ACAV100M is an automatically curated dataset of 10-seconds clips with high audio-visual correspondence.

Here, we display video clips clustered with 100 centroids for audio and visual modality. We randomly selected 2 million clips for this visualization.

Labels

Filter

Classes

Temp