ACAV100M is an automatically curated dataset of 10-seconds clips with high audio-visual correspondence.

For easier exploration, we arrange the clips by categories automatically assigned using pretrained audio (AudioSet), video (Kinetics400) and image (ImageNet) classifiers. We randomly selected 500 thousand clips for this visualization.

Labels

Filter

Classes

Temp