Mulan Projects .

Technology

Mulan

Mulan is a joint acoustic-semantic embedding model that links music recordings to natural language descriptions using a dual-encoder architecture.

Developed by Google Research, Mulan maps unaligned audio and text into a shared 128-dimensional embedding space. The model leverages two distinct towers (a ResNet-50 for audio and a BERT-base for text) trained on 44 million music clips and 370,000 hours of audio. By utilizing contrastive learning, Mulan enables zero-shot music tagging and cross-modal retrieval without requiring manual annotations. This technology powers advanced music understanding tasks, allowing systems to identify complex genres or moods (e.g., 'lo-fi hip hop for studying') directly from raw waveforms.

https://google-research.github.io/seanet/mulan/
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects