New AudioShake models double in transcription accuracy, over 5X in speed

AudioShake Research

April 16, 2024

‍

Last year, AudioShake introduced its lyric transcription service, which leveraged our award-winning stem separation to create clean and time-stamped lyric transcripts. The tech was sought out by artists like Sia and Lala Sadii, and helped opened up the catalogs of publishers like Spirit Music Group.

‍

Six months later, we are releasing an entirely new transcription service that is five times as fast and twice as accurate.

‍

The latest advancements significantly improve the accuracy of our lyric transcriptions across a range of languages. These new models boost the accuracy of our ten best-performing languages to nearly 90% (and even higher than 90% for some European languages). Zooming out to the top thirty most represented languages in the world, the accuracy has increased from 41% in the first version of our model, to 78%.

‍

Our team also saw big strides in non-European languages. Most significantly, accuracy for languages like Chinese, Indonesian, Japanese, and Vietnamese improved to over 80%. This significant jump in accuracy across global languages will allow AudioShake to better serve global artists, labels, and partners in efforts to localize and extend the reach of their music.

‍

As a company focused on sound separation, lyric transcription was an interesting and challenging task for two reasons. First, it requires good acapella separation because the background music can hinder an understanding of what is being sung. Second, lyric transcription is a less studied problem than speech recognition (ASR) and transcription technology, and the latter does not work well with lyrics, often missing distinctive spelling and formatting, and reflecting the unique qualities of sung words, like rhythm, rhyme, emotional emphasis, etc.

‍

As musicians and music lovers interested in helping contribute to the research community, we have compiled an entirely new benchmark for more reliable assessments of lyric transcription. It includes revised data, unified industry guidelines, and new metrics for accuracy. This allows us and others working on lyric transcription tasks to more accurately evaluate how well we transcribe lyrics.

‍

AudioShake’s new lyric transcription model is now available on its API. For more information on how to use these services, please visit our Lyric Transcription page.