New lyric transcription & alignment

AudioShake Research

October 5, 2023

Models updated in 2024, read more here.

‍

Automated speech transcription and captioning technology has come a long way over the past few years, powered by big use cases like call centers, TV and film captioning, sports broadcasts, and more.

‍

Lyrics are another great use case for transcription, as they serve a wide variety of uses in the music industry–from powering fan engagement and track sing-along on streaming services; to creating promotional assets; identifying explicit words; and helping music supervisors and sync teams find the perfect song for an advertising brief.

‍

The problem is, speech transcription doesn’t work so well for lyrics. For one, think of how often a singer might stretch out a word–that long “o” or “a” in a chorus or at the end of a sentence. Or how a word might have a different pronunciation, cadence, or stress in order to fit the rhythm and style of the song. And then there’s the issue of understanding the sung word: Music is complex audio, and it can be difficult for a computer program to clearly distinguish what is being sung, amidst the presence of so many instruments.

‍

In short, automated lyric transcription is a different art from speech transcription, and one that requires a different approach.

‍

That’s where AudioShake’s LyricSync service comes in. Our newest lyric transcription service is capable of creating clean and time-stamped lyric transcripts. Leveraging AudioShake’s best-in-class acapella separation, LyricSync isolates the vocal and then transcribes it in both the original and new languages. The service can also timestamp and sync at a word-by-word level, achieving more than 95% accuracy (averaged over multiple languages). Meaning, we can time stamp them like in a karaoke bar, where you can sing along with each word in time. In seconds, our tool can transcribe lyrics in 40+ languages at a high level of accuracy, reducing both the time and cost of transcription and alignment.

‍

With LyricSync, users can create transcription of any audio source for uses such as:

Lyric Videos: Produce word-by-word lyric videos in seconds to create easy promotional assets for any recording.
Emerging DSP Karaoke Formats: With time-stamped lyrics, users can take advantage of the rising popularity of karaoke formats within DSPs like Spotify.
Improved Internal Databases: Transcribing all the content in a catalog or database in bulk to improve the ability to search internally for keywords or themes (i.e. “songs about the rain”).
Transcribe Older Catalogs: AudioShake can isolate the vocals from any audio file, including analog and mono tracks. Music publishers or labels can transcribe older catalogs to improve their metadata or open them up for monetization opportunities with low-cost lyric videos.
Localization: Labels have the opportunity to reach new global audiences by creating everything from karaoke to lyric videos in over forty languages.

‍

AudioShake’s LyricSync customers include Spirit Music Group, one of the world’s leading independent music publishing companies; karaoke company Singa; hip hop label LVRN; sync licensing platform Chordal; and recording artist and TikTok star, Lala Sadii, who is distributed by Downtown Artist & Label Services.

‍