AudioShake Voice Model Achieves Highest-Quality, State-of-the-Art Benchmark

AudioShake
May 8, 2025

Today, AudioShake announces its highest-quality vocal model to date, setting a new, state-of-the-art benchmark for vocal isolation.

While our efforts are always first and foremost focused on perceptual quality–meaning, how the output of our stems actually sound to real human ears–we also regularly test our models on quantitative benchmarks widely used across the industry. With an SDR of 13.5 dBon MUSDBHQ, today’s vocal model release surpasses the state-of-the-art benchmark previously set by ByteDance in 2024, and builds on our success in industry challenges like the Sony Demixing Challenge.

AudioShake’s new vocal model delivers cleaner, more natural separations by capturing subtle details like long-tail reverb, preserving the original vocal’s depth and timbre, and more proficiently picking up vocal harmonies. It surpasses previous models in both perceptual listening tests and quantitative benchmarks. 

Already made available to some of our beta testers, AudioShake’s new vocal model has been hailed for its precision and quality. 

“The new vocal model from AudioShake is a big step up. The separation is cleaner, and there’s a noticeable boost in clarity without losing the feel of the original mix. Reverb tails are better preserved, and the stereo image stays intact, which helps the vocal sit more naturally in the track. The result is a spacious, defined sound—easily the best I’ve heard.” – Daniel Rowland, Audio Engineer and Co-Founder of Immersive Mixers

Hear for yourself how AudioShake’s vocal model has improved over the years. 

How Quality is Measured in AI Stem Separation

We rely on perceptual quality indicators–or listening tests–to help assess separation quality and how well our models isolate the natural depth, timbre, and spatial characteristics of the original vocal performance.

To assess perceptual quality, participants compared results from both the new high-quality model and our previous vocals model in internal listening tests. These tests showed a clear preference for the high-quality model in over 90% of songs, particularly excelling on challenging tracks from often underrepresented genres and live music recordings.

For a quantitative assessment, we used the Signal-to-Distortion Ratio (SDR), a standard metric in audio source separation that measures the ratio of the desired signal to the distortion introduced during separation. Our new model attained a groundbreaking score on MUSDBHQ of 13.5 dB, compared to our 

previous model's 12.5 dB, reflecting a significant enhancement in separation quality. The HQ model also improved the stereo consistency of the output by more than 30% using metrics proposed by Apple.  

We’re proud that our new model achieves state-of-the-art performance, outperforming other commercial services as well as published research models that aren’t practical for production use due to their size. These evaluations underscore the advancements of our new model in both perceptual and quantitative terms, both of which we explored in this previous blog post

Building in Partnership with the Music & Entertainment Industry

We’re fortunate to count some of the world’s largest music labels, film studios, and artists among our customers. From all three major label groups and iconic labels like Disney Music Group, to artists like Green Day, Nina Simone, One Republic, Sia, De La Soul, and Oscar Peterson, we are deeply grateful for the music industry’s feedback and trust. Our partners continue to challenge us–and our technology–to reach higher standards, pursue new projects, and explore fresh creative paths. Here are some vocal highlights from across the media and entertainment industry:  

AudioShake’s new vocal isolation model begins rolling out today, and will reach all our users across the AudioShake Live, AudioShake Indie, and our API platform in the coming weeks.