AI STEM SEPARATION

Instantly separate
audio into stems

Isolate vocals, instruments, overlapping speakers, and effects with AudioShake’s AI stem separation — built for music, film, and media production.

Explore AudioShake Separation Models

Tailored AI for every kind of sound – from music to film.

Instrument Stem Separation

Separate vocals, drums, bass, guitar, and more from a single mixed track

Learn more →
Multi-Speaker Separation

Detect and isolate overlapping speakers into separate audio channels

Learn more →
Dialogue, Music, and Effects Separation

Create clean speech tracks while retaining background music and effects

Learn more →
Industry leading technology

Why Choose AudioShake

AudioShake is a non-generative AI company focused solely on separating sound, allowing us to create the best possible technologies for any media workflow.

State-of-the-art AI vocal isolation
Best-in-class sound separation
Professional audio formats
Trusted by major labels and studios
Real-time sound separation
AUDIOSHAKE PLATFORMS

How to Get Started

Built for teams of all sizes, from indie artists to media production teams.

AudioShake Indie

AudioShake Indie gives independent artists, producers, and labels access to the same award-winning instrument AI stem separation used by all the major labels.

AudioShake Live

AudioShake Live is an intuitive, drag-and-drop AI stem separation platform designed for enterprises to create high-quality stems on demand.

AudioShake API & SDK

Bring the power of award-winning sound separation directly to apps and edge devices with AudioShake's API or SDK.

Trusted by top media companies worldwide

Frequently Asked Questions

What is AI stem separation?

For music, we have an on demand platform designed specifically for industry professionals called AudioShake Live where you can quickly upload your songs and create stems for them. Get in touch for a demo and free trial. For independent artists and labels, we have AudioShake Indie.

We've also integrated across a number of platforms in the sync and localization industries. Music supervisors can make use of our services on Chordal. Dubbing freelancers and studios can find our technology already embedded in their workflow tools including OOONA and Yella Umbrella, as well as through services including Dubverse and cielo24.

If you are a developer, check out our documentation center about ways to integrate our API.‍

How does AudioShake's technology work?

AudioShake uses A.I. to recognize different components in a piece of audio--for example, the drums in a rock song. We then isolate that stem so you can use it for new purposes--sampling, synch licensing, re-mastering, re-mixes, and more.

I've seen similar tech like this in the past and it always falls short. What makes AudioShake different?

There have long been efforts to isolate tracks within music, but recent advances in artificial intelligence have made it possible for a big leap in quality. We believe AudioShake's technology is the best in the industry, and significantly outperforms all other offerings. In fact we are the winner's of Sony's Demixing Challenge, which pitted our stem separation against 40 other teams, including Big Tech companies, start-ups, and research institutes.

What kinds of sound separation do you offer?

AudioShake offers a range of models, centering on music and "dirty audio" tasks:

Music: separate different instruments, or create an instrumental. You can also separate multiple singers from a track.
Dialogue, Music, & Effects: separate speech or dialogue from background audio, or separate effects from music.
Multi-speaker Separation: separate overlapping speakers in a podcast, video, or speech file
Lyric Transcription & Alignment: Transcribe lyrics from a song, then align them with word-by-word time stamping, to create a karaoke-type experience.

Is this available via API or on-device?

Yes! All our separations are via API, and many are also available on-device. Our documentation site is here.

Do you do speech transcription and alignment?

No, in transcription, we only focus on lyrics, which are a different research task from speech. However, we work with many speech transcription and captioning services, in order to clean dialogue before it goes through automated speech recognition (ASR).

What file formats and resolutions do you offer?

We match your file inputs and can support up to 192kHz, which can be exported in the following file formats: WAV, MP3, AAC, FLAC, AIFF, and PCM. For transcriptions, we can export text as JSON or TXT files. 

More details on the formats we offer and support can be found on our developer page. 

Get in touch.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.