September 7, 2023

Doubling Down on Dubbing


Creators want to reach the widest audience for their work, which is why content localization–the process of translating, captioning, and/or dubbing content into a different language–is such a fast-growing industry. While Hollywood blockbusters dispose of large localization budgets, most content-–from reality TV shows and news programs, through to YouTubers–don’t often have the budgets or clean, multi-tracked audio, to be able to use professional dubbing services.

Automated speech recognition (ASR), artificial intelligence, and machine learning technologies have helped open up localization services to a broader range of content. However, transcription accuracy can fall short when additional background noise or music is in the video.

That’s where AudioShake comes in. Our new dialogue, music, and effects separation technology presents a solution by cleanly separating dialogue and background audio from a video or audio file, allowing ASR, captioning, and dubbing services to benefit from cleaner input audio. This results in much more accurate transcription rates.

“Since we started using AudioShake, our accuracy rates have improved by 25% or more in some cases. This has allowed us to transcribe audio and video content faster and more efficiently, saving valuable time and resources,” says Shanna Johnson, CEO of AudioShake customer cielo24, which works with companies like Dell, WWE, and McClatchy.

Building this model leveraged our existing expertise in sound and instrument separation to train our AI to understand the distinctions between vocals, instruments, and other sounds. We’ve also integrated our technology with a number of partners to deploy dialogue, music, and effects separation across a wide range of use cases and services.

Clean dialogue for captioning: OOONA provides professional management and production tools for media localization. With AudioShake’s dialogue-separation technology, users will soon be able to automatically cleanly separate the dialogue within their files before running it through OOONA’s captioning technologies, in turn creating more accurate automatic captions.

Retain music and effects in localized videos: Yella Umbrella is adopting AudioShake’s AI to supercharge its virtual studio system for titlers, voice artists, engineers and language experts. When users turn to Yella Umbrella’s platform, they’ll be able to use AudioShake to isolate the music and effects track from a video. Both providing a dialogue reference during recording and retaining music and effects tracks are key to delivering a high-quality output when reviewing and mixing.

Human-AI hybrid dubbing: cielo24 uses a hybrid human-AI dubbing approach, using AudioShake’s dialogue separation to create accurate transcription, translation, and localization capabilities in multiple languages at scale. They use humans along the transcription-translation workflow to ensure that creators get human-accurate results and quality control.

Fully automated dubbing: Dubverse has combined their natural language processing, neural machine translation, and deep learning algorithms with AudioShake’s technology to create a fully automated dubbing and subtitling service, at price points friendly to content creators who are just getting going.

Archival material: On the archival side, film and TV studios have used AudioShake’s technology to tackle separating and dubbing older work that doesn’t have its audio separated into multiple tracks. For example, AudioShake recently partnered with the German studio Pandastorm Pictures in the localization of the BBC’s iconic TV show, Doctor Who, into German.