July 11, 2023

Building Spatial Formats with AI Stems


Those in music are closely watching the rise of spatial audio. Also known as reality audio, 3D immersive audio, or 360-degree sound, spatial audio is a type of digital surround sound in which the origin points of different sounds envelop listeners from all directions. Whereas stereo or surround sound has a limited number of channels (in the single digits), spatial audio has dozens. This allows audio to flow around, above, and below listeners in a three-dimensional, 360º space — immersing them in any scene, song, or landscape. The richness of spatial sound brings music to life in a way that has been described as “awakening”, “deeply emotive”, and “the next revolution for audio.” Already, we’ve seen spatial audio begin to proliferate in consumer products built by Apple, JBL, Samsung, Sony, and Xbox.

To make these soundscapes possible, engineers need access to the individual tracks of a song — known as stems. With stems, sound engineers can map the individual audio parts to specific points in a digital 3D space. An easy way to picture this is to imagine a sphere built around a listener, with each sound source placed around them.

This approach allows for those individual sounds to seemingly come from either side, behind, above, or below the listener. Engineers can also adjust the distance of sounds so that they seem closer or farther away. For example, in some spatial mixes, listeners could hear the drums in one ear and bass in the other, and then feel them switch back and forth. Or in a film, if a car is swiftly driving by, engineers can map the sound to move beside and past a listener. Adding more depth to the experience of music or a movie.

Amazon, Apple, Google, and TIDAL have all begun to introduce spatial audio in their hardware and distribution platforms. Contemporary recordings, where stems are often readily available, make it easy for artists to create and publish spatial mixes. However, with older tracks, in which masters tapes or stems are unavailable or lost, labels and mixers may turn to AudioShake to create the stems from the full mix using AI.

Independent labels have helped steer the industry and listeners toward this immersive audio experience. CODISCOS, a Colombian label publishing music of independent artists from Latin America, has been active in spatial audio, mixing new and existing albums as spatial experiences. In a recent project, CODISCOS spatialized early songs from iconic Latin American artists including Israel Romero, Moncho Santana, and Omar Geles more than 30 years after their release. And labels like BMG have worked with AudioShake to open up an iconic catalog from Nina Simone and others to make spatial mixes.

Many artists recognize the value and quality that spatial provides and have sought to have their tracks turned into these compelling mixes. As is often the case for Kevin McCloskey, an audio engineer and the owner of Bigger Mixes, who creates major label-quality mixes for indie artists. When approached by a prominent artist looking for a new mix of an existing recording session, he was given only an MP3 mix after the temporary drive with all the session files was damaged. To correct for the low-resolution audio and a lack of stems, McCloskey turned to AudioShake to create the stems. McCloskey was able to create a high-quality mix for the artist.

After more than six decades of stereo, spatial audio is positioned to become the industry standard for music — as well as cinema, streaming, gaming, and so on. Bringing this format to tracks both old and new, analog and digital, will require the ability to access and manipulate high-quality stems from these project files.