Music Source Separation
Music Source Separation also known as Stem Separation, Demixing, Audio Source Separation or Unmixing is a technique of separating one audio track into multiple audio tracks by targeting mixed material using Music Information Retrieval (MIR) MSS is a branch of Signal Separation which was established in the mid-1990s as a technology to reconstruct one or more source signals from mixtures of them. The process is generally utilized by music professionals to separate existing recordings for the purposes of enhancing the balance of the mix, remixing or remastering. There are additional use cases where there is no multitrack or session files available of the sound recording so it becomes a necessity to rely on tools that can provide stem separation from a single audio file.
Initial audio source separation for commercial purposes resulted in a file that was non-destructively separated, so that the resulting files could be reconstructed and sound exactly like the original without introducing issues when all tracks were performed simultaneously.
There are a wide variety of applications of the technology outside of music including teaching, forensics, speech separation, live sound cancelation, audio restoration, and VR/AR.
How AI Stem Separation Generally Works
This process involving reverse engineering stems from mastered tracks relies on training models to identify targets in mixtures. Millions of real isolated stems from project files are used to update the parameter margins of models to generate estimates for the final output from mixtures. Large multitrack datasets are developed from the provided isolated stems with further adjustments to mixtures to provide higher numbers in the dataset that train the models for higher degrees of accuracy. Initially providers utilized online-based stem separation because it enable the utilization of powerful computational systems, now there are many options for local system based processing of the AI because of optimizations in the processing approach. There are also CPU developments that include neural workflows which facilitate the faster processing architecture needed for highest fidelity stem separation with lower time requirements.AI Stem Separation in Sync Music
A growing number of companies are providing the ability for both music publishers and clients to utilize the stem separation technologies for their project needs. Especially useful in the case of vocal removal from mixtures. Utilizing these tools provides editors and agents of the film and TV music industry to quickly have available the ability to adjust and contour songs without the need to reach out to providers which would cause time delays. This improves the potential of a usage because a common issue with sync placements is that certain kinds of sounds can interfere too much with the application of the underscore. This also provides the sync professionals the ability to take the track into unexpected directions and otherwise enhance the mix for the purpose of the application of the track.AI Stem Separation for the DJ
Quick stem separation is a perfect match for the professional DJ looking to create unique mashups. Generally the track would be rendered into a stem by placing the desired songs into the appropriate folder, when the song is selected it will have the basic four stem groupings available and in some cases individual parts can be triggered on pads for live performance.Notable Case Studies of AI Stem Separation
Disney Music Group made use of stem separation technologies to enhance their back catalog of recordings. Beatles recordings where split and enhanced with stem separation technologies and the engineers during this process also helped to progress the development of the technology. Numerous classic hit songs have been the target of restoration through stem separation achievements.Stems vs AI Stem Separations
Stems have been used in the recording industry to mean files bounced during the mixing process, generally a collection of like sounds grouped as a "stem". Stems in the context of the original project files can provide a large number of exported audio files for multiple purposes. These kinds of files generally provide a better quality overall and offer the ability to further isolate project material without introducing artifacts.AI Stem Separations have generally produced material that is ideally suited for volume adjustments or further effect processing or production. These kinds of stems generally have come in the basic four groupings of vocal, bass, drum and other. New approaches and deeper training of models resulted in the capability to isolate additional material beyond the basic four groupings however these kinds of separations generally have spectral anomalies, blend in additional sounds or change some quality of the original targeted sound.
Sound Design with AI Stem Separation Tools
The process of using AI and other methodologies to target specific kinds of sounds happened to enable a new method of spectral separation based sound design through new kinds of tools to edit with such as those in SpectraLayers and RipX. The instant ability to unmix components such as transient information and time based information into full tracks of unconventional sound creations. Groove shadows and other sound production dubbing techniques are easily achievable by revealing new timbres and structures based on spectral selections because of the advancements into tools to support stem manipulation.Noise Reduction and AI Stem Separation
Aside from advanced noise reduction methods based on learning noise profiles, taking an inverted approach and removing known source targets such as the basic four and specialized models can result in leaving only the noise as a separate track depending on the ensemble. From that, one can remove the noise track. Noise may result on only a single stem and that stem can be targeted exclusively with noise reduction profiles in this way the entire mix does not need to be processed.Karaoke (Vocal Remover) and AI Stem Separation
One of the most popular use cases of stem separation is for the purposes of creating an instrumental of a song where one isn't known to exist or available. There are dozens of sites using the technology to attract users aspiring to make such instrumental versions of their favorite songs.RipX and the Melodyne Approach to Stem Separation
The RipX DAW is a unique take on the concept of stem separation because of its note-based harmonic audio visual structure branded as "Rip Audio Format". The system provides a stem separation tool that breaks down a single file into several tracks with notes being represented as the audio track. These notes are highly adjustable and the system includes highly specialized tools for working with the notes and the spectral aspects of the captures. Each note or note part can have specialized effects applied. Tracks can be swapped easily because of the utilization of this notation with other sounds entirely. So the stem is not only separated but the midi is transcribed making it possible to perform as a midi sequence and thereby direct instruments. The notation used by the Rip Audio format resembles the Melodyne architecture of note extraction from audio, these notes however also function as MIDI and audio simultaneously. RipX is a completely unique kind of DAW that is based around stem separation as well as this new Rip Audio format, where audio and midi worlds forge a symbiosis with new kinds of tools to support the new paradigm.Stem Mastering Tool
Native Instruments created a specialized tool called "Stem Creator Tool" for working with four part stem tracks which is ideally suited for the DJ world as digital DJ consoles and Native Instruments hardware like Traktor and Maschine use the four track stem structure. This tool enables quick mastering and saving of files in a "stem" archival format. The tool is free to use and essential mastering effects applicable to stem-based audio are provided.Example Approaches and Methodologies Employed
Deep Learning
- Neural Networks
- Convolutional Neural Networks
- Recurrent Neural Networks and Transformers
- Source Separation Algorithms
Signal Processing Techniques with AI Integration
- Short-time Fourier transform STFT
- Independent Component Analysis (ICA)
- Non-negative Matrix Factorization (NMF)
- Computational Auditory Scene Analysis (CASA)
- Repetition-based methods
- Masking-based approaches
- End-to-end approaches
- Hybrid approaches
Supporting Developments
- Ensemble-based approaches
- Leveraging large datasets
- Text-based source separation
- Conv-TasNet
- Wave-U-Net
- Mapping-based Methods
- SynthSOD