Semantic audio
Semantic audio is the extraction of meaning from audio signals. The field of semantic audio is primarily based around the analysis of audio to create some meaningful metadata, which can then be used in a variety of different ways.
Semantic analysis
Semantic analysis of audio is performed to reveal some deeper understanding of an audio signal. This typically results in high-level metadata descriptors such as musical chords and tempo, or the identification of the individual speaking, to facilitate content-based management of audio recordings. In recent years, the growth of automatic data analysis techniques has grown considerably,- Music Information Retrieval
- Sound recognition
- Speech segmentation
- Automatic music transcription
- Blind source separation
- Musical similarity
- Audio indexing, hashing, searching
- Broadcast Monitoring
- Musical performance analysis
Applications
Aside from audio retrieval and recommendation technologies, the semantics of audio signals are also becoming increasingly important, for instance, in object-based audio coding, as well as intelligent audio editing, and processing. Recent product releases already demonstrate this to a great extent, however, more innovative functionalities relying on semantic audio analysis and management are imminent. These functionalities may utilise, for instance, audio source separation, speaker segmentation and identification, structural music segmentation, or social and Semantic Web technologies, including ontologies and linked open data.
Speech recognition is an important semantic audio application. But for speech, other semantic operations include language identification, speaker identification or gender identification. For more general audio or music, it includes identifying a piece of music or a movie soundtrack.
Areas of research in semantic audio include the ability to label an audio waveform with where the harmonies change and what they are and where material is repeated and what instruments are playing.