Codec listening test


A codec listening test is a scientific study designed to compare two or more lossy audio codecs, usually with respect to perceived fidelity or compression efficiency.
Most tests take the form of a double-blind comparison. Commonly used methods are known as "ABX" or "ABC/HR" or "MUSHRA". There are various software packages available for individuals to perform this type of testing themselves with minimal assistance.

Testing methods

ABX test

In an ABX test, the listener has to identify an unknown sample X as being A or B, with A and B available for reference. The outcome of a test must be statistically significant. This setup ensures that the listener is not biased by their expectations, and that the outcome is not likely to be the result of chance. If sample X cannot be determined reliably with a low p-value in a predetermined number of trials, then the null hypothesis cannot be rejected and it cannot be proved that there is a perceptible difference between samples A and B. This usually indicates that the encoded version will actually be transparent to the listener.

ABC/HR test

In an ABC/HR test, C is the original which is always available for reference. A and B are the original and the encoded version in randomized order. The listener must first distinguish the encoded version from the original, prior to assigning a score as a subjective judgment of the quality. Different encoded versions can be compared against each other using these scores.

MUSHRA

In MUSHRA, the listener is presented with the reference, a certain number of test samples, a hidden version of the reference and one or more anchors. The purpose of the anchor is to make the scale be closer to an "absolute scale", making sure that minor artifacts are not rated as having very bad quality.

Results

Many double-blind music listening tests have been carried out. The following table lists the results of several listening tests that have been published online. To obtain meaningful results, listening tests must compare codecs' performance at similar or identical bitrates, since the audio quality produced by any lossy encoder will be trivially improved by increasing the bitrate. If listeners cannot consistently distinguish a lossy encoder's output from the uncompressed original audio, then it may be concluded that the codec has achieved transparency.
Popular formats compared in these tests include MP3, AAC, Vorbis, Musepack, and WMA. The RealAudio Gecko, ATRAC3, QDesign, and mp3PRO formats appear in some tests, despite much lower adoption. Many encoder and decoder implementations exist for some formats, such as MP3, which is the oldest and best-known format still in widespread use today.
SourceDatesFormatsBitrate CodecsMusical genresSamplesListenersBest ResultComments
2001multiple~128MP3: Lame 3.89beta --abr 134 -h --nspsytune—athtype 2 --lowpass 16—ns-bass -8MP3: Xing within AudioCatalyst 2.1 128 kbit/s, high frequency mode disabled, simple stereo disabledAAC: Liquifier Pro 5.0.0 Beta 2, Build 24 streaming 128, equalization disabled, dynamics disabled, dual mono encoding disabled, audio bandwidth overridden by the program, set at 17995 HzMPC: mppenc.exe version 1.7.9c -radio -ltq_gain 10 -tmn 12 -nmt 4.8WMAv8: Windows Media Player 7.1 ; Wmadmoe.dll version 8.0.0.0371 128 kbit/sOgg Vorbis: Oggdrop RC2 for Windows 32 128 kbit/s116Musepack and AAC
2001 October - 2002 Januarymultiple~128MP3: Lame 3.89beta --abr 134 -h --nspsytune—athtype 2 --ns-bass -8MP3: Xing within AudioCatalyst 2.1 128 kbit/s, high frequency mode disabled, simple stereo disabledAAC: Liquifier Pro 5.0.0 Beta 2, Build 24 streaming 128, equalization disabled, dynamics disabled, dual mono encoding disabled, audio bandwidth overridden by the program, set at 17995 HzMPC: mppenc.exe version 1.7.9c -radio -ltq_gain 10 -tmn 12 -nmt 4.8WMAv8: Windows Media Player 7.1 ; Wmadmoe.dll version 8.0.0.0371 128 kbit/sVorbis: **Oggdrop pre-RC3 for Windows 32; from CVS 128 kbit/sVarious325-28Musepack
or Vorbis
2002 Julymultiple~64Ogg Vorbis 1.0 -b 64—managedOgg Vorbis 1.0 -q 0MMJB 7.2 mp3PRO 64WMA8 at 64 kbit/s QuickTime 6.0 MPEG-4 AAC Low complexity at 64 kbit/sVarious1224-41mp3PROBoth Vorbis variants were a close second.
2003 JuneAAC128 CBRPsytel AAC-enc 2.15 -br 128Ahead/Nero 5.5.10.35 128 kbit/s CBR, high qualitySorenson Squeeze 3.5 128 kbit/sApple QuickTime 6.3 128 kbit/s high qualityFAAC 1.17b -a 64 Various1011-18QuickTime
2003 Julymultiple~128Apple QuickTime 6.3 MP4 encoder 128 kbit/s high qualityLAME MP3 Encoder 3.90.3 --alt-preset 128Musepack 1.14 --quality 4 --xlevelOgg Vorbis post-1.0 CVS -q 4.25Windows Media Audio v9 PRO bitrate-managed 2-pass VBR 128 kbit/sVarious1214-24MusepackAAC, WMA, and Vorbis tied for close second
2003 Septembermultiple~64Ahead/Nero 6.0.0.15 HE AAC VBR profile Streaming :: Medium, high qualityOgg Vorbis post-1.0 CVS -q 0mp3PRO VBR quality 40, Current Codec, allow M/S and IS, allow narrowing, no CRCReal Audio Gecko 64 kbit/sWindows Media Audio v9 VBR quality 50QuickTime 6.3 AAC LC 64 kbit/s, Best QualityVarious1230-43Nero
HE-AAC
This test showed that listeners preferred 128 kbit/s MP3 audio encoded by LAME to all the tested codecs at 64 kbit/s, with greater than 99% confidence:
"No codec delivers the marketing plot of same quality as MP3 at half the bitrates."
2004 JanuaryMP3~128LAME encoder 3.95 --preset 128FhG MP3 encoder from Adobe Audition 1.0 VBR quality 40, "Current - Best" codec.Apple iTunes 4.2 MP3 112 kbit/s VBR, Highest quality, joint stereo, smart encodingGOGO-no-coda 3.12 -b 128 -a -q 0Audioactive Encoder 2.04 128 kbit/s High QualityXing MP3 Encoder 1.5 VBR quality normalVarious1211-22LAMEThe author noted that the results may have been affected by the use of an outdated version of the Xing encoder and non-optimal settings for ITunes.
2004 FebruaryAAC~128Ahead/Nero AAC-enc v 2.6.2.0 -internet profile, high quality, LCApple iTunes 4.2 128 kbit/sCompaact! 1.2beta3 VBR 5, high quality, LCFAAC 1.23.5 -q 115Real Producer 10 beta 128 kbit/sVarious1219-29iTunesOpen-source FAAC codec improved greatly since previous test
2004 Maymultiple~128LAME encoder 3.96 -V5—athaa-sensitivity 1Apple iTunes 4.2 128 kbit/s AACOgg Vorbis aoTuV tuning b2 -q 4.35Musepack 1.14b --quality 4.15—xlevelSony Atrac3 132 kbit/sMicrosoft WMA9 Std Bitrate VBR 128 kbit/sVarious1812-27aoTuV and Musepack
2004 Junemultiple32 CBRLAME encoder 3.96 -b 32Nero Ahead HE AAC+PS 32 kbit/s CBR High QualityOgg Vorbis post-1.0.1CVS --managed -b 32 resampled with SSRCReal Audio 32 kbit/s stereo music codec in Helix Producer 10QDesign Music Codec 2 Pro 32 kbit/s at 32 kHz, Quality modeMicrosoft WMA9 Std 32 kbit/s at 32 kHzmp3PRO 32 kbit/s at 32 kHz, in Adobe Audition 1.5Various1847-77Nero
HE-AAC
2004 Julymultiple~175MPC: musepack -standardMP3: LAME 3.97 alpha -V 3; -V 2Ogg Vorbis: megamix -q 6,00; -q 6,99; -q 5,50Classical181Musepack
2005 Augustmultiple~180AAC: Faac 1.24.1. Release date: end 2004. Setting: -q175AAC: Nero Digital aacenc32 v.3.2.0.15. Release date: June 2005. Setting: -streaming.MP3: LAME 3.97 alpha 11. Release date: July 2005. Setting: -V2—vbr-newMPC: mppenc 1.15v. Release date: march 2005. Setting: --quality 5Ogg Vorbis: aoTuV beta 4 based on 1.1.1. Release date: July 2005. Setting: -q6,00Classical181aoTuV The author reflects on substantial improvements in Vorbis encoding since his previous test :
"Vorbis is now –thanks to Aoyumi – an excellent audio format for 180 kbit/s encodings."
2005 Augustmultiple~96AAC-LC: iTunes 4.9 / QuickTime 7.02 CBR 96MP3: LAME 3.97 alpha 11 --abr 99MPC: mppenc 1.14 --xlevel—quality 3 Ogg Vorbis: aoTuV / LANCER beta 4 based on SVN 1.1.1 -q2,00WMA Standard: WMA 9.1 CBR 96Classic, various150 classical, 35 various1aoTuV and AAC tied, aoTuV The author selected each participating encoder by pitting multiple encoders against one another in an initial "Darwinian phase." For example, LAME was chosen as the representative MP3 encoder because it clearly outperformed four other MP3 encoders on a subset of the full sample corpus.
2005 Decembermultiple~140 Nero AAC 3.1.0.2 VBR/Stereo - Streaming, 100-120 kbit/s iTunes AAC 6.0.1.3 128 kbit/s, VBRLAME 3.97 Beta 2 -V5—vbr-newOgg Vorbis AoTuV 4.51 Beta -q 4.25WMA Professional 9.1 Quality-Based VBR, Q50Shine 0.1.4 -b 128Various1818-304-way tie "I think this test shows that with the current encoders, the quality at 128 kbit/s is very good... It's time to move to bitrates like 96 kbit/s or even lower."
2006 MarchAAC483gpp 6.3.0 48 kbit/s CBRCoding Technologies - Winamp 5.2 beta 393 48 kbit/s CBR HE-AACCoding Technologies - Winamp 5.2 beta 393 48 kbit/s CBR HEv2-AACNero Digital 4.9.9.95 48 kbit/s ABR HE-AACNero Digital 4.9.9.96 48 kbit/s ABR HEv2-AACiTunes 6.0.2 48 kbit/s CBRLAME 3.97b2 -V5Various1810-205-way tie
"... it seems that overall, plain HE-AAC might be better than HE-AAC v2 at this bitrate, but a lot more samples would be needed to be able to draw definitive conclusions regarding this.
2006 Novembermultiple~48Ogg Vorbis AoTuV 5 Beta -q -1WMA Professional 10 1-pass CBR, 48 kbit/sNero HE-AAC May 26, 2006 -q 0.2WMA Standard 9.2 Quality-Based VBR, Q10iTunes AAC 7.0.2.16 48 kbit/s, CBRVarious2022-34Nero
HE-AAC
WMA Professional and aoTuV tied for second
2007 Julymultiple~64Ogg Vorbis AoTuV 5 Beta -q 0WMA Professional 10 1-pass CBR, 64 kbit/sNero HE-AAC Jul 20 2007 -q 0.24Various1821-33Nero Digital and WMA Professional
2008 OctoberMP3~128LAME 3.98.2 -V5.7LAME 3.97 -V5—vbr-newiTunes 8.0.1.11 112 kbit/s, VBR, highest quality, joint stereo, smart encoding, filter below 10 HzFraunhofer IIS mp3surround CL encoder v1.5 -br 0 -m 4 -q 1 -vbri -oflHelix v5.1 2005.08.09 -X2 -U2 -V60l3enc 0.99a -br 128000 -mod 1Various1426-395-way tie
"The quality at 128 kbps is very good and MP3 encoders improved a lot since the last test." Also notes that Fraunhofer and Helix codecs are several times faster at encoding than LAME, although virtually identical in terms of perceived audio quality.
2011 Marchmultiple~64Ogg Vorbis AoTuV 6.02 Beta -q 0.1Apple HE-AAC constrained VBR, high quality, 64 kbit/sCELT complexity 10, VBR 67.5 kbit/sNero HE-AAC -q 0.245Various3025-13CELT / OpusIn, CELT is referred to as Opus, its name when later standardized.
2011 July/AugustLC-AAC~96Nero 1.5.4.0 -q 0.345Apple QuickTime 7.6.9 true VBR, high quality, 96 kbit/sApple QuickTime 7.6.9 constrained VBR, high quality, 96 kbit/sFraunhofer IIS VBR 3Coding Technologies CBR 100 kbpsVarious2025Apple QuickTime
2013 MayMP3~224Lame3100i -V2+LAME 3.99.5 -V1LAME 3.98.4 -q 0 -b 224Helix v5.1 -X2 -U2 -V146BladeEnc -quit -nocfg -224Various2514-way tie
Most impairment grades rated between 4 and 5. Both speech samples transparent except for the low anchor.
2014 July - Septembermultiple~96AAC Apple QuickTime iTunes 11.2.2 constrained VBR, high quality, 96 kbit/sOpus 1.1 VBR, 96 kbit/sOgg Vorbis aoTuV Beta6.03 -q 2.2 MP3 LAME 3.99.5 VBR, -V 5 AAC FAAC v1.28 -b 96AAC FAAC v1.28 -q 30 Various4033OpusIn Opus is clear winner, Apple AAC is second, Ogg Vorbis and higher-bitrate LAME MP3 are statistically tied in joint third place. FAAC, known to be inferior in advance, was used to discard bad results and as quality scale anchor.
Cunningham and McGregor2019 Februarymultiple192 - 1411Uncompressed WAVMP3 CBR 192 kbpsAAC 192 kbps CBRACER low quality ~1023 kbps VBRACER medium quality ~1130 kbps VBRACER high quality ~1233 kbps VBRPop101005-way tie Participants reported no perceived differences between the uncompressed, MP3, AAC, ACER high quality, and ACER medium quality compressed audio in terms of noise and distortions but that the ACER low quality format was perceived as being of lower quality. However, in terms of participants’ perceptions of the stereo field, all formats under test performed as well as each other, with no statistically significant differences.
SourceDatesFormatsBitrate CodecsMusical genresSamplesListenersBest ResultComments