SIMPLE EVALUATION METHODS FOR AUDIO CODING SYSTEMS

SIMPLE EVALUATION METHODS FOR AUDIO CODING SYSTEMS Jednoduché metody vyhodnocování systémů kodování audiosignálu Marcelo Herrera Martinez Abstract Th...
5 downloads 1 Views 212KB Size
SIMPLE EVALUATION METHODS FOR AUDIO CODING SYSTEMS Jednoduché metody vyhodnocování systémů kodování audiosignálu Marcelo Herrera Martinez

Abstract This article deals with the subjective evaluation of audio coding technologies. A set of codec-bit rate combination is selected and evaluated by 25 persons. Data are statistical processed with the ANOVA method and results are presented.

Abstrakt Tento článek se zabývá subjektivním hodnocením některých kódovacích technologií pro audio. Je vybrána množina kodeků s různými rychlostmi a vyhodnocována 25 osobami. Data jsou statisticky zpracována ANOVOU a výsledky jsou prezentovány v článku.

Introduction For broadcasting and storaging purposes there is the need to reduce the amount of the audio files in order to save bandwith and capacity, respectively. Therefore efficient algorithms are implemented, in order to meet the task, eliminating from the signal irrelevancies and redundancies. Coding technologies use, among other techniques, facts from signal processing, hearing physiology and perception. The masking phenomenon that takes place inside the inner ear, in the bassilar membrane, is the key to new designs of audio coding devices, specifically their psychoacoustical models. Nevertheless, there is a huge number of codecs avalaible on the market and on the internet, and the need for their careful evaluation has become a necessity. Therefore, a definite number of audiophile groups have gotten involved in this evaluation process by both, subjective and objective approaches. The present work focuses on searching the correlation between audio subjective quality, codec-bit rate combination, CD quality and statistical results obtained with ANOVA methods.

Background: Masking and coding technologies Irrelevancies and redundacies are taken out from the signal in the psychoacoustical model and in the bit streaming process. The psychoacoustical model mainly uses the fact that when the ear-brain mechanisms are processing the audio signal, another signal may mask a weaker signal of neighbouring frequency. This fact is known as masking, and it finds explanation in the mechanisms of the inner ear, specifically the bassilar membrane (B.M.) mechanisms. While the fluid mechanical vibrations of the signal passes through the membrane, they get processed on definite places on the B.M. depending on the value of their frequency. Moreover, depending on the intensity of the signals, a definite number of cells in the particular place of the B.M. gets excited and sometimes (in the case of frequency neighbouring signals), there are no cells avalaible for processing the weaker signal, and so

42

stimulus is not transmitted to the brain. This principle is the basis for the masking phenomenon. For illustration a masking phenomenon is depicted in the frequency domain.

Fig. 1 Masking phenomenon

Subjective coding evaluation Describing audio quality in objective terms, signal to noise ratio or total block distortion provide simple, objective measures of audio quality but they ignore psychoacoustic effects that can lead to large differences in perceived quality. Listening tests are therefore the way to inspect codec performances when a high accurate assessment is needed. The present paper presents the results when testing three codecs (aac, ogg and wma) at two different bit rates (96 and 128 kbit/s) with the double-blind stimulus hidden reference method. The critical audio material chosen stresses different aspects of the codecs resolution. Mainly there were selected excerpts which stresses transient phenomena, and high frequency components. In the next table some characteristics of used excerpts are given. Item 1

Audio excerpt Eric Clapton – Don’t know which way to go

2

English female speaker, from mpeg web

3

French male speaker, critical mpeg material

4 5 6 7

Castanets, critical mpeg web material Erotica, Grieg Alleluja, Adash Eric Clapton – Preludin fugue

Description Suitable for charles drum evaluation Broadcast english female Broadcast french male Highly transient signal Piano classical sequence Female choral Drum changes, Transient signal

Tab. 1 Audio excerpts used Double-blind stimulus hidden reference method description

43

In this method, listeners are asked to listen to three tracks of the same audio sequence, one of which was the reference one, labelled „R“, and the other two „A“ and „B“. One of them (A or B) is the reference, and the other the compressed signal. Listeners are asked to recognize which is the compressed one and to judge its basic audio quality, on the scale from one to five, as is shown in the Tab. 2. This method is known as the blind stimulus double hidden reference. The listener could switch freely between the presentations 'Reference', 'A' and 'B', where 'A' and 'B' were the processed version and the hidden reference, randomly allocated for that trial. Each excerpt could be repeated as often as required. The listener was asked to judge the 'Basic Audio Quality' of the 'A' and 'B' versions in each trial and any difference from the reference was to be considered as an impairment. The order of the test presentations and the position of the hidden reference were randomised for each listener.

Impairment description Imperceptible

Grade 5.0

Perceptible but not annoying

4.0

Level

3.0 Slightly annoying 2.0 Annoying 1.0 Very annoying Tab. 2 ITU-R five grade impairment scale

Statistical analysis and results Throughout the statistical analysis 'diffgrades' are used; these are calculated, from each trial, as the grade awarded to the coded version minus the grade awarded to the reference. Thus, for example, an impairment grade of 4.0 ('Perceptible but not annoying') awarded to the coded version becomes a diffgrade of -1.0, whilst a grade of 5.0 ('Imperceptible') gives a diffgrade of 0.0. Then, the calculation of the mean score, u jk for each of the presentations is held:

u jk =

1 N

N

∑u i =1

ijk

(1)

where: u j : score of observer i for a given test condition j and audio sequence k

N:

number of observers.

Associated confidence interval is derived from the standard deviation and the size of each sample. It is proposed to use the 95% confidence interval which is given by:

[u

jk

− δ jk , u jk + δ jk ]

44

(2)

where:

δ jk = t0.05

S jk

(3)

N

and t0.05 is the t value for a significance level of 95%. The standard deviation for each presentation, S jk , is given by: S jk =

N



(u jk − u jk ) 2

i =1

(4)

( N − 1)

Codecs on item #1

Codecs on item #2

0,5

0,5

0 0

-0,5 -0,5

-1

-1

-1,5

-2

96aac 96ogg

96wma 128aac

128ogg 128wma

-1,5

96aac 96ogg

96wma 128aac

128ogg 128wma

Codecs on item #6

Codecs on item #5 0,5

0,5

Codecs on item #4

Codecs on item#3 0,5

0,5

0

0

0

0

-0,5

-0,5

-1

-0,5 -0,5

-1,5

-1

-1

96aac 96ogg

96wma 128aac

96aac 96ogg

128ogg 128wma -1

-1,5

96aac 96ogg

96wma 128aac

128ogg 128wma

45

96aac

96ogg

96wma

128aac

128ogg 128wma

96wma 128aac 128ogg 128wma

C o d e c s o n ite m # 7 0 ,5

0

-0 ,5

-1

-1 ,5

96aac

96ogg

96wma

128aac

128ogg

128wma

Conclusion Results show that ogg and wma formats have better performance than aac, specially when evaluated at 96 kbit/s. Ogg performed similarly as Wma , Ogg slightly better, specially in sequencies containing high frequencies components and transient signals. At 128 kbit/s, is observed that wma was generally rated better, but again on transient and high frequency component signals ogg performed slightly better. Results show, that ogg and wma at 96 kbit/s, can be considered as CD quality coders, and this is a remarkable final result. The statistical approach of the present work, including test post-screening, enable us conclude that ogg and wma formats can be considered for a wide range of audio critical material as cd quality formats.

Literatura [1] [2] [3]

J.P. Guilford, Psychometric methods, McGraw-Hill, Second Edition, 1954. M. Bosi and R. Goldberg, Introduction to Digital Audio Coding and Standards, Kluwer Academic Publishers, 2003. L. Husnik, “Methods of Subjective Evaluation of Sound Codecs”, in CTU Reports, Proceedings of Workshop 2003, Vol. 7. pp. 482-483.

46