Defeating Fake-Quality MP3

Defeating Fake-Quality MP3 Rui Yang School of Information Science and Technology Sun Yat-sen University Guangzhou 510275, China [email protected]....
Author: Noah Short
5 downloads 1 Views 918KB Size
Defeating Fake-Quality MP3 Rui Yang

School of Information Science and Technology Sun Yat-sen University Guangzhou 510275, China

[email protected]

Yun Q. Shi

Department of Electrical and Computer Engineering New Jersey Institute of Technology Newark 07102, USA

[email protected]

Jiwu Huang

School of Information Science and Technology Sun Yat-sen University Guangzhou 510275, China

[email protected]

ABSTRACT MP3 is the most popular audio format nowadays in our daily life, e.g., music on the Internet are often of this format. However, low bit rate MP3s are sometimes transcoded to high bit rate, resulting in fake-quality MP3, to seek commercial benefit. This paper presents a method to defeat this kind of fake-quality MP3. Our experimental study and theoretical analysis have demonstrated that there are fewer MDCT coefficients of small values in fake-quality MP3 than in normal MP3. We use the numbers of small-value MDCT coefficients as features to discriminate fake-quality MP3 from normal MP3. Experimental results show the effectiveness of the proposed method. To our best knowledge, this work is the first one to find out fake-quality MP3.

Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous

General Terms Security, Algorithms, Verification

Keywords Digital Audio Forensics, MDCT coefficient, Fake-Quality MP3

1.

INTRODUCTION

Currently, online music store has become popular, and people can easily buy their favourite music via the Internet [1]. Often these music are in MP3 format of high bit rates. However, the price of the music with different bit rate varies. Typically high bit rate means higher quality because more bits are used to record the music. Some websites like Napster [2] even provide MP3 with low quality for free to attract customers. These free versions are often in low bit rates, such as 32 kbps or 64 kbps. Fig. 1 is a screenshot of online MP3 selling from amazon.com, in which the hot song "If I were a boy" by Beyonce is on sale. As seen, the format of the song is MP3 of 256 kbps.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM&Sec’09, September 7–8, 2009, Princeton, New Jersey, USA. Copyright 2009 ACM 978-1-60558-492-8/09/09 ...$10.00.

Figure 1: The song "If I were a boy" sold on amazon.com

Unfortunately, according to IFPI digit music report 2009, more than 40 billion music files were illegally file-shared in 2008, giving a piracy rate of around 95% [3]. It is urgent to develop some technologies to find out those piracy music files. Often a part of those piracy files is "fake-quality" MP3 files. Fake-quality MP3 here means that an MP3 file with a low bit rate is transcoded to high bit rate, specifically, it is decompressed and then recompressed with a high bit rate, thus pretending to be in high quality. Why are fake-quality files flooding the Internet? It is because that people may evaluate the quality of MP3 files only via the provided bit rate before listening, and high bit rate version is more attractive than low bit rate one. It is easy to create fake-quality MP3 from the free version which is of low quality with some pervasive audio editing software, such as GoldWave and CoolEdit. These fake-quality MP3 files severely affect the legal MP3 selling and sharing. Hence it is necessary to develop some technologies to find out these fakequality MP3s. Also these technologies can be used as file filter to reduce the number of fake-quality MP3 files on the Internet. To our best knowledge, there is no work about fake music detection that has been reported in the literature up to this point. The following two works came seemingly close though. The first one is a free program to detect "fake CD", named Tau Analyzer [4], which utilizes high frequency analysis to distinguish original studio-based CDs from faked CDs that are reconstructed from lossy compressed audio source. Hence, it only works on detecting whether or not the audio signal has been compressed, but not on detecting the transcoding. The second is a freeware called "fake MP3 detector" [5], and the "fake MP3" that the freeware can detect is either some corrupted MP3 or other types of non-MP3 files but with an added file extension ".mp3". It is different from detecting fake-quality MP3 addressed in this paper. Since the creation of fake-quality MP3 files must involve double MP3 compression, we find out fake-quality MP3 files by detecting double MP3 compression. Although detection of double com-

Scalefactor band division for MP3, 44.1 kHz, long window 0.2

value

0.1

0

−0.1 0 16 3652 7490110 134 162 196

238 288 342 MDCT coefficients

418

575

Figure 3: Scale-factor band division for MP3, 44.1 kHz, long window

Figure 2: Block diagram of MP3 (a)encoder, (b)decoder pression of image and video has attracted many researchers [6]-[9], there is still no literature on detecting double compression of audio signal up to date. As the procedures of MP3 compression are quite different from image or video compression, which will be described in Section 2, it is not straightforward to apply the existing methods designed for image and video to MP3 files. In this paper, theoretical analysis with mathematical proof has revealed an important property. Experimental study on large data possessed by quantized MDCT coefficients has verified this. This property is utilized to generate a reliable feature to discriminate fake-quality MP3 files from normal ones. Using this feature, we propose a method to detect fake-quality MP3 with high detection accuracy. The rest of this paper is organized as follows. In Section 2, we review the procedures of MP3 compression that are relevant to our work. We investigate the property of quantized MDCT coefficients of fake-quality MP3 in Section 3. In Section 4, we perform theoretical analysis on fake-quality MP3 and then propose a method to detect fake-quality MP3. Experimental works on detecting fakequality MP3 are presented in Section 5. Finally, the conclusions and discussions are shown in Section 6.

each frame. After inverse quantization, the coefficients are transformed back to the subband domain by applying an inverse-MDCT. Finally, the waveform in the PCM format is reconstructed by the synthesis filterbank.

2.2 Quantization and Scaling As we discussed in the previous subsection, there are 576 MDCT coefficients in one frame. Note that, in case of the short window, three consecutive groups of 192 coefficients are accumulated to form one frame. We represent these 576 MDCT coefficients in the vector form consisting of elements xrm : xr = [xr0 · · · xrm · · · xr575 ]T

(1)

where each element assumes real values ranging from -1.0 to 1.0. When performing quantization, the components in the vector xr, are grouped into 22 scale-factor bands [10], as demonstrated in Fig. 3, where there are 22 intervals in the horizontal axis, and each interval corresponds to one scale-factor band, xrT{s} . Hence the following formula holds. xr = [xrT{0} · · · xrT{s} · · · xrT{21} ]T

(2)

The division of each scale-factor band varies, depending on the sampling frequency. Each band has its own quantization factor. If xrm belongs to band s, the quantization can be shown as follow. 3

2. 2.1

RELEVANT PROCEDURES OF MP3 COMPRESSION MP3 Coding

Fig.2(a) shows the block diagram of a typical MP3 encoder [10]. The input PCM signal is first separated into 32 subbands by the analysis filterbank, and the Modified Discrete Cosine Transform (MDCT) window further divides each of these 32 subbands into 18 (long windows) or 6 (short window) subbands. Hence, a total of 576 or 192 spectral lines are generated after MDCT. The psychoacoustic model analyzes the audio content and estimates the masking thresholds. The output of this model consists of the just noticeable noise-level for each subband and the information about the window type for MDCT. According to the masking thresholds estimated by the psychoacoustic model, the spectral values are quantized via a power-law quantizer. The quantization process introduces an iterative algorithm to control both the bit rate and distortion level, so that the perceived distortion is as small as possible within the constraint of the desired bit rate. Finally, the quantized spectral values are Huffman encoded to form a bitstream. The block diagram of a typical MP3 decoder is demonstrated as Fig.2(b). An MP3 stream consists of a series of frames, and each frame has its header information and compressed audio data (576 quantized MDCT coefficients in Huffman coding). First Huffman decoding is performed on the MP3 bitstream, then the decoder restores the quantized MDCT coefficient values and the side information related to them, such as the window type that is assigned to

ixm = Q(xrm , Z{s} ) = sign(xrm ) · [(2q · |xrm |) 4 ]

(3)

where ixm is the quantized MDCT coefficient of integer value and its sign is the same as xrm , Z{s} is the value of scale-factor corresponding to band s, q is a positive integer value which is the quantization factor corresponding to Z{s} , and [·] means rounding to the nearest integer.

3.

CHARACTERISTICS OF FAKE-QUALITY MP3

3.1 Observations on Quantized MDCT Coefficients The difference on quantized MDCT coefficient values between fake-quality MP3 and normal MP3 is not clear if a visual examination is conducted on the full range of coefficient values. This is shown in Fig. 4, where the horizontal axis is frequency index, i.e., sequencing of 576 MDCT coefficients, the vertical axis is the values of the MDCT coefficient. Except Fig. 4(a), which is for a normal MP3 signal of 128 kbps, Fig. 4(b) to Fig. 4(f) are five different fake-quality MP3. For instance, Fig. 4(b) is for a fakequality MP3 of 128 kbps, which is made from an original MP3 signal of 32 kbps. A visual examination on these six figures is hard to tell difference between Fig. 4(a) and other five figures because the absolute values of MDCT coefficients vary from 0 to 8192, i.e., the dynamic range is too large. However, they are discriminatory if looking at the distribution of the small integer values, such as from -3 to 3, as shown in Fig.

15

200 400 (b) 32kbps−128kbps

0

200 400 (c) 40kbps−128kbps

200 400 (d) 48kbps−128kbps

−500

0

200 400 (e) 56kbps−128kbps

value of ix

0 −500

0

200 400 (f) 64kbps−128kbps

value of ix

0 −2000

0

200

400

0

600

200 400 (j) 48kbps−128kbps

600

0

200 400 (k) 56kbps−128kbps

600

0

200 400 (l) 64kbps−128kbps

600

3

10 5

0.5

0 −5−4−3−2−1 0 1 2 3 4 5 value of ix 4 x 10 (d) 48kbps−128kbps 15

0 −5−4−3−2−1 0 1 2 3 4 5 value of ix 4 x 10 (e) 56kbps−128kbps 15

0 −5−4−3−2−1 0 1 2 3 4 5 value of ix 4 x 10 (f) 64kbps−128kbps 15

10

10

10

5

5

5

0 −5−4−3−2−1 0 1 2 3 4 5 value of ix

0 −5−4−3−2−1 0 1 2 3 4 5 value of ix

0 −5−4−3−2−1 0 1 2 3 4 5 value of ix

0 −3 3 0 −3

Figure 5: Histogram of MDCT coefficients of normal and fakequality 128 kbps MP3, each figure is generated from the 687 MP3 clips.

3 0 −3 0

200

400

600

Figure 4: Comparison of quantized MDCT coefficients between normal 128 kbps MP3 and five types of fake-quality 128 kbps MP3. Figures (a) to (f) at the left side are the quantized MDCT coefficients viewed in full range of coefficient values, while Figures (g) to (l) at the right side viewed only in range of values from -3 to 3. The notation "normal 128kbps" represents normal 128 kbps MP3, while "32kbps-128kbps" stands for fake-quality 128 kbps MP3 transcoded from 32 kbps MP3. 4(g) to (l). It is found that there are many more quantized MDCT coefficients with values ±1, ±2, and ±3 in the normal MP3 than that in the fake-quality MP3, no matter which bit rate the fakequality MP3 is transcoded from. The question is now: whether is this an inherent property or only an occasional phenomenon? To find out the answer, we take statistics from 687 MP3 clips of 5 sec each for normal 128 kbps MP3 and corresponding fake-quality MP3 (the generation of these MP3 files is discussed in Section 5), and observe the distributions of MDCT coefficient values ±1. As shown in Fig. 5, it is a fact that compared with normal MP3 all of the fake-quality MP3 contain obviously fewer quantized MDCT coefficients of values ±1. In the next subsection, we investigate fake-quality MP3 with different bit rates.

3.2

4 x 10 (c) 40kbps−128kbps

−3

600

2000

600

3

600

500

200 400 (i) 40kbps−128kbps

−3

0 value of ix

0

600

0

600

500

200 400 (h) 32kbps−128kbps

3

0 value of ix

0 0

15

1 5

0

600

1000

−1000

x 10 (b) 32kbps−128kbps

1.5

10

−3

600 value of ix

−2000

5

2

0

Number of MDCT Coefficients with Small Values

From the previous subsection, we observe that the number of small-value MDCT coefficients existing in normal MP3 and that existing in fake-quality MP3 are very different. Is it possible to discriminate normal MP3 from fake-quality MP3 based on the number of small-value MDCT coefficients? We use Matlab function ’boxplot’ to plot the distribution of numbers of MDCT coefficients having values ±1, which are shown in Fig. 6. In Fig. 6(a) the horizontal axis represents sequential number of 200 audio clips, and the vertical axis the frame-average number of coefficients of values ±1. It is obvious that the number of MDCT coefficients having values ±1 in normal MP3 is larger than that in fake-quality MP3. In Fig. 6(b), the horizontal axis denotes normal and five fake-quality audio cases, and the vertical axis represents the frame-average number of values of ±1. The frame-average numbers of MDCT coefficients having values ±1 for all of 200 audio clips for each of six cases are

listed in an non-decreasing order from the smallest to the largest. The median (50th percentile) of numbers of ±1 is marked by a horizontal (red) line. The 95th, 75th, 25th and 5th percentiles of numbers of ±1 are also marked in Fig. 6(b). As observed, the numbers of values ±1 for normal MP3 are all around 100, while those for fake-quality ones are all less than 60. For comparison, Table 1 displays the mean values and standard deviations of the numbers of values ±1, as well as the 5th percentile and 95th percentile. It is found that the 5th percentile of the number of values ±1 for normal 128 kbps MP3 is 80.05, which is much larger than the largest 95th percentile 52.35 among five cases of fake-quality 128 kbps MP3. The standard deviations are all small compared with the mean values, which indicates that the number of values ±1 can be a reliable feature for discriminating fake-quality MP3 from normal one. (a) comparison between normal MP3 and fake MP3 120 frame−average number of values ±1

0

3

normal 128kbps 32kbps−128kbps 40kbps−128kbps 48kbps−128kbps 56kbps−128kbps 64kbps−128kbps

100 80 60 40 20 0

frame−average number of values ±1

value of ix

0

0

value of ix value of ix

500

−500

4 x 10 (a) normal 128kbps

(g) normal 128kbps

2000

value of ix

value of ix

value of ix

value of ix

(a)normal 128kbps

0

100

50

100 audio clips (b) Distributions

150

200

95th percentile 75th percentile median

80

25th percentile

60

5th percentile

40 20

normal 128 32−128

40−128

48−128

56−128

64−128

Figure 6: Distributions of numbers of coefficients with values ±1 for MP3 of 128 kbps. In (a) horizontal axis represents sequential number of 200 audio clips, and vertical axis the number of coefficients of values ±1. In (b) horizontal axis denotes normal and five fake-quality audio cases, and vertical axis represents the number of coefficients with values of ±1.

Table 1: Distributions of numbers of small values for MP3 of 128 kbps Bit Rate (kbps) 128 32-128 40-128 48-128 56-128 64-128 Mean Value 98.45 15.24 18.50 27.83 44.49 24.94 Standard Deviation 9.77 6.55 2.56 2.58 4.94 6.95 5th percentile 80.05 9.72 14.72 24.56 36.20 13.26 95th percentile 111.42 32.10 22.77 32.82 52.35 36.02 Table 2: Distributions of numbers of small values for MP3 of 96 kbps and 64 kbps Bit Rate (kbps) 96 32-96 40-96 48-96 56-96 64-96 64 32-64 40-64 Mean Values 126.08 13.20 4.34 4.59 35.38 42.43 102.80 4.04 6.01 Standard Deviation 14.94 9.66 3.16 3.59 9.55 7.78 14.08 3.25 3.82 5th percentile 97.68 2.12 0.91 1.01 14.26 29.80 80.06 0.65 1.71 95th percentile 147.67 36.73 9.86 10.44 47.13 55.56 111.43 9.79 12.01 (b) Distributions

200 normal 96kbps 32kbps−96kbps 40kbps−96kbps 48kbps−96kbps 56kbps−96kbps 64kbps−96kbps

150

100

50

0

0

50

100 audio clips

150

frame−average number of values ±1

frame−average number of values ±1

(a) comparison of normal MP3 and fake MP3 160 140 120 100 80 60 40 20 0

200

normal 96 32−96

normal 64kbps 32kbps−64kbps 40kbps−64kbps 48kbps−64kbps

80 60 40 20 0

0

50

100 audio clips

150

200

frame−average number of values ±1

frame−average number of values ±1

140

100

40−96

48−96

56−96

64−96

(d) Distributions

(c) comparison between normal MP3 and fake MP3

120

140 120 100 80 60 40

Figure 8: The inner iteration loop of the MP3 encoder (simplified) from LAME v3.97

20 0 normal 64

32−64

40−64

48−64

Figure 7: Distributions of numbers of coefficients with values ±1, (a) and (b) show the distributions for 96 kbps MP3, (c) and (d) illustrate the distributions for 64 kbps MP3. The similar experimental studies for 96 kbps and 64 kbps MP3 audio are reported in Fig. 7 and Table 2. As observed, similar to the case of 128 kbps, there is an obvious threshold between fakequality MP3 and normal MP3.

4.

THEORETICAL ANALYSIS AND PROPOSED METHOD

In this section, we theoretically prove that the number of MDCT coefficients of small values from fake-quality MP3 is rather smaller than that from normal MP3. Based on this, a method using the number of small-value MDCT coefficients as feature is proposed to detect fake-quality MP3.

4.1

48-64 24.01 6.06 12.80 32.49

Relationship between Bit Rate and Quantization Factor

The heart of an MP3 encoder is a system of two nested iteration loops for quantization. The inner iteration loop finds the optimal quantization factor q (shown as cod_info_w.global_gain in Fig. 8). The larger number of target bits T B (shown as targ_bits in Fig. 8), the more iterations will be performed, which will lead to the larger quantization factor. Now we consider the case of transcoding an MP3 signal from bit rate BR1 to bit rate BR2 , where BR1 < BR2 . The corresponding numbers of target bits for one frame are denoted as T B1 and T B2 , respectively. As one frame is divided into 22 scale-factor bands in quantization (see Fig. 3), and each scale-factor has its own quan-

tization factor, we represent the corresponding quantization factors as q1{s} and q2{s} (0 ≤ s ≤ 21). Since the bit rate means the total bits per second, we have BR1 < BR2 ⇒ T B1 < T B2 , under constant bit rate (CBR) mode assumption.When transcoding from BR1 to BR2 , for each scale-factor band in one frame it is naturally expected to have q1{s} < q2{s} . However, if for all s, q1{s} < q2{s} , then the number of target bits T B2 may sometimes not be achieved exactly. Hence for some s (0 ≤ s ≤ 21), the encoder allows q1{s} = q2{s} . Thus, when BR1 < BR2 , we have q1{s} ≤ q2{s} for all s. On the other hand, if q1{s} = q2{s} for all s, then there is no transcoding. Hence we have 0 ≤ P {q1{s} = q2{s} } < 1 in any frame of the transcoded MP3.

4.2 Probability of MDCT Coefficients of Values ±1 The cause for fake-quality MP3 containing fewer small values is that fake-quality MP3 has been compressed twice and the second bit rate BR2 is larger than the first bit rate BR1 . As the quantized MDCT coefficient for normal MP3 is single-compressed while for fake-quality MP3 is double-compressed, we denote them as ixS and ixD , respectively. Now we compare ixS with ixD , where the quantization factors of double-compression are q1 and q2 , respectively. For the sake of brevity we only analyze the case that the MDCT coefficient xr is non-negative in the following because the analysis is similar when xr is negative. The double-compression consists of three steps. First, the MDCT 3 coefficient is quantized as Qq1 (xr) = [(2q1 · xr) 4 ] with first quantization factor q1 ; second, the coefficient value becomes 3 4 q1 · xr) 4 ] 3 · 2−q1 when de-quantizated with Q−1 q1 (Qq1 (xr)) = [(2 3 3 q1 ; third, the coefficient value becomes [[(2q1 · xr) 4 ] · 2 4 (q2 −q1 ) ] during the second quantization with factor q2 . In summary, the

MDCT coefficient for the fake-quality MP3 is double compressed, and can be expressed as follow. 3

4

3

q2 ixD = Qq2 (Q−1 · [(2q1 · xr) 4 ] 3 · 2−q1 ) 4 ] q1 (Qq1 (xr))) = [(2 3 3 (q −q ) q1 2 1 = [[(2 · xr) 4 ] · 2 4 ] (4) For meaningful comparison, we use the identical quantization step q2 for the single-compressed coefficient, ixS . That is, 3

ixS = Qq2 (xr) = [(2q2 · xr) 4 ]

(5)

We denote PD = P {ixD = 1}, PS = P {ixS = 1} . Then we can prove that PD < PS . 3

PS = P {ixS = 1} = P {[(2q2 · xr) 4 ] = 1} 3 = P {0.5 ≤ (2q2 · xr) 4 < 1.5}

(6)

Since BR1 < BR2 holds for fake-quality MP3, from the conclusion of the previous subsection we know that q1 and q2 are positive integers, q1 ≤ q2 for all scale-factor bands, i.e., actually for each MDCT coefficient, P {q1 = q2 } < 1 for any frame. Hence, it 3 is obvious that 2 4 (q2 −q1 ) ≥ 1. Further, we can show that PD = P {ixD = 1} 3 3 = P {[[(2q1 · xr) 4 ] · 2 4 (q2 −q1 ) ] = 1} 3 3 = P {0.5 ≤ [(2q1 · xr) 4 ] · 2 4 (q2 −q1 ) < 1.5} 3 3 = P {[(2q1 · xr) 4 ] = 1 ∩ 1 ≤ 2 4 (q2 −q1 ) < 1.5} 3 = P {0.5 ≤ (2q1 · xr) 4 < 1.5 ∩ 0 ≤ q2 − q1 < 0.80} 3 = P {0.5 ≤ (2q1 · xr) 4 < 1.5 ∩ q2 = q1 } 3 q2 = P {0.5 ≤ (2 · xr) 4 < 1.5 ∩ q2 = q1 } = PS · P {q2 = q1 } < PS

(7)

In the above, we need to note that the event that 0 ≤ q2 −q1 < 0.80 is equivalent to the event of q2 = q1 , because q1 and q2 are both 3 non-negative integers. Since 0.5 ≤ (2q2 · xr) 4 < 1.5 describes an event of single-compression case, while q2 = q1 describes an event of double-compression case, these two events are independent. Therefore, we have proved that PD < PS for MDCT coefficient of value +1. In the similar way, we can prove that P {ixD = −1} = P {ixS = −1} · P {q2 = q1 } for value −1.

4.3

Proposed Method to Detect Fake-Quality MP3

Now we are in a position to present our proposed method to detect fake-quality MP3. Given an input MP3 under scrutiny, which is in bitstream and contains totally M frames, denote the frames as Fi (1 ≤ i ≤ M ). The process of detecting fake-quality MP3 is as follow. 1. For each frame Fi , perform Huffman decoding, and get 576 quantized MDCT coefficients, which are denoted in vector form xri ; 2. Compute the number of coefficients of values ±1 in xri (1 ≤ i ≤ M ) and denote it by mi (1 ≤ i ≤ M ); 3. Compute the values of as feature;

1 M

M P

mi , denote it by v, and use it

i=1

4. Compare v with a threshold T h, if v < T h, decide the MP3 as fake-quality one, otherwise as the normal one.

5. The threshold T h should depend only on the bit rate of the given MP3. That is, if the MP3 file is fake-quality, the threshold selected should be workable no matter what bit rates the fake-quality MP3 file is transcoded from. The detail is discussed in Section 5.

5. EXPERIMENTAL WORK Two pieces of experimental work have been conducted. In the first, we created 687 short MP3 clips of 5 sec each with bit rates of 32, 40, 48, 56, 64, 96, and 128 kbps, respectively. Then, the short MP3 clips with bit rates of 32, 40, 48, 56, and 64 kbps were each decompressed and compressed again to 128, 96, and 64 kbps, respectively. These double compressed MP3 clips serve as fakequality MP3 clips, while the above-mentioned single compressed MP3 clips of bit rate 128, 96, 64 kbps serve as the normal MP3 clips in this piece of experimental work. We experimentally examine and report if our proposed method can correctly detect the fake-quality MP3 from the normal MP3 in Section 5.1. In addition, the experiment to check if the proposed method can work reliably when the MP3 encoders used in the first and second compressions are different is contained in Section 5.2. Similarly, if the proposed method works as different stereo modes are used in the two compressions is reported in Section 5.3. In order to see if our proposed method can work for MP3 files of normal length, we conducted the second piece of experimental work. From 400 low bit rate MP3 files with their lengths varying from 29 seconds to 625 seconds downloaded from a popular web site, we use publicly available software to create various fakequality MP3 files. From CD recording, we generated 400 normal MP3 files of lengths varying from 150 sec to 663 sec. Then we applied our method to see if our proposed method can correctly detect fake-quality and normal MP3 files. This piece of experiments is reported in Section 5.4. According to the statistics discussed in Section 3, we determine the thresholds for number of MDCT coefficients of values ±1 to discriminate normal MP3 from fake MP3 for different bit rates. We denote 5th percentile of normal MP3 as T hh and the 95th percentile of fake-quality MP3 as T hl . The threshold T h is set as hl ) T h = T hh +max(T , where max(T hl ) represents the maximum 2 95th percentile among all fake-quality MP3 files of the same second bit rate. The false positive error means that the normal MP3 files are determined as fake-quality ones, while the false negative error represents the fake-quality MP3 files are recognized as normal ones. We denote the number of false positive errors and false negative errors as fp and fn , respectively. The accurate rate AR is 2K−(fp +fn ) × 100%, where K is the number calculated as AR = 2K of test MP3 files.

5.1 Short MP3 Clips The 687 WAV clips of 5 sec each used to create MP3 clips are downloaded from two publicly available uncompressed audio databases [12, 13]. All these WAV clips are of 22.05 kHz (sampling frequency), 16 bit, mono (stereo mode). All WAV clips are first encoded by LAME v3.97 with five low bit rates, namely 32, 40, 48, 56 and 64 kbps, respectively, which are widely used in low quality MP3. CBR mode is used and the stereo mode is mono. Since 128 kbps, 96 kbps and 64 kbps are widely used on the Internet for mono mode, we transcode MP3 clips with low bit rates to these three bit rates, respectively. = 66.20, We set the threshold for 128 kbps as T h = 80.06+52.35 2 where 80.06 is the 5th percentile of normal 128 kbps MP3 and 52.35 as the largest 95th percentile of fake-quality MP3 (refer to

Table 3: Detection results of fake-quality MP3s of 128 kbps Bit Rate T hl T h fn fp AR (%) 32-128 32.09 66.20 0 17 98.76 40-128 22.76 66.20 0 17 98.76 48-128 32.81 66.20 0 17 98.76 56-128 52.35 66.20 0 17 98.76 64-128 36.02 66.20 0 17 98.76 ROC curves of detecting fake−quality MP3

True positive rate

1 32k−128k 40k−128k 48k−128k 56k−128k 64k−128k

0.995

0.99

0.985

0.98

0

0.02

0.04 0.06 False positive rate

0.08

0.1

Figure 9: ROC curves of detecting MP3s of 128 kbps Table 1). Note that the threshold for 128 kbps MP3 files always set as this value no matter what bit rates the files are transcoded from. The test results of the case of 128 kbps are shown in Table 3. As seen, all the fake-quality MP3 files are under the threshold. Hence, all of fn are equal to 0. However, there are 17 normal MP3 files decided as fake-quality MP3 files. We get all detection accuracy above 98% for tests, no matter which bit rate the fake-quality MP3 is generated from. In order to show the detection performance in a more general way, we also calculate the receiver operating characteristics (ROC) curves of detecting fake-quality MP3 in Fig 9. The ROC curves are calculated via changing the threshold used for decision. Note that Fig.9 only displays the ROC curves as false positive rate up to 0.1 because the ROC curves remain horizontal for false positive rate from 0.1 to 1. For the cases of 96 kbps and 64 kbps, as shown in Table 4 we set the thresholds in the same way as T h = 72.95 for 96 kbps and T h = 54.09 for 64 kbps. All the fake-quality MP3 files can be detected via using these thresholds, although few normal MP3 files are misclassified. Table 4: Detection results of MP3s of 96 kbps and 64 kbps Bit Rate T hl T h fn fp AR (%) 32-96 36.73 72.95 0 25 98.18 40-96 9.85 72.95 0 25 98.18 48-96 10.44 72.95 0 25 98.18 56-96 47.12 72.95 0 25 98.18 64-96 55.56 72.95 0 25 98.18 32-64 9.79 54.09 0 1 99.93 40-64 12.01 54.09 0 1 99.93 48-64 32.49 54.09 0 1 99.93

5.2

from the one used in the first compression. According to [14], different MP3 encoders have different statistical characterizations, which leads to the final bit streams having some differences. Hence, the influence to our method introduced by different MP3 encoders should be examined. All the 687 WAV clips of 5 sec each are encoded by LAME v3.97 with five low bit rates, namely 32, 40, 48, 56 and 64 kbps. The sampling frequency is 44.1 kHz, and stereo mode is mono. The MP3 files generated by LAME v3.97 are converted to 128 kbps by CoolEdit v2.1 with the specification "128 kbps, 44100Hz, Mono(5.5:1)", which applies an MP3 encoder different from LAME. For consistence, the threshold is set also as 66.20. Table 5 contains the results. Comparing with Table 3, it is found that different MP3 encoders do not influence the performance of our method for detecting fake-quality MP3 as the first bit rate is not higher than 56 kbps, but do affect the detection a little bit when the first bit rate is at 64 kbps. This is because different MP3 encoders have caused some differences on determination of quantization factor.

Different MP3 Encoders

Often, the MP3 encoder of the second compression is different

Table 5: Detection results of fake-quality MP3s generated from different encoder Bit Rate T hl T h fn fp AR (%) 32-128 24.61 66.20 0 17 98.76 40-128 27.03 66.20 0 17 98.76 48-128 31.41 66.20 0 17 98.76 56-128 48.64 66.20 0 17 98.76 64-128 63.69 66.20 18 17 97.45

5.3 Different Stereo Modes MP3 files support different stereo modes, and typically music quality of joint stereo mode is better than that of mono mode with the same bit rate. Hence, changing the low quality from mono mode to joint stereo mode is widely used making in fake-quality MP3. We transcode all the 687 MP3 clips of mono mode from five low bit rates to 128 kbps of joint stereo mode, while the sampling frequency is kept the same as 44.1 kHz. The same threshold 66.20 was applied. The detection results are shown in Table 6. As observed, the fp increases in bit rate of 64-128 kbps. This can be attributed to two different stereo modes have been used in transcoding. However, all accurate rates are above 97% even under different stereo modes. Table 6: Detection results of MP3s with stereo mode changed Bit Rate T hl T h fn fp AR (%) 32-128 9.64 66.20 0 17 98.76 40-128 24.09 66.20 0 17 98.76 48-128 29.15 66.20 0 17 98.76 56-128 53.23 66.20 0 17 98.76 64-128 64.35 66.20 20 17 97.31

Table 7: Detection results of fake-quality stereo MP3s generated from 64 kbps Bit Rate T hl T hh T h fn fp AR (%) 64-128 50.56 64.60 57.58 0 4 99.50 64-192 32.14 51.94 42.04 9 1 98.75 64-256 34.42 38.40 36.41 10 9 97.62

5.4

Stereo MP3 Files From Internet

From the above experimental results, it is shown that the proposed method performed well on detecting fake-quality MP3 of short length (5 sec each). However, in practice, MP3 files with commercial value are often longer than 150 sec. Hence, it is necessary to test if our proposed method can work for long MP3 clips, referred to as files. For this purpose, 400 low bit rate MP3 files with their lengths ranging from 29 sec to 625 sec are download from www.archive.org [15], which provides large amount of legal and free MP3. These MP3 files are of 44.1 kHz, stereo, CBR mode, including genres like blues, pop, classical, jazz, and so on. We use GoldWave v5.06 to transcode all of these 400 low bit rate (specifically 64 kbps) MP3 files to the following three most wildly used bit rates by online music stores, i.e., 256, 192, and 128 kbps. The specified format is "layer-3, 44.1 kHz, stereo". In addition to these fake-quality MP3 files, we also generate 400 real high-quality MP3 files with these three bit rates from CD recordings, including albums of Celine Dion, piano music of Beethoven. These high-quality MP3 files are used as negative class in detecting fake-quality MP3. The experimental results are shown in Table 7. As shown, the performance on stereo MP3 files of normal length obtained from the Internet are similar with that on MP3 clips of 5 sec each, and all the accurate rate are above 97%. This is due to using frame-average number of values ±1 as feature. It appears that the length of MP3 file does not affect the performance much, because an MP3 file of 5 sec contains 386 frames which are long enough to make the average number of values ±1 statistically stable.

6.

DISCUSSIONS AND CONCLUSIONS

6.1

Selection of Small Values Other than ±1 From Fig. 5, it is observed that the histograms between normal MP3 and fake-quality MP3 are different not only for values ±1 but also for other small values, such as ±2 and ±3. Then, why do we only select values ±1 as features in the above discussion and experiments? The reason is that the difference between P {ixD = N } and P {ixS = N } becomes small while N increases (refer to Appendix). For the case of N = 2, it is proved in Appendix that P {ixD = 2} > P {ixS = 2} · (P {q2 − q1 = 1} + P {q2 = q1 }). Fig. 10 illustrates the distributions of numbers of values ±2 for 200 MP3 clips of 128 kbps. We observe that the numbers of values ±2 for normal MP3 are closer to that for fake-quality MP3 as compared with that of values ±1 shown in Fig. 6, although normal MP3 still contains more numbers of values ±2. This matches our theoretical D =2} D =1} = P {q2 = q1 } < PP {ix < 1. analysis, i.e., PP {ix {ixS =1} {ixS =2} Distributions

comparison between normal MP3 and fake MP3

120 normal 128kbps 32kbps−128kbps 40kbps−128kbps 48kbps−128kbps 56kbps−128kbps 64kbps−128kbps

100 80 60 40 20 0

0

50

100 audio clips

150

200

frame−average number of values ±2

average number of value ±2

120

100 80 60 40 20 0

normal 128 32−128

40−128

48−128

56−128

64−128

Figure 10: Distributions of numbers of coefficients with values ±2 for MP3 of 128 kbps As observed, it is difficult to detect fake-quality MP3 with 100% accurate rate only using number of values ±1 as feature. It is believed that numbers of values ±2, even numbers of values ±3,

can be added as features when applying machine learning to detect fake-quality MP3 with better performance.

6.2 Threshold Determination In above experiments, the threshold T h is always set as T h = Using this threshold, almost all fake-quality MP3s can be detected with little error. However, some normal MP3s (less than 3%) are determined as fake-quality MP3s because these normal MP3 contain smaller number of MDCT coefficients of values ±1 than the threshold. As observed from Table 3 to 7, almost all the T hl are far away from T h. This allows us to set the threshhl ) old lower than T hh +max(T for the situation where very low fp 2 is required. For instance, if we are requested to have fp = 0, for some legal consideration we can lower T h to make this happen, i.e., fp = 0 and fn 6= 0. T hh +max(T hl ) . 2

6.3

Computation Complexity Analysis

The proposed method is based on the statistics on the quantized MDCT coefficients, and it only needs to perform Huffman decoding on the given bitstream of MP3 file which is just the first step of MP3 decompression (refer to Fig. 2(b)). Compared with the inverse MDCT and synthesis filterbank, the Huffman decoding only takes small amount of time of the whole decompression process. So given an MP3, we can finish the detection before it is totally decompressed.

6.4 Limitations and Future Works The first obstacle is that we have only investigated the case in which the second compression is sample-synchronized with the first compression. Our theoretical analysis is also based on samplesynchronization. However, sophisticated adversary may purposely destroy the sample-synchronized situation by doing, say, the following: first, decode the low bit rate MP3 to WAV format; second, append some zero samples at the beginning of the WAV; finally, encode the WAV with a high bit rate. Also up-sampling would destroy sample-synchronized possibly. Therefore, the issue of sample-synchronization is the subject to further research. Only CBR is analyzed in this paper, variable bit rate (VBR) has not been considered. VBR means the target bits for each frame are not the same any more. Some frames of the MP3 with high VBR would not have more target bits than that of low VBR version, and this makes the condition q1 ≤ q2 fail for VBR sometimes. Since VBR is used in online music store sometimes, special method for VBR needs to be developed.

6.5 Conclusions In this paper, we investigate the number of the small values of quantized MDCT coefficients in MP3. It is discovered that the distribution of the number of these coefficients with small values for normal MP3 is obviously different from that for fake-quality MP3. Theoretical proof is given, indicating inherent nature of the above mentioned discovery. Then, we use this characteristic to detect fake-quality MP3. Our experiments have shown that the accurate rates in detecting fake-quality MP3 are all over 97% for all of commonly used bit rates. Finally, the limitation and the future research are pointed out and discussed.

7. REFERENCES [1] Understanding online music stores, http://reviews.cnet.com/online-music-store-guide/ [2] Napster Free - Listen to free streaming music online, http://free.napster.com/

[3] IFPI DIGITAL MUSIC REPORT 2009, (II) The case for value N : http://www.ifpi.org/content/library/DMR2009.pdf 3 3 P {ixD = N } = P {[[(2q1 · xr) 4 ] · 2 4 (q2 −q1 ) ] = N } [4] Tau Analyzer - CD Authenticity Detector, http://true3 3 = P {([(2q1 · xr) 4 ] = N ∩ N −0.5 ≤ 2 4 (q2 −q1 ) < N +0.5 )∪ audio.com/Tau_Analyzer_-_CD_Authenticity_Detector N N 3 3 (q −q ) q1 N −0.5 ([(2 · xr) 4 ] = N − 1 ∩ N −1 ≤ 2 4 2 1 < NN+0.5 )∪ [5] Fake MP3 Detector, −1 ... http://www.gold-software.com/download9192.html 3 3 [6] J. Fridrich and J. Luk´l´ca˛ ˛es, "Estimation of primary ≤ 2 4 (q2 −q1 ) < N +0.5 )} ∪([(2q1 · xr) 4 ] = 1 ∩ N −0.5 1 1 4 log N +0.5 quantization matrix in double compressed JPEG images," 2 N 3 n P P 3 in Proc. DFRW’03, Cleveland, (2003). = P {[(2q2 −i · xr) 4 ] = n} · P {q2 − q1 = i} n=1 i= 4 log N −0.5 [7] W. Wang and H. Farid, "Exposing digital forgeries in 2 3 n (10) video by detecting double MPEG compression," in Proc. ACM MMSEC’06, New York, (2006). [8] B. Li, Y. Q. Shi, and J. Huang, "Detecting Double Compressed JPEG Images by Using Mode Based First Digit Features," in Proc. MMSP’08, Queensland, Australia, October 2008. [9] W. Chen and Y. Q. Shi, "Detection of double MPEG video compression using first digits statistics," in Proc. IWDW08, Busan, Korea, November 2008. [10] ISO/IEC International Standard IS 11172-3, "Information technology - Coding of moving pictures and associated audio for digital storage media up to about 1.5 Mbit/s", (1992). [11] LAME 3.97: MP3 encoder, http://lame.sourceforge.net [12] http://soundlab.cs.princeton.edu [13] http://www.ee.columbia.edu/ dpwe/sounds/musp/scheislan.html [14] R. Boehm and A. Westfeld: Statistical Characterisation of MP3 Encoders for Steganalysis. In Proc. of ACM MMSEC’04, Magdeburg, Germany, (2004). [15] http://www.archive.org/details/audio_music

APPENDIX A.

PROBABILITY OF SMALL-VALUE MDCT COEFFICIENTS

The notations of ixD , ixS , q1 , q2 , xr have been defined in Section 4. (I) The case for value 2 is as follow: 3

3

P {ixD = 2} = P {[[(2q1 · xr) 4 ] · 2 4 (q2 −q1 ) ] = 2} 3 3 3 3 = P {([(2q1 · xr) 4 ] = 1 ∩ 1.5 ≤ 2 4 (q2 −q1 ) < 2.5) ∪ ([(2q1 · xr) 4 ] = 2 ∩ 1 ≤ 2 4 (q2 −q1 ) < 1.25)} 3 3 = P {([(2q1 · xr) 4 ] = 1 ∩ 0.80 ≤ q2 − q1 < 1.76) ∪ ([(2q1 · xr) 4 ] = 2 ∩ 0 ≤ q2 − q1 < 0.43)} 3 3 = P {[(2q1 · xr) 4 ] = 1 ∩ q2 − q1 = 1} + P {[(2q1 · xr) 4 ] = 2 ∩ q2 − q1 = 0} 3 3 = P {[(2q2 −1 · xr) 4 ] = 1} · P {q2 − q1 = 1} + P {[(2q2 · xr) 4 ] = 2} · P {q2 − q1 = 0} 3 3 3 = P {[(2q2 · xr) 4 · 2− 4 ] = 1} · P {q2 − q1 = 1} + P {[(2q2 · xr) 4 ] = 2} · P {q2 = q1 } 3 3 q q2 = P {0.84 ≤ (2 · xr) 4 < 2.52} · P {q2 − q1 = 1} + P {[(2 2 · xr) 4 ] = 2} · P {q2 = q1 } 3 3 q2 q = P {0.84 ≤ (2 · xr) 4 < 2.52} · P {q2 − q1 = 1} + P {[(2 2 · xr) 4 ] = 2} · P {q2 = q1 } 3 3 < P {0.5 ≤ (2q2 · xr) 4 < 2.5} · P {q2 − q1 = 1} + P {[(2q2 · xr) 4 ] = 2} · P {q2 = q1 } = (P {ixS = 1} + P {ixS = 2}) · P {q2 − q1 = 1} + P {ixS = 2} · P {q2 = q1 } = P {ixS = 2} · (P {q2 = q1 } + P {q2 − q1 = 1}) + P {ixS = 1} · P {q2 − q1 = 1} (8) P {ixD = 2} 3 3 = P {0.84 ≤ (2q2 · xr) 4 < 2.52} · P {q2 − q1 = 1} + P {[(2q2 · xr) 4 ] = 2} · P {q2 = q1 } 3 3 > P {1.5 ≤ (2q2 · xr) 4 < 2.5} · P {q2 − q1 = 1} + P {[(2q2 · xr) 4 ] = 2} · P {q2 = q1 } 3 3 = P {[(2q2 · xr) 4 ] = 2} · P {q2 − q1 = 1} + P {[(2q2 · xr) 4 ] = 2} · P {q2 = q1 } = P {ixS = 2} · (P {q2 − q1 = 1} + P {q2 = q1 }) (9)