Audio Source Separation Techniques Including Novel Time-Frequency Representation Tools

Audio Source Separation Techniques Including Novel Time-Frequency Representation Tools. Pablo Cancela Doctoral Thesis Electrical Engineering Universid...
Author: Katrina Warner
4 downloads 3 Views 9MB Size
Audio Source Separation Techniques Including Novel Time-Frequency Representation Tools. Pablo Cancela Doctoral Thesis Electrical Engineering Universidad de la Rep´ ublica Thesis Director: Guillermo Sapiro Academic Director: Gregory Randall

IIE, November 2015

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

1 / 63

Outline 1

Time-Frequency Analysis Motivation

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

2 / 63

Outline 1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

2 / 63

Outline 1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

2 / 63

Outline 1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

2 / 63

Outline 1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

5

Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

2 / 63

Outline 1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

5

Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction

6

Conclusions and Future Work

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

2 / 63

1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

5

Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction

6

Conclusions and Future Work

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

3 / 63

Time-Frequency representations tools Motivation: represent acutely multiple sources in a polyphonic Mix. Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7

Frequency (kHz)

6 5 4 3 2 1 0

0.5

1

1.5

2 Time (s)

2.5

3

3.5

4

Audio: audio 1 pop1.wav Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

4 / 63

Time-Frequency representation tools Motivation: Extract a singing voice from a poliphonic mix. Original, Left Channel (Recording 1)

Frequency (Hz)

10000

8000

6000

4000

2000

0

1

2

3

4

5

6

7

8

9

Time

Separated leading voice, Left Channel (Recording 1)

Frequency (Hz)

10000

8000

6000

4000

2000

0

1

2

3

4

5

6

7

8

9

Time

Audio: audio 2 tamy.wav Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

5 / 63

Time-Frequency representation tools

Goals • Improve tools for the representation in time and frequency of

time-varying harmonic signals. • Facilitate the task of high-level algorithms

(detection, estimation, classification, recognition, etc.) • Explore applications in polyphonic music.

- Visualization. - Source Separation. - Music Information Retrieval applications.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

6 / 63

Time-Frequency representations

Difficulties • Bounded time-frequency resolution (uncertainty principle). • Challenge in the representation of polyphonic signals.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

7 / 63

Time-Frequency representations

Difficulties • Bounded time-frequency resolution (uncertainty principle). • Challenge in the representation of polyphonic signals.

Goals of alternative representations • Overcome classical time-frequency representations limitations • Enhance representation using a-priori knowledge of signals

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

7 / 63

Music signal analysis Features

Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7

Frequency (kHz)

6 5 4 3 2 1 0

0.5

Pablo Cancela (UdelaR, Uruguay)

1

1.5

2 Time (s)

2.5

Audio Source Separation T-F Rep. Tools

3

3.5

4

Doctoral Thesis

8 / 63

Music signal analysis Features • Harmonic structure

Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7

Frequency (kHz)

6 5 4 3 2 1 0

0.5

Pablo Cancela (UdelaR, Uruguay)

1

1.5

2 Time (s)

2.5

Audio Source Separation T-F Rep. Tools

3

3.5

4

Doctoral Thesis

8 / 63

Music signal analysis Features • Harmonic structure • High density of harmonics in low and mid frequencies

Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7

Frequency (kHz)

6 5 4 3 2 1 0

0.5

Pablo Cancela (UdelaR, Uruguay)

1

1.5

2 Time (s)

2.5

Audio Source Separation T-F Rep. Tools

3

3.5

4

Doctoral Thesis

8 / 63

Music signal analysis Features • Harmonic structure • High density of harmonics in low and mid frequencies • High frequency modulation in high frequencies Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7

Frequency (kHz)

6 5 4 3 2 1 0

0.5

Pablo Cancela (UdelaR, Uruguay)

1

1.5

2 Time (s)

2.5

Audio Source Separation T-F Rep. Tools

3

3.5

4

Doctoral Thesis

8 / 63

Music signal analysis Example: harmonic signals • Can approximate music sounds within short time intervals • Frequency modulation produces poor resolution for higher harmonics STFT Spectrogram 6000

5000

Frequency

4000

3000

2000

1000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

9 / 63

1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

5

Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction

6

Conclusions and Future Work

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

10 / 63

Sinusoidal based transforms Short Time Fourier Transform (STFT) [Gabor, 1946] • constant time-frequency resolution

Pablo Cancela (UdelaR, Uruguay)

Frequency

Frequency

• window length (and shape) determines resolution • longer windows give: I better frequency resolution I and worse time resolution

Time

Time

Wideband

Narrowband

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

11 / 63

Sinusoidal based transforms Wideband spectrogram

Narrowband spectrogram

• poor spectral resolution

• good spectral resolution

• good time resolution

• poor time resolution

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

12 / 63

Sinusoidal based transforms

Multi-resolution analysis • computing several DFTs with Frequency

different window lengths • examples: I Multi-resolution FFT (MRFFT) [Dressler, 2006] I Constant-Q Transform CQT [Brown, 1991] I Wavelets • better suited for music analysis Time

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

13 / 63

Comparison of Multi-resolution vs STFT Analysis of a non-stationary harmonic signal • Improved time-frequency resolution for the multi-resolution approach STFT Spectrogram 6000

5000

Frequency

4000

3000

2000

1000

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

STFT spectrum and spectrogram

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

14 / 63

Comparison of Multi-resolution vs STFT Analysis of a non-stationary harmonic signal • Improved time-frequency resolution for the multi-resolution approach CQT Spectrogram 6000

5000

Frequency

4000

3000

2000

1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Time

CQT spectrum and spectrogram

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

15 / 63

1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

5

Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction

6

Conclusions and Future Work

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

16 / 63

IIR-CQT Contribution: new transform based on a time varying filtering in the frequency domain Zero-pole diagram

Imaginary Part

Magnitude

Response to shifted deltas

Frequency (rad)

Magnitude

Time Window

Normalized Time (rad) Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Real Part Doctoral Thesis

17 / 63

IIR-CQT Contribution: new transform based on a time varying filtering in the frequency domain Zero-pole diagram

Imaginary Part

Magnitude

Response to shifted deltas

Frequency (rad)

Magnitude

Time Window

Normalized Time (rad) Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Real Part Doctoral Thesis

18 / 63

IIR-CQT Contribution: new transform based on a time varying filtering in the frequency domain Zero-pole diagram

Imaginary Part

Magnitude

Response to shifted deltas

Frequency (rad)

Magnitude

Time Window

Normalized Time (rad) Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Real Part Doctoral Thesis

19 / 63

IIR-CQT Proposed Transform Characteristics Design stage to define the filtering properties: determine the pole p(f ) to obtain constant Q.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

20 / 63

IIR-CQT Hilights • Design flexibility

- Restriction: slowly changing quality factor in different frequencies. • Low computational cost.

- Linear with the number of frequency bins. • Simple implementation. p = design_poles ( NFFT , Q ) ; X = f f t ( f f t s h i f t (s) ) ; Y ' (1 ) = X (1) ; f o r i = 2 : NFFT /2 Y ' ( i ) = X ( i −1) + X ( i ) + p ( i ) Y ' ( i −1) ; end Y ( NFFT / 2 ) = Y ' ( NFFT / 2 ) ; f o r i = NFFT /2 −1: −1:1 Y ( i ) = Y ' ( i+1) + Y ' ( i ) + p ( i ) Y ( i+1) ; end

“An efficient multi-resolution spectral transform for music analysis”, ISMIR 2009 Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

21 / 63

Comparison of Multi-resolution vs STFT Analysis of a music analysis + sharper high harmonics

+ improved low frequency discrimination

+ note onsets better defined

− worse resolution for stationary sounds

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

22 / 63

Chirp based transforms Chirp based transforms • chirp: linearly frequency modulated sinusoid • examples: I Chirplet Transform(CT) [Mann and Haykin, 1991] I Fractional Fourier Transform (FrFT) [Almeida, 1994]

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

23 / 63

Comparison of classical tools Analysis of a non-stationary harmonic signals • Sinusoidal based: inapropiate for non-stationary signals

Frequency

• Chirp based: not optimal for non-stationary harmonic signals

Time

STFT [Gabor, 1946] Constant resolution Pablo Cancela (UdelaR, Uruguay)

CQT [Brown, 1991] Multiresolution

CT [Mann and Haykin, 1991] FrFT [Almeida, 1994] “directional” resolution

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

24 / 63

Chirp based transforms Fan Chirp Transform [Weruaga and K´epesi, 2007] • offers optimal resolution in a fan geometry

Frequency

• simultaneously for all the partials of an harmonic chirp

Time

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

25 / 63

1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

5

Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction

6

Conclusions and Future Work

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

26 / 63

Definition Analysis equation Z



X (f , α) , −∞

x(t)φ0α (t)e −j2πf φα (t) dt

 with φα (t) =

 1 1 + αt t 2

Chirps with linear instantaneous frequency fα (t) = (1 + αt) f . f

f

relative slope =

t

t

Tiling for the Fan Chirp Transform and its fan geometry. Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

27 / 63

Computation Change of variable ˆt = φα (t) , X (f , α) =

Z



−∞

ˆ

ˆ −j2πf t d ˆt , x(φ−1 α (t ))e

Fourier Transform of a time warped version of the signal. Phase and inverse function

Original signal (x(t))

1 (t) 0.4

−1

(t)

−0.5

0.2 Warped time (s)

0.5 0

0.3

−1 0.1

−0.3

−0.2

−0.1

0

Warped Signal (x(

0.1

0.2

0.3

0.2

0.3

−1

(t)))

1

0

0.5 −0.1 0 −0.2 −0.5 −0.3 −0.2 0 0.2 Original time (s)

Pablo Cancela (UdelaR, Uruguay)

−1

−0.3

−0.2

−0.1

0 Time (s)

Audio Source Separation T-F Rep. Tools

0.1

Doctoral Thesis

28 / 63

Short Time Fan Chirp Transform Music signal analysis • Harmonic linear chirp model valid in short time intervals • Application of FChT in short time frames with the appropiate α value

Pitch chirp rate α of each source should be determined FChT Spectrogram (STFChT) 5000

4500

4500

4000

4000

3500

3500

3000

3000

Frequency

Frequency

Spectrogram (STFT) 5000

2500

2000

2000

1500

1500

1000

1000

500

0

500

0.1

0.2

0.3

Time

Pablo Cancela (UdelaR, Uruguay)

2500

0.4

0.5

0

0.1

0.2

0.3

0.4

0.5

Time

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

29 / 63

Short Time Fan Chirp Transform Music signal analysis • Harmonic linear chirp model valid in short time intervals • Application of FChT in short time frames with the appropiate α value

Pitch chirp rate α of each source should be determined STFChT with α tuned to the harmonic chirp

STFChT with α=0 6000

5000

Frequency (Hz)

4000

3000

2000

1000

0 0.1

0.2

0.3

0.4

0.5

Time (s)

Pablo Cancela (UdelaR, Uruguay)

0.6

0.7

0.8 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Time (s)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

30 / 63

Chirp rate estimation Motivation for chirp rate estimation • Harmonics energy more concentrated in FChT with correct α value • Computation of FChT in a range of α values for each frame

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

31 / 63

Gathered Log Spectrum [K´epesi and Weruaga, 2006] Definition • Accumulation of the spectrum magnitude at harmonics positions • Use of logarithm for spectrum whitening nh 1 X ρ0 (f ) = log |S(i · f , α)| nH i=1

DFT ( = 0) and FChT with optimum

( = 1.3351) DFT FChT

−40 −50

Magnitude (dB)

−60 −70 −80 −90 −100 −110 −120 −130

0

500

Pablo Cancela (UdelaR, Uruguay)

1000

1500

2000

2500 Frequency (Hz)

3000

3500

Audio Source Separation T-F Rep. Tools

4000

4500

5000

Doctoral Thesis

32 / 63

Gathered Log Spectrum

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

33 / 63

Gathered Log Spectrum GLogS with DFT (α = 0) and with FChT with optimum α GLogS FChT GLogS DFT 300

250

Magnitude

200

150

100

50

0

200

Pablo Cancela (UdelaR, Uruguay)

400

600 800 Fundamental frequency (Hz)

Audio Source Separation T-F Rep. Tools

1000

1200

Doctoral Thesis

34 / 63

GLogS post-processing Motivation Spurious peaks at multiples and submultiples of fundamental frequency GLogS − Sintetized harmonic signal Salience

0.03 0.02 0.01 0 0

200

400

600

800

1000

1200

1000

1200

GLogS − Music signal

−3

x 10 Salience

10 5 0 0

200

Pablo Cancela (UdelaR, Uruguay)

400

600 Frequency (Hz)

800

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

35 / 63

Post-processing Contribution: Atenuation of harmonics and subharmonics: ρ1 (f0 ) = ρ0 (f0 ) − argmax ρ0 (f0 /q) k = 2, 3, . . . ρ2 (f0 ) = ρ1 (f0 ) − ak ρ1 (kf0 ) k = 2, 3, . . . Postprocessed GLogS − Sintetized harmonic signal

−3

x 10 Salience

15 10 5 0 0

200

400

600

800

1000

1200

1000

1200

Postprocessed GLogS − Music signal

−3

x 10 Salience

6 4 2 0 −2 0

200

Pablo Cancela (UdelaR, Uruguay)

400

600 Frequency (Hz)

800

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

36 / 63

Time-frequency representation with STFChT

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

37 / 63

Time-frequency representation with STFChT

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

38 / 63

Time-frequency representation with STFChT

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

39 / 63

FChT + IIR-CQT Contribution: FChT + IIR-CQT • lower interfering terms • better energy concentration

in the f0 -gram α − f plane

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

40 / 63

FChT non-linear Warpings • Quadratic: φα,β (t) = 1 + 12 αt + 13 βt 2 t • Learned (PCA): φa,b (t) = (1 + aC1 (t) + bC2 (t)) t



Publication: “Fan Chirp Transform with Non-linear Time Warping, LASCAS 2015”

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

41 / 63

FChT non-linear Warpings Slight improvements for times with high curvature in the f0 contour.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

42 / 63

FChT non-linear Warpings Slight improvements for times with high curvature in the f0 contour.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

43 / 63

1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

5

Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction

6

Conclusions and Future Work

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

44 / 63

Pitch visualization Fundamental frequency (Hz)

1280

640

320

160

80

0.5

1

1.5

2 Time (s)

2.5

3

3.5

3.5

4

4

Fundamental frequency (Hz)

1280

640

320

160

80

0.5

1

1.5

2

2.5 Time (s)

3

4.5

Audio: audio 3 pop1.wav, audio 4 opera.wav, video 5 opera.avi Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

45 / 63

Pitch visualization F0-gram properties • Pitch of simultaneous sources can be correctly represented • Crossing pitch contours are well resolved

Fundamental frequency (Hz)

• Precise representation of rapid pitch fluctuations F0gram: α with highest salience for each fundamental frequency 297 617 257

257

533

222

222

461

192

398

192 166 0.2 0.22 0.24 0.26 0.28 Time (s)

344 1.55

1.6 1.65 Time (s)

2

2.2

2.4 2.6 Time (s)

2.8

Publication: “Computer aided music performance analysis by means of pitch contours representation”, ISMIR2012 Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

46 / 63

Melody detection Frame-based melody candidates • MIREX and RWC labeled databases as reference • Frame-based detection First three melody candidates obtained from the F0gram.

Frequency (Hz)

338

235

164

114 First Second Third

80

0.5

1

1.5

2 Time (s)

2.5

3

3.5

First candidate corresponds to main melody when it is most prominent

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

47 / 63

Melody detection FChT low-level Information • provides good candidates. • provides slope information candidates, useful for a similarity measure.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

48 / 63

Melody detection FChT low-level Information • provides good candidates. • provides slope information candidates, useful for a similarity measure.

Tracking, two approaches: • Classical Spectral clustering to find short time contours +

Merging of short term contours to find the melody • Ad-Hoc Tracking using similarity of contiguous frame candidates.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

48 / 63

Melody detection FChT low-level Information • provides good candidates. • provides slope information candidates, useful for a similarity measure.

Tracking, two approaches: • Classical Spectral clustering to find short time contours +

Merging of short term contours to find the melody • Ad-Hoc Tracking using similarity of contiguous frame candidates.

Best Overall Accuracy in 2008 MIREX Melody Extraction contest. Participant Avg. Overall Acc

Pablo Cancela (UdelaR, Uruguay)

pc 76.1%

drd2 73.2%

rk 71.1%

vr 67.1%

Audio Source Separation T-F Rep. Tools

clly2 62.1%

drd1 58.6%.

Doctoral Thesis

48 / 63

Source separation Partials extraction • Energy of partials is concentrated in the FChT. It leaks to neighbor

bins in the DFT. Partials extraction is more precise using FChT. • Front end among the best performance algorithm in the 2008 Signal

Separation Evaluation Campaign [Vincent et al., 2009] FChT tuned to a source

DFT

0

1000

Pablo Cancela (UdelaR, Uruguay)

2000

3000

4000

5000 Frequency (Hz)

6000

7000

Audio Source Separation T-F Rep. Tools

8000

9000

10000

Doctoral Thesis

49 / 63

Source separation Partials extraction • Energy of partials is concentrated in the FChT. It leaks to neighbor

bins in the DFT. Partials extraction is more precise using FChT. • Front end among the best performance algorithm in the 2008 Signal

Separation Evaluation Campaign [Vincent et al., 2009] Original spectrum

Source spectrum

Residual spectrum

0

1000

Pablo Cancela (UdelaR, Uruguay)

2000

3000

4000

5000 Frequency (Hz)

6000

7000

Audio Source Separation T-F Rep. Tools

8000

9000

10000

Doctoral Thesis

50 / 63

Source separation Example Original signal 8000

7000

6000

5000

4000

3000

2000

1000

0

1

2

3

4

5

6

7

8

Time (s)

Audio: audio 6 1 pop mix.wav Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

51 / 63

Source separation Example Separated source 8000

7000

6000

5000

4000

3000

2000

1000

0

1

2

3

4

5

6

7

8

Time (s)

Audio: audio 6 2 pop voice.wav Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

52 / 63

Source separation Example Residual 8000

7000

6000

5000

4000

3000

2000

1000

0

1

2

3

4

5

6

7

8

Time (s)

Audio: audio 6 3 pop residual.wav Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

53 / 63

Source separation Among best results in SiSEC 2008: Professionally produced music recordings SIR: Signal to Interference Ratio SAR: Signal to Artifacts Ratio Proposed technique: SIR SAR

Tamy (L) 23 dB 9.7 dB

Tamy (R) 25.2 dB 9.7 dB

Bearlin - Roads 17.7 dB 5.2 dB

Tamy (L) 21.9 dB 10.5 dB

Tamy (R) 20.3 dB 12.3 dB

Bearlin - Roads 18.9 dB 8.5 dB

Ideal Binary Mask: SIR SAR

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

54 / 63

Source separation Original, Left Channel (Recording 1)

Frequency (Hz)

10000

8000

6000

4000

2000

0

1

2

3

4

5

6

7

8

9

Time

Separated leading voice, Left Channel (Recording 1)

Frequency (Hz)

10000

8000

6000

4000

2000

0

1

2

3

4

5

6

7

8

9

Time

Audio: audio 7 1 tamy mix.wav, audio 7 2 tamy voice.wav, audio 7 3 tamy residual.wav Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

55 / 63

Source separation for a Query by Humming Application QBH systems limitation: DB growth

Existing Approaches • Use manually transcribed MIDI files • Use users queries as the reference to search

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

56 / 63

Source separation for a Query by Humming Application QBH systems limitation: DB growth

Existing Approaches • Use manually transcribed MIDI files • Use users queries as the reference to search

Proposed method • Main Melody separation • Transcription of the Main melody (as if it was a query)

It’s fully automatic Publication: “Query by humming: Automatically building the database from music recordings. Pattern Recognition Letters, 2014”

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

56 / 63

Automatic QBH database construction • Source separation Based on the FChT • Extraction of the F0 contour based on the isolated voiced signal Time-frequency analysis Polyphonic audio signal

Audio fetaures computation

Pitch tracking

Polyphonic pitch tracking

Tuning adjustment Database Sound classification

Sound sources separation

Segmentation

Transcription

Classification

Separation

Proof of concept results MRR MIDI audio 1 audio 2 Pablo Cancela (UdelaR, Uruguay)

0.89 0.75 0.76

Top-X hit rate (%) 1 5 10 88.68 89.62 91.51 69.81 79.25 84.91 71.70 81.13 84.91

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

57 / 63

1

Time-Frequency Analysis Motivation

2

Review of classical tools Sinusoidal based transforms

3

IIR-CQT

4

Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions

5

Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction

6

Conclusions and Future Work

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

58 / 63

Conclusions Conclusions

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Conclusions Conclusions • Different T-F representations were explored

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio

visualization

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio

visualization • The T-F representations were used in diverse higher level applications

with good results

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio

visualization • The T-F representations were used in diverse higher level applications

with good results • Audio source separation of harmonic sounds is naturally obtained

from the FChT

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio

visualization • The T-F representations were used in diverse higher level applications

with good results • Audio source separation of harmonic sounds is naturally obtained

from the FChT • Plug-In for Sonic Visualizer available for general public use.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio

visualization • The T-F representations were used in diverse higher level applications

with good results • Audio source separation of harmonic sounds is naturally obtained

from the FChT • Plug-In for Sonic Visualizer available for general public use. • The implementation code has been made publicly available.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

59 / 63

Future Work

Future Work

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

60 / 63

Future Work

Future Work • Envelope modelling + FChT (GMM+ FChT)

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

60 / 63

Future Work

Future Work • Envelope modelling + FChT (GMM+ FChT) • Explore other MIR applications based on these representations

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

60 / 63

Future Work

Future Work • Envelope modelling + FChT (GMM+ FChT) • Explore other MIR applications based on these representations • Explore existing powerful techniques using the FChT instead of the

STFT (i.e. low rank+sparsity)

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

60 / 63

Future Work

Future Work • Envelope modelling + FChT (GMM+ FChT) • Explore other MIR applications based on these representations • Explore existing powerful techniques using the FChT instead of the

STFT (i.e. low rank+sparsity) • Tracking of sources in the α − f − t

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

60 / 63

Questions

Questions?

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

61 / 63

Referencias I Almeida, L. B. (1994). The fractional fourier transform and time-frequency representations. IEEE Transactions on Signal Processing, 42(11):3084 – 3091. Brown, J. C. (1991). Calculation of a constant Q spectral transform. JASA, 89(1):425–434. Dressler, K. (2006). Sinusoidal Extraction Using and Efficient Implementation of a Multi-Resolution FFT. In Proceedings of the DAFx-06, Montreal, Canada. Gabor, D. (1946). Theory of communication. Journal I.E.E., 93(26):429–457.

Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

62 / 63

Referencias II K´epesi, M. and Weruaga, L. (2006). Adaptive chirp-based time-frequency analysis of speech signals. Speech Communication, 48(5):474–492. Mann, S. and Haykin, S. (1991). The chirplet transform: physical considerations. IEEE Transactions on Signal Processing, 41(11):2745–2761. Vincent, E., Araki, S., and Bofill, P. (2009). The 2008 signal separation evaluation campaign: A community-based approach to large-scale evaluation. In Proc. Int. Conf. on Independent Component Analysis and Signal Separation. Weruaga, L. and K´epesi, M. (2007). The fan-chirp transform for nonstationary harmonic signals. Signal Processing, 87(6):1504–1522. Pablo Cancela (UdelaR, Uruguay)

Audio Source Separation T-F Rep. Tools

Doctoral Thesis

63 / 63

Suggest Documents