Audio Source Separation Techniques Including Novel Time-Frequency Representation Tools. Pablo Cancela Doctoral Thesis Electrical Engineering Universidad de la Rep´ ublica Thesis Director: Guillermo Sapiro Academic Director: Gregory Randall
IIE, November 2015
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
1 / 63
Outline 1
Time-Frequency Analysis Motivation
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
2 / 63
Outline 1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
2 / 63
Outline 1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
2 / 63
Outline 1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
2 / 63
Outline 1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
5
Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
2 / 63
Outline 1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
5
Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction
6
Conclusions and Future Work
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
2 / 63
1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
5
Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction
6
Conclusions and Future Work
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
3 / 63
Time-Frequency representations tools Motivation: represent acutely multiple sources in a polyphonic Mix. Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7
Frequency (kHz)
6 5 4 3 2 1 0
0.5
1
1.5
2 Time (s)
2.5
3
3.5
4
Audio: audio 1 pop1.wav Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
4 / 63
Time-Frequency representation tools Motivation: Extract a singing voice from a poliphonic mix. Original, Left Channel (Recording 1)
Frequency (Hz)
10000
8000
6000
4000
2000
0
1
2
3
4
5
6
7
8
9
Time
Separated leading voice, Left Channel (Recording 1)
Frequency (Hz)
10000
8000
6000
4000
2000
0
1
2
3
4
5
6
7
8
9
Time
Audio: audio 2 tamy.wav Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
5 / 63
Time-Frequency representation tools
Goals • Improve tools for the representation in time and frequency of
time-varying harmonic signals. • Facilitate the task of high-level algorithms
(detection, estimation, classification, recognition, etc.) • Explore applications in polyphonic music.
- Visualization. - Source Separation. - Music Information Retrieval applications.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
6 / 63
Time-Frequency representations
Difficulties • Bounded time-frequency resolution (uncertainty principle). • Challenge in the representation of polyphonic signals.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
7 / 63
Time-Frequency representations
Difficulties • Bounded time-frequency resolution (uncertainty principle). • Challenge in the representation of polyphonic signals.
Goals of alternative representations • Overcome classical time-frequency representations limitations • Enhance representation using a-priori knowledge of signals
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
7 / 63
Music signal analysis Features
Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7
Frequency (kHz)
6 5 4 3 2 1 0
0.5
Pablo Cancela (UdelaR, Uruguay)
1
1.5
2 Time (s)
2.5
Audio Source Separation T-F Rep. Tools
3
3.5
4
Doctoral Thesis
8 / 63
Music signal analysis Features • Harmonic structure
Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7
Frequency (kHz)
6 5 4 3 2 1 0
0.5
Pablo Cancela (UdelaR, Uruguay)
1
1.5
2 Time (s)
2.5
Audio Source Separation T-F Rep. Tools
3
3.5
4
Doctoral Thesis
8 / 63
Music signal analysis Features • Harmonic structure • High density of harmonics in low and mid frequencies
Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7
Frequency (kHz)
6 5 4 3 2 1 0
0.5
Pablo Cancela (UdelaR, Uruguay)
1
1.5
2 Time (s)
2.5
Audio Source Separation T-F Rep. Tools
3
3.5
4
Doctoral Thesis
8 / 63
Music signal analysis Features • Harmonic structure • High density of harmonics in low and mid frequencies • High frequency modulation in high frequencies Spectrogram. fs = 44100 Hz, window length = 2048 samples 8 7
Frequency (kHz)
6 5 4 3 2 1 0
0.5
Pablo Cancela (UdelaR, Uruguay)
1
1.5
2 Time (s)
2.5
Audio Source Separation T-F Rep. Tools
3
3.5
4
Doctoral Thesis
8 / 63
Music signal analysis Example: harmonic signals • Can approximate music sounds within short time intervals • Frequency modulation produces poor resolution for higher harmonics STFT Spectrogram 6000
5000
Frequency
4000
3000
2000
1000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
9 / 63
1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
5
Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction
6
Conclusions and Future Work
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
10 / 63
Sinusoidal based transforms Short Time Fourier Transform (STFT) [Gabor, 1946] • constant time-frequency resolution
Pablo Cancela (UdelaR, Uruguay)
Frequency
Frequency
• window length (and shape) determines resolution • longer windows give: I better frequency resolution I and worse time resolution
Time
Time
Wideband
Narrowband
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
11 / 63
Sinusoidal based transforms Wideband spectrogram
Narrowband spectrogram
• poor spectral resolution
• good spectral resolution
• good time resolution
• poor time resolution
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
12 / 63
Sinusoidal based transforms
Multi-resolution analysis • computing several DFTs with Frequency
different window lengths • examples: I Multi-resolution FFT (MRFFT) [Dressler, 2006] I Constant-Q Transform CQT [Brown, 1991] I Wavelets • better suited for music analysis Time
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
13 / 63
Comparison of Multi-resolution vs STFT Analysis of a non-stationary harmonic signal • Improved time-frequency resolution for the multi-resolution approach STFT Spectrogram 6000
5000
Frequency
4000
3000
2000
1000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time
STFT spectrum and spectrogram
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
14 / 63
Comparison of Multi-resolution vs STFT Analysis of a non-stationary harmonic signal • Improved time-frequency resolution for the multi-resolution approach CQT Spectrogram 6000
5000
Frequency
4000
3000
2000
1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Time
CQT spectrum and spectrogram
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
15 / 63
1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
5
Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction
6
Conclusions and Future Work
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
16 / 63
IIR-CQT Contribution: new transform based on a time varying filtering in the frequency domain Zero-pole diagram
Imaginary Part
Magnitude
Response to shifted deltas
Frequency (rad)
Magnitude
Time Window
Normalized Time (rad) Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Real Part Doctoral Thesis
17 / 63
IIR-CQT Contribution: new transform based on a time varying filtering in the frequency domain Zero-pole diagram
Imaginary Part
Magnitude
Response to shifted deltas
Frequency (rad)
Magnitude
Time Window
Normalized Time (rad) Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Real Part Doctoral Thesis
18 / 63
IIR-CQT Contribution: new transform based on a time varying filtering in the frequency domain Zero-pole diagram
Imaginary Part
Magnitude
Response to shifted deltas
Frequency (rad)
Magnitude
Time Window
Normalized Time (rad) Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Real Part Doctoral Thesis
19 / 63
IIR-CQT Proposed Transform Characteristics Design stage to define the filtering properties: determine the pole p(f ) to obtain constant Q.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
20 / 63
IIR-CQT Hilights • Design flexibility
- Restriction: slowly changing quality factor in different frequencies. • Low computational cost.
- Linear with the number of frequency bins. • Simple implementation. p = design_poles ( NFFT , Q ) ; X = f f t ( f f t s h i f t (s) ) ; Y ' (1 ) = X (1) ; f o r i = 2 : NFFT /2 Y ' ( i ) = X ( i −1) + X ( i ) + p ( i ) Y ' ( i −1) ; end Y ( NFFT / 2 ) = Y ' ( NFFT / 2 ) ; f o r i = NFFT /2 −1: −1:1 Y ( i ) = Y ' ( i+1) + Y ' ( i ) + p ( i ) Y ( i+1) ; end
“An efficient multi-resolution spectral transform for music analysis”, ISMIR 2009 Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
21 / 63
Comparison of Multi-resolution vs STFT Analysis of a music analysis + sharper high harmonics
+ improved low frequency discrimination
+ note onsets better defined
− worse resolution for stationary sounds
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
22 / 63
Chirp based transforms Chirp based transforms • chirp: linearly frequency modulated sinusoid • examples: I Chirplet Transform(CT) [Mann and Haykin, 1991] I Fractional Fourier Transform (FrFT) [Almeida, 1994]
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
23 / 63
Comparison of classical tools Analysis of a non-stationary harmonic signals • Sinusoidal based: inapropiate for non-stationary signals
Frequency
• Chirp based: not optimal for non-stationary harmonic signals
Time
STFT [Gabor, 1946] Constant resolution Pablo Cancela (UdelaR, Uruguay)
CQT [Brown, 1991] Multiresolution
CT [Mann and Haykin, 1991] FrFT [Almeida, 1994] “directional” resolution
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
24 / 63
Chirp based transforms Fan Chirp Transform [Weruaga and K´epesi, 2007] • offers optimal resolution in a fan geometry
Frequency
• simultaneously for all the partials of an harmonic chirp
Time
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
25 / 63
1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
5
Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction
6
Conclusions and Future Work
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
26 / 63
Definition Analysis equation Z
∞
X (f , α) , −∞
x(t)φ0α (t)e −j2πf φα (t) dt
with φα (t) =
1 1 + αt t 2
Chirps with linear instantaneous frequency fα (t) = (1 + αt) f . f
f
relative slope =
t
t
Tiling for the Fan Chirp Transform and its fan geometry. Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
27 / 63
Computation Change of variable ˆt = φα (t) , X (f , α) =
Z
∞
−∞
ˆ
ˆ −j2πf t d ˆt , x(φ−1 α (t ))e
Fourier Transform of a time warped version of the signal. Phase and inverse function
Original signal (x(t))
1 (t) 0.4
−1
(t)
−0.5
0.2 Warped time (s)
0.5 0
0.3
−1 0.1
−0.3
−0.2
−0.1
0
Warped Signal (x(
0.1
0.2
0.3
0.2
0.3
−1
(t)))
1
0
0.5 −0.1 0 −0.2 −0.5 −0.3 −0.2 0 0.2 Original time (s)
Pablo Cancela (UdelaR, Uruguay)
−1
−0.3
−0.2
−0.1
0 Time (s)
Audio Source Separation T-F Rep. Tools
0.1
Doctoral Thesis
28 / 63
Short Time Fan Chirp Transform Music signal analysis • Harmonic linear chirp model valid in short time intervals • Application of FChT in short time frames with the appropiate α value
Pitch chirp rate α of each source should be determined FChT Spectrogram (STFChT) 5000
4500
4500
4000
4000
3500
3500
3000
3000
Frequency
Frequency
Spectrogram (STFT) 5000
2500
2000
2000
1500
1500
1000
1000
500
0
500
0.1
0.2
0.3
Time
Pablo Cancela (UdelaR, Uruguay)
2500
0.4
0.5
0
0.1
0.2
0.3
0.4
0.5
Time
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
29 / 63
Short Time Fan Chirp Transform Music signal analysis • Harmonic linear chirp model valid in short time intervals • Application of FChT in short time frames with the appropiate α value
Pitch chirp rate α of each source should be determined STFChT with α tuned to the harmonic chirp
STFChT with α=0 6000
5000
Frequency (Hz)
4000
3000
2000
1000
0 0.1
0.2
0.3
0.4
0.5
Time (s)
Pablo Cancela (UdelaR, Uruguay)
0.6
0.7
0.8 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Time (s)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
30 / 63
Chirp rate estimation Motivation for chirp rate estimation • Harmonics energy more concentrated in FChT with correct α value • Computation of FChT in a range of α values for each frame
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
31 / 63
Gathered Log Spectrum [K´epesi and Weruaga, 2006] Definition • Accumulation of the spectrum magnitude at harmonics positions • Use of logarithm for spectrum whitening nh 1 X ρ0 (f ) = log |S(i · f , α)| nH i=1
DFT ( = 0) and FChT with optimum
( = 1.3351) DFT FChT
−40 −50
Magnitude (dB)
−60 −70 −80 −90 −100 −110 −120 −130
0
500
Pablo Cancela (UdelaR, Uruguay)
1000
1500
2000
2500 Frequency (Hz)
3000
3500
Audio Source Separation T-F Rep. Tools
4000
4500
5000
Doctoral Thesis
32 / 63
Gathered Log Spectrum
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
33 / 63
Gathered Log Spectrum GLogS with DFT (α = 0) and with FChT with optimum α GLogS FChT GLogS DFT 300
250
Magnitude
200
150
100
50
0
200
Pablo Cancela (UdelaR, Uruguay)
400
600 800 Fundamental frequency (Hz)
Audio Source Separation T-F Rep. Tools
1000
1200
Doctoral Thesis
34 / 63
GLogS post-processing Motivation Spurious peaks at multiples and submultiples of fundamental frequency GLogS − Sintetized harmonic signal Salience
0.03 0.02 0.01 0 0
200
400
600
800
1000
1200
1000
1200
GLogS − Music signal
−3
x 10 Salience
10 5 0 0
200
Pablo Cancela (UdelaR, Uruguay)
400
600 Frequency (Hz)
800
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
35 / 63
Post-processing Contribution: Atenuation of harmonics and subharmonics: ρ1 (f0 ) = ρ0 (f0 ) − argmax ρ0 (f0 /q) k = 2, 3, . . . ρ2 (f0 ) = ρ1 (f0 ) − ak ρ1 (kf0 ) k = 2, 3, . . . Postprocessed GLogS − Sintetized harmonic signal
−3
x 10 Salience
15 10 5 0 0
200
400
600
800
1000
1200
1000
1200
Postprocessed GLogS − Music signal
−3
x 10 Salience
6 4 2 0 −2 0
200
Pablo Cancela (UdelaR, Uruguay)
400
600 Frequency (Hz)
800
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
36 / 63
Time-frequency representation with STFChT
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
37 / 63
Time-frequency representation with STFChT
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
38 / 63
Time-frequency representation with STFChT
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
39 / 63
FChT + IIR-CQT Contribution: FChT + IIR-CQT • lower interfering terms • better energy concentration
in the f0 -gram α − f plane
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
40 / 63
FChT non-linear Warpings • Quadratic: φα,β (t) = 1 + 12 αt + 13 βt 2 t • Learned (PCA): φa,b (t) = (1 + aC1 (t) + bC2 (t)) t
Publication: “Fan Chirp Transform with Non-linear Time Warping, LASCAS 2015”
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
41 / 63
FChT non-linear Warpings Slight improvements for times with high curvature in the f0 contour.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
42 / 63
FChT non-linear Warpings Slight improvements for times with high curvature in the f0 contour.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
43 / 63
1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
5
Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction
6
Conclusions and Future Work
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
44 / 63
Pitch visualization Fundamental frequency (Hz)
1280
640
320
160
80
0.5
1
1.5
2 Time (s)
2.5
3
3.5
3.5
4
4
Fundamental frequency (Hz)
1280
640
320
160
80
0.5
1
1.5
2
2.5 Time (s)
3
4.5
Audio: audio 3 pop1.wav, audio 4 opera.wav, video 5 opera.avi Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
45 / 63
Pitch visualization F0-gram properties • Pitch of simultaneous sources can be correctly represented • Crossing pitch contours are well resolved
Fundamental frequency (Hz)
• Precise representation of rapid pitch fluctuations F0gram: α with highest salience for each fundamental frequency 297 617 257
257
533
222
222
461
192
398
192 166 0.2 0.22 0.24 0.26 0.28 Time (s)
344 1.55
1.6 1.65 Time (s)
2
2.2
2.4 2.6 Time (s)
2.8
Publication: “Computer aided music performance analysis by means of pitch contours representation”, ISMIR2012 Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
46 / 63
Melody detection Frame-based melody candidates • MIREX and RWC labeled databases as reference • Frame-based detection First three melody candidates obtained from the F0gram.
Frequency (Hz)
338
235
164
114 First Second Third
80
0.5
1
1.5
2 Time (s)
2.5
3
3.5
First candidate corresponds to main melody when it is most prominent
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
47 / 63
Melody detection FChT low-level Information • provides good candidates. • provides slope information candidates, useful for a similarity measure.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
48 / 63
Melody detection FChT low-level Information • provides good candidates. • provides slope information candidates, useful for a similarity measure.
Tracking, two approaches: • Classical Spectral clustering to find short time contours +
Merging of short term contours to find the melody • Ad-Hoc Tracking using similarity of contiguous frame candidates.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
48 / 63
Melody detection FChT low-level Information • provides good candidates. • provides slope information candidates, useful for a similarity measure.
Tracking, two approaches: • Classical Spectral clustering to find short time contours +
Merging of short term contours to find the melody • Ad-Hoc Tracking using similarity of contiguous frame candidates.
Best Overall Accuracy in 2008 MIREX Melody Extraction contest. Participant Avg. Overall Acc
Pablo Cancela (UdelaR, Uruguay)
pc 76.1%
drd2 73.2%
rk 71.1%
vr 67.1%
Audio Source Separation T-F Rep. Tools
clly2 62.1%
drd1 58.6%.
Doctoral Thesis
48 / 63
Source separation Partials extraction • Energy of partials is concentrated in the FChT. It leaks to neighbor
bins in the DFT. Partials extraction is more precise using FChT. • Front end among the best performance algorithm in the 2008 Signal
Separation Evaluation Campaign [Vincent et al., 2009] FChT tuned to a source
DFT
0
1000
Pablo Cancela (UdelaR, Uruguay)
2000
3000
4000
5000 Frequency (Hz)
6000
7000
Audio Source Separation T-F Rep. Tools
8000
9000
10000
Doctoral Thesis
49 / 63
Source separation Partials extraction • Energy of partials is concentrated in the FChT. It leaks to neighbor
bins in the DFT. Partials extraction is more precise using FChT. • Front end among the best performance algorithm in the 2008 Signal
Separation Evaluation Campaign [Vincent et al., 2009] Original spectrum
Source spectrum
Residual spectrum
0
1000
Pablo Cancela (UdelaR, Uruguay)
2000
3000
4000
5000 Frequency (Hz)
6000
7000
Audio Source Separation T-F Rep. Tools
8000
9000
10000
Doctoral Thesis
50 / 63
Source separation Example Original signal 8000
7000
6000
5000
4000
3000
2000
1000
0
1
2
3
4
5
6
7
8
Time (s)
Audio: audio 6 1 pop mix.wav Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
51 / 63
Source separation Example Separated source 8000
7000
6000
5000
4000
3000
2000
1000
0
1
2
3
4
5
6
7
8
Time (s)
Audio: audio 6 2 pop voice.wav Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
52 / 63
Source separation Example Residual 8000
7000
6000
5000
4000
3000
2000
1000
0
1
2
3
4
5
6
7
8
Time (s)
Audio: audio 6 3 pop residual.wav Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
53 / 63
Source separation Among best results in SiSEC 2008: Professionally produced music recordings SIR: Signal to Interference Ratio SAR: Signal to Artifacts Ratio Proposed technique: SIR SAR
Tamy (L) 23 dB 9.7 dB
Tamy (R) 25.2 dB 9.7 dB
Bearlin - Roads 17.7 dB 5.2 dB
Tamy (L) 21.9 dB 10.5 dB
Tamy (R) 20.3 dB 12.3 dB
Bearlin - Roads 18.9 dB 8.5 dB
Ideal Binary Mask: SIR SAR
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
54 / 63
Source separation Original, Left Channel (Recording 1)
Frequency (Hz)
10000
8000
6000
4000
2000
0
1
2
3
4
5
6
7
8
9
Time
Separated leading voice, Left Channel (Recording 1)
Frequency (Hz)
10000
8000
6000
4000
2000
0
1
2
3
4
5
6
7
8
9
Time
Audio: audio 7 1 tamy mix.wav, audio 7 2 tamy voice.wav, audio 7 3 tamy residual.wav Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
55 / 63
Source separation for a Query by Humming Application QBH systems limitation: DB growth
Existing Approaches • Use manually transcribed MIDI files • Use users queries as the reference to search
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
56 / 63
Source separation for a Query by Humming Application QBH systems limitation: DB growth
Existing Approaches • Use manually transcribed MIDI files • Use users queries as the reference to search
Proposed method • Main Melody separation • Transcription of the Main melody (as if it was a query)
It’s fully automatic Publication: “Query by humming: Automatically building the database from music recordings. Pattern Recognition Letters, 2014”
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
56 / 63
Automatic QBH database construction • Source separation Based on the FChT • Extraction of the F0 contour based on the isolated voiced signal Time-frequency analysis Polyphonic audio signal
Audio fetaures computation
Pitch tracking
Polyphonic pitch tracking
Tuning adjustment Database Sound classification
Sound sources separation
Segmentation
Transcription
Classification
Separation
Proof of concept results MRR MIDI audio 1 audio 2 Pablo Cancela (UdelaR, Uruguay)
0.89 0.75 0.76
Top-X hit rate (%) 1 5 10 88.68 89.62 91.51 69.81 79.25 84.91 71.70 81.13 84.91
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
57 / 63
1
Time-Frequency Analysis Motivation
2
Review of classical tools Sinusoidal based transforms
3
IIR-CQT
4
Fan Chirp Transform Definition and computation Pitch salience estimation Fan Chirp Transform Analysis and Extensions
5
Applications using the Fan Chirp Transform Pitch visualization Pitch Tracking Source separation Query by Humming, Automatic Database Construction
6
Conclusions and Future Work
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
58 / 63
Conclusions Conclusions
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Conclusions Conclusions • Different T-F representations were explored
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio
visualization
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio
visualization • The T-F representations were used in diverse higher level applications
with good results
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio
visualization • The T-F representations were used in diverse higher level applications
with good results • Audio source separation of harmonic sounds is naturally obtained
from the FChT
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio
visualization • The T-F representations were used in diverse higher level applications
with good results • Audio source separation of harmonic sounds is naturally obtained
from the FChT • Plug-In for Sonic Visualizer available for general public use.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Conclusions Conclusions • Different T-F representations were explored • Contribution: Novel IIR-CQT transform was proposed • Contribution: Novel extensions to the FChT were proposed • Contribution: F0-gram improvements allow polyphonic audio
visualization • The T-F representations were used in diverse higher level applications
with good results • Audio source separation of harmonic sounds is naturally obtained
from the FChT • Plug-In for Sonic Visualizer available for general public use. • The implementation code has been made publicly available.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
59 / 63
Future Work
Future Work
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
60 / 63
Future Work
Future Work • Envelope modelling + FChT (GMM+ FChT)
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
60 / 63
Future Work
Future Work • Envelope modelling + FChT (GMM+ FChT) • Explore other MIR applications based on these representations
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
60 / 63
Future Work
Future Work • Envelope modelling + FChT (GMM+ FChT) • Explore other MIR applications based on these representations • Explore existing powerful techniques using the FChT instead of the
STFT (i.e. low rank+sparsity)
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
60 / 63
Future Work
Future Work • Envelope modelling + FChT (GMM+ FChT) • Explore other MIR applications based on these representations • Explore existing powerful techniques using the FChT instead of the
STFT (i.e. low rank+sparsity) • Tracking of sources in the α − f − t
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
60 / 63
Questions
Questions?
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
61 / 63
Referencias I Almeida, L. B. (1994). The fractional fourier transform and time-frequency representations. IEEE Transactions on Signal Processing, 42(11):3084 – 3091. Brown, J. C. (1991). Calculation of a constant Q spectral transform. JASA, 89(1):425–434. Dressler, K. (2006). Sinusoidal Extraction Using and Efficient Implementation of a Multi-Resolution FFT. In Proceedings of the DAFx-06, Montreal, Canada. Gabor, D. (1946). Theory of communication. Journal I.E.E., 93(26):429–457.
Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
62 / 63
Referencias II K´epesi, M. and Weruaga, L. (2006). Adaptive chirp-based time-frequency analysis of speech signals. Speech Communication, 48(5):474–492. Mann, S. and Haykin, S. (1991). The chirplet transform: physical considerations. IEEE Transactions on Signal Processing, 41(11):2745–2761. Vincent, E., Araki, S., and Bofill, P. (2009). The 2008 signal separation evaluation campaign: A community-based approach to large-scale evaluation. In Proc. Int. Conf. on Independent Component Analysis and Signal Separation. Weruaga, L. and K´epesi, M. (2007). The fan-chirp transform for nonstationary harmonic signals. Signal Processing, 87(6):1504–1522. Pablo Cancela (UdelaR, Uruguay)
Audio Source Separation T-F Rep. Tools
Doctoral Thesis
63 / 63