Informed Spectral Analysis for Under Determined Audio Source Separation

Informed Spectral Analysis for Under Determined Audio Source Separation Fourer Dominique LaBRI - Universit´ e Bordeaux I 17 Novembre 2011 1 Introd...

Author: Linette Jane Simmons

2 downloads 3 Views 382KB Size

Report

Download PDF

Recommend Documents

An overview of informed audio source separation

Audio Source Separation using Independent Component Analysis

Score-Informed Source Separation for Music Signals

Perceptually controlled doping for audio source separation

Blind audio source separation via Independent Component Analysis

AUDIO SOURCE SEPARATION WITH TIME-FREQUENCY VELOCITIES

AUDIO SOURCE SEPARATION USING MULTIPLE DEFORMED REFERENCES

Audio Source Separation With a Single Sensor

Audio-Visual and Sparsity based Source Separation

On-the-fly audio source separation

Performance measurement in blind audio source separation

Supervised non-negative matrix factorization for audio source separation

Extended Semantic Initialization for NMF-based Audio Source Separation

Audio-Video Array Source Separation for Perceptual User Interfaces

ESTIMATION OF THE SPATIAL INFORMATION IN GAUSSIAN MODEL BASED AUDIO SOURCE SEPARATION USING WEIGHTED SPECTRAL BASES

SCORE-INFORMED LEADING VOICE SEPARATION FROM MONAURAL AUDIO

Independent Component Analysis Enhancements for Source Separation in Immersive Audio Environments

Modulation Spectral Analysis of Audio Features for Music Genre Classification

EFFICIENT MANIFOLD PRESERVING AUDIO SOURCE SEPARATION USING LOCALITY SENSITIVE HASHING

BEYOND NMF: TIME-DOMAIN AUDIO SOURCE SEPARATION WITHOUT PHASE RECONSTRUCTION

Audio Source Separation Techniques Including Novel Time-Frequency Representation Tools

Multichannel audio source separation with deep neural networks

How to integrate audio source separation and classification?

Audio Source Separation Techniques Including Novel Time-Frequency Representation Tools

Informed Spectral Analysis for Under Determined Audio Source Separation Fourer Dominique LaBRI - Universit´ e Bordeaux I

17 Novembre 2011

1

Introduction

Plan

Introduction Introducing informed Spectral Analysis Application to audio source Separation problem Conclusion and future work

Introduction

Source Separation Problem Observation model Instantaneous discrete sound mixture signal : x[n] =

K X

sk [n] + r [n]

(1)

k=1

with r [n] is a residual noise signal.

Monaural sound mixture I

The number K of sources present in the mixture is greater than the number of observation (under determined configuration).

I

No orthogonality assumption (sources may overlap in time and frequency.) 3

Introduction

State of the Art Purpose of presented work Recover each sk [n] signals from x[n] with the minimal distortion (in the less squared-error sense).

Existing Approaches for audio under determined source separation I

Model-based inference : estimation of source signal parameters using prior information (e.g. harmonic model, sinusoidal modelling, GMM, ...).

I

Unsupervised learning : non-parametric approach that attempts to extract signal characteristics from data. (e.g. ICA, NMF, sparse coding)

I

Psychoacoustically motivated methods : organization of psychoacoustic cues (e.g. CASA) 4

Introduction

Sinusoidal Modelling Source decomposition using the stationary model for the analysis of a local frame : sk [n] =

L X

al cos (ωl n + φl )

(2)

l=1

where (a, ω, φ) ∈ R3 denotes respectively the amplitude, frequency and phase. Why sinusoidal modelling ? I Sparse representation of musical signal (efficient for low bit rate coding MPEG4-SSC/HILN), I a and ω are perceptual parameters, I allows efficient sound transformation. (e.g. time-stretching, transposition)

Introduction

Sinusoidal Modelling Parameters estimation I

Reassignment method [Kodera,Villedary & Gendrin 1976] achieves to reach Cram`er Rao lower Bound (CRB).

I

Generalized derivative method for non-stationary model [Marchand & Depalle 2008] ω ˆ (t, ωk ) = ωk − = |

Sw 0 (t, ωk ) . Sw (t, ωk ) {z } −∆ω

Sw [t,ωk ] k) ˆ ˆa(t, ωk ) = SWw (t,ω φ = ∠ (∆ω ) W (∆ω ) I I

Sw is the STFT of s using window w w 0 = ∂w∂t(t) and W = F(w )

6

Introduction

Sinusoidal Modelling : Theoretical bounds

Fig.: Frequency estimation 7

Introduction

Sinusoidal Modelling : Theoretical bounds

Fig.: Amplitude estimation 8

Introduction

Sinusoidal Modelling : Theoretical bounds

Fig.: Phase estimation 9

Introducing informed Spectral Analysis

Plan

Introduction Introducing informed Spectral Analysis Application to audio source Separation problem Conclusion and future work

Introducing informed Spectral Analysis

Informed Source Separation Why ? I

Classic estimators have theoretical limitations (CRB),

I

high-quality is required by demanding applications,

I

the original separated audio sources signals are available during the mixing process (recording studio).

Motivation I

Derive new informed-estimators that combine classic estimation with extra-information,

I

optimize the rate-distortion ratio of these estimators,

I

hide extra-information into the signal itself (Watermarking)*. 11

Introducing informed Spectral Analysis

Approach comparison Classic estimation x[n]

sˆk [n] Estimator

Informed estimation Watermarking

x[n]

sk [n]

Info Extract

xW [n]

I

Coder xW [n]

Decoder

xW [n], I

Decoder

Estimator

s˜k [n]

Introducing informed Spectral Analysis

Informed Source Separation State of the art Classic source separation combined with side extra information : I

Spectral envelope + clustering + Spatial filtering [Gorlow, Marchand 2011],

I

magnitude spectrogram compression + Wiener filtering + [Liuktus & al. 2011],

I

spectral envelope + Wiener filtering [Parvaix, Girin 2009],

E stimate ˜sk without prior knowledge about signal model or parameters.

13

Introducing informed Spectral Analysis

Model based informed analysis Motivation I

find the minimal amount of extra-information necessary to reach a fixed target precision from any classic estimator,

I

allow a bit-per-bit scalable quality control,

I

generalize the informed-analysis approach to allow a theoretical study.

Intuition A classic estimator can be combined with the minimal complementary information required to systematically correct errors and reach a target quality. 14

Introducing informed Spectral Analysis

Observation Experimentation Histogram of the Most Significant Bit (fixed point binary representation) of the estimation error committed with the reassignment method for frequency estimation of a sinusoid combined with a white noise. SNR = 10 dB 35

30

30

occurrences (%)

occurrences (%)

SNR = −8 dB 35

25

20

15

10

5

0

25

20

15

10

5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

most significant bit index

0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

most significant bit index

Introducing informed Spectral Analysis

Informed spectral analysis principle Scalar case I

Let be p ∈ [0, 1) a real parameter and pˆ its estimation (obtained by a classic spectral analysis).

I

Let be Cd the d-length fixed point binary coding application and D = C −1 . Let C = C1 , C2 , ..., Cd denote Cd (p)

I

Let be Iσ = msb (C (|p − pˆ|)) for a significant number of occurrences over pˆ. (Iσ is the upper bound of the CI of the estimator).

Cd (p) = C1 , C2 , . . . , CIσ −1 , CIσ , . . . , Cd . | {z } | {z } reliable part

unreliable part

(3)

Introducing informed Spectral Analysis

How to correct errors ? 1/2

Proposal of solution Substitution of the unreliable extracted from C(p) p : 0.4032 pˆ : 0.3831 (|| = 0.0201) p˜ : 0.4026 (|| = 0.006)

part of C(ˆ p ) with the exact bit values C(p) : C(ˆ p) : C(˜ p) :

0110 0110 0110

01110 00100 01110

0111 0010 0010

|˜ p − p| ≤ |ˆ p − p|

17

Introducing informed Spectral Analysis

How to correct errors 2/2

Problem : What happens when ... p : 0.2776 C(p) : pˆ : 0.2473 (|| = 0.0303) C(ˆ p) : p˜ : 0.2161 (|| = 0.0615) C(˜ p) :

0100 0011 0011

0111 1111 0111

00010 01010 01010

|˜ p − p| > |ˆ p − p| Substitution may increase the error.

18

Introducing informed Spectral Analysis

Solution We transmit the Iσ − 1 bit for verification p : 0.2776 C(p) : 0 1 0 0 pˆ : 0.2473 (|| = 0.0303) C(ˆ p) : 0 0 1 1

0111 1111

I

If Iσ is exact then we have p − 2−Iσ ≤ pˆ ≤ p + 2−Iσ

I

When C(p)[Iσ − 1] 6= C(ˆ p )[Iσ − 1] we check 2 cases :

p˜+ : p˜− : I

0.2786 (|| = 0.0010) 0.1536 (|| = 0.1240)

C(˜ p+) : C(˜ p−) :

0100 0010

0111 0111

00010 01010

01010 01010

The best informed value verifies pˆ − 2−Iσ ≤ p˜ ≤ pˆ + 2−Iσ

Introducing informed Spectral Analysis

variance of the error (log10 scale)

Results 0 CRB reassignment informed reassignment (5/16 bits) informed lower bound (5 bits)

−2

−4

−6

−8

−10

−12

−14 −20

−10

0

10

20

30

40

50

signal−to−noise ratio (dB) Fig.: Comparison with CRB for frequency estimation with d = 16

20

Introducing informed Spectral Analysis

variance of the error (log10 scale)

Results 0 CRB reassignment informed reassignment (5/24 bits) informed lower bound (5 bits)

−5

−10

−15 −20

−10

0

10

20

30

40

50

signal−to−noise ratio (dB) Fig.: Comparison with CRB for frequency estimation with d = 24

21

Introducing informed Spectral Analysis

Generalization to P ∈ R3 for sinusoidal model I

Parameter is a vector of R3 : P = (a, ω, φ)

I

Coding application : Cd (P) = Ce (a), Cf (ω), Cg (φ) with d =e +f +g

Distortion measure Weighted squared error between synthesized signals. ˆ = D(P, P)

N X

2 w [n] a cos (ωn + φ) − ˆa cos ω ˆ n + φˆ

(4)

n=1

22

Introducing informed Spectral Analysis

Vector quantization problem How to : I I I

find the minimal d for a given distortion measure, ˆ (bit allocation), find e, f , and g that minimize D(P, P) taking advantage of dependence between parameters (e.g. It is useless to allocate bit to ω and φ if a ≈ 0).

Solution Entropy Constrained Unrestricted Spherical Quantization (ECUSQ)[Korten, Jeusen & Heusdens 2007] : I

Define distortion as a function of entropy Ht ,

I

define a variable-length quantizer that minimize D(Ht ),

Introducing informed Spectral Analysis

ECUSQ Using high-rate assumption, D can be expressed as a function of error over each component :

D(˜a, ∆a , ∆ω , ∆φ ) ≈

||w ||2 ∆2a + ˜a2 (∆2φ + σw2 ∆2ω ) 24

Let be fA,Ω,Φ (a, ω, φ) the joint probability density of P and g the quantizer point density. Thus we can express overall average distortion : Z Z Z ||w ||2 ¯ D= fA,Ω,Φ (a, ω, φ) gA−2 (a, ω, φ) 24 +˜a gΦ−2 (a, ω, φ) + σw2 gΩ−2 (a, ω, φ) dadωdφ P 2 2 with σw = ||w1||2 N−1 n=0 w [n] n 24

Introducing informed Spectral Analysis

ECUSQ Finally we have to minimize using entropy constraint using Lagrangian multiplier : ¯ + λh(A, Ω, Φ) ν=D

(5)

We obtain : q ||2 gA (a, ω, φ) = 12λ||w log2 (e) gΦ (˜a, ω, φ) = ˜agA (a, ω, φ) gΩ (˜a, ω, φ) = ˜aσw gA (a, ω, φ) with λ =

||w ||2 exp(log(2)(− 23 (Ht )−2b(A)−log 2(σw ))) 12 log2 (e)

25

Introducing informed Spectral Analysis

ECUSQ

Distortion Rate Function Average distortion as a function of the entropy (with high-rate assumption) : DECUSQ =

||w ||2 −(2/3)(Ht −h(A,Ω,Φ))−2b(A)−log2 (σw ) 2 8

(6)

Z with b(A) =

fA (a) log2 (a)da

26

Introducing informed Spectral Analysis

ECUSQ Quantizer point density functions ˜

gA (a, ω, φ) = 2(1/3)Ht −2b(A)−log2 (σw ) ,

(7)

gΦ (˜a, ω, φ) = ˜agA (a, ω, φ),

(8)

gΩ (˜a, ω, φ) = ˜aσw gA (a, ω, φ),

(9)

Notices I

Quantization steps are given by ∆ = g −1 ,

I

Optimal quantizers for ω and φ are linearly dependent on ˜a,

I

the bit allocation function ba,ω,φ is computed from dlog2 (g )e

Introducing informed Spectral Analysis

Simulation for P ∈ R3 SNRtarget = 45dB ⇒ Ht ≈ 42bits 45

amount of extra−information (bits)

80

measured SNR (dB)

60

40

20

0

initial SNR blind estimation target SNR informed estimation (Ht=42)

−20

−40 −20

−10

0

10

20

initial SNR (dB)

SNR

30

40

50

40

Ht 35

ω+a+φ ω a φ

30 25 20 15 10 5 0 −20

−10

0

10

20

30

40

50

initial SNR (dB)

extra information bit rate

28

Application to audio source Separation problem

Plan

Introduction Introducing informed Spectral Analysis Application to audio source Separation problem Conclusion and future work

Application to audio source Separation problem

mixing process

s1. [n] .. sk [n] .. . sK [n]

blind estimator

x[n]

Pk

information extraction

Ik

watermarking

Method Overview x W [n]

x W [n]

Ik

informed estimation

x W [n] − ˜sk [n] + − ˜sk [n]

˜k model-based P synthesis

˜sk [n]

signal transformation

x W [n]

watermarking extraction

(a) Coder

+

x 0 [n]

˜sk0 [n]

(b) Decoder 30

Application to audio source Separation problem

Method summary : coder input : sk [n] : isolated source signals output : x W [n] : watermarked mixture I

Estimate Pk,l from sk [n] using reassignment method.

I

Compute ba,ω,φ from Pk,l using the ECUSQ.

I

Compute binary mask[n]

I

ˆk,l using the informed spectral Estimate Iσ,k,l and Ik,l from P analysis method with simulated mixing process combined with watermark.

I

Compute x W [n] using QIM-based watermarking containing mask[n], Iσ,k,l and Ik,l .

Application to audio source Separation problem

Method summary : decoder

input : x W [n] : watermarked mixture ˜k,l : isolated source signals and parameters output : ˜sk [n], P I

I I

I

Recover mask[n], Iσ,k,l and Ik,l from watermark extraction from x W [n] and ECUSQ for ba,ω,φ computation. ˆk,l using mask[n] and reassignment method. Estimate P ˜k,l with Iσ,k,l and Ik,l using the informed spectral Compute P analysis. ˜k,l . Synthesize ˜sk [n] from P

Application to audio source Separation problem

Results with real sounds 200 180

Informed analysis Quantized Watermark capacity

160

bitrate (kbits/s)

140 120 100 80 60 40 20 0 30

35

40

45

50

55

60

65

70

75

result SNR (dB)

guitar 33

Application to audio source Separation problem

Results with real sounds 180 160

Informed analysis Quantized Watermark capacity

bitrate (kbits/s)

140 120 100 80 60 40 20 0 20

25

30

35

40

45

50

55

60

65

result SNR (dB)

voice 34

Conclusion and future work

Plan

Introduction Introducing informed Spectral Analysis Application to audio source Separation problem Conclusion and future work

Conclusion and future work

Happy ending

Conclusion We have proposed method for informed-analysis of sounds mixture using a quality constraint.

Future work I

theoretical study and comparison with Shannon Lower Bound,

I

applications to other audio signal models and estimators,

I

optimization of mask[n] computation using prior knowledge about sound structure.

36