Content-based Audio Music Retrieval

Content-based Audio Music Retrieval Hsin-Min Wang [email protected] Speech, Language and Music Processing Laboratory Institute of Information Sci...

Author: Iris Carter

0 downloads 0 Views 2MB Size

Report

Download PDF

Recommend Documents

Pitch Histograms in Audio and Symbolic Music Information Retrieval

Music Information Retrieval

MUSIC AUDIO CONTENT

ENHANCING SONIC BROWSING USING AUDIO INFORMATION RETRIEVAL

HOW SIGNIFICANT IS STATISTICALLY SIGNIFICANT? THE CASE OF AUDIO MUSIC SIMILARITY AND RETRIEVAL

Multi-Modal Music Information Retrieval - Visualisation and Evaluation of Clusterings by Both Audio and Lyrics

Arduino Music and Audio Projects

Can Humans Benefit from Music Information Retrieval?

Time Series Representations for Music Information Retrieval

Music Database Retrieval Based on Spectral Similarity

Impact of audio degradation on music classification

Audio Based Genre Classification of Electronic Music

Core Inventory: Music Audio & Video All Genres

Features for Audio and Music Classification

Music Information Retrieval: A Survey of Issues and Approaches

Eyes4Ears More than a Classical Music Retrieval System

Interdisciplinary Communities and Research Issues in Music Information Retrieval

Effectiveness of Note Duration Information for Music Retrieval

MUSIC INFORMATION RETRIEVAL WITH TEMPORAL FEATURES AND TIMBRE

The Music Information Retrieval Evaluation exchange: Some Observations and Insights

Report of the ISMIS 2011 Contest: Music Information Retrieval

Problems of Music Information Retrieval in the Real World

MUSIC data analysis and retrieval has become a very

TUNEPAL - DISSEMINATING A MUSIC INFORMATION RETRIEVAL SYSTEM TO THE TRADITIONAL IRISH MUSIC COMMUNITY

Content-based Audio Music Retrieval Hsin-Min Wang [email protected]

Speech, Language and Music Processing Laboratory Institute of Information Science, Academia Sinica, Taipei, Taiwan http://slam.iis.sinica.edu.tw

Music Information Retrieval (MIR) 2



Users need to find the “right” songs for   





a specific listening context (driving, studying, exercising) a specific mood (sad, happy, angry) a specific event (wedding, party) accompanying a video (home video)

Current solutions    

Manual browsing or selection Keyword search (artist, title, lyrics) Social recommendation Content-based retrieval (query-by-singing/humming, fingerprinting)

Outline 3

  

Retrieving music by singer (query-by-example) Retrieving music by melody (query-by-singing/humming) Retrieving music by melody (query-by-example) 

 

Cover song retrieval

Retrieving music by social tags (query-by-tag) Retrieving music by emotion (query-by-emotion)

Outline 4

  

Retrieving music by singer (query-by-example) Retrieving music by melody (query-by-singing/humming) Retrieving music by melody (query-by-example) 

 

Cover song retrieval

Retrieving music by social tags (query-by-tag) Retrieving music by emotion (query-by-emotion)

Retrieving Music by Singer 5 

Retrieving music data performed by the singer of an example audio query (query-by-example) Same-singer Songs

Find me all the songs performed by the singer of this audio query 

Users are recommended to listen to the songs performed by their favorite singer or the songs performed by singers with similar vocal characteristics

Singer-based MIR System 6

Music Doc. X1

Music Doc. X2

Music Doc. XM

Vocal/Non-vocal Seg.

Vocal/Non-vocal Seg.

Vocal/Non-vocal Seg.

Solo Voice Modeling

Solo Voice Modeling

Solo Voice Modeling

s,1

s,2

s,M

Likelihood Computation & Ranking L (Y, Xi )  log p(YV | λ s ,i , λb,Y ), 1  i  M

λ b ,Y

YV

Example Music Query Y

YB

Vocal/Non-vocal Seg.

Background Music Model Training

Ranked List

Copyright Protection 7



Singer verification – examining if an unknown music file contains the voices of a particular singer If this unknown file contains Norah Jones's latest song

?

Internet

Spider

Singer Verification 8

A Test Music Recording X

Music Data H 0

Vocal/Non-vocal Segmentation

Solo Voice Modeling

Music Data H 1

Vocal/Non-vocal Segmentation

Solo Voice Modeling

Training Phase

Yes/ No

Testing Phase

H 0 (Yes)

H0: X is performed by the target singer, H1: X is not perform ed by the target singer.

Likelihood Ratio Computation & Decision

Decision

p( X | H 0 ) p ( X | H1 )

 

H1 (No)

,

Outline 9

  

Retrieving music by singer (query-by-example) Retrieving music by melody (query-by-singing/humming) Retrieving music by melody (query-by-example) 

 

Cover song retrieval

Retrieving music by social tags (query-by-tag) Retrieving music by emotion (query-by-emotion)

Query-by-Singing/Humming System 10

Indexing Phase

Searching Phase

Sung Query Background Accompaniment Reduction

Karaoke Song 1

Note Sequence Generation & Smoothing

Main Melody Extraction

Document 1

Phrase Onset Detection Karaoke Song 2

Main Melody Extraction

Document 2

Phrase Onset Detection

End-point Detection Note Sequence Generation & Smoothing

Similarity Computation & Decision

Relevant Song

Phrase Onsets (Document) Alignment & Comparison (Query)

Melody Similarity Comparison 11

u1 u2 u3 ...ul ... uL

Dynamic Time Warping

Document's Note Sequence



DTW constructs a TL distance matrix D = [D(t,)]T  L D(t  2,   1)  2  d (t , )  D(t , )  minD(t  1,   1)  d (t , )   D(t  1,   2)  d (t , )  d (t , ) | qt  u |

r

Boundary conditions:

q1 q2 q3 ... qt ... qT

Query's Note Sequence

Similarity between q and u:

S (q, u)  min D(T , ) T / 2   L

Complexity: O(T2)

 D(1,1)  d (1,1)  D(t ,1)  , 2  t  T   D(t ,2)  , 4  t  T  d (1, ), 1    r  .  D(1, )  , rlL    d (1,   1)  d (2, ), 2    r  1  D(2, )   r 1  l  L  ,   D(3,2)  d (1,1)  2  d (3,2)

Outline 12

  

Retrieving music by singer (query-by-example) Retrieving music by melody (query-by-singing/humming) Retrieving music by melody (query-by-example) 

 

Cover song retrieval

Retrieving music by social tags (query-by-tag) Retrieving music by emotion (query-by-emotion)

Example Cover Song Pairs 13

Type of within-pair difference L L+S L+T L+S+T L+T+N L+S+T+N L+A+T+N L+S+A+T+N

No. of pairs 8 7 3 7 6 4 2 10

Examples

L: Language (Mandarin/English/Japanese) S: Singer A: Principal Accompaniments T: Tempo N: Non-vocal Melodies

Melody-based Cover Song Retrieval System

14

Indexing Phase

Searching Phase Audio Query Main Melody Extraction Note Sequence

Song 1

Non-vocal Removal

Main Melody Note Sequence 1 Extraction

Song 2

Non-vocal Removal

Main Melody Note Sequence 2 Extraction

Song M

Non-vocal Removal

Main Melody Note Sequence M Extraction

Similarity Computation & Ranking

Ranked List

Outline 15

  

Retrieving music by singer (query-by-example) Retrieving music by melody (query-by-singing/humming) Retrieving music by melody (query-by-example) 

 

Cover song retrieval

Retrieving music by social tags (query-by-tag) Retrieving music by emotion (query-by-emotion)

Social Tags for Music 16



Music tags describe different aspects of a music clip, e.g., genre, mood, instrumentation, users’ preference

16

Music Tag Annotation and Retrieval 17

Annotating music clips with tags

A Music Clip

Annotate Music Using One Predictor for Each Tag

Scores of Tags

Female R&B Guitar Metal Bass

Retrieving music clips using a tag query

A Query: Rock

Rank Music Clips based on the Scores of the Rock Predictor

Music Ranked List for the Query

High Relevance

Low Relevance

MTML Query Interface – Coloring Tags in Tag Cloud

18



Online: http://slam.iis.sinica.edu.tw/demo/SoTags/

Music Tagging System 19

p(wm | s)   k  km k

“Playing with Tagging” Music Player

20 

Visual effects of current music players are usually generated by audio signal processing directly and render meaningless or incomprehensible displays



Our playing-with-tagging player can show the dynamic tag distribution during music playback

Outline 21

  

Retrieving music by singer (query-by-example) Retrieving music by melody (query-by-singing/humming) Retrieving music by melody (query-by-example) 

 

Cover song retrieval

Retrieving music by social tags (query-by-tag) Retrieving music by emotion (query-by-emotion)

Emotion-based MIR Systems 22 

Music retrieval and organization by emotion is intuitive 

Music is created to convey and modulate emotions



Music Emotion Recognition (MER)

Mufin Player

Mr. Emo developed by Yang et al.

Emotion as Categories 23 

Music emotion recognition is a classification problem 





Support vector machines, hidden Markov models, etc.

5 mood categories used in the MIREX Audio Mood Classification task: 

Cluster_1: passionate, rousing, confident, boisterous, rowdy



Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured



Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding



Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry



Cluster_5: aggressive, fiery, tense/anxious, intense, volatile, visceral

Debate on categorical emotions

The Valence-Arousal Model 24 

Emotions are considered as numerical values (instead of discrete labels) over the valence and arousal dimensions



Good visualization and intuitive



Easy to capture temporal change of emotion Activation‒Arousal • Energy or neurophysiological stimulation level

Evaluation‒Valence • Pleasantness • Positive and negative affective states

Valence-arousal circumplex chart

Valence-Arousal Annotations 25 

Emotion is subjective, but aggregation of annotations among users indeed exists



Dimensional emotion of a song can be described by a bivariate Gaussian distribution



Predict the emotion of a song as a single Gaussian

Regression for Gaussian Parameters

26 

The regression method directly learns five regression models to predict the mean, variance, and covariance of valence and arousal



No joint modeling and estimation of the Gaussian parameters

Regressor 2

mVal mAro

Regressor 3

sVal-Val

Regressor 4

sVal-Aro

Regressor 5

sAro-Aro

Regressor 1

…

Audio signal

Frame-based feature vectors

The Acoustic Emotion Gaussians Model

27 

A probabilistic approach



Represent the acoustic features of a song by a probabilistic histogram vector



Develop a model to comprehend the relationship between acoustic features and annotations in the VA space

Acoustic GMM posterior representations of songs

Music Emotion Recognition (MER) 28 

Given the acoustic GMM posterior of a test song, predict the emotion as a single VA Gaussian

𝜃1 𝜃2 …

…

𝜃K-1 Feature vectors

𝜃K Acoustic Learned VA GMM GMM Posterior

Predicted Single Gaussian

29

Automatic Generation of Music Video



Audio



Video



Sound energy



Lighting key



Tempo and beat strength



Shot change rate



Rhythm regularity



Motion intensity



Pitch



Color (saturation, color energy)

Demonstration I 30

Video Retrieves Audio

Audio: Radiohead - No Surprises https://www.youtube.com/watch?v=u5CVsCnxyXg Video: Sigur Ros – Von (Heima) http://www.youtube.com/watch?v=hme5jf2Z_ow

Demonstration II 31

Audio Retrieves Video

Audio: Michael Franti & Spearhead - Say Hey (I Love You) https://www.youtube.com/watch?v=ehu3wy4WkHs Video: Of Montreal - Wraith Pinned to the Mist and Other Things https://www.youtube.com/watch?v=7PoJv4N1Too

Automatic Generation of Music Video Considering Temporal Emotion Flow 32

(ACM MM 2015)

References 33 

W. H. Tsai and H. M. Wang, "Automatic Singer Recognition of Popular Music Recordings via Estimation and Modeling of Solo Vocal Signals," IEEE Trans. on Audio, Speech, and Language Processing, 14(1), pp. 330-341, January 2006.



H. M. Yu, W. H. Tsai, and H. M. Wang, "A Query-by-Singing System for Retrieving Karaoke Music," IEEE Trans. on Multimedia, 10(8), pp. 1626-1637, December 2008.



W. H. Tsai, H. M. Yu, and H. M. Wang, "Using the Similarity of Main Melodies to Identify Cover Versions of Popular Songs for Music Document Retrieval," Journal of Information Science and Engineering, 24(6), pp. 1669-1687, November 2008.



J. C. Wang, Y. C. Shih, M. S. Wu, H. M. Wang and S. K. Jeng, "Colorizing tags in tag cloud: A novel query-by-tag music search system," in Proceedings of ACM MM 2011, pp. 293-302, November 2011.



J. C. Wang, Y. H. Yang, H. M. Wang, and S. K. Jeng, "The Acoustic Emotion Gaussians Model for Emotion-based Music Annotation and Retrieval," in Proceedings of ACM MM 2012, pp. 89-98, October 2012.



J. C. Wang, Y. H. Yang, H. M. Wang, and S. K. Jeng, "Modeling the Affective Content of Music with a Gaussian Mixture Model," IEEE Transactions on Affective Computing, 6(1), pp. 56 - 68, March 2015.

More papers are available at http://slam.iis.sinica.edu.tw/paper.htm

Thank You!