Multimedia Information Retrieval

Multimedia Information Retrieval Dr. Qi Tian Department of Computer Science The University of Texas at San Antonio [email protected] October 31, 200...

Author: Martha O’Connor’

0 downloads 2 Views 2MB Size

Report

Download PDF

Recommend Documents

Multimedia Information Discovery. Multimedia Information Retrieval. Multimedia Information Retrieval Model. Recommended Reading

Multimedia Information Retrieval Systems: An Overview

Ansichten eines Fotoalbums zu Multimedia Information Retrieval

Statistical Models for Semantic-Multimedia Information Retrieval

Informedia: News-on-Demand Multimedia Information Acquisition and Retrieval

Lecture 5: Multimedia Information Retrieval Dr. Jian Zhang

Eliciting Inductive User Preferences for Multimedia Information Retrieval

Information Retrieval

4.2.1 Shot Detection. Dcut. cut. Kap Multimedia Retrieval - SS01. Kap Multimedia Retrieval - SS01

Adaptive Pattern Discovery for Interactive Multimedia Retrieval

Personalized Multimedia Retrieval: The New Trend?

Multilinguales Information Retrieval, AG Datenbanken und Informationssysteme. Multilinguales Information Retrieval

XQuery Framework for Interoperable Multimedia Retrieval

Information-Retrieval: Unscharfe Suche

Music Information Retrieval

Modern Information Retrieval

Information Retrieval 1

Introduction to Information Retrieval

Modern Information Retrieval

Ranking in Information Retrieval

Information Retrieval. Ulf Leser

Information Retrieval im Internet

NLP im Information Retrieval

Multimedia Information Retrieval

Dr. Qi Tian Department of Computer Science The University of Texas at San Antonio [email protected] October 31, 2007

Outline

Overview Problems Our Work Trends and Directions

Qi Tian Univ. of Texas at San Antonio

Multimedia Multimedia Information Information Retrieval Retrieval

Motivation ¾ With the explosive growth of digital media data, there is a huge demand for new tools and systems that enables average users to more efficiently and more effectively search, access, process, manage, author and share these digital media contents.

Qi Tian Univ. of Texas at San Antonio

Overview

Multimedia Multimedia Information Information Retrieval Retrieval Text-based Information Retrieval ¾ ¾ ¾

Too many images to annotate High cost of human interpretation Subjectivity of visual content, e.g., “A picture is worth a thousand words”

Content-based Retrieval ¾ automatically retrieves images, video, and audio based on the visual and audio content

History ¾ ¾ ¾

Conference on Database Applications of Pictorial Applications in 1979 NSF workshop in 1992 More active field since 1997 when Internet and web browsing became popular Qi Tian Univ. of Texas at San Antonio

Overview

Multimedia Multimedia Information Information Retrieval Retrieval A VERY DIVERSIFIED FIELD! Data Types ¾

Text, hypertext, image, audio, graphics, animation, paintings, video/movie, rich text, spread sheet, slides, combinations of these and user interaction

Research Problems ¾

Systems, content, services, user, evaluation, implementation, social/business, applications

Methodologies ¾

Database, information retrieval, signal and image processing, graphics, vision, human-computer interaction, machine learning, statistical modeling, data mining, pattern analysis, data fusion, social sciences, and domain knowledge for applications Qi Tian Univ. of Texas at San Antonio

Overview

Multimedia Multimedia Information Information Retrieval Retrieval

Overview Content-based Image Retrieval Content-based Video Retrieval Content-based Audio Retrieval

Qi Tian Univ. of Texas at San Antonio

Approaches

Hierarchical Hierarchical Levels Levels High-level Bridge the semantic gap, integration of context and content, hybrid (text and content) approaches

Mid-level Active Learning, Boosting, Incremental Learning

Low-level Feature Extraction and Representation, Dimension Reduction and Selection.

Qi Tian Univ. of Texas at San Antonio

Overview

Multimedia Multimedia Information Information Retrieval Retrieval Content-based Image Retrieval Image DB

metadata

Feature Extraction

9Color

Color

C/C++

Texture

Off-line

structure

Automatic

User Interface Visual C++

Color

memory Feature weighting

histogram 9Color moments 9Color correlogram

Texture 9Tamura

texture 9Co-occurrence matrices 9Gabor features 9Wavelet moments

Shape 9Fourier

descriptor

Structure 9Edge-based

Similarity ranking Qi Tian Univ. of Texas at San Antonio

features

Approaches

Many Many CBIR CBIR systems systems have have been been built.. built..

WISE WISEby byStanford Stanford The growing list: ADL, AltaVista Photofinder, Amore, ASSERT, BDLP, Blobworld, CANDID, C-bird, Chabot, CBVQ, DrawSearch, Excalibur Visual RetrievalWare, FIDS, FIR, FOCUS, ImageFinder, ImageMiner, ImageRETRO, ImageRover, ImageScape, Jacob, LCPD, MARS, MetaSEEk, MIR, NETRA, QBIC by IBM Almaden Photobook, Picasso, PicHunter, PicToSeek, QBIC, Quicklook2, SIMBA, SQUID, Surfimage, SYNAPSE, MARS by UIUC WebSEEK by Columbia TODAI, VIR Image Engine, VisualSEEk, VP Image Retrieval System, WebSeer, WISE… WebSEEK byWebSEEk, Columbia

Qi Tian Univ. of Texas at San Antonio

Overview

Multimedia Multimedia Information Information Retrieval Retrieval Content-based Video Retrieval Traditional Video Retrieval ¾Query-by-textual keyword

Automatic Visual Concept Detection e.g., indoor/outdoor, Sky, Car, Building, US-flag Example concepts: Airplane, Building, Car, Crowd, Desert, Explosion, Outdoor, People, Vehicle, Violence

Video Retrieval – Scene ¾How to recognize a scene? Context – Use Proto-Concepts to describe context – Machine learning to link context to concepts

Sky

Sky

Water

Qi Tian Univ. of Texas at San Antonio

Sky

Water

Overview

Multimedia Multimedia Information Information Retrieval Retrieval Content-based Audio Retrieval

to search sounds by their features in the waveform, statistics, or transform domains ¾Short time energy ¾ Speech,

Music, Environment Audio, Silence

Applications

¾Zero-crossing rate ¾Pitch period

Entertainment ¾ Film making - searching sound effects ¾ TV/radio studio - editing programs ¾ Karaoke, music stores, or online shopping - query by humming the melody Audio/video archive management ¾ Segmenting and indexing of raw recordings ¾ Searching and browsing audio/video clips Surveillance ¾ Monitoring criminal or emergent events ¾ Film rating

¾MFCC ¾Spectrogram ¾LPC

Alarming!!!

glass breaking, explosion, cry, shot, …

Qi Tian Univ. of Texas at San Antonio

MIR MIR

Top Top 10 10 Problems Problems in in MIR MIR Bridge the Semantic Gap ¾

high level concept (sites, objects, events) and low-level visual/audio features (color, texture, shape and structure, layout; motion; audio - pitch, energy, etc.).

How to Best Combine Human Intelligence and Machine Intelligence. ¾

Keep human in the loop, e.g. Relevance Feedback

New Query Paradigms ¾

Query by keywords, similarity, sketching an object, sketching a trajectory, painting a rough image, etc. Can we think of useful new paradigms?

Multimedia Data Mining ¾

¾

Searching for interesting/unusual patterns and correlations in multimedia has many important applications, including Web Search Engines and dealing with intelligence data. Work to date on Data Mining has been mainly in Text data.

How to Use Unlabeled Data ¾ ¾

Active learning, e.g., in Relevance Feedback Label propagation, e.g., image/video annotation

Xiong, Zhou, Tian, Rui and Huang, “Semantic Retrieval of Video”, IEEE SP Mag., March 2006

MIR

Top Top 10 10 Problems Problems (continued) (continued) Using Virtual Reality Visualization To Help ¾ ¾

Can we use 3D audio/visual visualization techniques to help a user to navigate through the data space to browse and to retrieve? e.g., 3D MARS

Incremental Learning ¾

Change the parameters of the retrieval algorithms incrementally, not needing to start from scratch every time we have new data.

Structuring Very Large Databases ¾

Researchers in audio/visual scene analysis and those in Databases and Information Retrieval should really collaborate CLOSELY to find good ways of structuring very large multimedia databases for efficient retrieval and search.

Performance Evaluation ¾

e.g., TRECVID for video retrieval, how about image retrieval?

What Are the Killer Applications of Multimedia Retrieval? ¾

e.g., medical multimedia document management Qi Tian Univ. of Texas at San Antonio

Approaches

Our Our Recent Recent Work Work Discriminant Analysis Training

Feature Extraction

Useful Usefulin inother otherapplications applications

Data Modeling •Dimension •Dimensionreduction reduction •Statistical •Statisticalestimation estimation

Similarity Estimation •Classification •Classification •Ranking/indexing •Ranking/indexing

Image ImageDatabase Database

Qi Tian Univ. of Texas at San Antonio

User

Approaches Approaches

Our Our Recent Recent Work Work in in CBIR CBIR Semantic Subspace Projection 9 Bridge the semantic gap Hybrid Discriminant Analysis 9 Learn high dimensional data with small samples Adaptive Discriminant Projection 9 Adaptively learn a projection from data distribution Distance Measure for Similarity Estimation 9 Investigate the relations between probabilistic distributions, distance metric, and mean estimation

Qi Tian Univ. of Texas at San Antonio

Our Our Recent Recent Work Work in in CBIR CBIR Related Publications Journals o o o o o o o

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) IEEE Transactions on Circuits and Systems for Video Technology (CSVT) IEEE Multimedia (MM) International Journal on Computer Vision (IJCV) Pattern Recognition (PR) ACM Transactions on Knowledge Discovery from Data (TKDD) IEEE Transactions on Computational Biology and Bioinformatics

Conferences o ACM Multimedia o IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) o International Conference on Pattern Recognition (ICPR) o Others including ICME, ICIP, ICASSP, MIR, CIVR

Qi Tian Univ. of Texas at San Antonio

Current Directions Web Image Search and Mining Image Annotation Affective Video Retrieval Information Fusion in MIR Integration of Context and Content for Multimedia Management Multimodal Emotion Recognition

Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Web Search Web Search 1.0 – Traditional Text Retrieval Web Search 2.0 – Page-level Relevance Ranking Web Search 3.0 – Object-level Structured Search

Object Level Vertical Search (MSRA Libra: http://libra.msra.cn/ Live Product Search (http://products.live.com) Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Image Annotation Photo sharing through the Internet has become a common practice. flickr.com: 19.5 million photos (30% growth/month), 2005 Photo.net and airliners.net: millions of images

Most image search engines relies on textual descriptions of the images, e.g., Google, Yahoo, MSN

In general, people do not spend time labeling or annotating their personal photos

Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Image Annotation Can computer do this?

Building Sky Lake Landscape Tree

Image Annotation System ¾ A statistical model that can relate words to image features. ¾ Sketch an image, extract feature vectors. ¾ Descriptive words--top words ranked according to likelihood.

Current work Li, Wang (alipr.com 2006), Blei, Jordan (2003); Vasconcelos (UCSD-SML 2007), Zhang et al. (MSRA 2005), Li et al. (MSRA 2007) Promising Direction: Web Image Annotation – an integration of IR and Content Analysis Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Affective Video Retrieval Affective: ¾“a feeling or emotion as distinguished from cognition, thought, or action”

Real Multimedia Retrieval ¾ Search for a subset of nicest holiday pictures to show them to friends ¾ Selecting the most appropriate background music for the given situation ¾ Search for the most impressive video clips ¾ Search for the most appealing photographs of one and the same content ¾ Search for all film comedies I like most

Alternative approach Search for 9 Mood 9 Matches to users’ profile Like/dislike, interest/no interest

Affective Video Retrieval Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Underlying Idea Arousal t

Video ¾ Temporal content flow ¾ Continuous transitions from one affective state to another

Valence t

Temporal measurement of ¾ Arousal ¾ Valence

Arousal Affect curve

Combining the two curves into the Affect Curve Qi Tian Univ. of Texas at San Antonio

Valence

Trends and Directions

Affective Media Content Characterization Hanjalic & Xu Media

Feature extraction

Suspense horror

Arousal = f1 (feature values) Valence = f2 (feature values)

Hilarious fun

A bit somber, medium excitement Romantic “feel-good”

Mapping AV values onto the 2D affect space

Affective content characterization

Qi Tian Univ. of Texas at San Antonio

Trends and Directions Hanjalic & Xu

Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Information Fusion in MIR Fusion: A merging of diverse, distinct, or separate elements into a unified whole (Merriam-Webster dictionary). ¾ Feature Extraction Module: 9 Multiple features ->vectors 9 Concatenated vector 9 Feature Fusion: more discriminating hyperspace can be found in the new vector

¾ Matching Module: 9 9 9 9

One type of classifiers for multiple features or Multiple types of classifiers for one feature or Both The output score can be combined

¾ Decision Module: 9 The output decision of each classifier can be combined

Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Information Fusion in MIR Two Forms

¾Multi-modality 9 e.g. Video clip->visual information, audio information, textural information 9 Multi-modality fusion occurs at feature extraction module. 9 Single source information may be represented by multiple features, e.g. Color image->color, texture, shape

¾Multi-Classifiers (Ensemble of Classifiers) 9 9 9 9

A set of classifiers are trained to solve the same problem Applied on single or multiple source of information A single type of base classifiers or Different types of classifiers (Bayesian, K-NN, SVM)

Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Information Fusion in MIR Fusion Schemes ¾ The prediction of multiple classifiers need to be integrated into one fused decision by Fusion Scheme. 9 The output of different classifiers need to be normalized.

¾ Rule-based 9 Decision is made by a simple operation on the output of all classifiers 9 e.g. Max, Min, Sum, Mean (Matching Module) 9 e.g. AND, OR (Decision Module)

¾ Learning-based 9 The output of all classifiers is fed into a learning process to obtain the final decision 9 e.g. Decision Tree, Neutral Network

No one is guaranteed to be the best empirically or theoretically. Applications 9 Multimodal Biometrics Fusion (face, fingerprint, iris, palmprint, voice, hand geometry) 9 Audio/visual fusion for multimodal emotion recognition Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Integration of Context and Content for Multimedia Management

from Carlson & Hatfield, 1992 Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Integration of Context and Content for Multimedia Management ¾ An increasing number of active research in this direction ¾ Crucial to human-human communication and human understanding of multimedia. 9 without context it is difficult for a human to recognize various objects, ¾ Enable that the (semi)automatic content analysis and indexing methods become more powerful in managing multimedia data ¾ Contextual information 9 e.g., cell ID for the mobile phone location, GPS integrated in a digital camera, camera parameters, time information, and identity of the producer

Qi Tian Univ. of Texas at San Antonio

Trends and Directions

Integration of Context and Content for Multimedia Management (T-MM Special Issue) Topics of interest include: ¾ Contextual metadata extraction ¾ Models for temporal context, spatial context, imaging context (e.g., camera metadata), social context, and so on ¾ Web context for online multimedia annotation, browsing, sharing and reuse ¾ Context tagging systems, e.g., geotagging, voice annotation ¾ Context-aware inference algorithms ¾ Context-aware multi-modal fusion systems (text, document, image, video, metadata, etc.) ¾ Models for combining contextual and content information ¾ Context-aware interfaces and collaboration ¾ Novel methods to support and enhance social interaction, including innovative ideas like social, affective computing, and experience capture. ¾ Applications such as using context and similarity for face and location identification ¾ Context-aware mobile media technology and applications ¾ Using context to browse and navigate large media collections Qi Tian Univ. of Texas at San Antonio

Projects in High Impact Features Web-based Large Scale User participated

A combination of internet and multimedia

Examples: Photo Search Home photo management, mobile photo search, face search Millions Books Project Tremendous Space for Search and Multimedia Data Mining Human-centered Multimedia Search Search for UCC data (user preference, profile, and opinion), Social search (public relations, names, personalization)

Acknowledgement Current Collaborators •

Funding Agencies

Academia

•

Army Research Office (ARO)

•

Department of Homeland Security (DHS)

•

San Antonio Life Science Institute (SALSI)

•

Center for Infrastructure Assurance and Security (CIAS)

University of Illinois University of Amsterdam Chinese Academy of Science University of Science and Technology of China

•

Industry Institute of Infocomm Research, Singapore

Students

HP Labs, Palo Alto, CA Microsoft Research (MSR) Redmond Microsoft Research Asia (MSRA) NEC Labs America

•

Jerry Yu (2007, Kodak Research, NJ)

•

Yijuan Lu (2008 expected)

•

Yuning Xu

Kodak Research Lab IBM T.J. Watson Research Ctr

Qi Tian Univ. of Texas at San Antonio

Q Q& &A A

Thank you ! Questions ?

Qi Tian Univ. of Texas at San Antonio