Boosting Image Retrieval through Aggregating Search Results based on Visual Annotations

Boosting Image Retrieval through Aggregating Search Results based on Visual Annotations Ximena Olivares1 Massimiliano Ciaramita2 Roelof van Zwol2 1...

Author: Zoe Pearson

1 downloads 3 Views 5MB Size

Report

Download PDF

Recommend Documents

Content Based Image Retrieval

Survey on Sketch Based Image Retrieval System

Content-Based Visual Information Retrieval

BARCODE ANNOTATIONS FOR MEDICAL IMAGE RETRIEVAL: A PRELIMINARY INVESTIGATION. H.R.Tizhoosh

Sketch Based Image Retrieval System

Color Sketch Based Image Retrieval

Survey paper on Sketch Based and Content Based Image Retrieval

AN IMAGE RETRIEVAL SYSTEM BASED ON REGION CLASSIFICATION

Content-Based Retrieval for European Image Libraries

IV OPTIONEN BILDBASIERTER BILDSORTIERUNG (IMAGE-BASED RETRIEVAL)

Project Based Housing Search Results

Filtering Abstract Senses From Image Search Results

Outline. Machine Learning Approaches to Image Retrieval. Image Retrieval. Text-Based Approach. Content-Based Approach. Text-Based Approach

Documenting Visual Quality Controls on the Evaluation of Petroleum Reservoir-rocks through Ontology-based Image Annotation

A Digital Image Watermarking Scheme Based on Visual Cryptography

Adapting Boosting for Information Retrieval Measures

Content Based Image Search. Mark Desnoyer

Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations

Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition

A Survey on Image Mining Techniques for Image Retrieval

Information Retrieval. Web Search

Boosting Image Retrieval through Aggregating Search Results based on Visual Annotations Ximena Olivares1  Massimiliano Ciaramita2  Roelof van Zwol2 1

Universitat Pompeu Fabra           2Yahoo! Research

Introduction 





Very large continuosly growing, online collections of humanannotated content Annotations are essential to make photo easily retrievable However, retrieval models that are effective in text retrieval do not work as well for textbased image retrieval

Introduction 1. Textbased queries Textual annotations are sparse and short [Marlow et al]



Diverse motivations :  Spatial, temporal, social



[Ames & Naaman] 

Events, personalities, media tagging [Dubinko et al]

Illinois Chicago autumn farmers market Thanksgiving holiday thoughts orayer thankful thanks

Introduction 1. Textbased queries 2. Query by Image Content 

User requires to begin with    a sample image

Introduction 1. Textbased queries 2. Query by Image Content 3. Detectors 

Specific for each topic



e.g. face, skin, anchor man, ...



Good results in narrow domain

Introduction What about broad domains? (e.g. Web, Flickr) 

Billion photos



Billion of users



Large quantity of annotations

Problems: 



Retrieval performance of contentbased image systems is lower than keywordbased Results of search system using only tags are noisy and suboptimal

Introduction Flickr allows another kind of annotations (notes) 

Associate text with visual area



Highly relevant to content

→ Visual Annotation 

Valuable to learn different

    the visual representations of     an object

Motivation The main objective of our research is  to improve retrieval performance, specifically precision at early recall by combining textual and visual information Main idea Use visual annotations (text & image) and rank aggregation to improve retrieval

Highlevel outline 1.User performs a query (e.g. ”coke can”) 2.Visual annotations matching the query                    are selected (2)

(1)

Highlevel outline 3.For each annotation, the top k similar images are retrieved, using contentbased image retrieval

(2) (3) (1)

Highlevel outline 3.For each annotation, the top k similar images are retrieved, using contentbased image retrieval

(2) (3) (1)

Highlevel outline 3.For each annotation, the top k similar images are retrieved, using contentbased image retrieval

(2) (3) (1)

Highlevel outline 4.The result lists are aggregated to obtain the final result ranking

(2) (3) (1)

(4)

Contentbased Image Retrieval Research in this area is extensive



Characteristics of contentbased image retrieval 

Stateoftheart object retrieval



Scalable to a Web domain

We adopted the framework proposed by [Sivic and Zisserman] 

Successfully applied to a collection of flickr images, detecting buildings [Philbin et al]



Promising results (scalability and performance)



Baseline

Contentbased Image Retrieval 1. Extract visual features and describe them.   

Processed 12,000 images. Computed Harris and Hessian features Described using SIFT

2.Build visual vocabulary. 

 

Clustered SIFT descriptors to create vocabulary of       10,000 words Implemented an approximate Kmeans algorithm 3 resulting vocabularies. Based on Harris, Hessian and a combination of those 2 features.

}

SIFT descriptors

kmeans clustering

Visual vocabulary 10k words

Contentbased Image Retrieval 3. An image can be represented as a set of visual words 



Using the analogy with text retrieval, images can be represented in the vector space model Similarity can be measured by calculating the cosine similarity

4. Considerations regarding spatial distribution must        be taken 

Spatial distribution of words is significantly more important in image retrieval than in text retrieval

}

spatial distribution cosine similarity

Aggregating visual annotations Borda count 

Aslam & Montague compared different aggregation methods in Web scenario => Borda count simple and efficient algorithm

Aggregating visual annotations Borda count 

Aslam & Montague compared different aggregation methods in Web scenario => Borda count simple and efficient algorithm

Aggregating visual annotations Borda count 

Aslam & Montague compared different aggregation methods in Web scenario => Borda count simple and efficient algorithm

Aggregating visual annotations Borda count 

Aslam & Montague compared different aggregation methods in Web scenario => Borda count simple and efficient algorithm

Evaluation Task Hypothesis: 



H1: Rank aggregation using visual annotations will significantly improve the retrieval performance in terms of precision H2: Tagbased search combined with CBIR using visual annotations will improve retrieval in terms of precision

Experimental setup Image Collection Flickr





Images with high variability



Web extracted



Annotated by Web users

Crawled 12,000 images (Flickr API), based on their tags





No restriction on relevancy to surrounding tags, or whether the object appears on the image



Contains 59,693 unique tags (from a total of 229,672)



Medium size image: 500x333 pixels

Topics 

Set of 30 topics, derived from Flickr search logs 



Filtered most frequent queries for objects

Defined four categories: 1. Fruits and flowers 2. Monuments and buildings 3. Brands and logos 4. General objects

List of Topics          



  

Strawberry Daisy Moai Sunflower Sushi roll Golden Gate McDonald logo Taj Mahal Hot air balloon Petronas Twin Towers Telephone booth Butterfly Converse Watermelon

 

            

American flag Big Ben clock tower Arc de Triomphe Clock Coke can CN tower Dice Eiffel tower Engagement ring Guitar Soccer ball Statue of Liberty Apple logo Rose Parthenon

Systems Each system uses: 

Input: keywordbased query



Output: ranked list of image results

S1: Textbased retrieval



Based on vector space model for text retrieval. Using the textual annotations, images are retrieved for a given query

S2: Contentbased image retrieval using visual annotations



Using the keywordbased query to select at random one of the visual annotations that matches the query. We constructed 25 random runs for which we report the average performance

System 

S3: Aggregated ranking over the results of CBIR using visual annotations Search using 10 visual annotations (per topic) and retrieve top25 results for each annotation. Apply rank aggregation over the result lists



S4: CBIR using visual annotations and a tag filter Similar to S2, with an additional filter over the image annotations, tag match over all the query



S5: Aggregated ranking over the results of content based image retrieval using annotations and tag filters Similar to S3, with additional filter over the image annotations

Pooling and Assesments 

Topic pools Based on top 25 results for each topic retrieved by each of the systems





Assessors judged the results for a given topic as relevant or not relevant Evaluation Measures 



Mainly interested in achieving a high precision at the top of the ranking Focus on P@N with N ranging from 125

    Results:     Feature selection & Summary statistics Feature selection S2

System

S3

S4

S5

Feature COM HAR HES COM HAR HES COM HAR HES COM HAR HES P@10

0.31 0.23 0.27 0.48 0.49 0.48 0.71 0.69 0.72



Differences are not significant



Combined vocabulary is used

0.8

0.77 0.79

Statistics 

S5 outperform

   other systems

System Images Retrieved Relevant Retrieved P@5 P@10

S1 S2 S3 S4 S5 750 750 750 742 748 393 149 301 494 562 0.53 0.34 0.55 0.72 0.82 0.49 0.31 0.48 0.71 0.80

Results: Systems comparison Tags only Visual Annot. Agg Visual Annot. Visual Annot. + Tags Agg Visual Annot. + Tags

Results: Systems comparison Tags only Visual Annot. Agg Visual Annot. Visual Annot. + Tags Agg Visual Annot. + Tags

H2

}H1

}

H1

Results: Topic analysis S5: aggregated annotations and tags

S1: tagsonly

Avg:  0.80 StD:  0.19

Topic frequency

Topic frequency

Avg:  0.49 StD:  0.24

P@10

P@10

Detect if present results are caused by abnormalities of some topics



Observe a significant and uniform increase in retrieval performance in all topics



Result: Topic analysis MAP Histogram per Topic

MAP

Tags only Visual Annot. Agg Visual Annot. Visual Annot. + Tags Agg Visual Annot. + Tags

Topic

Result: Topic analysis MAP Histogram per Topic Tags only Visual Annot. Agg Visual Annot. Visual Annot. + Tags Agg Visual Annot. + Tags

MAP

Butterfly

Topic

Conclusions and Future Work 









We have proposed to use rank aggregation to combine the results sets of a contentbased image retrieval system that uses the visual annotations to retrieve similar images Our results clearly show that the quality of the results significativily improves when applying the aggregation Aggregation strategies that can be applied as a pre retrieval fashion, rather than postretrieval Learn to select good visual annotations Detect different senses of the same keyword using a visual analysis