Browsing and searching in image databases using image content

Browsing and searching in image databases using image content Professor Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The ...
Author: Milton Short
1 downloads 3 Views 12MB Size
Browsing and searching in image databases using image content Professor Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis

Outline Near duplicate detection Visual similarity, semantic similarity Challenges Automated Annotation MAREC collection Observations on patent image retrieval Browsing

Query with images

Built by the monks and nuns of the Nipponzan Myohoji, this was the first Peace Pagoda to be built in the western hemisphere and enshrines sacred relics of Lord Buddha. The Inauguration ceremony, on 21st September 1980, was presided over by the late most Venerable Nichidattsu Fujii, founder and …

Image retrieval Find near-duplicate images or find visually similar images or find semantically similar images

?

[illustration from Suzanne Little, SocialLearn project]

Similarity: Features and distances

o

x

x

x x

Feature space

Cool access mode using a photo!

Snaptell: Book, CD and DVD covers

Snaptell: Book, CD and DVD covers

Snaptell: Book, CD and DVD covers

Snaptell: Book, CD and DVD covers

Snaptell: Book, CD and DVD covers

Snaptell: Book, CD and DVD covers

doi: 10.2200/S00244ED1V01Y200912ICR010

The Open University, UK: Spot & Search

Scott Forrest: E=MC squared "Between finished surface texture and raw quarried stone. Between hard materials and soft concepts. Between text and context." More information [with Suzanne Little]

The Open University, UK: Spot & Search

[with Suzanne Little]

Near-duplicate detection

Works well in 2d: CD covers, wine labels, signs, ... Less so in near 2d: buildings, vases, … Not so well in 3d: faces, complex objects, ...

Near-duplicate detection: Applications Tourism Museum guide (“What is this?” Linking learning objects (slide and video, figures, …) Link into context of an image Plagiarism detection Word spotting for handwritten documents “What is this?” (parts of an engine, ...) Shopping (where does this [take photo] sell for less?)

Near-duplicate detection: How does it work? Fingerprinting technique 1 Compute salient points 2 Extract “characteristics” from vincinity (feature) 3 Make invariant under rotation & scaling 4 Quantise: create visterms 5 Index as in text search engines 6 Check/enforce spatial constraints after retrieval

NDD: Compute salient points and features

Eg, SIFT features: each salient point described by a feature vector of 128 numbers; the vector is invariant to scaling and rotation

[Lowe2004 – http://www.cs.ubc.ca/~lowe/keypoints/]

NDD: Keypoint feature space clustering All keypoint features of all images in collection x xx xx

x

x x xx x x xx x x

xx x

x x xx x x x x x x xx x x xx x x x xx x x

Nine Geese Are Running Under A Wharf And Here I Am Millions of “visterms”

Feature space

NDD: Encode all images with visterms

Jkjh Geese Bjlkj Wharf Ojkkjhhj Kssn Klkekjl Here Lkjkll Wjjkll Kkjlk Bnm Kllkgjg Lwoe Boerm ...

NDD: query like text

At query time compute salient points, keypoint features and visterms Query against database of images represented as bag of vistems

Query Joiu Gddwd Bipoi Wueft Oiooiuui Kwwn Kpodoip Hdfd Loiopp Wiiopp Koipo Bnm Kppoyiy Lsld Bldfm ...

[with Suzanne Little]

NDD: Check spatial constraints

[with Suzanne Little, SocialLearn project]

Near-duplicate detection: Summary Fingerprinting technique 1 Compute salient points 2 Extract “characteristics” from vincinity (feature) 3 Make invariant under rotation & scaling 4 Quantise: create visterms 5 Index as in text search engines 6 Check/enforce spatial constraints after retrieval

Where are the challenges?

Image content analysis (diversity, semantic gap, polysemy) Mapping to higher level (semantic representation) Automation, scale and coping with errors

Which features?

The semantic gap

1m pixels with a spatial colour distribution faces & vase-like object victory, triumph, ...

[http://www.sierraexpressmedia.com/archives/11148]

Polysemy

Outline Near duplicate detection Visual similarity, semantic similarity Challenges Automated Annotation MAREC collection Observations on patent image retrieval Browsing

From visual to semantic similarity

{Snow, ice, bear, grass, …

{City, water, building, sky, …}

Example: grass classifier very likely may be probably not

Modelling semantic concepts

From training data outdoor

sky

town

grass

crowd

tarmac

faces

Automated annotation as machine translation water

grass trees

the beautiful sun

le soleil beau

Automated annotation as machine learning Probabilistic models: maximum entropy models models for joint and conditional probabilities evidence combination with Support Vector Machines

[with [with [with [with [with [with

Magalhães, SIGIR 2005] Yavlinsky and Schofield, CIVR 2005] Yavlinsky, Heesch and Pickering: ICASSP May 2004] Yavlinsky et al CIVR 2005] Yavlinsky SPIE 2007] Magalhães CIVR 2007, best paper]

33

A simple Bayesian classifier

Use training data J and annotations w P(w|I) is probability of word w given unseen image I The model is an empirical distribution (w,J)

34

Automated annotation [with Yavlinsky et al CIVR 2005] [with Yavlinsky SPIE 2007] [with Magalhaes CIVR 2007, best paper]

Automated: water buildings city sunset aerial

[Corel Gallery 380,000]

Machine Learning is hard, and data may not help

The good door

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]

The bad wave

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]

The ugly iceberg

[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]

Automated annotation: State of the art 2008 Effectiveness

(Amir, Argillander et al 2005)

(Yavlinsky, Schofield, Rüger 2005) (Feng, Lavrenko, Manmatha 2004) (Carneiro, Vasconcelos 2005) (Magalhaes, Rüger 2007)

(Snoek, Worring, et al. 2006) (Snoek, Gemert, et al. 2006)

(Barnard, Forsyth 2001) (Duygulu, et al, 2002)

Flexibility

Efficiency

(Blei, Jordan, 2003) (Lavrenko, Manmatha, Jeon, 2003) (Jeon, Lavrenko, Manmatha, 2003)

Handling annotation errors

Abstract concepts (victory, triumph, religion) Complex concepts (barbecue, car) Simple material (grass, tarmac, sky)

Salient keypoints

Outline Near duplicate detection Visual similarity, semantic similarity Challenges Automated Annotation MAREC collection Observations on patent image retrieval Browsing

Horizontal vs vertical

Biology Sciences Medicine Cultural heritage E-learning Entertainment Tourism Media archives Retail Personal collections Patents

IR, CV, ML, NLP, IE, Clustering, Systems, …

MAREC research collection and CLEF-IP

Reserach corpus of 19m patents: XML docs + tif images CLEF-IP track 2011 - Image-based prior art search - Image classification

MAREC subset

212,867 patents (34k EP; 167k US; 12k WO) 2,586,767 tifs as attachments (min 1 per patent) A43B characteristic features of footwear; parts of footwear A61B diagnosis; surgery; identification H01L semiconductor devices; electric solid state devices not otherwise provided for

Randomly picked image

Randomly picked images (zooming out)

Randomly picked images (zooming out)

Randomly picked images (zooming out)

Randomly picked images (zooming out)

Randomly picked images (zooming out)

Randomly picked images (zooming out)

Randomly picked 108 images

Randomly picked images (another set of 108)

Randomly picked images (3rd set of 108)

Randomly picked images (last (another set of set108) of 108)

Randomly picked images (zooming in)

Randomly picked images (zooming in)

Randomly picked images (zooming in)

Randomly picked images (zooming in)

Randomly picked images (zooming in)

Randomly picked images (zooming in)

Another random image

What have we seen?

All black/white 1-bit bitonal Different orientations, aspect ratio, sizes Range of complexity (a few letters to complex sketches) Range of different types – but predominantly Abstract drawings Graphs Flow charts Gene sequences Program listings Symbols Chemical structures Tables Mathematics

Bitonal scans

WO/001999/01/47/63/imgf000005_0001.tif (104x59)

Recommendation: Scan colour at depth of 3x8bit

Greyscale is important for different thresholding Colour may help in various ways (significance, back side shines through)

Ideal world?

Semantically marked up entities (text & drawings obsolete) Chemical Markup Language (CML) Scalable Vector Graphics (SVG) Flowchart markup language? Tables and graphs?

Can these be extracted from pdfs, original documents?

Automated processing

Need dedicated features for bitonal images! What is the analogon for SIFT-type of special features? I predict near duplicate detection is possible (and useful!) Automated extraction of structure from scans? Word spotting

Bitonal features

For example Freeman chain codes FFT Region moments Segmentation?

#tifs/patent Zipfian? No fat tail?

Assuming images were oversegmented we get

US/020030/02/70/16 (4871 tifs – 12 randomly selected)

EP/000001/71/97/48 (3061 tifs – 12 randomly selected)

US/020020/03/88/67 (2181 tifs – 12 randomly selected)

US/020050/00/32/30 (1912 tifs – 12 randomly selected)

US/020050/02/59/92 (1911 tifs – 12 randomly selected)

US/020060/15/41/05 (1711 tifs – 12 randomly selected)

EP/000001/24/62/47 (1637 tifs – 12 randomly selected)

EP/000001/24/62/47 (1637 tifs – detail of random image)

US/020020/15/45/56 (1634 tifs – 12 randomly selected)

WO/001992/01/76/21 (1581 tifs – 12 randomly selected)

US/020060/28/74/98 (1504 tifs tifs – 12 randomly selected)

Randomly selected single-image patents

Randomly selected single-image patents

Outline Near duplicate detection Visual similarity, semantic similarity Challenges Automated Annotation MAREC collection Observations on patent image retrieval Browsing

6 degrees of separation Stanley Milgram experiment

Connect similar objects Focus

Structure

[with Heesch ECIR 2004, CIVR 2005] [with Heesch et al ACM MM 2006, TRECVID 2003-2004] [PhD thesis Heesch, nominated for best UK thesis 2005]

Colour Text

Browsing example TRECVID topic 102: Find shots from behind the pitcher in a baseball game as he throws a ball that the batter swings at

In your mind

[Alexander May: SET award 2004: best use of IT – national prize] [dataset TRECVID 2003]

Conclusions

Browsing and searching in image databases using image content Professor Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis

Suggest Documents