Browsing and searching in image databases using image content Professor Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis
Outline Near duplicate detection Visual similarity, semantic similarity Challenges Automated Annotation MAREC collection Observations on patent image retrieval Browsing
Query with images
Built by the monks and nuns of the Nipponzan Myohoji, this was the first Peace Pagoda to be built in the western hemisphere and enshrines sacred relics of Lord Buddha. The Inauguration ceremony, on 21st September 1980, was presided over by the late most Venerable Nichidattsu Fujii, founder and …
Image retrieval Find near-duplicate images or find visually similar images or find semantically similar images
?
[illustration from Suzanne Little, SocialLearn project]
Similarity: Features and distances
o
x
x
x x
Feature space
Cool access mode using a photo!
Snaptell: Book, CD and DVD covers
Snaptell: Book, CD and DVD covers
Snaptell: Book, CD and DVD covers
Snaptell: Book, CD and DVD covers
Snaptell: Book, CD and DVD covers
Snaptell: Book, CD and DVD covers
doi: 10.2200/S00244ED1V01Y200912ICR010
The Open University, UK: Spot & Search
Scott Forrest: E=MC squared "Between finished surface texture and raw quarried stone. Between hard materials and soft concepts. Between text and context." More information [with Suzanne Little]
The Open University, UK: Spot & Search
[with Suzanne Little]
Near-duplicate detection
Works well in 2d: CD covers, wine labels, signs, ... Less so in near 2d: buildings, vases, … Not so well in 3d: faces, complex objects, ...
Near-duplicate detection: Applications Tourism Museum guide (“What is this?” Linking learning objects (slide and video, figures, …) Link into context of an image Plagiarism detection Word spotting for handwritten documents “What is this?” (parts of an engine, ...) Shopping (where does this [take photo] sell for less?)
Near-duplicate detection: How does it work? Fingerprinting technique 1 Compute salient points 2 Extract “characteristics” from vincinity (feature) 3 Make invariant under rotation & scaling 4 Quantise: create visterms 5 Index as in text search engines 6 Check/enforce spatial constraints after retrieval
NDD: Compute salient points and features
Eg, SIFT features: each salient point described by a feature vector of 128 numbers; the vector is invariant to scaling and rotation
[Lowe2004 – http://www.cs.ubc.ca/~lowe/keypoints/]
NDD: Keypoint feature space clustering All keypoint features of all images in collection x xx xx
x
x x xx x x xx x x
xx x
x x xx x x x x x x xx x x xx x x x xx x x
Nine Geese Are Running Under A Wharf And Here I Am Millions of “visterms”
Feature space
NDD: Encode all images with visterms
Jkjh Geese Bjlkj Wharf Ojkkjhhj Kssn Klkekjl Here Lkjkll Wjjkll Kkjlk Bnm Kllkgjg Lwoe Boerm ...
NDD: query like text
At query time compute salient points, keypoint features and visterms Query against database of images represented as bag of vistems
Query Joiu Gddwd Bipoi Wueft Oiooiuui Kwwn Kpodoip Hdfd Loiopp Wiiopp Koipo Bnm Kppoyiy Lsld Bldfm ...
[with Suzanne Little]
NDD: Check spatial constraints
[with Suzanne Little, SocialLearn project]
Near-duplicate detection: Summary Fingerprinting technique 1 Compute salient points 2 Extract “characteristics” from vincinity (feature) 3 Make invariant under rotation & scaling 4 Quantise: create visterms 5 Index as in text search engines 6 Check/enforce spatial constraints after retrieval
Where are the challenges?
Image content analysis (diversity, semantic gap, polysemy) Mapping to higher level (semantic representation) Automation, scale and coping with errors
Which features?
The semantic gap
1m pixels with a spatial colour distribution faces & vase-like object victory, triumph, ...
[http://www.sierraexpressmedia.com/archives/11148]
Polysemy
Outline Near duplicate detection Visual similarity, semantic similarity Challenges Automated Annotation MAREC collection Observations on patent image retrieval Browsing
From visual to semantic similarity
{Snow, ice, bear, grass, …
{City, water, building, sky, …}
Example: grass classifier very likely may be probably not
Modelling semantic concepts
From training data outdoor
sky
town
grass
crowd
tarmac
faces
Automated annotation as machine translation water
grass trees
the beautiful sun
le soleil beau
Automated annotation as machine learning Probabilistic models: maximum entropy models models for joint and conditional probabilities evidence combination with Support Vector Machines
[with [with [with [with [with [with
Magalhães, SIGIR 2005] Yavlinsky and Schofield, CIVR 2005] Yavlinsky, Heesch and Pickering: ICASSP May 2004] Yavlinsky et al CIVR 2005] Yavlinsky SPIE 2007] Magalhães CIVR 2007, best paper]
33
A simple Bayesian classifier
Use training data J and annotations w P(w|I) is probability of word w given unseen image I The model is an empirical distribution (w,J)
34
Automated annotation [with Yavlinsky et al CIVR 2005] [with Yavlinsky SPIE 2007] [with Magalhaes CIVR 2007, best paper]
Automated: water buildings city sunset aerial
[Corel Gallery 380,000]
Machine Learning is hard, and data may not help
The good door
[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]
The bad wave
[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]
The ugly iceberg
[beholdsearch.com, 19.07.2007, now behold.cc (Yavlinksy)] [images: Flickr creative commons]
Automated annotation: State of the art 2008 Effectiveness
(Amir, Argillander et al 2005)
(Yavlinsky, Schofield, Rüger 2005) (Feng, Lavrenko, Manmatha 2004) (Carneiro, Vasconcelos 2005) (Magalhaes, Rüger 2007)
(Snoek, Worring, et al. 2006) (Snoek, Gemert, et al. 2006)
(Barnard, Forsyth 2001) (Duygulu, et al, 2002)
Flexibility
Efficiency
(Blei, Jordan, 2003) (Lavrenko, Manmatha, Jeon, 2003) (Jeon, Lavrenko, Manmatha, 2003)
Handling annotation errors
Abstract concepts (victory, triumph, religion) Complex concepts (barbecue, car) Simple material (grass, tarmac, sky)
Salient keypoints
Outline Near duplicate detection Visual similarity, semantic similarity Challenges Automated Annotation MAREC collection Observations on patent image retrieval Browsing
Horizontal vs vertical
Biology Sciences Medicine Cultural heritage E-learning Entertainment Tourism Media archives Retail Personal collections Patents
IR, CV, ML, NLP, IE, Clustering, Systems, …
MAREC research collection and CLEF-IP
Reserach corpus of 19m patents: XML docs + tif images CLEF-IP track 2011 - Image-based prior art search - Image classification
MAREC subset
212,867 patents (34k EP; 167k US; 12k WO) 2,586,767 tifs as attachments (min 1 per patent) A43B characteristic features of footwear; parts of footwear A61B diagnosis; surgery; identification H01L semiconductor devices; electric solid state devices not otherwise provided for
Randomly picked image
Randomly picked images (zooming out)
Randomly picked images (zooming out)
Randomly picked images (zooming out)
Randomly picked images (zooming out)
Randomly picked images (zooming out)
Randomly picked images (zooming out)
Randomly picked 108 images
Randomly picked images (another set of 108)
Randomly picked images (3rd set of 108)
Randomly picked images (last (another set of set108) of 108)
Randomly picked images (zooming in)
Randomly picked images (zooming in)
Randomly picked images (zooming in)
Randomly picked images (zooming in)
Randomly picked images (zooming in)
Randomly picked images (zooming in)
Another random image
What have we seen?
All black/white 1-bit bitonal Different orientations, aspect ratio, sizes Range of complexity (a few letters to complex sketches) Range of different types – but predominantly Abstract drawings Graphs Flow charts Gene sequences Program listings Symbols Chemical structures Tables Mathematics
Bitonal scans
WO/001999/01/47/63/imgf000005_0001.tif (104x59)
Recommendation: Scan colour at depth of 3x8bit
Greyscale is important for different thresholding Colour may help in various ways (significance, back side shines through)
Ideal world?
Semantically marked up entities (text & drawings obsolete) Chemical Markup Language (CML) Scalable Vector Graphics (SVG) Flowchart markup language? Tables and graphs?
Can these be extracted from pdfs, original documents?
Automated processing
Need dedicated features for bitonal images! What is the analogon for SIFT-type of special features? I predict near duplicate detection is possible (and useful!) Automated extraction of structure from scans? Word spotting
Bitonal features
For example Freeman chain codes FFT Region moments Segmentation?
#tifs/patent Zipfian? No fat tail?
Assuming images were oversegmented we get
US/020030/02/70/16 (4871 tifs – 12 randomly selected)
EP/000001/71/97/48 (3061 tifs – 12 randomly selected)
US/020020/03/88/67 (2181 tifs – 12 randomly selected)
US/020050/00/32/30 (1912 tifs – 12 randomly selected)
US/020050/02/59/92 (1911 tifs – 12 randomly selected)
US/020060/15/41/05 (1711 tifs – 12 randomly selected)
EP/000001/24/62/47 (1637 tifs – 12 randomly selected)
EP/000001/24/62/47 (1637 tifs – detail of random image)
US/020020/15/45/56 (1634 tifs – 12 randomly selected)
WO/001992/01/76/21 (1581 tifs – 12 randomly selected)
US/020060/28/74/98 (1504 tifs tifs – 12 randomly selected)
Randomly selected single-image patents
Randomly selected single-image patents
Outline Near duplicate detection Visual similarity, semantic similarity Challenges Automated Annotation MAREC collection Observations on patent image retrieval Browsing
6 degrees of separation Stanley Milgram experiment
Connect similar objects Focus
Structure
[with Heesch ECIR 2004, CIVR 2005] [with Heesch et al ACM MM 2006, TRECVID 2003-2004] [PhD thesis Heesch, nominated for best UK thesis 2005]
Colour Text
Browsing example TRECVID topic 102: Find shots from behind the pitcher in a baseball game as he throws a ball that the batter swings at
In your mind
[Alexander May: SET award 2004: best use of IT – national prize] [dataset TRECVID 2003]
Conclusions
Browsing and searching in image databases using image content Professor Stefan Rüger Multimedia and Information Systems Knowledge Media Institute The Open University http://kmi.open.ac.uk/mmis