Patent Images A Glass-encased Tool Mihai Lupu, Allan Hanbury, Florina Piroi Vienna University of Technology Tobias Schleser m2n consulting and development GmbH Roland Mörzinger, René Schuster Joanneum Research
Motivation • Many patents contain images. On a dataset of 16 million patents: – 28% of patents have at least one image – Average #images/patent: 9.4
• Often images contain important information about the innovation • Very important in a variety of engineering fields (mechanics, electronics, chemistry) • Current patent search tools use only text
Motivation 2 • Almost all image retrieval work has focussed on images containing colour/ texture • Useless in the patent domain
Outline • Related work • IMPEx • Patent Image Processing – Figure Segmentation – Specific processing (Flowcharts)
• Text and Images • The Integrated System (demo) • Conclusion
Image Similarity in Patents • PATSEEK – Extract lines – Graph-based similarity using a “softened” variant of the Hausdorff distance – Huet et al. 2001
• Patmedia – Informatics and Telematics Institute, Greece – Adaptive Hierarchical Density Histogram • Focus on geometry • Sidiropoulos et al., 2011 6
http://mklab-services.iti.gr/patmedia/
PatMedia
Image Mining for Patent Exploration • Make information in patent images accessible • Automatic interlinking of patent text and drawing parts by sub-part segmentation and label identification • Interactive user-guided search with manual feedback provided by the patent expert • Variety of technical drawings: flow charts, block diagrams, time charts and graph plots • Prototype integration into m2n Knowledge Discovery Suite based on MAREC dataset extract and patent pdfs
FIT-IT project IMPEx Austrian Research Promotion Agency (FFG), No. 825846
Figure Extraction • Separation of multiple figures on one page
Variability
10
Types of Patent Image Figure Photo
Diagram
BlockDiagram
State
Flowchart
Circuit
TechnicalDrawing
PlaneView BottomView
ElevationalView
Graph
SideView TopView
Waveform Response
TimeChart
SectionalView
From: S. Vrochidis, S. Papadopoulos, A. Moumtzidou, P. Sidiropoulos, E. Pianta, and I. Kompatsiaris. Towards contentbased patent image retrieval: A framework perspective. World Patent Information, 32(2):94-106, 2010.
12
Chemical Structure Abstract Drawing
Flow Chart
Mathematical Formula Program Code Graph
Gene Sequence
CLEF-IP 2011 Patent Image Classes
Character Table
Flowchart Analysis • Identify – – – – –
Number of nodes Types of nodes Text in nodes Edges Types of edges
• Specific to the domain – Node annotations
Text and Images
The Integrated System
Patent PDFs without metadata for claims, technical devices, measurements, images, ….
The Integrated System
Pages of PDFs converted to bitmap images
The Integrated System
Filtering of pages with images and other page types
vs.
The Integrated System
Optical character recognition, reference detection (Fig., Tab., …)
The Integrated System
Segmentation of images into individual figures for linking figure labels with image content
M. Lupu, R. Mörzinger, T. Schleser et al: "Patent Images - a Glass encased Tool"; 12th International Conference on Knowledge Management and Knowledge Technologies (i-KNOW 2012)
The Integrated System
Classification of figure type e.g. abstract drawing, graph, gene sequence, table, maths, program listing, flow chart, …
R. Mörzinger et al: "Classifying Patent Images"; Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2011)
The Integrated System
Flow chart analysis with recognition of node types, edges and annotations for semantic processing
Current work for CLEF-IP: Information Retrieval in the Intellectual Property Domain, Flowchart recognition task 2012
The Integrated System • Demo – References as Finding Objects – Semantic Patent Viewer
The Integrated System • Demo
The Integrated System • Demo
Conclusions • Patent images need special treatment – From other images – Between themselves
• First step: integrate text & images through references • Ultimate goal: full semantic search on all types of images • Evaluation efforts at CLEF-IP (in two weeks in Rome)