Patent Images A Glass-encased Tool

Patent Images A Glass-encased Tool Mihai Lupu, Allan Hanbury, Florina Piroi Vienna University of Technology Tobias Schleser m2n consulting and develop...
Author: Marjory Gardner
3 downloads 0 Views 1MB Size
Patent Images A Glass-encased Tool Mihai Lupu, Allan Hanbury, Florina Piroi Vienna University of Technology Tobias Schleser m2n consulting and development GmbH Roland Mörzinger, René Schuster Joanneum Research

Motivation • Many patents contain images. On a dataset of 16 million patents: – 28% of patents have at least one image – Average #images/patent: 9.4

• Often images contain important information about the innovation • Very important in a variety of engineering fields (mechanics, electronics, chemistry) • Current patent search tools use only text

Motivation 2 • Almost all image retrieval work has focussed on images containing colour/ texture • Useless in the patent domain

Outline • Related work • IMPEx • Patent Image Processing – Figure Segmentation – Specific processing (Flowcharts)

• Text and Images • The Integrated System (demo) • Conclusion

Image Similarity in Patents • PATSEEK – Extract lines – Graph-based similarity using a “softened” variant of the Hausdorff distance – Huet et al. 2001

• Patmedia – Informatics and Telematics Institute, Greece – Adaptive Hierarchical Density Histogram • Focus on geometry • Sidiropoulos et al., 2011 6

http://mklab-services.iti.gr/patmedia/

PatMedia

Image Mining for Patent Exploration • Make information in patent images accessible • Automatic interlinking of patent text and drawing parts by sub-part segmentation and label identification • Interactive user-guided search with manual feedback provided by the patent expert • Variety of technical drawings: flow charts, block diagrams, time charts and graph plots • Prototype integration into m2n Knowledge Discovery Suite based on MAREC dataset extract and patent pdfs

FIT-IT project IMPEx Austrian Research Promotion Agency (FFG), No. 825846

Figure Extraction • Separation of multiple figures on one page

Variability

10

Types of Patent Image Figure Photo

Diagram

BlockDiagram

State

Flowchart

Circuit

TechnicalDrawing

PlaneView BottomView

ElevationalView

Graph

SideView TopView

Waveform Response

TimeChart

SectionalView

From: S. Vrochidis, S. Papadopoulos, A. Moumtzidou, P. Sidiropoulos, E. Pianta, and I. Kompatsiaris. Towards contentbased patent image retrieval: A framework perspective. World Patent Information, 32(2):94-106, 2010.

12

Chemical Structure Abstract Drawing

Flow Chart

Mathematical Formula Program Code Graph

Gene Sequence

CLEF-IP 2011 Patent Image Classes

Character Table

Flowchart Analysis • Identify – – – – –

Number of nodes Types of nodes Text in nodes Edges Types of edges

• Specific to the domain – Node annotations

Text and Images

The Integrated System

Patent PDFs without metadata for claims, technical devices, measurements, images, ….

The Integrated System

Pages of PDFs converted to bitmap images

The Integrated System

Filtering of pages with images and other page types

vs.



The Integrated System

Optical character recognition, reference detection (Fig., Tab., …)

The Integrated System

Segmentation of images into individual figures for linking figure labels with image content

M. Lupu, R. Mörzinger, T. Schleser et al: "Patent Images - a Glass encased Tool"; 12th International Conference on Knowledge Management and Knowledge Technologies (i-KNOW 2012)

The Integrated System

Classification of figure type e.g. abstract drawing, graph, gene sequence, table, maths, program listing, flow chart, …

R. Mörzinger et al: "Classifying Patent Images"; Conference on Multilingual and Multimodal Information Access Evaluation (CLEF 2011)

The Integrated System

Flow chart analysis with recognition of node types, edges and annotations for semantic processing

Current work for CLEF-IP: Information Retrieval in the Intellectual Property Domain, Flowchart recognition task 2012

The Integrated System • Demo – References as Finding Objects – Semantic Patent Viewer

The Integrated System • Demo

The Integrated System • Demo

Conclusions • Patent images need special treatment – From other images – Between themselves

• First step: integrate text & images through references • Ultimate goal: full semantic search on all types of images • Evaluation efforts at CLEF-IP (in two weeks in Rome)