Fundamentals and Applications of Image Retrieval: An Overview

Fundamentals and Applications of Image Retrieval Alaa H. Halawani, Alexandra Teynor, Lokesh Setia, Gerd Brunner, Hans Burkhardt Fundamentals and App...

Author: Mervin Arnold

5 downloads 0 Views 5MB Size

Report

Download PDF

Recommend Documents

Image and Video Fundamentals

Learning an Image Manifold for Retrieval

An overview of JML tools and applications

Fundamentals and applications of electrochemistry

DIODE APPLICATIONS: AN OVERVIEW

GRASS applications: an overview

Multimedia Information Retrieval Systems: An Overview

Image Capture and Processing: an Overview

Content Based Image Retrieval using Color Boosted Salient Points and Shape features of an image

Content Based Image Retrieval

Fundamentals and Applications of Micro- and Nanofibers

Fundamentals of Digital Image Processing

GENETIC ALGORITHMS AND THEIR APPLICATIONS: AN OVERVIEW

POTENTIAL MEDICAL APPLICATIONS OF EPIGENETICS: AN OVERVIEW

2. Digital Image Fundamentals

Detecting and Aligning Faces by Image Retrieval

Diagram Query and Image Retrieval in Design

Joint Image and Word Sense Discrimination For Image Retrieval

Automatic Image Annotation and Retrieval: A Survey

Semi-Automated Magnetic Image Retrieval

Sketch Based Image Retrieval System

Color Sketch Based Image Retrieval

An overview of reservoir computing: theory, applications and implementations

Fundamentals and Applications of Image Retrieval

Alaa H. Halawani, Alexandra Teynor, Lokesh Setia, Gerd Brunner, Hans Burkhardt

Fundamentals and Applications of Image Retrieval: An Overview Images play an important role in conveying information. With the rapid development of computer technology, the amount of digital imagery data is rapidly increasing. There is an inevitable need for efficient methods that can help in searching for and retrieving the visual information that a user is interested in. An ever flourishing retrieval technique is contentbased image retrieval (CBIR), where the visual cues found in the images are exploited for representing and retrieving the images. In this article we give an overview of the main concepts of CBIR. We concentrate mainly on the feature extraction stage as it is considered the core of the CBIR engines. Areas where the CBIR strategies are applicable are also reviewed.

1

Introduction

As we say, a picture is worth a thousand words. Humans have often used drawings to convey information. The cave men have told us about their dangerous hunting trips through the illustrations on the stone walls. The Pharaohs have illustrated their customs of praying on the walls of their temples. Nowadays, visual information can be found in most (if not all) areas of life. As the impact of computers on our lives is becoming more and more significant, much of the information, including pictures, is being digitized. Digital imagery is getting more and more popular in many perspectives. Private photo collections, medical imaging, and geographical information systems are only some to mention. As the computation power is growing and the cost of storage media is decreasing, the size of digital image collections is increasing rapidly. There is a need for techniques that enables us to access and retrieve the huge amount of information embedded in these collections, methods that can present us the information efficiently and conveniently. Simple manual browsing is getting cumbersome even with private collections. Automatic image retrieval is inevitable.

14

1.1 Text-Based Image Retrieval Early work on image retrieval was based on textual annotation of the images, in which keywords were used to describe the image content. Retrieval was then done using text-based search approaches [Chang & sun Fu 1980, Roussopoulos et al. 1988]. Text-based systems suffer from major drawbacks. First of all, these systems require too much effort and time for manual image annotation. The problem becomes more severe as the image collections grow. Secondly, the description of the image content is subjective to human perception [Rui et al. 1999]; different people may end up with different descriptions for the content of the image in hand. Moreover, any image information that the annotator forgets, ignores or considers as unimportant at the time of annotation cannot be retrieved later. Another problem is that manual text annotation is valid only for the language used for the purpose of annotation. Other people that do not have a background in the used language(s) are not able to use the text-based retrieval systems.

1.2 Content-Based Image Retrieval In the last years, the focus of the research on image retrieval has been shifted towards exploiting the visual cues that exist in the contents of the image. This is called content-based image retrieval (CBIR). Figure 1 is an abstraction of a typical CBIR system. Different kinds of visual features such as color, shape and texture, are extracted from the images. The result is then a multidimensional feature vector that represents the image content (see section 3). The set of feature vectors from all images in the collection in hand is stored in a feature database. This is done, usually, offline. If new images are added to the collection, then their feature vectors are extracted and added to the feature database. For accomplishing the retrieval task, the user communicates with the retrieval system by feeding his/her query, which can be for example in the form of a sample image, a sketch or a drawing of what the user is searching for. The feature extraction procedure is applied to the query image online, resulting in the compact representation in the form of a feature vector that is then compared with the feature vectors stored in the feature database. This is done to determine similarities between the query feature vector and the database feature vectors. Retrieval is then performed possibly using an indexing scheme that provides an efficient way to search the image database. Some systems allow the user to affect the retrieval process by integrating the user‘s feedback in the hope of generating more meaningful

Fig. 1: Content-based image retrieval system

Datenbank-Spektrum 18/2006

Fundamentals and Applications of Image Retrieval

retrieval results. This is called relevance feedback.

2

Query Formats

The user interacts with a CBIR system through specifying a query. There are different ways in which a user can express his/her query. One possibility is to enable the user to sketch the query specifying the image elements and their spatial relationships. For example, if the wish is to search for country images with grass and sky, the user would draw a large blue region on the top of a smaller green region. Examples of such systems are [Jacobs et al. 1995, Wang et al. 1997b]. The most common type of queries is the query by image example, where the user feeds a sample image and asks for the most similar images (see figure 1). Another possibility is to use multi-image queries [Iqbal & Aggarwal 2003] where more than one query image is used to define the search goal more precisely. Using multi-image queries may be sometimes advantageous in supplying more detailed knowledge representation [Iqbal & Aggarwal 2003]. This can overcome the limitation on the specification of image content using a single query image, the matter that may lead to better results. The concept can be further extended by letting the user specify positive and negative image examples. This is applied for example in some systems that incorporate relevance feedback [Setia et al. 2005]. Some systems use a visual thesaurus to help the user to formulate his query [Fauqueur & Boujemaa 2003]. The assumption is that the user has the query in his mind without having a specific example image that expresses his goal. The thesaurus shows representative regions of the database. During the query, the user selects positive categories and negative categories from the thesaurus. The query composition is then defined as finding the images composed of regions in the positive categories and no region in the negative categories.

3

Features

Feature extraction is the core of the CBIR. The raw image data is not used directly in most computer vision tasks. This has two reasons. First of all, the high dimensionality of the image makes it unreasonable to use the whole image. Second-

Datenbank-Spektrum 18/2006

ly, a lot of the information embedded in the image is redundant and/or unuseful. Instead of using the whole image, only an expressive representation of the most important information is extracted. The process of finding the expressive representation is known as feature extraction and the resulting representation is called the feature vector. Feature extraction can be understood as the act of mapping the image from image space to the feature space. The task of finding good features that adequately represent an image is still a challenging task. In literature, a wide variety of features are used for image retrieval. The used features vary depending on the purpose of the retrieval task. Concerning image content, one can distinguish between visual and semantic content. Features usually represent the visual content. Visual content can be further classified into general or domain-specific. For example the features that are used for searching in a diverse and general image database would be commonly features that represent the general visual content like color, texture, and shape. On the other hand, the features used for retrieval tasks like searching for human faces are domain-specific and may include domain knowledge [Rui et al. 1999]. The semantic content of an image is not easy to extract. Annotation and/or specialized inference procedures based on the visual content help to some extent in obtaining the semantic content.

3.1 Invariant Features For many applications the extracted features should remain unchanged (i.e. they should be invariant) if the image content is subjected to a transform that changes its current situation. Transformations can be both geometric (e.g. rotation and scaling) and/or photometric (change in lighting conditions). Consider an image retrieval system. The aim is to retrieve images with similar contents independent of transformations applied to the content of the image. For example, the images shown in figure 2 should be considered similar by the retrieval system although the main object was transformed. This means that the representation of both images should remain unchanged despite of the transformation applied. In the following we will concentrate on geometrical transformations. Given a transformation

Fig. 2: The two images should be considered similar by a retrieval system, despite the geometrical transformation applied to the image content.

group G with elements g ∈ G acting on an image S, we would like to have a mapping F that satisfies the following Equation [Schulz-Mirbach 1995]:

F(gS) = F(S), ∀g ∈ G gS means a group element g acting on the image S. i.e., we are looking for a transformation that maps all images of an equivalence class from the image space to the same point in the feature space [Burkhardt & Siggelkow 2000]. The above equation gives the necessary condition to achieve invariance. Different geometrical transformation groups include the group of translations, the group of Euclidean motion, which consists of translation and rotation, the group of similarities, which extends the Euclidean group by adding scaling, and the affine transformation group where the shear is an additional degree of freedom. Sometimes practical situations occur in which global invariance is not wanted, but only adjustable robustness against local transformations of the patterns. For instance, in optical character recognition small rotations of a letter are acceptable, but large rotations change class memberships like Z → N, M → W, 6 → 9 etc. Similarly, too large horizontal stretching can convert a slightly bent I to L, C or J. In these cases, there is a need for a generalization of the invariant features so that they are invariant/robust with respect to subsets of the transformation group [Haasdonk et al. 2004]. 3.2 Global vs. Local Features In general, image features can be either local or global. If the features are extracted from the visual content of the entire image, then these features are called global features. Global features have been used successfully for image retrieval.

15

Fundamentals and Applications of Image Retrieval

The easiest and most famous example is the global color histogram [Swain & Ballard 1991]. The main problem of global features is that the resulting description cannot differentiate between different image parts like the object of interest and the background. Therefore, they are usually not suitable for tasks like partial image matching and object recognition or retrieval in cluttered and complex scenes. In contrast to global features, it is possible to extract only local features from regions of interest or objects in the image and use this information to try to solve problems like the above-mentioned. The main problem of most systems depending on this scheme is the required preprocessing, namely image segmentation to determine the regions of interest. This is not a simple task. Some researchers argue that coarse or inaccurate segmentation is enough for the task of retrieving images in a general database. This kind of segmentation is much easier and faster to accomplish than accurate segmentation of regions. Examples are systems in [Wang et al. 2001, Fauqueur & Boujemaa 2004]. However, for domain-specific applications, like object matching, this scheme would generally fail. As an alternative, one can consider extracting features from patches around image pixels ending up with a set of local feature vectors, each of which describes the local characteristics around an image pixel. Going in this direction, one can immediately observe that extracting local feature vectors around all image pixels is usually too expensive in terms of extraction time, storage and time needed for matching. Besides, it is not necessary to consider all pixels because of the redundant and/or worthless information. Taking these facts into consideration, a subset of the image pixels should be used for the computation of local features. These pixels should represent, together with their neighborhood, the most important visual information in an image. The term usually referred to in literature to describe this set of points is interest points. A lot of algorithms have been developed by researchers for the purpose of detecting and extracting the interest points [Lowe 2004, Harris & Stephens 1988, Loupias & Sebe 1999, Mikolajczyk & Schmid 2004]. Having identified these points, a feature vector is extracted around each interest

16

point (see fig. 3). These are usually called local descriptors because they characterize the local neighborhood of a point. During the retrieval process, the similarity between a pair of images is determined by the number of matches found between their feature vectors. In the recent years, this scheme has gained a great attention as it possesses several advantages that make it very well adapted to the tasks of matching, retrieval and recognition. First of all, the local features extracted around the points are usually robust to the transformations applied to the image content. Moreover, the scheme is also robust to partial occlusion, clutter, and changes in the background, as only corresponding local features in different scenes should match. This also eliminates the need for any prior segmentation. Examples of systems that adopt this scheme are [Schmid & Mohr 1997, Lowe 2004, Halawani & Tamimi 2006, Mikolajczyk & Schmid 2005]. However, the above-mentioned advantages come at the cost of storage requirements and computation and matching complexities, although this is getting less important with the growth of the computation power and the decrease of storage costs.

3.3 Features in CBIR In the following, we give a review of the features used in the literature for CBIR. 3.3.1 Color Features

Color is an active area of research in CBIR. As it has a three-dimensional domain, it is expected to return better results compared to the one-dimensional grayvalue images. Colors can be equivalently represented in different color spaces. Examples of such spaces are the RGB, the HSV, and the CIE color spaces. A very compact representation of the image is through the use of color moments as done by Stricker and Orengo [Stricker & Orengo 1995]. The idea is that any color distribution can be described by its moments. Only the low-order moments (mean, variance and skewness) are used since most of the information regarding the color distribution is accumulated in these moments. As the representation is very compact, the discrimination power is expected to be low. In [Sebe et al., 2000], the same principle was used but on a local basis considering

[ ][ ][ ]

[]

Local descriptors (feature vectors) extracted around interest points

Fig. 3: After identifying the interest points, local descriptors that characterize the neighborhood of the points are extracted.

only the neighborhood of a set of interest points during the computation. The most commonly used color feature is the color histogram. Swain and Ballard [Swain & Ballard 1991] have proposed the use of color histograms for color indexing. The color histogram was used, together with other features, in the QBIC (Query By Image Content) retrieval system [Flickner et al. 1995]. Jain and Vailaya [Jain & Vailaya 1996] have used the color histogram together with a shape histogram (based on edge orientation) for retrieving images in a trademark database. Color histograms are rotation and translation invariant. The main disadvantage of the conventional color histograms is the ignorance of any spatial relations. Many contributions tried to alleviate this problem. For example, the color correlogram [Huang et al. 1997] introduces (in addition to color distribution) geometric information through keeping track of the number of pairs of certain colored pixels that occur at certain separation distances in the image. Pass et al. [Pass et al. 1996] proposed the color coherence vector (CCV) which splits the pixels in a given bin of the histogram into two classes: coherent or incoherent based on whether or not a pixel is a part of a large similarlycolored region. This makes it possible to distinguish between scattered pixels and consistent regions, which leads to better retrieval results. Siggelkow et al. [Siggelkow et al. 2001] use the principle of integral invariants to construct a global invariant feature histogram. Integral in-

Datenbank-Spektrum 18/2006

Fundamentals and Applications of Image Retrieval

variants capture the local structure of the neighborhood around the points where they are computed. The idea was extended in [Halawani & Burkhardt 2004, Halawani & Burkhardt 2005] by constructing histograms of local integral invariants that are computed around interest points. Other techniques try to incorporate color-spatial information through the division of the image in different subblocks where color features are then extracted from each subblock. In [Gong et al. 1995] the image is split into nine equal partitions. A local histogram is computed from each partition in addition to a global histogram. For retrieval, the user can chose to match the global histogram or the 9 local histograms or to combine all of the histograms. In [Lu et al. 1994] a quadtree-based color layout approach was proposed. A quadtree structure is built from the image and a multi-level histogram is then constructed. Examples of other techniques are the »color signature« [Chua et al. 1997], the color sets [Smith & Chang 1996], and segmentation [Wang et al. 2001, Fauqueur & Boujemaa 2004, Fauqueur & Boujemaa 2003]. 3.3.2 Texture Features

Roughly, texture is a measure that expresses quantities like smoothness, coarseness and regularity by studying the variation of the intensity of a surface. Texture is another important property that is used in CBIR. In [Haralick et al. 1973], the use of the cooccurrence matrix representation of texture features was introduced. The matrix computes the cooccurrence of different gray levels at a certain distance and orientation. Features that represent the texture are then extracted from the matrix. Example features are contrast, correlation, and entropy. Other well-known texture features are the Tamura features [Tamura et al. 1978] that are based on the human perception of texture. Tamura features include coarseness, contrast, directionality, linelikeness, roughness, and regularity. Multiresolution filtering techniques, such as Gabor filters and wavelet transform, are popular for texture feature extraction. They decompose an image resulting in a set of filtered images, each of which represents image information at a defined orientation and scale. Statistics of the filtered images (e.g. mean and standard deviation) can be used as features

Datenbank-Spektrum 18/2006

[Manjunath & Ma 1996]. In other contributions, the most significant coefficients of the transform [Jacobs et al. 1995, Wang et al. 1997b] or their energy properties [Albuz et al. 1998] are used to create a compact representation of an image. A simple but yet effective method for constructing textural features is through the use of the local binary pattern operator [Ojala et al. 2000], which maps the relations between the center pixel and its neighborhood into a binary pattern that is invariant against monotonic grayscale transformation. [Schael 2005] extends this work to be rotation invariant and to alleviate the sensitivity of the operator to noise. Histograms based on these features were used for CBIR [Siggelkow et al. 2001, Halawani & Burkhardt 2004]. Schmid and Mohr [Schmid & Mohr 1997] were among the first to use local texture descriptors for the retrieval task. They have used rotation-invariant local feature vectors based on Gaussian derivative local measurements around interest points. Gouet and Boujemaa [Gouet & Boujemaa 2001] extended this work to color images using color differential invariants [Montesinos et al. 1998]. 3.3.3 Shape Features

Shape-based features have also been used in image retrieval applications. Shape features are usually used to describe segmented regions or objects. So, their success depends on the accuracy of the segmentation scheme used. Shape features are mainly extracted from object boundaries or from the region that the object occupies. Moment invariants are an example of region-based shape features. Seven moment invariants were introduced in [Hu 1962]. Fourier descriptors represent boundary-based shape features. They describe the shape of an object with the Fourier transform of its boundary [Persoom & Fu 1977]. Affine invariant Fourier descriptors were proposed in [Arbter et al. 1990]. Edge direction histogram is another well-known shape descriptor [Jain & Vailaya 1996]. Different from other shape features, the edge direction histogram does not need the object segmentation step. The idea is to preprocess the image to extract its edges and their directions. The edge directions are then quantized and used to build the histogram. The edge direction histogram captures the general

shape information and is invariant against translation. Following the idea of the color coherence vector [Pass et al. 1996], Vailaya et al. [Vailaya et al. 1998] have proposed an edge direction coherence vector that stores the number of coherent versus non-coherent edge pixels with the same directions. Van Gool et al. [Gool et al. 2001] have proposed constructing local affinely invariant regions, where features are extracted from color patterns within these regions. The feature vector describing a region is constructed using moment invariants. Lowe [Lowe 2004] has introduced a highly-distinctive local image descriptor which he called SIFT (Scale Invariant Feature Transforms). The SIFT features are invariant against similarity transformations. The feature vector used in SIFT consists of gradient orientation histograms summarizing the content of a 16×16 region around each point.

4

Semantic Gap

Although CBIR overcomes the problems of the text-based image retrieval by the automation of the image description through feature extraction, it suffers from a problem that is directly related to the extracted features, namely, the semantic gap [Smeulders et al. 2000]. The semantic gap describes the absence of correspondence between the features extracted from the image’s visual content and the semantics contained in that image. This may lead to unsatisfactory or disappointing retrieval results that do not match the user‘s expectations. The returned images may have very similar features, however their visual content from the user’s point of view could be dissimilar. This is the biggest stumbling block for CBIR to gain mainstream acceptance, and remains largely unsolved even as quite a few articles may suggest otherwise [Wang et al. 2001, Vogel & Schiele 2006, Mojsilovic & Gomes 2002]. Relevance feedback techniques are usually used to alleviate the semantic gap problem (see section 6.1). Another possibility to skirt around this problem is to try to describe the image semantics by the means of automatic image annotation by keywords [Li & Wang 2003, Barnard & Forsyth 2001]. In contrast to the above-mentioned textbased retrieval systems, automatic anno-

17

Fundamentals and Applications of Image Retrieval

tation tries to associate keywords with an image based on automatic analysis of its content. Retrieval is then done based on both associated text and image content. Due to the infinite number of semantic concepts that can be considered, the automatic annotation methods are still not highly reliable.

5

Indexing

The end-user is actually interested in a fast retrieval of images relevant to the query. Today‘s typical image databases are steadily increasing in size which in turn puts higher demands on the retrieval performance of a CBIR system. Thus speed becomes more and more important. In order to accomplish an increase of the retrieval performance, the multi-dimensional image features are indexed. During the last years indexing methods have gained more and more importance in CBIR applications. Various indexing methods are widely employed to increase the efficiency of image retrieval systems, where indexing enables fast data retrieval by providing sophisticated address calculations. The indices enable the retrieval platform to find images similar to the query without checking each image in the database and, thus reduces the retrieval time. Trees are widely used as indexing structures, where the most known architectures are R-trees [Guttman 1984] such as R+-trees [Sellis et al. 1990], R*-trees [Beckmann et al. 1990], K-d-B-trees [Robinson 1981], SS-trees [White & Jain 1996] and quad-trees [Samet 1990]. The R-tree is a dynamic hierarchical tree structure where the higher level node is a minimum bounding rectangle (MBR) that encloses a set of child MBRs or objects in the lower levels. The R*-tree improved the original R-tree by a more efficient storage and a dynamic reorganization of the tree resulting in less split operations. In [Oh et al. 2001] the authors propose the combination of a Self-Organizing Map (SOM) [Kohonen 2001] and a R*-tree for indexing high-dimensional feature vectors. The creation of a SOM-based R*tree consists of two steps; the clustering of similar images and the construction of the R*-tree. In the first step a topological feature map is created by incorporating the SOM, which provides mapping from a high-dimensional feature vector onto a

18

two-dimensional space. In the second step the R*-tree is build under the use of codebook vectors, i.e. the feature vectors contained in each node of the topological feature map. The actual gain of the method comes from the elimination of empty nodes in which no image is classified, representing unneeded disk access and loss in performance. Thus the SOMbased R*-tree indexing structure is constructed with fewer nodes which reduces the search time. The vector approximation file (VAfile) [Weber & Blott 1997] was designed in order to search high-dimensional feature spaces. The VA method represents each feature vector by a corresponding signature which is very compact and provides an approximation to the vector’s information.

6

Applications

6.1 General Purpose CBIR The aim of general CBIR systems is to find similar images in a large diverse database. In contrast to other domain-specific CBIR applications, similarity in general CBIR systems can have a very broad domain which depends on the system user: the goal of the search is in the user‘s mind only. The user might be interested only in particular subspaces of the feature space at a particular time. If, for example, the query image is an outdoor image that consists of a house surrounded by a meadow, then it is the user who decides if he is searching for green areas, man-made structures or both at the same time. In each of these cases, different features may be needed to accomplish the retrieval task. For green areas color may be the best cue to exploit (color subspace), shape features may be much more helpful for the man-made structures (shape features subspace), and a weighted combination of both should be suitable when searching for the complete scene (in the worst case, the user might be looking for some high-level concept which can no longer be expressed in the feature space (semantic gap problem)). One way to solve the subspace selection problem is to give the user the ability to weight the different kind of features manually. This has two disadvantages. The first is a practical one that it can be difficult to assign numerical weights to features for most users

(many people do not know what they are looking for until they see some sample results). The second problem arises with many different types of complex features which can also be related to each other, thus requiring many weights from the user for features that do not lend themselves well for human interpretation. Relevance feedback is a powerful technique which addresses these issues without putting additional burden on the part of the user. Originally developed by the text-search community, it is even more effective in image search, due to the inherent subjectivity of image content. The idea is to involve the user in the search process by asking for feedback about the initial results, and use this information to present hopefully improved results to the user. Theoretically the process can be repeated many times. In the practice however the real gains are often only there in the initial rounds of relevance feedback. Different approaches have been tried to achieve relevance feedback. Rui et al. [Rui et al. 1998] gave an algorithm for automatic update of feature weights based on the user feedback. Another option is to interpret the feed-backed information as a 2-class classification problem, the two classes being the relevant images and the irrelevant ones, and to use a standard classifier (e.g. SVM) to separate the two classes [Tong & Chang 2001, Setia et al. 2005]. A relevance feedback algorithm should preferably be able to learn the user‘s search goal with a small number of labelled information to reduce the burden on the user. Another interesting possibility is that of long-term learning from feedback collected from different users over a period of time [Hoi & Lyu 2004]. A comprehensive review of relevance feedback can be found in [Zhou & Huang 2003]. Many general purpose CBIR systems were developed. SIMBA [Siggelkow et al. 2001], SIMPLIcity [Wang et al. 2001], PicToSeek [Gevers & Smeulders 2000], Viper [Squire et al. 2000] and CIRES [Iqbal & Aggarwal 2003] are some systems to mention. Figure 4 shows a screenshot of searching for sunset scenes using the SIMBA search engine.

6.2 Object Matching and Detection This is a special case of image retrieval where the principle is limited to finding

Datenbank-Spektrum 18/2006

Fundamentals and Applications of Image Retrieval

Fig. 6: Searching for objects in videos [Sivic & Zisserman 2003] Fig. 4: »Sunset« query using the SIMBA search engine [Siggelkow et al. 2001]

images that contain some particular object. Here, instances of particular objects rather than object classes are considered. For example, we do not search for images that contain the general category »books«, but rather for images that contain »book X« or »book Y«. The object instances are usually found in different placements against variable cluttered backgrounds. Partial occlusion may also occur. A good example for application possibility is in robot vision where robots are responsible for object identification and manipulation. The robot is supposed to work under realistic conditions where clutter, occlusion and other transformations may occur. Ignoring these facts will lead to a failure or to non-realistic systems. As discussed before, local features extracted around interest points are the most suitable for accomplishing such a task. In most of the algorithms, correspondences between local features in query and database images should be established. The database images are ranked according to the number of matches fired. The higher the number of matches in an image, the similar it is to the query image. Figure 5 shows an example of object detection in cluttered scenes using the algorithm and the database of [Halawani & Tamimi 2006]. The left part of the figure shows the query image of a cluttered scene that contains an instance of a particular book. The query image is matched to a database of object tem-

Datenbank-Spektrum 18/2006

plates. The detection result is expressed by the returned book template and is shown to the right. The outline of the detected object is also shown on the query image. SIFT [Lowe 2004] is currently the most well-known local descriptor used for object matching and detection. A famous commercial application of object matching using SIFT features is the robotic pet AIBO that is manufactured by Sony. The SIFT algorithm was implemented in AIBO‘s vision system to make it able to recognize its charging station and communicate with visual cards. SIFT is used in [Sivic & Zisserman 2003] as a region descriptor in the so-called »Video Google system« that matches and retrieves objects and scenes in videos. The system searches for and localizes all the occurrences of an object, that is chosen by the user, in a video. Figure 6 shows a query example of the system using the movie »Groundhog Day«. The image on the top row is the query frame with the outlined query region (clock and surrounding). The images in the 2nd and the 3rd rows

Fig. 5: Object detection example

represent the retrieved frames showing the outline of the detected regions. A similar concept of object matching is scene matching where the aim is to find images that represent the same scene. An interesting application of scene matching is in vision-based robot localization [Wolf et al. 2005, Tamimi et al. 2006] where the acquired image is matched to the robot’s database images so that the robot determines where it is.

6.3 Retrieval of Object Classes As shown, techniques for general CBIR are able to find similar images based only on pixel content where the definition of similarity is on color and texture level, not on a semantic level. Most users do not want to find things with just the same texture and color, but want to find semantic entities, images containing particular objects like cars, sheep or faces. This is why the main focus of research is now drawn to the recognition of visual object classes rather than the already widely researched area of traditional CBIR. A visual class is understood as objects having a similar visual appearance, as opposed to e.g. having a similar function. Example images1 for the visual classes »motorbike« and »plane« can be seen in figure 7. One of the greatest problems to face here is intra class variability. Imagine how different a car might look, even if we only take side views: a VW beetle is essentially different from a Lamborghini or a Volvo. How1. www.robots.ox.ac.uk/~vgg/data3.html

19

Fundamentals and Applications of Image Retrieval

Fig. 7: Examples of visual classes

ever they also share some visual characteristics: lights in the front and back part, windows framed with metal and wheels below some chassis. The effect of different viewing directions and conditions are also a problem: even the image of the very same car looks different taken from the front, rear or the side. The difficulty can be seen in the example images1 for the class »bike« in figure 8. Successful approaches have to find characteristics of the objects that are stable within the class and discriminative between others. Sometimes it is beneficial to use context information: a ship is more likely to be found in an image if there is water around and not grass. However, care has to be taken that we do not end up with detecting water instead of ships. Ideally, the objects should be recognized in cluttered background, and also if only a part of the object is visible. Again, the most promising approaches, currently, are based on the use of the so-called »image patches«. A variety of different local features can be used, ranging from the pixel values themselves [Leibe & Schiele 2003b, Fergus et al. 2003], over local filtering approaches as Gabor filters [Mikolajczyk & Schmid 2005], to more sophisticated features as SIFT [Lowe 2004]. Then, models are built for each class to be recognized or the feature vectors can be used directly. 1. www.emt.tugraz.at/~pinz/data/GRAZ_02

Fig. 8: Two instances of the class »bike«

20

These models can be divided into two classes: the ones using the geometric configuration of the patches [Agarwal et al. 2004, Fergus et al. 2003] or so called »bag of words« [Csurka et al. 2004] representations, where only the frequencies of specific patches in an image are evaluated. Depending on the model, different classifiers (e.g., SVMs [Wallraven et al. 2003], Winnows [Agarwal et al. 2004], Bayes [Fergus et al. 2003]) are used. A very important step is to determine which of the extracted parts (which can be many and diverse) of the image carry most information for classification: here feature selection techniques [Dorko & Schmid 2005] or learning techniques [Deselaers et al. 2005] can be used. Another aspect in the context of visual object recognition is localization: we do not only want to know whether an object is contained in the image or not, we also want to know where it is. Simple approaches use sliding windows: they assign probabilities to sub-windows in different sizes that are shifted over all possible image locations [Agarwal et al. 2004]. Leibe et al. showed in [Leibe & Schiele 2003b] that it is possible to categorize and segment objects at the same time using an implicit shape model. Whenever distinct parts of an object are used for recognition, as e.g., in the constellation model [Fergus et al. 2003], this information can be used to localize the object directly. Other methods for object class recognition can only be mentioned briefly: it is also possible to use object contour [Leibe & Schiele 2003a], 3D model fitting [Blanz & Vetter 2003], 3D invariants for biological applications [Ronneberger et al. 2002], regions [Li et al. 2004] or shape contexts [Belongie et al. 2002] to just name a few.

6.4 Culture and Art Huge collections of cultural heritages are being digitized and archived. At the same time, the number of users that are accessing museum archives online is increasing. It sounds natural to think of applying the CBIR algorithms for the purpose of searching in these databases. Examples of systems that search in art galleries are found in [Flickner et al. 1995, Rui et al. 1997, Li & Wang 2004]. In [Siggelkow 2002] a system for searching in large databases of stamps (more than 12000 stamp images) was developed. Instead of rolling over the collection manually, the system, which is called MICHELscope, enables scanning of the query stamp. Similar stamps are then retrieved based on a set of features like color, texture, size and aspect ratio of the stamp. Figure 9 shows a screenshot of the MICHELscope, where the system finds similar stamps of the same series (Märchen der Brüder Grimm) but with strongly varying motive.

6.5 Other Applications CBIR may be applied in various other domains in many areas of our life. One example to mention is in medicine. Nowadays, medical departments have big archives of medical images. Using the principle of CBIR, it would be possible to identify a given case based on similar past cases found in the image archive. On the personal and private level, CBIR can be helpful in searching and organizing private photo collections [Schaffalitzky & Zisserman 2002]. With the rapid growth of the World Wide Web, general CBIR systems can be developed for searching for similar images on the internet. Moreover, it can be utilized to block and filter objectionable web contents, providing parents with a helpful tool to protect their children [Wang et al. 1997a]. Other examples of CBIR applications are in se-

Fig. 9: Searching for stamps with MICHELscope

Datenbank-Spektrum 18/2006

Fundamentals and Applications of Image Retrieval

curity and criminal investigation (e.g., searching in fingerprint databases), copyright protection (e.g., searching in a trademark database [Jain & Vailaya 1996]) and geographical information systems. More state-of-the art CBIR applications and algorithms can be found in [Veltkamp et al. 2001].

7

Performance Measures

It is necessary to evaluate how good a CBIR system performs. Several performance measures are suggested in the literature. Due to space limitation, we only mention some of these measures. A review can be found in [Müller et al. 2001]. The most commonly used measures are precision and recall. The recall is the percentage of relevant images returned by a query. Precision gives the percentage of the relevant images in the number of retrieved images. Other measures include rank of the best match, average normalized modified retrieval rank (ANMRR), average rank of relevant images and error rate.

8

Summary

In this article we have reviewed the concepts and the state-of-the-art applications of CBIR. One of the main challenges is to find features that adequately represent an image, especially for the general purpose CBIR applications. Different feature types and properties were discussed. The way the user communicates with the CBIR system, the features used, the size of the databases and the speed of the retrieval are the most important factors that judge the success of a CBIR system.

References [Agarwal et al. 2004] Agarwal, S.; Awan, A.; Roth, D.: Learning to Detect Objects in Images via a Sparse, Part-based Representation. PAMI, 26(11):1475-1490, 2004. [Albuz et al. 1998] Albuz, E.; Kocalar, E.; Khokhar, A. A.: Scalable Image Indexing and Retrieval using Wavelets. Technical report, University of Delaware, 1998. [Arbter et al. 1990] Arbter, K.; Snyder, W. E.; Burhardt, H.; Hirzinger, G.: Application of Affine-Invariant Fourier Descriptors to Recognition of 3-D Objects. PAMI, 12(7):640647, 1990. [Barnard & Forsyth 2001] Barnard, K.; Forsyth, D.: Learning the Semantics of Words and Pictures. In: ICCV, volume 2, 2001, S. 408415.

Datenbank-Spektrum 18/2006

[Beckmann et al. 1990] Beckmann, N.; Kriegel, H.-P.; Schneider, R.; Seeger, B.: The R*-tree: an Efficient and Robust Access Method for Points and Rectangles. In: ACM SIGMOD, 1990, S. 322-331. [Belongie et al. 2002] Belongie, S.; Malik, J.; Puzicha, J.: Shape Matching and Object Recognition using Shape Contexts. PAMI, 24(4):509-522, 2002. [Blanz & Vetter 2003] Blanz, V.; Vetter, T.: Face Recognition Based on Fitting a 3D Morphable Model. PAMI, 25(9):1063-1074, 2003. [Burkhardt & Siggelkow 2000] Burkhardt, H.; Siggelkow, S.: Invariant Features in Pattern Recognition – Fundamentals and Applications. In: Nonlinear Model-Based Image/Video Processing and Analysis, 2000, S. 269-307. [Chang & sun Fu 1980] Chang, N.-S.; sun Fu, K.: A Relational Database System for Images. In: Pictorial Info. Sys., 1980, S. 288-321. [Chua et al. 1997] Chua, T.-S.; Tan, K.-L.; Ooi, B. C.: Fast Signature-Based Color-Spatial Image Retrieval. In: Intl. Conf. on Mult. Computing and Sys.,1997, S. 362-369. [Csurka et al. 2004] Csurka, G.; Dance, L.; Willamowski, J.; Bray, C.: Visual Categorization with Bags of Keypoints. In: ECCV, 2004, S. 59-74. [Deselaers et al. 2005] Deselaers, T.; Keysers, D.; Ney, H.: Discriminative Training for Object Recognition using Image Patches. In: CVPR, volume 2, 2005, S. 157-162. [Dorko & Schmid, 2005] Dorko, G.; Schmid, C.: Object Class Recognition Using Discriminative Local Features. Technical Report RR5497, INRIA – Rhone-Alpes, 2005. [Fauqueur & Boujemaa 2003] Fauqueur, J.; Boujemaa, N.: Logical Query Composition from Local Visual Feature Thesaurus. In: CBMI, 2003. [Fauqueur & Boujemaa 2004] Fauqueur, J.; Boujemaa, N.: Region-based image retrieval: fast coarse segmentation and fine color description. Journal of Visual Languages and Computing, 15(1):69-95, 2004. [Fergus et al. 2003] Fergus, R.; Perona, P.; Zisserman, A.: Object Class Recognition by Unsupervised Scale-invariant Learning. In: CVPR, volume 2, 2003, S. 264-27. [Flickner et al. 1995] Flickner, M.; Sawhney, H.; Niblack, W.; Ashley, J.; Huang, Q.; Dom, B.; Gorkani, M.; Hafner, J.; Lee, D.; Petkovic, D.; Steele, D.; Yanker, P.: Query by Image and Video Content: The QBIC System. IEEE Computer, 28(9):23-32, 1995. [Gevers & Smeulders 2000] Gevers, T.; Smeulders, A.: PicToSeek: Combining Color and Shape Invariant Features for Image Retrieval. IEEE Tr. on Image Processing, 9(1):102-119, 2000. [Gong et al. 1995] Gong, Y.; Chua, H.; Guo, X.: Image Indexing and Retrieval Based on Color Histogram. In: Intl. Conf. on Multimedia Modeling, 1995, S. 115-126. [Gool et al. 2001] Gool, L. V.; Tuytelaars, T.; Turina, A.: Local Features for Image Retrieval. In: State-of-the-Art in Content-Based Image and Video Retrieval, 2001, S. 21-41. [Gouet & Boujemaa 2001] Gouet, V.; Boujemaa, N.: Object-based Queries using Color Points of Interest. In: CBAIVL 2001, S. 30-36.

[Guttman 1984] Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: ACM SIGMOD, 1984, S. 45-57. [Haasdonk et al. 2004] Haasdonk, B.; Halawani, A.; Burkhardt, H.: Adjustable Invariant Features by Partial Haar-Integration. In: ICPR, volume 2, 2004, S. 769-774. [Halawani & Burkhardt 2004] Halawani, A.; Burkhardt, H.: Image Retrieval by Local Evaluation of Nonlinear Kernel Functions around Salient Points. In: ICPR, volume 2, 2004, S. 955-960. [Halawani & Burkhardt 2005] Halawani, A.; Burkhardt, H.: On using Histograms of Local Invariant Features for Image Retrieval. In: Proc. of MVA, Tsukuba Science City, Japan, 2005, S. 538-541. [Halawani & Tamimi 2006] Halawani, A.; Tamimi, H.: Retrieving Objects using Local Integral Invariants. In: Proceedings of the 5th International Conference on Image and Video Retrieval, Tempe, Arizona, USA, 2006. [Haralick et al. 1973] Haralick, R. M.; Shanmugam, K.; Dinstein, I.: Texture Features for Image Classification. IEEE Tr. on Sys., Man, & Cybernetics, 3, 1973. [Harris & Stephens 1988] Harris, C.; Stephens, M.: A Combined Corner and Edge Detector. In: Proceedings of The Fourth Alvey Vision Conference, 1988, S. 147-151. [Hoi & Lyu 2004] Hoi, C.-H.; Lyu, M. R.: A novel log-based relevance feedback technique in content-based image retrieval. In: ACM Multimedia, 2004, S. 24-31. [Hu 1962] Hu, M.: Visual Pattern Recognition by Moment Invariants. IEEE Tr. on Info. Theory, 8(2):179-187, 1962. [Huang et al. 1997] Huang, J.; Kumar, S. R.; Mitra, M.; Zhu, W.-J.; Zabih, R.: Image indexing using color correlograms. In: CVPR, 1997, S. 762-768. [Iqbal & Aggarwal 2003] Iqbal, Q.; Aggarwal, J. K.: Feature Integration, Multi-image Queries, and Relevance Feedback in Image Retrieval. In: VISUAL, 2003, S. 467-474. [Jacobs et al. 1995] Jacobs, C. E.;Finkelstein, A.; Salesin, D. H.: Fast Multiresolution Image Querying. In: SIGGRAPH, 1995, S. 277-286. [Jain & Vailaya 1996] Jain, A. K.; Vailaya, A.: Image Retrieval Using Color and Shape. PR, 29(8):1233-1244, 1996. [Kohonen 2001] Kohonen, T.: Self-Organizing Maps. Springer Series in Information Sciences,Vol. 30, Springer-Verlag, Berlin, 2001. [Leibe & Schiele 2003a] Leibe, B.; Schiele, B.: Analyzing Contour and Appearance Based Methods for Object Categorization. In: CVPR, 2003. [Leibe & Schiele 2003b] Leibe, B.; Schiele, B.: Interleaved Object Categorization and Segmentation. In: BMVC, 2003. [Li & Wang 2003] Li, J.; Wang, J. Z.: Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach. PAMI, 25(9):10751088, 2003. [Li & Wang 2004] Li, J.; Wang, J. Z.: Studying Digital Imagery of Ancient Paintings by Mixtures of Stochastic Models. IEEE Tr. on Image Processing, 13(3):340-353, 2004. [Li et al. 2004] Li, Y.; Bilmes, J.; Shapiro, L.: Object Class Recognition using Images of

21

Fundamentals and Applications of Image Retrieval

Abstract Regions. In: ICPR, volume 1, 2004, S. 40-43. [Loupias & Sebe 1999] Loupias, E.; Sebe, N.: Wavelet-based Salient Points for Image Retrieval. Technical Report TR. 99.11, Laboratoire Reconnaissance de Formes et Vision, INSA Lyon, 1999. [Lowe 2004] Lowe, D. G.: Distinctive Image Features from Scale-Invariant Keypoints. IJCV, 60(2):91-110, 2004. [Lu et al. 1994] Lu, H.; Ooi, B.-C.; Tan, K.-L.: Efficient Image Retrieval by Color Contents. In: 1st Intl. Conf. on Appl. of Databases, 1994, S. 95-108. [Manjunath & Ma 1996] Manjunath, B. S.; Ma, W.-Y.: Texture Features for Browsing and Retrieval of Image Data. PAMI, 18(8):837-842, 1996. [Mikolajczyk & Schmid 2004] Mikolajczyk, K.; Schmid, C.: Scale and Affine Invariant Interest Point Detectors. IJCV, 60(1):63-86, 2004. [Mikolajczyk & Schmid 2005] Mikolajczyk, K.; Schmid, C.: A Performance Evaluation of Local Descriptors. PAMI, 27(10):1615-1630, 2005. [Mojsilovic & Gomes 2002] Mojsilovic, A.; Gomes, J.: Semantic based categorization, browsing and retrieval in medical image databases. In: ICIP, 2002. [Montesinos et al. 1998] Montesinos, P.; Gouet, V.; Deriche, R.: Differential Invariants for Color Images. In: ICPR, volume 1, 1998, S. 838840. [Müller et al. 2001] Müller, H.; Müller, W. M.; Squire, D.; Marchand-Maillet, S.; Pun, T.: Performance Evaluation in Content-Based Image Retrieval: Overview and Proposals. Pat. Recog. Letters, 22(5), 2001. [Oh et al. 2001] Oh, K.; Feng, Y.; Kaneko, K.; Makinouchi, A.; Bae, S.: SOM-Based R*-tree for Similarity Retrieval. In: DASFAA, 2001, S. 182. [Ojala et al. 2000] Ojala, T.; Pietikäinen, M.; Mäenpää, T.: Gray Scale and Rotation Invariant Texture Classification with Local Binary Patterns. In: ECCV, 2000, S. 404-420. [Pass et al. 1996] Pass, G.; Zabih, R.; Miller, J.: Comparing Images Using Color Coherence Vectors. In: MULTIMEDIA '96, 1996, S. 6573. [Persoon & Fu 1977] Persoon, E.; Fu, K. S.: Shape Discrimination using Fourier Descriptors. IEEE Tr. on Sys., Man, & Cybernetics, 7:170179, 1977. [Robinson 1981] Robinson, J. T.: The K-D-Btree: a Search Structure for Large Multidimensional Dynamic Indexes. In: ACM SIGMOD, 1981, S. 10-18. [Ronneberger et al. 2002] Ronneberger, O.; Burkhardt, H.; Schultz, E.: General-purpose Object Recognition in 3D Volume Data Sets using Gray-Scale Invariants – Classification of Airborne Pollen-Grains Recorded with a Confocal Laser Scanning Microscope. In: ICPR, volume 2, 2002. [Roussopoulos et al. 1988] Roussopoulos, N.; Faloutsos, C.; Sellis, T.: An Efficient Pictorial Database System for PSQL. IEEE Tr. Software Eng., 14(5):639-650, 1988.

22

[Rui et al. 1997] Rui, Y.; Huang, T.; Mehrotra, S.: Content-Based Image Retrieval with Relevance Feedback in MARS. In: ICIP,1997, S. 815-818. [Rui et al. 1998] Rui, Y.; Huang, T.; Ortega, M.; Mehrotra, S.: Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval. IEEE Tr. on Circuits and Sys. for Video Techno., 8(5):644-655, 1998. [Rui et al. 1999] Rui, Y.; Huang, T.; Chang, S.: Image Retrieval: Current Techniques, Promising Directions and Open Issues. J-VCIR, 10(4):39-62, 1999. [Samet 1990] Samet, H.: The Design and Analysis of Spatial Data Structures. Addison-Wesley Publishing Co. Inc., 1990. [Schael 2005] Schael, M.: Methoden zur Konstruktion invarianter Merkmale für die Texturanalyse. PhD thesis, Universität Freiburg, 2005. [Schaffalitzky & Zisserman 2002] Schaffalitzky, F.; Zisserman, A.: Multi-view Matching for Unordered Image Sets, or »How Do I Organize My Holiday Snaps?«. In: ECCV, 2002, S. 414-431. [Schmid & Mohr 1997] Schmid, C.; Mohr, R.: Local Grayvalue Invariants for Image Retrieval. PAMI, 19(5):530-535, 1997. [Schulz-Mirbach 1995] Schulz-Mirbach, H.: Invariant Features for Gray Scale Images. In: DAGM, 1995, S. 1-14. [Sebe et al. 2000] Sebe, N.; Tian, Q.; Loupias, E.; Lew, M. S.; Huang, T. S.: Color Indexing Using Wavelet-Based Salient Points. In: CBAIVL, 2000, S. 115-19. [Sellis et al. 1987] Sellis, T. K.; Roussopoulos, N.; Faloutsos, C.: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. In: The VLDB Journal, 1987, S. 507-518. [Setia et al. 2005] Setia, L.; Ick, J.; Burkhardt, H.: SVM-based Relevance Feedback in Image Retrieval using Invariant Feature Histograms. In: MVA 2005, S. 542-545. [Siggelkow 2002] Siggelkow, S.: Feature Historgrams for Content-Based Image Retrieval. PhD thesis, Universität, Freiburg, 2002. [Siggelkow et al. 2001] Siggelkow, S.; Schael, M.; Burkhardt, H.: SIMBA – Search IMages By Appearance. In: DAGM, S. 9-16. [Sivic & Zisserman 2003] Sivic, J.; Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. In: ICCV, volume 2, 2003, S. 1470-1477. [Smeulders et al. 2000] Smeulders, A. W. M.; Worring, M.; Santini, S.; Gupta, A.; Jain, R.: Content-Based Image Retrieval at the End of the Early Years. PAMI, 22(12):1349-1380, 2000. [Smith & Chang 1996] Smith, J. R.; Chang, S.-F.: VisualSEEk: a Fully Automated ContentBased Image Query System. In: ACM international conference on Multimedia, 1996, S. 87-98. [Squire et al. 2000] Squire, D. M.; Müller, W.; Müller, H.; Pun, T.: Content-based Query of Image Databases: Inspirations from Text Retrieval. Pattern Recogn. Lett., 21(1314):1193-1198, 2000. [Stricker & Orengo 1995] Stricker, M. A.; Orengo, M.: Similarity of Color Images. In: SPIE, 1995, S. 381-392.

[Swain & Ballard 1991] Swain, M. J.; Ballard, D. H.: Color Indexing. IJCV, 7(1):11-32, 1991. [Tamimi et al. 2006] Tamimi, H.; Halawani, A.; Burkhardt, H.; Zell, A.: Appearance-based Localization of Mobile Robots using Local Integral Invariants. In: IAS-9, Tokyo, Japan, 2006. [Tamura et al. 1978] Tamura, H.; Mori, S.; Yamawaki, T.: Texture features corresponding to visual perception. IEEE Tr. on Sys., Man, & Cybernetics, 8(6):460-473, 1978. [Tong & Chang 2001] Tong, S.; Chang, E.: Support Vector Machine Active Learning for Image Retrieval. In: ACM Multimedia,2001, S. 107-118. [Vailaya et al. 1998] Vailaya, A.; Jain, A.; Zhang, H.: On Image Classification: City Images vs. Landscapes. PR, 31(12):1921-1935, 1998. [Veltkamp et al. 2001] Veltkamp, R. C.; Burkhardt, H.; Kriegel, H.-P. (eds.): State-of-theArt in Content-Based Image and Video Retrieval. Kluwer, B.V., 2001. [Vogel & Schiele 2006] Vogel, J.; Schiele, B.: Semantic Scene Modeling and Retrieval for Content-Based Image Retrieval. IJCV, 2006. [Wallraven et al. 2003] Wallraven, C.; Caputo, B.; Graf, A.: Recognition with Local Features: The Kernel Recipe. In: ICCV, 2003. [Wang et al. 1997a] Wang, J. Z.; Wiederhold, G.; Firschein, O.: System for Screening Objectionable Images Using Daubechies' Wavelets and Color Histograms. In: 4th Intl. Workshop on Interactive Distributed Multimedia Sys. & Telecom. Services, 1997, S. 20-30. [Wang et al. 1997b] Wang, J. Z.; Wiederhold, G.; Firschein, O.; Wei, S. X.: Wavelet-Based Image Indexing Techniques with Partial Sketch Retrieval Capability. In: Intl. Forum on Research and Techn. Advances in Digital Libraries, 1997, S. 13-24. [Wang et al. 2001] Wang, J.; Li, J.; Wiederhold, G.: SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture LIbraries. PAMI, 23(9):947-963, 2001. [Weber & Blott 1997] Weber, R.; Blott, S.: An Approximation based Data Structure for Similarity Search. Technical report, TR1997b, ETH Zentrum, Zurich, Switzerland, 1997. [White & Jain,1996] White, D. A.; Jain, R.: Similarity Indexing with the SS-tree. In: ICDE, S. 516-523. [Wolf et al. 2005] Wolf, J.; Burgard, W.; Burkhardt, H.: Robust Vision-based Localization by Combining an Image Retrieval System with Monte Carlo Localization. IEEE Tr. on Robotics, 21(2):208-216, S.2005. [Zhou & Huang, 2003] Zhou, X. S.; Huang, T. S.: Relevance Feedback in Image Retrieval: A Comprehensive Review. Multimedia Syst., 8(6):536-544, 2003.

Datenbank-Spektrum 18/2006

Fundamentals and Applications of Image Retrieval

Alaa H. Halawani received his B.Sc. degree in Computer Systems Engineering from the Palestine Polytechnic University, Hebron, Palestine, in 1996. In 1997 he was awarded a DAAD scholarship to continue his education in Jordan. He received his M.Sc. in Electrical Engineering – Computer Major – from Jordan University of Science and Technology, Irbid, Jordan, in January 2000. From February 2000 to October 2001 he worked as a lecturer at the Electrical and Computer Engineering Department at the Palestine Polytechnic University. In 2001 he was awarded a DAAD scholarship to attain his PhD degree in Germany. He is currently finalizing his PhD studies at the Institute of Pattern Recognition and Image Processing, University of Freiburg, Germany. Alexandra Teynor received her Dipl.-Inf. degree from the University of Applied Sciences Augsburg in 2003. Currently, she is pursuing her PhD degree in the Department of Computer Science at the University of Freiburg. Her research interests are content based image retrieval techniques with a focus on the recognition and localization of visual object classes.

Datenbank-Spektrum 18/2006

Lokesh Setia obtained in 1999 his Bachelor of Technology degree in electrical engineering from the Indian Institute of Technology, Delhi. Thereafter he worked as a software engineer at the Hughes Software Systems. In 2001, he received a scholarship from the German Academic Exchange Service (DAAD) to pursue a master study at the University of Applied Sciences, Offenburg. Since 2003, he is working on his doctoral thesis at the University of Freiburg. His research interests include image retrieval, relevance feedback methods, and human cognition. Gerd Brunner received the Master degree in natural sciences from the Institute of Geophysics, Astrophysics and Meteorology (IGAM) from the Karl-Franzens-University in Graz, Austria, in June 2001. He is currently a PhD-Student at the Chair of Pattern Recognition and Image Processing, University of Freiburg, Germany. His scientific interests include pattern recognition, content-based image retrieval, invariant feature computations and in particular structure-based image descriptors. Hans Burkhardt obtained the Dipl.-Ing. degree in electrical engineering in 1969, the Dr.-Ing. degree in 1974, and the Venia Legendi in 1979, all from the University of Karlsruhe. From 1969, he was a Research Assistant, and in 1975, he became Lecturer at the University of Karlsruhe. During 1980-81, he held a

scientific fellowship at the IBM Research Laboratory, San Jose, CA. In 1981, he became Professor for Control and Signal Theory at the University of Karlsruhe. During 1985-1996, he was full Professor at the Technical University of Hamburg. From 1990 and 1996, he additionally was Scientific Advisor of the Microelectronic Application Center (MAZ) in Hamburg. Since 1997, he has been a full Professor with the Computer Science Department, University of Freiburg. His interests cover invariants in pattern recognition, optimal image-restoration methods, motionestimation algorithms, parallel algorithms in image processing and pattern recognition, image analysis, and vision-guided control of combustion processes. In 2000, Dr. Burkhardt became President of the German Association for Pattern Recognition (DAGM). He is a Member of the Academy of Sciences and Humanities, Heidelberg, and a Fellow of the International Association for Pattern Recognition (IAPR). Dipl.-Ing. Alaa H. Halawani Dipl.-Inf. (FH) Alexandra Teynor Lokesh Setia MSc Dipl.-Phys. Gerd Brunner Prof. Dr.-Ing. Hans Burkhardt Albert-Ludwigs-Universität Freiburg Institut für Informatik Lehrstuhl für Mustererkennung und Bildverarbeitung Georges-Köhler-Allee 052 79110 Freiburg {halawani, teynor, setia, gbrunner, Hans.Burkhardt}@informatik.uni-freiburg.de www.informatik.uni-freiburg.de

23