Weighted Walkthroughs Between Extended Entities for Retrieval by Spatial Arrangement

52 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003 Weighted Walkthroughs Between Extended Entities for Retrieval by Spatial Arrangement S...

Author: Karin Beasley

1 downloads 0 Views 3MB Size

Report

Download PDF

Recommend Documents

Spatial Weighted Outlier Detection

Scheduling Spatial Arrangement and Harvest Simultaneously

Increased Photorealism for Interactive Architectural Walkthroughs

Cognitive Walkthroughs & Usability Reporting

Multiple Strategy Stochastic Iteration for Architectural Walkthroughs

by special arrangement

Reasoning from connectives and relations between entities

Communication: transfer of information between entities

A Distance Metric Between Directed Weighted Graphs

HKFRS for Private Entities

Learning the Manifolds of Local Features and Their Spatial Arrangement

Business Entities for Artists

ACCOUNTING FOR LEGAL ENTITIES

LAND USE AND SPATIAL PLANNING BILL, ARRANGEMENT OF SECTIONS

Seed quality and optimal spatial arrangement of fodder radish

The Spatial Arrangement of Chromosomes during Prometaphase Facilitates Spindle Assembly

Estimating Sum by Weighted Sampling

SECOND REVIEW UNDER THE EXTENDED ARRANGEMENT UNDER THE EXTENDED FUND FACILITY AND REQUEST FOR MODIFCATION OF PERFORMANCE CRITERIA

Spatial Arrangement of the -Glucoside Transporter from Escherichia coli

PostgreSQL & PostGIS Showcase -Centralized spatial data manipulation, storage and retrieval

FIRST REVIEW UNDER THE EXTENDED ARRANGEMENT PRESS RELEASE; STAFF REPORT; AND STATEMENT BY THE EXECUTIVE DIRECTOR FOR UKRAINE

Reinforcement Learning by Reward-weighted Regression for Operational Space Control

A Spatial analysis on the relation between accessibility and spatial development for Cross-border regions

APPLICATION GUIDANCE FOR PASS-THROUGH ENTITIES AND TAX-EXEMPT NOT-FOR-PROFIT ENTITIES AND DISCLOSURE MODIFICATIONS FOR NONPUBLIC ENTITIES

52

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

Weighted Walkthroughs Between Extended Entities for Retrieval by Spatial Arrangement Stefano Berretti, Alberto Del Bimbo, Member, IEEE, and Enrico Vicario, Member, IEEE

Abstract—In the access to image databases, queries based on the appearing visual features of searched data reduce the gap between the user and the engineering representation. To support this access modality, image content can be modeled in terms of different types of features such as shape, texture, color, and spatial arrangement. An original framework is presented which supports quantitative nonsymbolic representation and comparison of the mutual positioning of extended nonrectangular spatial entities. Properties of the model are expounded to develop an efficient computation technique and to motivate and assess a metric of similarity for quantitative comparison of spatial relationships. Representation and comparison of binary relationships between entities is then embedded into a graph-theoretical framework supporting representation and comparison of the spatial arrangements of a picture. Two prototype applications are described. Index Terms—Image databases, retrieval by visual content, spatial relationships.

I. INTRODUCTION

W

ITH the recent advances in multimedia technology, libraries of digital images are assuming an ever increasing relevance within a wide range of information systems. Effective access to such archives requires that conventional searching techniques based on external attributes of the image, such as pressmarks, authorship, or historical keywords, be complemented by content-based queries addressing appearing visual features of searched data [25], [26]. Central to this retrieval approach is the creation of models abstracting images into feature spaces which support indexing and comparison of visual content. Different content modeling techniques have been experimented which address features of shape, texture, color, and spatial arrangement. In particular, spatial-based modeling addresses the relationships among sets of pixels representing spatial entities that have some low-level cohesion (e.g., homogeneous chrominance or texture) or some higher level significance. Relationships involve topological set-theoretical concepts, such as inclusion, overlapping, or adjacency [17], [18], and directional constructs, such as left of, above, below [8], [21], [20], [23].

Manuscript received September 14, 1999; revised April 12, 2001. The associate editor coordinating the review of this paper and approving it for publication was Prof. Yao Wang. The authors are with the Dipartimento di Sistemi e Informatica at the Università degli Studi di Firenze, 50139 Firenze, Italy (e-mail: [email protected]; [email protected]; [email protected]). Digital Object Identifier 10.1109/TMM.2002.802833

Models for spatial relationships in image databases have mainly followed a qualitative (symbolic) approach, in which spatial relationships are interpreted over a finite set of predefined symbolic values. In [17], the topological relationship between pixel sets is encoded by the emptiness/nonemptiness of the intersections between their inner, border, and outer parts. In [20], [21], [27], and [33], directional relationship between point-like objects is encoded in terms of the relative quadrants in which the two points are laying. In the theory of the symbolic projection, which underlies a large part of the literature on image retrieval by spatial similarity, both directional and topological relationships between the entities in a two-dimensional (2-D) scene are reduced to the composition of the qualitative ordering relationships among their projections on two reference axes [8], [10]. In the original formulation [8], spatial entities are assimilated to points (usually the centroids) to avoid overlapping and to ensure a total and transitive ordering of the projections on each axis. This permits us to encode the bi-dimensional arrangement of a set of entities into a sequential structure, the 2-D-string, which reduces matching from quadratic to linear complexity. However, this point-like representation looses soundness when entities have a complex shape, or when their mutual distances are small with respect to individual dimensions. Much work has been done around this model to account the extent of spatial entities, trading efficiency of match for the sake of representation soundness. In the 2-DG-string and the 2-DC-string, entities are cut in subparts with disjoint convex hulls [9], [30]. This permits us to maintain a sequential representation of the overall ordering, but may result in complex segmentation, as the cuts performed on each object depend on the total number of objects in the scene. In the 2-D-B string [29], [31], the mutual arrangement of spatial entities is represented in terms of the interval ordering of the projections on two reference axes. This leads to a kind of 2-D extension of Allen’s interval logic [2], with 13 exclusive ordering conditions on each axis. Since projections on different axes are independent, the representation subtends the assimilation of objects to their minimum embedding rectangles, which largely reduces the capability to discriminate perceptually distant arrangements. In [12] and [32], this limit is partially smoothed by replacing extended objects through a finite set of representative points. In particular, in [12], the directional relationship between two entities is interpreted as the union of the primitive directions (up, up-right, right, down-right, down, down-left, left, up-left, and coincident) capturing the displacement between any of their respective representative points.

1520-9210/03$17.00 © 2003 IEEE

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

In general, the effectiveness of qualitative models is basically limited by inherent Boolean classification thresholds which determine discontinuities between perceived spatial arrangements and their representation. This hurdles the establishment of quantitative metrics of similarity and basically limits the robustness of comparison. These limits of Boolean matching are faced in quantitative models by associating spatial relationships with numeric values which enable the evaluation of a continuous distance between nonexactly matching arrangements. In the most common approach, directional information is represented through the orientation of the line connecting object centroids [23], [24]. This type of representation inherently requires that extended entities are replaced by a single representative point used to take the measure of orientation. This still limits the capability to distinguish perceptually dissimilar configurations. Approaches to maintain entity extension in quantitative representation of spatial relationships have been proposed in [22], [28], and [36]. In [28], the spatial arrangement of the components of a color histogram is represented through color correlograms capturing the distribution of distances between pixels belonging to different bins. In [36], a picture is segmented into color sets, and partitioned into a finite number of equally spaced slices. Spatial relationship between two color sets is modeled by the number of slices in which one color set is above the other. In [22], the spatial distribution of a set of pixel blocks with common chromatic features is indexed by the two largest angles obtained in a Delaunay triangulation over the set of block centroids. In this paper, we develop a modeling technique, called weighted walkthroughs (WWs), specifically tailored to represent and compare spatial relationships in the application context of retrieval by visual similarity [38]. WWs enable quantitative representation and comparison of the mutual positioning of a pair of extended pixel sets, by taking into account the overall distribution of relationships among the individual pixels in the two sets. Properties of the model are expounded to derive an efficient computation technique and to motivate and assess a quantitative metric of similarity. Representation and comparison of binary spatial relationships are then embedded into a graph-theoretical framework fitting the wider requirements of retrieval by visual similarity. Concrete usage of the framework is also illustrated. The rest of the paper is organized in three sections and an Appendix. Representation and comparison of binary spatial relationships based on WWs are expounded in Section II. In Section III, the model is extended into a graph-theoretical framework and its application in retrieval by visual content is discussed through two separate prototype systems supporting retrieval based on the spatial arrangement of annotate-type objects and chromatic patches, respectively. For this latter system, results of retrieval effectiveness are evaluated in a two-stage user-based test, addressing first a benchmark of basic arrangements and then an archive of real images. Conclusions are drawn in Section IV. Theorems and proofs are given in the Appendix.

53

Fig. 1. (a) Pixel sets A and B are connected by three distinct walkthroughs, represented by pairs 1; 1 ; 1; 0 ; and 1; 1 . (b) Nine weights w arranged in a 3 3 array: the central weight w evaluates how the number of pixels in A coincide with the number of pixels in B (overlapping); middle weights and w evaluate how the number of pixels in A are aligned with the w number of pixels in B along the horizontal and vertical directions, respectively. account for how many pixels in A are up-right, The corner weights w down-right, down-left, and up-left with respect to the number of pixels in B. (c) Weights associated with the walkthroughs connecting A to B .

2

h ih i

h 0i

II. REPRESENTATION AND COMPARISON SPATIAL RELATIONSHIPS

OF

BINARY

A. Weighted Walkthroughs Given a Cartesian reference system, the projections of two and , on each reference axis pixels, can take three different orders: 1) before; 2) coincindent; or 3) after . The combination of the two projections results in nine different bi-dimensional displacements (sometimes referred to as primitive directions), which we encode in a pair of indexes if if if if if if Given two sets of pixels, and , multiple different primitive directions can apply at the same time to different pairs of pixels is in and . According to this, we will say that the pair encodes the displacement a walkthrough from to if between at least one pair of pixels belonging to and , respectively [see Fig. 1(a)]. In order to account for its perceptual is associated with a weight relevance, each walkthrough measuring the number of pairs of pixels which belong to and and the displacement of which is captured by (note that we do not postulate that either the direction or are connected sets). The weight is evaluated as an integral measure over the four-dimensional (4-D) set of pixel pairs in and [see Fig. 1(b) and (c)]

(1) are the characteristic functions of negative and where and , respectively; positive real semi-axes

54

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

derived by linear combination of the WWs between and between and

and

Compositionality permits us to reduce the integral of (1) to the linear combination of a set of subintegrals computed on any partition of entities and . This has a main relevance for the computational viability of WWs. In fact, if and are decomposed in rectangular parts, the numerical computation of the 4-D integral of (1) can be replaced through the linear combination of a set of closed-form terms representing the WWs between rectangular entities. These can be reduced to nine basic arrangements. 1) Projections of the two sets of pixels and are disjoint on both of the axes. Four similar cases are possible with different left/right and upside down positioning of and . In particular, if is in the lower-right quadrant of Fig. 2. (a) Global measures on A and B appearing in normalization factors K of (2). (b) Symbols involved in the upper bound for weights w .

denotes the Dirac function, which here acts as a ; and characteristic function of the singleton set is a dimensional normalization factor depending on global measures taken on and . and ) In (1), the weights with a null index (i.e., are computed by integration of a quasi-everywhere-null function (the set of pixel pairs that are aligned or coincident has a null measure in a 4-D space). The Dirac function appearing in reduces the dimensionality of the intethe expression of gration domain to enable a finite nonnull measure. To compenhave different disate this reduction, normalization factors mensionality whether indexes and are equal to zero or take nonnull values

(2) and are the areas of , and and where are the width and the height of the minimum embedding and are the rectangles of and , respectively; and width and the height of the minimum embedding rectangles of the union of and , respectively [see Fig. 2(a)]. B. Symbolic Computation Through Multirectangular Decomposition The nine coefficients of the WW representation are invariant with respect to shifting and zooming of the image, and they are also reflexive (see Appendix, Theorems 4.1 and 4.2):

WWs are also compositional (see Appendix, Theorem 4.3), in can be that the walkthroughs between and the union

2) Projections of and are coincident on one axis and disjoint on the other axis. Again, four similar cases are is possible, among which we consider the case that below

3) Projections of objects and are coincident on both the axes, i.e., and are coincident

The values computed in the basic cases are adimensional real numbers undergoing a set of bounds that are inductively maintained through linear combination, and which thus apply to the relationship between any two pixel sets and : ; each weight takes values in the interval the sum of the four corner weights equals one (see Theorem 4.4 given in the Appendix); is upper-bounded by the the sum following estimate (see Theorem 4.5 given in the Appendix): (3) denotes the part of , the projection on the vertical where axis of which is included in the projection of [see Fig. 2(b)]. The same type of estimate holds for the sum ;

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

55

the central weight equals the ratio between and the square of the measure of the intersection and (see Appendix, the product of the measures of Theorem 4.6)

For the practical application of WWs, it is also relevant to note that, by exploiting the property of compositionality, the WWs can be derived in time between two spatial entities and , where and are the number of rectangles in the decomposition of entities and , respectively [3]. C. Metric of Similarity for Weighted Walkthroughs Different arrangements of a pair of pixel sets, and , are which composes the compared by a numeric spatial distance differences between homologous weights in the WWs and , capturing the two arrangements

(4) and are nonnegative where numbers with a sum equal to one (i.e., they are a convex comand are the following bination), and distance components: evaluates the difference in the horizontal displacement and . In in the two spatial relationships captured by evaluates how many pixels in fact, the sum are on the left-hand side of how many pixels of , and is on the comprises a measure of the degree by which right-hand side of . This equals one iff all the pixels of are on the right of all the pixels of ; it equals zero iff all the pixels of are on the left of all the pixels of ; and ranges with continuity between one and zero in the intermediate cases. Similarly to for the horizontal alignment, evaluates the difference in the vertical displacement in the two spatial relationships captured by and . evaluates the difference in the alignment along the diagonal of the Cartesian reference system in the two spatial relationships captured by and . In fact, the sum evaluates how many pixels in are aligned along the diagonal of how many pixels of . Note that, since the sum of the four corner weights is always equal to one (see Section II-B or Theorem 4.4), the three and encode the same information that is indexes captured in the four corner weights [7]. evaluates the difference in the alignment of vertical projections in the two spatial relationships captured by and . In fact, the sum evaluates how many pairs and share a common vertical projection: of pixels in

Fig. 3. Property of continuity: if B is approximated by the minimum made up of elements of size , the relationship embedding multirectangle B with respect to A changes up to a quantity which tends to zero when becomes small with respect to the size of B (see Theorem 4.7 given in the Appendix).

2

is equal to one iff and are rectangles with coincident vertical projections; it equals zero iff the vertical projections of and are disjoint; and ranges with continuity between one and zero in the intermediate cases. In a similar manner, evaluates the difference in the alignment of horizontal projections in the two spatial relationships captured by and . evaluates the difference in the overlapping in the and . In fact, two spatial relationships captured by evaluates how many pixel pairs in and are coincident, and evaluates the degree of overlapping of and . This equals one if and are exactly coincident; it equals zero if they are disjoint; and ranges between one and zero in the intermediate cases. , distance satisfies Due to the integral nature of weights a property of continuity which ensures that slight changes in the mutual positioning or in the distribution of pixels in two sets and result in slight changes in their WWs: if the set is modified by the addition of , the relationship with respect to a set changes up to a distance which is limited by a bound becomes small with respect to . This tending to zero when has a main relevance in ensuring robustness of comparison. The property of continuity also provides a quantitative basis to smooth the trade off between accuracy and computational complexity when pixel sets are approximated to a multirectangular shape to compute their relationship by composition. To this end, Theorem 4.7 provides an upper bound on the error in the evaluation of the spatial relationship between two entities as a function of the partition grain and of the entity extension (see Fig. 3). is nonnegative, auto-simDue to its city block structure, ilar, reflexive, and triangular. In addition, as a consequence of can be proven the bounds on the range of values of WWs, to be normal (i.e., it ranges between zero and one). Axiomatic hold independently of the specific properties of the metric and . values assigned to coefficients These have five degrees of freedom which can be tuned to fit

56

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

(a)

(b)

(c)

(d) Fig. 4. (a)–(e) Five testing cases. The table of each case reports: the reference picture and its six variations (heading row); the average and standard deviation of the user perceived distance between each reference picture and its six variations (first two rows); and the distance between each reference picture and its six variations as evaluated by the four modeling techniques under test (last four rows).

perceptual distance depending on the specific characteristics of and images and users. In particular, the balance between permits us to change the mutual relevance of horizontal and vertical displacement, while the balance between

and permits us to adjust the relative weight of displacement, diagonal alignment, horizontal and vertical alignment, and overlapping, thereby impacting on the mutual relevance of directional and topological relationships.

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

57

(e) Fig. 4. (Continued.) (a)–(e) Five testing cases. The table of each case reports: the reference picture and its six variations (heading row); the average and standard deviation of the user perceived distance between each reference picture and its six variations (first two rows); and the distance between each reference picture and its six variations as evaluated by the four modeling techniques under test (last four rows).

D. User-Based Assessment The effectiveness of the metrics of similarity between binary spatial relationships based on WWs was evaluated against 2-D strings [8], 2-D-B strings [31], and centroid orientation (CO) [23]. The set of these four modeling techniques encompasses different directions in the taxonomy representations for binary spatial relationships: 2-D and 2-D-B string are symbolic while CO and WWs are quantitative; 2-D string and CO replace entities through their centroids, while 2-D-B string and WWs also account for their extension; 2-D-B string extends entities to their bounding box, while WWs account only for their actual extension. The ground truth for the comparison was acquired in a test addressing the users’ subjective ranking of similarity in the spatial arrangement of pairs of synthetic objects. Five subsequent test cases were considered, each defined by a reference picture composed of two entities with given colors and shapes. For each reference picture, six variations were created by modifying the mutual positioning of the two entities, but not their shape or color. The reference pictures (labeled “case a” through “case e”) and their corresponding variations (labeled “1” through “6”) appear in the tables of Fig. 4. For each case, the user was asked to provide a three-levels ranking of the similarity in the spatial positioning of the entity pair as appearing in the reference picture and in its six variations. The reference picture and the variations were shown on separate pages, letting the user decide when to hide the reference and show the variations to rank. After the rank, the user was also asked to position the two entities of the case so as to reproduce the reference arrangement and validate the ranking. The test was experimented with three pilot users and then administered to 32 students of a postdoctoral course on multimedia. In the final version, the test took approximately 25 min. The average rank of similarity provided by the users was complemented to provide a measure of perceptual distance between each variation and its reference picture. Average values and standard deviation of the user-provided rankings are reported in the first two rows of the tables in Fig. 4. The distance between each reference arrangement and its variations was then computed using 2-D strings, 2-D-B strings, CO, and WWs. For WWs, coefficients and , were set equal to .25, .25, .2, .1, .1 and .1, respectively.

For 2-D and 2-D-B strings, the distance between different binary arrangements was evaluated as the number of hops in the neighborhood graph. For the CO, the distance was evaluated as the angular difference between directions connecting entity centroids. Results are reported in the last four rows of the tables in Fig. 4. For each model and for each test case, the computed ranking was finally compared against the user-provided ranking . In the comparison, computed rankings were scaled to fit into the range of user-provided rankings. The scaling factor was computed so as to minimize the minimum square error in the overall set of variations of each case

mean square error

(5)

Results of the comparison in the five testing cases are shown in Fig. 5. Each graph refers to a testing case and compares the user-perceived distance against the ranking provided by the four modeling techniques under test. The mean square error (mse) scored by each modeling technique is reported in the table of symbols of each test case. Numeric values of mses reported in Fig. 5 provide a synthetic clue for the comparison of the different modeling techniques in the cases under test: 1) quantitative models (WWs and orientation) always overperform qualitative ones (2-D-B and 2-D strings); 2) the object extension considered in the WWs yields better results than the punctual approximation of the CO. Apart from these synthetic remarks, the analysis of the cases under test highlights conditions which stress the inherent weaknesses of the different modeling techniques, independent of the specific composition of the testing benchmark. Symbolic classification of the 2-D-string and the 2-D-B string tends to provide a flat classification, which does not separate variations distinguished by the user. This is exacerbated by the centroid approximation in the 2-D-string. For instance, in Fig. 4, the ranking for cases d and a are characterized by a large number of equal (and null) values. The 2-D-B string introduces classification discontinuities between variations which are perceptually close. For instance, in

58

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

The limits of the punctual approximation employed by the 2-D-string and CO are emphasized when mutual distances are small with respect to the internal dimensions of entities. When entities are sufficiently near to interact, the punctual approximation may prevent the distinction of variations that are definitely separated. For instance, in Fig. 4, in case a, the CO assimilates variation 5 to the reference picture; the same happens in case c for variations 1 and 3, and in case d for variation 3. Of course, the same errors are made by the 2-D-string, which is not able to separate any two variations that are assimilated by the CO. In the table of Fig. 5, all these cases result in under-estimates of perceptual distance. As an opposite effect, when the distance between entities is small with respect to their internal size, the relationships between centroids tend to lose robustness and a slight variation in the positioning of the entities may result in a significant variation of the mutual orientation of centroids. For instance, in Fig. 4, in case d, consider the different ranking provided by the CO and by the 2-D string for variations 3 and 4. The last two variations of the testing case e create difficulties for all the models under test. In these variations, the moving object on the RHS is rotated with respect to the reference picture, and none of the spatial models under test is able to follow the distance that this induces on the user perception. Due to their numeric and integral nature, WWs are not able to create a classification discontinuity to follow the sharp distinction caused in the user perception by the presence or absence of an overlap. This is observed in Fig. 5, in case a for variation 5, and in case d for variations 2, 3, and 4. In all these cases, WWs under-estimate perceptual distance. III. USING WEIGHTED WALKTHROUGHS BY VISUAL CONTENT

IN

RETRIEVAL

Modeling of the mutual positioning of two extended pixel sets is the core of retrieval by spatial arrangement. However, for concrete application, this nucleus must be developed so as to support representation and comparison of the overall arrangement of multiple visual entities appearing within an image and a query. In this section, we position WWs into a graph theoretical framework, and formulate and discuss the problem of match between images and queries. We then describe the application of the framework in two prototypes supporting different and complementary approaches to visual content modeling. Fig. 5. Difference in the ranking of distance as provided by the user sample and by each of the four modeling techniques under test. For each test case, the table reports the mse on the set of six variations for each modeling technique.

Fig. 4, in case b, variations 1, 2, and 3, are definitely distinguished from the reference picture; in case a, the reference picture is separated from variations 1 and 2, and variations 1 and 2 are separated from each other; in case e, the reference picture is separated from variations 1 and 2, and variations 1 and 2 are separated from each other. The 2-D-string, which also applies a symbolic classification, appears to be less affected by this problem as thresholds in the mutual positioning are encountered less frequently by centroids than by the extrema of bounding boxes. An example in which this happens is the classification discontinuity in case a between variations 3 and 4.

A. Graph Model for Spatial Arrangements The spatial arrangement of an image is modeled as an attributed relational graph image model set of spatial entities (6) Vertices represent significant spatial entities. Depending on the specific application context, these may be patches with homogeneous color or texture [16], [19], [34], or objects represented in the image, e.g., persons, plants, or handmade in a painting [8] [23]. Qualifying features of each entity are captured by the attribute label taking values in the feature space . Space is

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

59

Fig. 6. Manual annotation in the creation of models for retrieval based on spatial arrangement of typed objects: (a) and (b) the picture to archive is loaded in the main window and relevant objects are outlined and annotated with a type in the object hierarchy in the lower right window; and (c) the system model of the picture is the labeled graph capturing the spatial relationships between typed objects defined by the user.

supposed to be provided with a distance satisfying the axiomatic properties of nonnegativity, normality, and reflexivity. Furthermore, edges are labeled by WWs capturing the spatial relationship between pixel sets occupied by visual entities, and of (4). are thus compared according to the spatial distance To accommodate partial knowledge and intentional detail concealment, both edges and vertices can take a neutral

label

, yielding an exact match in any comparison (i.e., and ).

B. Comparing Spatial Arrangements The comparison of two image models in the form of (6) consists of finding the optimal association of the entities in two

60

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

Fig. 7. Visual interaction in retrieval based on spatial arrangement of typed objects. (a) The user draws a sketch of spatial entities characterizing images to search, annotates them with a type from the archive hierarchy and positions them in their expected spatial arrangement. (b) The sketch is interpreted by the system and checked against the models stored in the archive; matching images are displayed in a separate window sorted by spatial and attribute similarity. Visual interaction in retrieval based on spatial arrangement of typed objects. (c) A refined query with three objects and (d) the corresponding retrieved images.

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

graphs, and , representing a query specification and an image description, respectively. Specifically, the problem consists of finding an injective function , called an interpretation, which assigns the entities in the query to a subset of the entities in the description (7) and which maximizes the similarity in the attributes of pixel sets and in their spatial relationships. Attribute and spatial distance, and their association into a joint metric, can be defined by comand associated with entity bining the metrics of distance attributes (vertices) and WWs (edges). Using an additive composition, this can be expressed as follows. is the sum of the distances measured The spatial distance on the spatial relationships between homologous entities in the query and description graphs (8) is the sum of the distances meaThe attribute distance sured on the attributes of query and description entities associated by the interpretation (9) The joint distance tances

is the sum of spatial and attribute dis-

(10) where balances the mutual relevance of spatial and attribute distance. In the comparison of the graphs representing two spatial arrangements, the similarity scored by an interpretation depends not only on the pairwise attribute similarity between associated vertices, but also on the similarity between the edges connecting associated vertices themselves. This yields a nonpolynomial complexity and rules out classical algorithms for bipartite weighted matching in the theory of network flows [1]. This complexity is a general problem for content modeling techniques which jointly account for spatial relationships and object attributes that can be matched by multiple object instances [37]. In the straightforward backtracking approach, the problem can be solved in two separate steps: Step 1) enumerate the set of injective interpretations which map the vertices of the query on a subset of the vertices of the description; Step 2) evaluate the similarity scored by each single mapping through the comparison of homologous labels on isomorphic graphs. The comparison of each two isomorphic graphs is linear in the number of vertices and edges in the graphs themselves, but the number of isomorphic graphs to compare is exponential in the number of vertices in the query and the description. Typical values for the number of entities in the query and the description basically depend on the modeling approach used to identify entities and their qualifying attributes. In retrieval by sketch based on the spatial arrangement of typed objects, the query may typically involve two to four entities and the description eight to 16 entities. In retrieval based on low-level cohesion features,

61

queries and descriptions tend to share by construction a common typical dimension which depends on the resolution of the segmentation process; three to six segments is a reasonable range in which an automated segmentation may achieve the best plausibility [15]. In both cases, the dimension of the querying and description graphs makes the straightforward enumerative approach not viable, and requires appropriate algorithms to manage the complexity of match. In [4] and [5], an efficient matching algorithm based on an original look-ahead strategy is introduced and integrated within a metric indexing scheme. Computational experience reported demonstrates that this permits us to efficiently manage the complexity of a real application context. In particular, the proposed solution turned out to be suitable for the two systems described in the latter part of this section. C. Retrieval by Spatial Arrangement of Annotated-Type Objects WWs have been employed within an educational prototype system supporting retrieval by visual content from a library of paintings of the Italian Renaissance [13], [38]. Contents of such paintings are largely pervaded by cultural conventions about the appearance and the spatial arrangement of characters, which are properly cast in the relational framework of (6) by interpreting spatial entities as imaged objects and attributes as iconographic types [14]. Types are organized on a tree specialization hierarchy, and the attribute distance is defined as a discrete function based on the hops separating entities in the specialization tree. Archival is based on the manual annotation of picture contents, and querying is carried out by sketch. Fig. 6 illustrates the operation of the graphic module supporting user interaction during the archival stage. The image to archive is loaded in the main window in the upper-left part of the screen, and objects that are relevant for content characterization are manually outlined and annotated with a type selected from the type hierarchy in the lower-right window of the screen; new types can be dynamically added to the hierarchy according to the user intention. When all the relevant objects have been outlined and classified, the picture is passed to the system; object contours are approximated to a multirectangular shape on a grid grain which is set according to Theorem 4.7, so as to ensure a predefined level of accuracy in the evaluation of spatial relationships. Finally, WWs between each pair of outlined objects are computed, and the image description graph defined in Section III is constructed and stored in the archive. Fig. 7(a) and (b) illustrates the operation of the querying module; the user draws a sketch of the contour of characterizing objects, classifies them through a type from the hierarchy, and positions them so as to reproduce the expected arrangement in searched images. The sketch is interpreted by the system as a querying-graph and checked against the description graphs stored in the archive. Matching images are displayed in a separate window sorted by spatial and type similarity (shown below each retrieved image). In Fig. 7(c) and (d), the query is refined through the addition of a third object; the interpretation of the sketch takes into account only those spatial relationships that are marked by the user and made explicit by lines on the screen.

62

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

Fig. 8. (a) and (b) Two queries with the same entities in opposite positions; and (c) details and scores of spatial similarity of retrieved pictures. For the two images that do not appear in the retrieval set of query (b), the scores of spatial similarity are not reported.

In Fig. 8, two further queries with different arrangements of the same entities are shown. Values of spatial similarity are evidenced to highlight the continuous changing in the evaluation of spatial similarity between the spatial relationships in the two cases. D. Retrieval by Spatial Arrangement of Chromatic Patches The integral measure which underlies their definition makes WWs robust with respect to possible complexities in the spatial

distribution of entities. In particular, the inherent compositionality naturally permits us to manage entities composed of multiple nonconnected regions. This capability can be suitably exploited in retrieval by spatial arrangement of chromatic content. In this case, entities are sets of pixels with homogeneous chromatic characteristics (i.e., color clusters) and entity attributes are low-level features such as chrominance, texture, and size. In the archival stage, entities and attributes can be derived automatically through a segmentation process, which clusters color histograms, and back-projects

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

63

Fig. 9. (a) Segmentation in the color space results in multiple nonconnected picture regions sharing common color attributes. In the straightforward approach, each such region comprises an individual entity. (b) Color clusters collect regions with common attributes to comprise a single entity.

color clusters onto the space of the image. In so doing, the spatial entities associated to a color cluster may be spatially split in multiple nonconnected image segments (see Fig. 9). In the straightforward approach, each image segment may be regarded as an individual entity. However, this generally incurs over-segmentation, due to occlusions or to the inherent limits of automated segmentation algorithms [11]. To circumvent the problem, entities can be identified with the overall set of pixels sharing a common color cluster, without taking into account their distribution in separate nonconnected regions [22]. This augments perceptual robustness of representation and opens the way to encompass spatial relationships within efficient indexing schemes. However, to make the approach viable, spatial representation must be able to cope with a number of complexities: color clusters are usually not connected; their mutual distances may be small with respect to their dimensions; and they may be tangled in a complex arrangement evading any crisp classification. While basically limiting the significance of embedding rectangles and centroids, these complexities are naturally encompassed by WWs, which do not require connected entities and which natively support vagueness of representation. Following this approach, the metric of similarity based on the joint combination of color and WWs has been employed within a prototype retrieval engine [6], [7]. In the archiving stage, each image is modeled as an attributed relational graph, following the structure of (6). Entities of the set are derived by clustering the color histogram in the color space through a split&merge algorithm [16], which identifies dominant color clusters, approximately comprised of equal quantities of pixels. The set of pixels comprising each color cluster is modeled as a single spatial entity, regardless of the fact that pixels may lay in nonconnected segments of the image space. Each entity is associated with an attribute encoding the normalized coordinates of the average color of triple of the cluster.

In the querying stage, the user is allowed to employ two different querying modalities: 1) by global similarity and 2) by sketch. In a query by global similarity, the user selects an example picture in the archive, and the system retrieves the pictures that are “globally” similar to the example. This means that the querying graph that represents the query is the same graph that represents the example picture in the archive. In a query by sketch, the user expresses the query by drawing a set of colored regions, and by arranging them so as to represent the expected appearance of searched images. This kind of query is usually comprised of two or three regions with variable size. For both of the querying modalities, the user is allowed to dynamically set the balance of relevance in the combination of spatial and chromatic distances, corresponding to a direct setting of parameter in (10). Fig. 10(a) illustrates the querying operation; the user draws a sketch of the contour of characterizing color entities, and positions them so as to reproduce the expected arrangement in searched images. The sketch is interpreted by the system as a set of color clusters and their spatial relationships, and checked against the descriptions stored in the archive. Matching images are displayed in a separate window sorted by spatial and color similarity [see Fig. 10(b)]. Fig. 11(c) and (d) shows the expression and the retrieval results of a query by example. In this case, one of the database images is pointed as query, and the system search for the most similar images, using all the color entities and relationships that appear in the query representation. 1) Experimental Evaluation: Using the prototype system, the capability of WWs to manage the complexity involved in retrieval by cromatic content, was evaluated in a two-stage test, focusing first on a benchmark of basic synthetic arrangements of three colors, and then on a database of real images. a) Benchmark database of basic arrangements: The first stage of the evaluation was oriented to investigate the capability

64

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

Fig. 10.

(a)

(a)

(b)

(b)

(a) Query by sketch and (b) the corresponding retrieval set.

of WWs to capture differences and similarities in basic spatial arrangements of colors, by abstracting from other relevant features, such as color distribution, size, and shape of color patches. The test was constructed around four reference synthetic pictures composing rectangular objects of three different colors: approximately red, blue, and yellow [see Fig. 12(a)–(d)]. Three sets of nine mutations were derived from each reference picture by a random engine changing the spatial arrangement of rectangles, but not their color histograms [see Fig. 12(e)], in such a way that their perceptual differences are mainly biased by the spatial arrangement of patches. Each user was shown a sequence of 3 4 html pages, each showing a reference picture and a set of its nine variations. For each reference picture, users were asked to provide a three-level ranking of the similarity between

Fig. 11. (a) Query by image example and (b) the corresponding retrieval set.

the reference picture itself and each of its nine mutations. The test was administered to 25 students of a postdoctoral course on multimedia. The average session took an average of 7 min. The average rank provided by the users sample was normalized to provide a value of perceptual similarity for each mutation, and then compared against the ranking of similarity provided by WWs. Results are summarized in Fig. 13: the horizontal axis is the dimension of the retrieval set, and the vertical axis is a measure of recall obtained as the sum of the perceptual similarity of the pictures included in a retrieval set of that dimension. In this representation, a perfect accordance of retrieval with the user ranking results in a concave slope, which is obtained when pictures are added to the retrieval set in order of decreasing value of similarity; a misclassification

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

Fig. 12. (a)–(d) Four reference synthetic pictures for the evaluation of user perceived similarity in the spatial arrangement of rectangular objects. (e) Nine mutations of the second reference picture as arranged in the form for the user test.

is highlighted by a change in the convexity of the curve which is derived from the “anticipated” retrieval of a picture with a lower value of similarity. Plotted curves compare results for WWs and CO which result, from the previous stage of the experimentation, as their more plausible competitor in the application context of visual information retrieval. For all four queries, WWs closely fit the ideal user-based curve in the ranking of the first, and most relevant, variations. A significant divergence is observed only on the third query for the ranking of variations taking the positions between 8 and 16. In all the cases under test, WWs overperform CO. In particular, CO evidences a main limit in the processing of the second query. The long sequence with horizontal slope indicates that this problem of CO is derived from a misclassification which confuses variations of the query with those of different reference pictures. Analysis of the specific results of retrieval indicate that CO is not able to discriminate the second reference pictures, which is definitely different in the user perception, but shares an almost equal representation in terms of the centroids of color clusters.

65

b) Benchmark database of real images: The second stage of the evaluation was aimed at extending the experimental results to the case of images with realistic complexity. To this end, the representations based on the graph model and on a global color histogram were compared through a user-based test, administered to the same set of students of the previous test. For the experiments, the system was applied to an archive of 200 reference paintings featured by the library of WebMuseum [39]. Before the start of the testing phase, users were trained with two preliminary examples, in order to ensure their understanding of the system. During the test, users were asked to retrieve a given set of eight target images (shown in Fig. 14), representing the aim of the search, by expressing queries by sketch. To this end, users were shown each target image and they were requested to express a query with three regions to retrieve it [see Fig. 10(a)]. Only one trial was allowed for each target image. For each query, the ranking of similarity on the overall set of 200 pictures was evaluated using the joint modeling of color and spatial relationships, and the global color histogram. Results were summarized within two indexes of recall and precision. For each target image, recall is one if the target image appears within the set of the first 20 retrieved images, zero otherwise; thus expressing with a true/false condition the presence of the target image within the retrieval set. Precision considers the rank scored by the target image in the retrieval set: it is one if the target image is ranked in the first position, and gradually decreases to zero when the target is ranked from the first toward the twentieth position (i.e., precision is assumed to be zero when the target is ranked out of the first 20 retrieved images). In this way, precision measures the system capability in classifying images according to the implicit ordering given by the target image. System recall and precision for each of the eight target images, are derived by averaging the individual values scored for a target image on the set of users’ queries. Results are reported in Fig. 15(a) and (b). Fig. 15(a) compares values of recall for the proposed model (WW), and for the color of histogram. For WW, results are reported for value the parameter which weighs the contribution of color and spatial distance in (10). Though histogram provides an average acceptable result, it becomes completely inappropriate in two of the eight cases ( and ), where the recall becomes zero. Differently, search based on the WW model provides optimal results for each of the eight tested cases. Histogram is clearly disadvantaged when the system performance is measured as rank of the target image in the retrieval set, as evidenced in plots of precision in Fig. 15(b). By considering spatial information, ranking provided from the system is much closer to the user expectation than that given by global histogram. Only for the target image , the histogram outper, basically due to the low spatial characterization of forms this image. IV. DISCUSSION WWs are characterized by their capability to provide a quantitative representation of the joint distribution of masses in two extended spatial entities.

66

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

Fig. 13. Recall value on the vertical axis sums up the average user-ranked value of similarity of the pictures included in the retrieval set with the dimension shown on the horizontal axis, for users (USERS), weighted walkthroughs (WW), and centroid orientation (CO). Cases (a), (b), (c), and (d) correspond to the four reference pictures of Fig. 12, respectively.

Fig. 14.

Eight target images employed in the test which compares the graph model against a global color histogram.

(a)

(b) Fig. 15. Values of (a) recall and (b) precision are compared for the proposed model (WW) and for global color histogram (Histogram). It can be noticed as the global histogram fails in ranking the second and fourth target images, the recall and precision values of which are both null.

This relationship is quantified over the dense set of pixels which comprise the two entities, without reducing them to a minimum embedding rectangle or to a finite set of representative points. This improves the capability to discriminate perceptually different relationships, and makes the representation applicable also for complex and irregular shaped entities. Matching a natural trait of vagueness in spatial perception, the relationship between extended entities is represented as the union of the primitive directions (the walkthroughs) which connect their individual pixels. The mutual relevance of different directions is accounted by quantitative values (the weights) which enable the establishment of a quantitative metric of similarity. Breaking the limits of Boolean classification of symbolic models, this prevents classification discontinuities and improves the capability to assimilate perceptually similar cases. Weights are computed through an integral form which satisfies a main property of compositionality. This permits us to efficiently compute the relationships between two entities by linear combination of the relationships between their parts, which is not possible for models based on symbolic classification. This is the actual basis which permits us to ensure consistency in quantitative weighting of spatial relationships and to deal with extended entities beyond the limits of the minimum embedding rectangle approximation.

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

WWs can be employed in retrieval by visual content by positioning them within a graph theoretical framework capturing the spatial arrangement of multiple attributed entities. This may fit different modeling schemes based on different selections of entities and their qualifying attributes. Two complementary approaches have been used in which entities are either typed objects derived from a manual annotation or color clusters derived from automated segmentation. In particular, the inherent characteristics of this latter application case gives value to the specific capabilities of WWs to model the spatial relationship between complex and nonconnected extended entities.

67

Proof: By (2), properties of the Dirac function in (1)

Theorem 4.3: For any pixel set and pixel sets

and, by the

and for any two disjoint

APPENDIX Theorem 4.1: WWs are scale and shift invariant, i.e. Proof: Proof: Shift invariance descends from the fact that is a relative measure, i.e., it depends on the disand , rather than on their placement between points in absolute position. Scale invariance derives from integration linearity and from and . the properties of characteristic functions and , demonstration is an immediate conseFor . When or are equal quence of the scale invariance of to zero, demonstration also involves the reduced dimensionality and , and the fact that scale invariof coefficients ance is exhibited by the integral of the Dirac delta function rather than by the delta function itself

Theorem 4.4: For any two multirectangular pixel sets , the sum of the four corner weights is equal to one

and (11)

We consider the case and dealt with in the same manner

. Cases with

are

Proof: Demonstration runs by induction on the set of rectangles which composes and . By the property of compositionality (Theorem 4.3), for any and , the corner partition of in two disjoint subparts can be expressed as coefficients of

(12) Since this is a convex combination, i.e. (13)

Theorem 4.2: WWs are reflexive, i.e.

are a convex combination of corner coefficients of and , respectively, and corner coefficients of therefore is also the sum of the corner coefficients themselves. This implies that, by recursive decomposition of and , the is a convex combinasum of the corner coefficients of tion of the sum of the corner coefficients of the WWs between elementary rectangles mutually arranged in basic cases. In all three basic cases of Section II-B, the sum of corner elements is equal to one. This implies that any convex combination of the sum of the corner coefficients of the WWs among any set of elementary rectangles mutually arranged in basic cases is equal to one.

68

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

Theorem 4.5: For any two multirectangular pixel sets and , the sum is upper-bounded by , where denotes the part of , the ratio the projection on the vertical axis of which is included in the projection of [see Fig. 2(b)]. and denote the linear secProof: By (1), if in correspondence of the vertical tions of pixel sets and coordinate , we can write

while, in the latter case, we have

(17)

Theorem 4.7: Let and be a pair of pixel sets, and let be the minimum multirectangular extension of on a grid of size (see Fig. 3). Let denote the difference between and . between the walkThe distance and , and throughs capturing the relationships between between and undergoes the following bound:

(18) (14)

Theorem 4.6: For any two multirectangular pixel sets and , the central weight equals the ratio between the and the square of the product measure of the intersection of the measures of and (15)

Proof: Separate bounds are derived for the six distance and . components By the property of compositionality (Theorem 4.3), can be decomposed as shown at the , yields top of the page, which, by the normality of

The same estimate can be applied to and . can be decomposed as

Proof: As in the proof of Theorem 4.4, we assume that (15) holds for the relationship between multirectangular pixel sets and , and we extend it to the relationship between and where is a rectangle disjoint from . Without loss of generality, we can assume that either or . By the property of compositionality (Theorem 4.3), and by (15), in the first case we obtain

In the first term, is not higher and can than one according to Theorem 4.5, while and (see Fig. 3), thus be estimated by yielding (16)

BERRETTI et al.: WEIGHTED WALKTHROUGHS BETWEEN EXTENDED ENTITIES

In

the

second

term,

according to Theorem 4.5, can be estimated by , thus yielding

The same estimate can be applied to . , the property of compoFinally, for yield sitionality and the normality of

By linear composition through coefficients and , the individual bounds yield the overall estimate of (18). Here, it is worth noting that the derivation of bounds for disand strictly relies on the presence tance components and (i.e., of linear measures on the bounding boxes of ) which appear in the middle of and of (2). coefficients REFERENCES [1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, Network Flows: Prentice Hall, 1993. [2] J. F. Allen, “Mantaining knowledge about temporal intervals,” Commun. ACM, vol. 26, pp. 832–843, Nov. 1983. [3] S. Berretti, A. Del Bimbo, and E. Vicario, “Modeling spatial relationships between color clusters,” Pattern Anal. Applicat., Special Issue on Image Indexation, vol. 4, no. 2/3, pp. 83–92, 2001. [4] , “Efficient matching and indexing of graph models in contentbased retrieval,” IEEE Trans. Pattern Anal. Machine Intell., Special Section on Graph Algorithms in Computer Vision, vol. 23, pp. 1089–1105, Oct. 2001. [5] , “The computational aspect of retrieval by spatial arrangement,” presented at the Proc. 15th ICPR, Barcelona, Spain, Sept. 2000. , “Weighting spatial arrangement of colors in content-based image [6] retrieval,” presented at the Proc. IEEE ICMCS, Firenze, Italy, June 1999. [7] , “Modeling spatial relationships between color sets,” Proc. IEEE Int. Workshop CBAIVL, June 2000. [8] S. K. Chang, Q. Y. Shi, and C. W. Yan, “Iconic indexing by 2-D strings,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, pp. 413–427, July 1987. [9] S. K. Chang, E. Jungert, and T. Li, “Representation and retrieval of symbolic pictures using generalized 2-D strings,” in Proc. SPIE Visual Communications and Image Processing IV, vol. 1199, 1989, pp. 1360–1372. [10] S. K. Chang and E. Jungert, “Pictorial data management based upon the theory of symbolic projections,” J. Vis. Lang. Comput., vol. 2, pp. 195–215, June 1991. [11] J. M. Corridoni, A. Del Bimbo, and E. Vicario, “Image retrieval by color semantics with incomplete knowledge,” J. Amer. Soc. Inf. Syst., 1998. [12] A. Del Bimbo and E. Vicario, “Specification by-example of virtual agents behavior,” IEEE Trans. Visual. Comput. Graph., vol. 1, Dec. 1995. [13] , “Using weighted spatial relationships in retrieval by visual content,” presented at the IEEE Workshop Content-Based Access of Image and Video Databases, Santa Barbara, CA, June 1998.

69

[14] A. Del Bimbo, W.-X. He, and E. Vicario, “Using weighted spatial relationships in retrieval by visual contents,” in Image Description and Retrieval, E. Vicario, Ed. New York: Plenum, 1998. [15] A. Del Bimbo, Visual Information Retrieval. New York: Academic, 1999. [16] A. Del Bimbo, M. Mugnaini, P. Pala, and F. Turco, “Visual querying by color perceptive regions,” Pattern Recognit., vol. 31, no. 9, pp. 1241–1253, 1998. [17] M. J. Egenhofer and R. Franzosa, “Point-set topological spatial relations,” Int. J. Geograph. Inf. Syst., vol. 5, no. 2, pp. 161–174, 1991. , “On the equivalence of topological relations,” Int. J. Geograph. [18] Inf. Syst., vol. 9, no. 2, 1992. [19] M. Flickner, H. Shawney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steel, and P. Yonker, “Query by image and video content: The QBIC system,” IEEE Computer, vol. 28, pp. 23–31, Sept. 1995. [20] A. U. Frank, “Qualitative spatial reasoning about distances and directions in geographic space,” J. Vis. Lang. Comput., vol. 3, no. 3, pp. 343–371, 1992. [21] C. Freksa, “Using orientation information for qualitative spatial reasoning,” in Proc. Int. Conf. Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, A. U. Frank, I. Campanari, and U. Formentini, Eds. New York, 1992, pp. 162–178. Lecture Notes in Computer Science. [22] Y. Tao and W. I. Grosky, “Spatial color indexing: A novel approach for content-based image retrieval,” presented at the Proc. IEEE ICMCS, Firenze, Italy, June 1999. [23] V. N. Gudivada and V. V. Raghavan, “Design and evaluation of algorithms for image retrieval by spatial similarity,” ACM Trans. Inf. Syst., vol. 13, Apr. 1995. [24] V. N. Gudivada, “Spatial knowledge representation and retrieval in 3-D image databases,” presented at the Int. Conf. Multimedia and Computing Systems, Washington, DC, May 1995. [25] V. N. Gudivada and V. V. Raghavan, “Special issue on content-based image retrieval systems,” Computer, vol. 28, no. 9, 1995. [26] A. Gupta and R. Jain, “Visual information retrieval,” Commun. ACM, vol. 40, pp. 70–79, May 1997. [27] D. Hernandez, “Relative representation of spatial knowledge: The 2-D case,” Technische Universitaet Muenchen, Münich, Germany, Rep. FKI135-90. [28] J. Huang, S. R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, “Image indexing using color correlograms,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 762–768, June 1997. [29] E. Jungert, “Qualitative spatial reasoning for determination of object relations using symbolic interval projections,” in IEEE Int. Workshop Visual Languages, 1993, pp. 83–87. [30] S. Y. Lee and F. Hsu, “Spatial reasoning and similarity retrieval of images using 2-D C-strings knowledge representation,” Pattern Recognit., vol. 25, no. 3, pp. 305–318, 1992. [31] S. Y. Lee, M. C. Yang, and J. W. Cheng, “Signature file as spatial filter for iconic image database,” J. Vis. Lang. Comput., vol. 3, no. 4, pp. 373–397, 1992. [32] D. Papadias and T. Sellis, “The semantics of relations in 2-D space using representative points: Spatial indexes,” in Proc. Int. Conf. Spatial Information Theory, Marciana Marina, Italy, Sept. 1993, pp. 234–247. , “A pictorial query by example language,” in Proc. Int. Conf. Spa[33] tial Information Theory, vol. 6, Marciana Marina, Italy, 1995, pp. 53–72. J. Vis. Lang. Comput.. [34] J. R. Smith and S. F. Chang, “Automated image retrieval using color and texture,” Columbia Univ., New York, Rep. TR 414-95-20, July 1995. , “VisualSEEk: A fully automated content-based image query [35] system,” presented at the Proc. ACM Multimedia, Boston, MA, Nov. 1996. [36] J. R. Smith and C.-S. Li., “Decoding image semantics using composite region templates,” presented at the Proc. IEEE CVPR’98 Workshop Content-Based Access to Image and Video Libraries, June 1998. [37] A. Soffer and H. Samet, “Handling multiple instances of symbols in pictorial queries by image similarity,” presented at the Proc. Int. Workshop IDB-MMS, Amsterdam, The Netherlands, Aug. 22–23, 1996. [38] E. Vicario and W. X. He, “Weighted walkthroughs in retrieval by contents of pictorial data,” presented at the Proc. ICIAP, Sept. 1997. [39] [Online]. Available: http://www.oir.ucf.edu

70

Stefano Berretti graduated in electronics engineering from the Università di Firenze, Firenze, Italy, in 1997. He received the Ph.D. in information and telecommunications engineering in 2001 from the same University. Currently, he is Assistant Professor of computer engineering, Dipartimento di Sistemi e Informatica, Università di Firenze. His main scientific interests include pattern recognition, image and video databases, and multimedia information retrieval.

Alberto Del Bimbo (M’90) is Full Professor of Computer Engineering at the Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Firenze, Italy, the Director of the Master in Multimedia, and Deputy Rector for Research and Innovation Transfer at the same university. His scientific interests have addressed the subject of image technology and multimedia, with particular reference to object recognition and image sequence analysis, content-based retrieval for image and video databases, visual languages, and advanced man-machine interaction. He is the author of over 150 publications which have appeared in the most distinguished international journals and conference proceedings, and is the author of the monography Visual Information Retrieval (San Mateo, CA: Morgan Kaufman, 1999). He has also been the Guest Editor of several special issues of international journals and the Chairman of several conferences in the field of image processing, image databases, and multimedia. Prof. Del Bimbo was the President of the Italian Chapter of the International Association for Pattern Recognition (IAPR) from 1996 to 2000, and a Member of the IEEE Publications Board from 1999 to 2001. Currently, he is a Member of the Steering Committee of the IEEE International Conference on Multimedia and Expo, and of the VISUAL conference series. He serves as Associate Editor of IEEE TRANSACTIONS ON MULTIMEDIA, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Pattern Recognition, Pattern Analysis and Application Journal, Multimedia Tools and Applications Journal, and Journal of Visual Languages and Computing. He is a Fellow of the IAPR.

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 5, NO. 1, MARCH 2003

Enrico Vicario (M’95) received the Doctoral Degree in electronics engineering and the Ph.D. in information and telecommunications engineering from the Università di Firenze, Italy, in 1990 and 1994, respectively. Since 1998, he has been Associate Professor of Information Engineering at the Università di Firenze and Università di Ancona, Ancona, Italy. Currently, he is Full Professor, Dipartimento di Sistemi e Informatica, Università di Firenze. His research activity has developed on both software engineering and visual information technologies, with specific contributions: image modeling and retrieval based on spatial arrangement, visual formalisms, usability engineering, formal description and validation of reactive and time dependent systems. Prof. Vicario is an Associate Editor of IEEE Multimedia, and a frequent reviewer of IEEE TRANSACTIONS ON SOFTWARE ENGINEERING and the Journal of Visual Languages and Computing.