Data Warehousing und Data Mining Multidimensionale Indexstrukturen

Ulf Leser Wissensmanagement in der Bioinformatik

Content of this Lecture

• Multidimensional Indexing • Grid-Files • Kd-trees

Ulf Leser: Data Warehousing und Data Mining

2

Multidimensional Queries

• Conditions on more than one attribute – Combined through AND (intersection) or OR (union)

• Partial queries: Conditions on some but not all dimensions • A MDQ selects a sub-cube – 2D: “All beverage sales in March 2000“ – 4D: “All beverage sales in 2000 in Berlin to male customers”

Ulf Leser: Data Warehousing und Data Mining

3

Composite Indexes month_id Point

X

Y

P1

2

2

P2

2

2

P3

5

7

P4

5

6

P5

8

6

P6

8

9

P7

9

3

product_id

• Imagine composite index on (X, Y) • Efficiently supported – Box queries (conditions in dimensions X and Y) – Points/rectangles with X coordinate between …

• Not efficiently supported – Points/rectangles with Y coordinate between … Ulf Leser: Data Warehousing und Data Mining

4

Composite Index • One index with two attributes (X, Y)

1 2 1 4

8 2 8 3

9 1

• Prefix of attribute list in index must be present in query – The longer the prefix in the query, the better

• Alternatives: Use independent indexes on each attribute

Ulf Leser: Data Warehousing und Data Mining

5

Independent Indexes • One index per attribute

Index on X

Index on Y

• Partial match query on one attribute: supported • Partial match query on >1 attributes – Compute TID lists for each attribute – Intersect Ulf Leser: Data Warehousing und Data Mining

6

Independent versus Composite Index • Consider 3 dimensions of range 1,...,100 – 1.000.000 points, uniformly distributed at random – Index blocks hold 50 keys or records – Index on each attribute has height 4

• Find points with 40