4/24/2009

4. Queries 4. Queries 4.1 Query processing 4.2 Queries in DW / OLAP 4.3 Physical Modeling

Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.1Query processing

4.1 Query processing

• Queries are posed to the DBMS and processed before the actual evaluation Query Processor

Data

Query Evaluation Engine

Storage Manager

• How queries are answered – Queries are usually stated in a high level declarative language as SQL

Applications Programs Object Code

• For relational DB it can be mapped to relational algebra (RA)

Embedded DML Precompiler

– For evaluation it has to be translated to a low level execution plan

DML Compiler

• Expressions that can be used at physical level of the file system – E.g., for RDB physical relational algebra

DDL Interpreter

3

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.1Query processing Query

Parser & Translator

2

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4

4.1 Query processing • Parsing and translation

Relational Algebra Expression

– Queries need to be translated • A scanner tokenizes the query Query Optimizer

Query Result

Evaluation Engine

Execution Plan

– DB language keywords, table names, attribute names, etc.

• The parser checks syntax and verifies relations, attributes, data types, etc. • Translate the query into its internal form

Access Paths

– Translated into relational algebra

Statistics Data

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

5

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

6

1

4/24/2009

4.1 Query processing

4.1 Query processing

• Optimization

• Evaluation

– Several relational algebra expressions might lead to the same result

– The query-execution engine takes a query-evaluation plan, executes it, and returns the answers to the query – For the result of each operator a temporary file has to be created

• But different statements might also result in very different performance

– Query optimization is the heart of every database kernel • Finding optimal plans may cost too much, but avoid crappy plans by all means

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

• Temporary files can be input for other operators • Storing the temporary files on the disk is expensive, but necessary if DB buffer is small

7

4.2 DW Queries

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

8

4.2 DW Queries

• DW queries are big queries

• OLAP usage fields

– Imply a large portion of the data – Read only queries

– Management Information • Sales per product group / area / year

• no Updates

– Government

• Redundancy a necessity – Materialized Views, special-purpose indexes, de-normalized schemas

• Population census

– Scientific databases • Geo-, Bio-Informatics

• Data is refreshed periodically

– Etc.

– E.g., Daily or weekly

• Their purpose is to analyze data

• Goal: Response Time of seconds / few minutes

– OLAP (OnLine Analytical Processing) Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

9

4.2 Why use DW

10

4.2 OLTP vs. OLAP

• ODS can also run analytical queries…but they are not so good at it • OLTP and OLAP are to each other as Water and Oil

• Tuning, Optimization – OLTP - inter-query parallelism, heuristic optimization – OLAP - intra-query parallelism, full-fledged optimization

• Freshness of Data

– Lock Conflicts: OLAP blocks OLTP

– OLTP - serializability – OLAP - reproducability

• E.g., an OLAP query can block the sales activity of all the stores trying to update the DB

• Precision

– Database design:

– OLTP - ACID – OLAP - Sampling, Confidence Intervals

• OLTP - normalized, • OLAP - de-normalized Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

11

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

12

2

4/24/2009

4.2 Why use DW

4.2 Typical analytical requests • Comparisons

• The solution is to run OLTP and OLAP separately • DW is a special sandbox for OLAP

– Show me the sales per region for this year and compare it to that of the previous year to identify discrepancies

• Multidimensional ratios – Show me the contribution to weekly profit made by all items sold in the northeast stores between 1st of May and 7th of May

– As input it uses OLTP systems – DW aggregates and replicates data

• Ranking and statistical profile

• Special schema

– New data is periodically uploaded to the Warehouse – Old data is deleted from Warehouse – Archiving done by OLTP system for legal reasons Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

13

4.2 Typical queries

– Show me sales, profit and average call volume per day for my 10 most profitable sales-people

• Custom consolidation – Show me the income statement by quarter for the last four quarters for my northeast region operations

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

14

4.2 Codd’s OLAP rules

• OLAP queries

• How do we differentiate between OLAP and nonOLAP products? - OLAP rules • Published in a controversial white paper

SELECT d1.x, d2.y, d3.z, sum(f.t1), avg(f.t2) FROM Fact f, Dim1 d1, Dim2 d2, Dim3 d3 WHERE a < d1.field < b AND d2.field = c GROUP BY d1.x, d2.y, d3.z;

– “Providing OLAP to the User-Analysts: An IT Mandate. (Arbor Software, 1993)”

• The idea is to – Select by Attributes of Dimensions

• Dr. Codd was accused that he allowed his name to be used, but did not put to much work into it Hyperions

• E.g., region = „Europe“

– Group by Attributes of Dimensions • E.g., region, month, quarter

ORACLE

Solutions

– Aggregate on measures

Arbor Software

• E.g., sum(price * volume) Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

15

16

4.2 Basic features

4.2 Codd’s OLAP rules • Rules organization

• 1) Multidimensional Conceptual View

– 12 rules + 6 (extension rules) added in 1995 – 4 feature groups • • • •

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

– Data is viewed in multidimensional form in a matrix. An enterprise becomes multidimensional • E.g., profits could be viewed by region, product, time period or scenario (actual budget, forecasts, etc.)

Basic features Special features Reporting features Dimension control

– Advantages • Multidimensional models enable more straight-forward manipulation of data – E.g., slice, dice, etc.

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

17

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

18

3

4/24/2009

4.2 Basic features

4.2 Basic features

• 2) Intuitive data manipulation

• 3) Accessibility: OLAP as a Mediator

– Existence of a GUI with drag-and-drop feature and other graphical facilities – Intuition is a vague term…

– Middleware between heterogeneous data sources and OLAP front end

• 4) Batch extraction vs. Interpretative Extraction – OLAP has to have its own staging database – It should also allow live access to external data – Similar to what HOLAP is today

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

19

4.2 Basic features

• 5) OLAP analysis models

– Categorical: typical descriptive statistics

– Formulaic: goal seeking models

• Comparison of historical values

• You know the outcome you want but you don’t know how to get there • The model keeps changing parameters and doing contemplations until it gets the desired result or proves it is impossible

– Exegetical: what we have been doing with spreadsheets (slice, dice, drill down) • Discovering reasons for what we found through the categorical model

– E.g., How can I increase the sales of bikinis in the Alaska store? The outcome can be: » Many solutions… » No solutions: Bikini sales in Alaska are doomed to failure » Unacceptable solutions: Close down all but the Alaska store

– Contemplative: what if analysis • E.g., What is the effect of closing the Alaska store, to the company

21

4.2 Basic features

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

22

4.2 Special features

• 6) Client/Server Architecture

• Special features

– Allow users to share data easily

– 9) Treatment of non-normalized data

• 7) Transparency

• Can load data also from non-RDBMS sources

– The client should not have to be aware of how connections to the OLAP engine or other data sources is made

– 10) Store OLAP results • OLAP data is expensive • Reconstructing it over and over from the live data is not a good idea

• 8) Multiuser support

– OLAP DB is a snapshot of the state of the data sources

– OLAP is read-only therefore no need for transaction control – New OLAP systems allow data query while data is being streamed from external sources Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

20

4.2 Basic features

• 5) OLAP analysis models

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

23

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

24

4

4/24/2009

4.2 Special features

4.2 Reporting features • 13) Flexible reporting

• 11) Extraction of missing values

– Vague

– 2 kinds of missing values

• 14) Uniform reporting performance

• NULL as in SQL meaning we don’t know the value of the attribute • Missing value meaning that the attribute will never have a value for that entity

– Codd required that reporting performance would not be significantly degraded by increasing the number of dimensions or database size – Sounds more like a goal then a rule

• 12) Treatment of missing values

• 15) Automatic adjustment of physical level

• All missing values are ignored by the OLAP analyzer, regardless of their source

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

– OLAP systems adjust its physical storage automatically – Dynamical adjusted HOLAP 25

4.2 Dimension control

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

26

4.2 Codd’s OLAP rules

• Dimension control

• Codd’s OLAP rules turned out not to be a success • Other attempts to define OLAP and offer OLAP guides were made by

– 16) Generic dimensionality • Each dimension must be equivalent in both its structure and operational capabilities • Controversial rule

– 17) Unlimited dimensions and aggregation levels

– The OLAP council – Analytical Solutions Forum – OLAP Solutions

• Unlimited…is physically impossible so we should settle with a large number – E.g., it should support at least 15 to 20 dimensions

– 18) Unrestricted cross-dimensional operations

• Nigel Pendse’s FASMI test

• Operation is not the same as calculation – E.g. “What is Friday divided by red?” but operation on mixed data is possible “How many red shirts did we sell on Friday?” Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

27

4.2 FASMI

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

28

4.2 FASMI

• The FASMI test:

• The FASMI test:

– Fast

– Multidimensional

• System is targeted to respond users within ~ 5 seconds

• The most important factor • Should support multidimensional conceptual views • Full support for hierarchies

– Complex analysis should take no longer than 20 second

• This can be achieved with exotic hardware and lots of precalculated scenarios

– Analysis • The system can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user

– Shared

– Information • All the data and derived information (meta-data), needed • The question is how much input data they can handle not how much GB they use to store it

• The system should implement security requirements • Not all OLAP products are read-only Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

29

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

30

5

4/24/2009

4.2 OLAP operations

4.2 Roll-up

• Typical OLAP operations

• Roll-up (drill-up)

– Roll-up – Drill-down – Slice and dice – Pivot (rotate)

– Taking the current aggregation level of fact values and doing a further aggregation – Summarize data by • Climbing up hierarchy • Or by dimensional reduction • A mix of these 2 techniques

• Other operations – Aggregate functions – Ranking and comparing – Drill-across – Drill-through

– Used for obtaining an increased generalization • E.g., from Time.Week to Time.Year

31

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.2 Roll-up

4.2 Roll-up

• Roll-up operations can be classified into

• Hierarchical roll-ups

– Dimensional roll-ups

– are done on the fact table and some dimension tables by climbing up the attribute hierarchies of dimensions whose hierarchies are used and having at least one attribute of each dimension

• Are done solely on the fact table by dropping one or more dimensions, where the dimensions retained are represented by their keys (basic attributes of the attribute hierarchy) – E.g., drop Client dimension Country

Region

Prod. Categ

City

Prod. Group

Store

Country

Article

Sales Turnover

Week Year

Prod. Categ

Year 33

4.2 Roll-up

Clothes …

Video recorder

Camcorder

Article

Quarter

Month

Sales Turnover

Day

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

34

TV

– Reverse of Roll-up – Represents a de-aggregate operation • From higher level of summary to lower level of summary – detailed data

– Introducing new dimensions – Requires the existence of materialized finer grained data

ALL

Electronics



Prod. Group

Store

• Drill-down (roll-down)

– In an ultimate case, hierarchical roll-up above the top level of an attribute hierarchy (attribute “ALL”) can be viewed as converting to a dimensional roll-up

Audio

Prod. Family

City

4.2 Drill-down

• Climbing above the top

Video

District

Day

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Category

Region

Week

Month

Quarter

• E.g., climbed the Time hierarchy to Quarter and Article Client hierarchy to Prod. group

Client

District

Prod. Family

32

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

Prod. Family



• You can’t drill if you don’t have the data Prod. Group



TR-34

TS-56

Article

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

35

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

36

6

4/24/2009

4.2 Slice

4.2 Roll-up drill-down example € by bar/drinker Jim

Bob

Mary

Joe’s

45

33

30

Salitos

50

36

42

Roots

38

31

40

• Slice

€ by drinker

Roll-up by BAR

Jim

Bob

Mary

133

100

112

– Reducing the number of dimensions by taking a projection of facts on a proper subset of dimensions for some selected values of dimensions that are being dropped

Drill-down by brand € by brand/drinker

Wolters

Jim

Bob

Mary

48

40

40

Becks

45

31

37

Krombacher

40

29

35

πStoreId, TimeId, Ammount (σArticleId = LaptopId(Sales))

37

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.2 Slice

38

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.2 Dice

– Amounts to equality select condition – WHERE clause in SQL

• Dice – Amounts to range select condition on one dimension, or to equality select condition on more than one dimension

• E.g., slice Laptops Product

• E.g., Range SELECT 818

Product

πStoreId, TimeId, Amount (σArticleId ∈ {Laptop, CellP}(Sales))

Laptops CellP.

Time

818

Laptops CellP.

13.11.2008 18.12.2008

Time 13.11.2008 18.12.2008

Geography Geography Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

39

4.2 Dice

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

40

4.2 Pivot • Pivot (rotate)

• E.g., Equality SELECT on 2 dimensions Product and Time

– Refers to re-arranging data for viewing purposes • E.g., display cities down the pages and products across a page

πStoreId, Amount (σArticleId = Laptop ∧ MonthID = December(Sales))

– The simplest view of pivoting is that it selects two dimensions to aggregate the measure

Product 818

• The aggregated values are often displayed in a grid where each point in the (x, y) coordinate system corresponds to an aggregated value of the measure • The x and y coordinate values are the values of the selected two dimensions

Laptops CellP.

Time January December

– The result of pivoting is also called cross–tabulation

Geography Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

41

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

42

7

4/24/2009

4.2 Pivot

4.2 Pivot • Pivoting on City and Day

• Consider pivoting the following data

Mon Auckland

Location CityId

City

1

Well

2

Nels

3

Auck

Sales

Time

Tue

Wed

Thu

Fri

Sat

San

60

0

140

0

880

0

Nelson

550

0

0

0

100

0

0

650

540

300

0

0

0

0

50

890

1150

360

140

100

880

50

2680

TimId

Day

Wellington

230

1

Mon

SubTotal

300

2

Tue

8

310

3

Wed

7

50

4

Thu

Mon

60

550

540

1150

3

1

550

5

Fri

Tue

60

0

300

360

2

3

5

100

6

Sat

Wed

0

0

0

0

3

4

6

880

Thu

140

0

0

140

3

5

1

60

7

San

Fri

0

100

0

100

3

5

2

60

8

Mon

Sat

880

0

0

880

3

5

4

140

San

0

0

50

50

1140

650

890

2680

CityId

PerId

TimId

1

1

1

1

1

2

1

1

1

2

2

Amnt

0 Auck

SubTotal Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

43

Nels

Well

1140

SubTotal

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.2 Typical analytical requests

44

4.2 OLAP data visualization

• Analytical requests are hard to express

• How do these operations look like for the user?

– Most analysts and decision makers f.region, z.month, sum(a.price * a.volume) won’t enjoy it SELECT FROM Order a, Time z, PoS f

– E.g., Crystal Decisions OLAP software • 2 dimensions … is trivial • E.g., Products by Store

WHERE a.pos = f.name AND a.date = z.date GROUP BY f.region, z.month

– But wait…there are solutions

Product dimension

Store dimension

• OLAP clients allow operations to be performed through GUIs

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

SubTotal

60

45

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.2 OLAP data visualization

46

4.2 OLAP data visualization

• 3 dimensions

• More dimensions are difficult to represent

– We can visualize sold quantity on 3 dimensions as layers

– If we introduce Time dimension, a data cell could be represented by its 4 dimensions as follows: • • • •

Product dimension

Store dimension

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

47

Abc from Supplier dimension Batteries from Products dimension Uptown from Store dimension And Monday from Time dimension

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

48

8

4/24/2009

4.2 OLAP data visualization

4.2 OLAP data visualization

• OLAP products represent 3 and more dimensional data reducing it to a 2D layout

• Another way is by nesting on the same axis

– By picking values of the dimensions which can not be displayed • E.g., Display the number sold of Products by any of the Stores on Monday

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

49

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.2 OLAP data visualization

4.2 OLAP data visualization

• OLAP reporting has to be very flexible

• Drill-down operation

– The IBM way of an OLAP web based report

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

50

– Can be performed easy by going down on the hierarchy and choosing the granularity

51

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

52

4.3 Physical models

4.2 OLAP data visualization • Trends Visualization

• We have seen how it looks at the user level and on the conceptual side • But…how do operations translate from user level downwards?

– With the help of charts

– Well…it depends on the physical models used T i m e

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

53

• • • •

DOLAP (Desktop OLAP) MOLAP (Multidimensional OLAP) ROLAP (Relational OLAP) HOLAP (Hybrid OLAP)

M R H D O O L L A A P P

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

54

9

4/24/2009

4.3 Physical models

4.3 Physical models Client

• DOLAP

• MOLAP

– Developed as extension to the production system reports – The idea behind is • It downloads a small hypercube from a central point (data mart or DW) • Performs multidimensional analysis while disconnected from the data source • The computation occurs on the client

• Computation occurs in this layer during the loading step (not at query)

– Requires little investment – Most are limited to a single user – They lack the ability to manage large data sets Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

55

4.3 MOLAP

Presentation

Server MOLAP Interface

Data

MDB

56

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.3 MOLAP

• Advantages

• All calculations are pre-generated when the cube is created

– Excellent performance • • • •

– Presentation layer provides the multidimensional view – The OLAP server stores data in a multidimensional structure

Fast data retrieval Optimal for slicing and dicing Complex calculations All calculations are pre-generated when the cube is created

all

time

item

0-D(apex) cuboid location

supplier 1-D cuboids

time,item

time,location

item,location

location,supplier 2-D cuboids

time,supplier

time,item,location

item,supplier

time,location,supplier 3-D cuboids time,item,supplier

item,location,supplier

time, item, location, supplier Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

57

4.3 MOLAP

58

4.3 MOLAP

• Disadvantages

• Things to consider when choosing MOLAP

– Limited amount of data it can handle

– MOLAP tools traditionally have difficulty querying models with dimensions with very high cardinality (i.e., millions of members) – Some MOLAP products have difficulty updating and querying models with more than 10 dimensions

• Cube can be derived from large amount of data, but only summary level information will be included in the cube

– Requires additional investment • Cube technology are often proprietary

– Enormous amount of overhead

• It depends on

• An input file of 200 MB can expand to 5 GB with calculations

– the complexity and cardinality of the dimensions in question – the number of facts or measures stored

• Products: – Cognos (IBM), Essbase (Oracle), Microsoft Analysis Service, Palo (open source) Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4-D(base) cuboid

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

– Other MOLAP products can handle hundreds of dimensions 59

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

60

10

4/24/2009

4.3 ROLAP

4.3 ROLAP Client

• Special schema design: e.g., star, snowflake • Special indexes: e.g., bitmap, R-Trees • Advantages

• ROLAP – Presentation layer provides the multidimensional view – The ROLAP Server generates SQL queries, from the OLAP OLAP requests, to query the RDBMS – Data is stored in RDBs

Presentation

Server

– Proven technology (relational model, DBMS) – Can handle large amounts of data (VLDBs)

ROLAP Server

• Disadvantages – Limited SQL functionalities

• Products

RDBMS

– Microsoft Analysis Service, Siebel Analytics (now Oracle BI), Micro Strategy, Mondrian (open source)

Data 61

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

4.3 ROLAP vs. MOLAP

4.3 HOLAP

• Based on OLAP needs OLAP needs

User Benefits

MIS Benefits

62

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

• HOLAP MOLAP

ROLAP

Multidimensional View





Excellent Performance



-

Analytical Flexibility



-

Real-Time Data Access

-



High Data Capacity

-



Easy Development



-

Low Structure Maintenance

-



Low Aggregate Maintenance



-

– Best of both worlds

Presentation

• Storing detailed data in RDBs • Storing aggregated data in MDBs

– Different partitioning approachesServer between MOLAP and ROLAP

HOLAP Server MDDB

• Vertical • Horizontal RDBMS Data

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

63

4.3 HOLAP

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

64

4.3 HOLAP

• Vertical partitioning

• Other approaches

– Aggregations are stored in MOLAP for fast query performance, – Detailed data in ROLAP to optimize time of cube processing (loading the data from the OLTP)

– Store some cubes in MOLAP and others in ROLAP, leveraging the fact that in a large cuboid, there will be dense and sparse sub-regions

• Horizontal partitioning – HOLAP stores some slice of data, usually the more recent one (i.e. sliced by Time dimension) in MOLAP for fast query performance – Older data in ROLAP Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

65

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

66

11

4/24/2009

4.3 Conclusions

4.3 OLAP operations

– ROLAP

• How do these operations look like?

• RDBMS - star/snowflake schema

– As queries they can be expressed through query languages as SQL or MDX – The original SQL/92 was not fit for OLAP

– MOLAP • MDBMS - Cube structures, array based storage

– ROLAP or MOLAP

• But SQL99 has extensions for OLAP functions

• Data models used play major role in performance differences

– GROUP BY, CUBE operators

– MOLAP

• But since the subject is more comprising… we will discuss it in the next lecture

• for summarized and relatively “small” volumes of data (50GB)

– ROLAP • for detailed and larger volumes of data (TB)

– HOLAP is emerging as the OLAP server of choice Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

67

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

68

Next lecture • Queries – OLAP query languages – Logical modeling - implementation

Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig

69

12