4/24/2009
4. Queries 4. Queries 4.1 Query processing 4.2 Queries in DW / OLAP 4.3 Physical Modeling
Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.1Query processing
4.1 Query processing
• Queries are posed to the DBMS and processed before the actual evaluation Query Processor
Data
Query Evaluation Engine
Storage Manager
• How queries are answered – Queries are usually stated in a high level declarative language as SQL
Applications Programs Object Code
• For relational DB it can be mapped to relational algebra (RA)
Embedded DML Precompiler
– For evaluation it has to be translated to a low level execution plan
DML Compiler
• Expressions that can be used at physical level of the file system – E.g., for RDB physical relational algebra
DDL Interpreter
3
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.1Query processing Query
Parser & Translator
2
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4
4.1 Query processing • Parsing and translation
Relational Algebra Expression
– Queries need to be translated • A scanner tokenizes the query Query Optimizer
Query Result
Evaluation Engine
Execution Plan
– DB language keywords, table names, attribute names, etc.
• The parser checks syntax and verifies relations, attributes, data types, etc. • Translate the query into its internal form
Access Paths
– Translated into relational algebra
Statistics Data
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
5
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
6
1
4/24/2009
4.1 Query processing
4.1 Query processing
• Optimization
• Evaluation
– Several relational algebra expressions might lead to the same result
– The query-execution engine takes a query-evaluation plan, executes it, and returns the answers to the query – For the result of each operator a temporary file has to be created
• But different statements might also result in very different performance
– Query optimization is the heart of every database kernel • Finding optimal plans may cost too much, but avoid crappy plans by all means
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
• Temporary files can be input for other operators • Storing the temporary files on the disk is expensive, but necessary if DB buffer is small
7
4.2 DW Queries
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
8
4.2 DW Queries
• DW queries are big queries
• OLAP usage fields
– Imply a large portion of the data – Read only queries
– Management Information • Sales per product group / area / year
• no Updates
– Government
• Redundancy a necessity – Materialized Views, special-purpose indexes, de-normalized schemas
• Population census
– Scientific databases • Geo-, Bio-Informatics
• Data is refreshed periodically
– Etc.
– E.g., Daily or weekly
• Their purpose is to analyze data
• Goal: Response Time of seconds / few minutes
– OLAP (OnLine Analytical Processing) Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
9
4.2 Why use DW
10
4.2 OLTP vs. OLAP
• ODS can also run analytical queries…but they are not so good at it • OLTP and OLAP are to each other as Water and Oil
• Tuning, Optimization – OLTP - inter-query parallelism, heuristic optimization – OLAP - intra-query parallelism, full-fledged optimization
• Freshness of Data
– Lock Conflicts: OLAP blocks OLTP
– OLTP - serializability – OLAP - reproducability
• E.g., an OLAP query can block the sales activity of all the stores trying to update the DB
• Precision
– Database design:
– OLTP - ACID – OLAP - Sampling, Confidence Intervals
• OLTP - normalized, • OLAP - de-normalized Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
11
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
12
2
4/24/2009
4.2 Why use DW
4.2 Typical analytical requests • Comparisons
• The solution is to run OLTP and OLAP separately • DW is a special sandbox for OLAP
– Show me the sales per region for this year and compare it to that of the previous year to identify discrepancies
• Multidimensional ratios – Show me the contribution to weekly profit made by all items sold in the northeast stores between 1st of May and 7th of May
– As input it uses OLTP systems – DW aggregates and replicates data
• Ranking and statistical profile
• Special schema
– New data is periodically uploaded to the Warehouse – Old data is deleted from Warehouse – Archiving done by OLTP system for legal reasons Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
13
4.2 Typical queries
– Show me sales, profit and average call volume per day for my 10 most profitable sales-people
• Custom consolidation – Show me the income statement by quarter for the last four quarters for my northeast region operations
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
14
4.2 Codd’s OLAP rules
• OLAP queries
• How do we differentiate between OLAP and nonOLAP products? - OLAP rules • Published in a controversial white paper
SELECT d1.x, d2.y, d3.z, sum(f.t1), avg(f.t2) FROM Fact f, Dim1 d1, Dim2 d2, Dim3 d3 WHERE a < d1.field < b AND d2.field = c GROUP BY d1.x, d2.y, d3.z;
– “Providing OLAP to the User-Analysts: An IT Mandate. (Arbor Software, 1993)”
• The idea is to – Select by Attributes of Dimensions
• Dr. Codd was accused that he allowed his name to be used, but did not put to much work into it Hyperions
• E.g., region = „Europe“
– Group by Attributes of Dimensions • E.g., region, month, quarter
ORACLE
Solutions
– Aggregate on measures
Arbor Software
• E.g., sum(price * volume) Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
15
16
4.2 Basic features
4.2 Codd’s OLAP rules • Rules organization
• 1) Multidimensional Conceptual View
– 12 rules + 6 (extension rules) added in 1995 – 4 feature groups • • • •
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
– Data is viewed in multidimensional form in a matrix. An enterprise becomes multidimensional • E.g., profits could be viewed by region, product, time period or scenario (actual budget, forecasts, etc.)
Basic features Special features Reporting features Dimension control
– Advantages • Multidimensional models enable more straight-forward manipulation of data – E.g., slice, dice, etc.
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
17
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
18
3
4/24/2009
4.2 Basic features
4.2 Basic features
• 2) Intuitive data manipulation
• 3) Accessibility: OLAP as a Mediator
– Existence of a GUI with drag-and-drop feature and other graphical facilities – Intuition is a vague term…
– Middleware between heterogeneous data sources and OLAP front end
• 4) Batch extraction vs. Interpretative Extraction – OLAP has to have its own staging database – It should also allow live access to external data – Similar to what HOLAP is today
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
19
4.2 Basic features
• 5) OLAP analysis models
– Categorical: typical descriptive statistics
– Formulaic: goal seeking models
• Comparison of historical values
• You know the outcome you want but you don’t know how to get there • The model keeps changing parameters and doing contemplations until it gets the desired result or proves it is impossible
– Exegetical: what we have been doing with spreadsheets (slice, dice, drill down) • Discovering reasons for what we found through the categorical model
– E.g., How can I increase the sales of bikinis in the Alaska store? The outcome can be: » Many solutions… » No solutions: Bikini sales in Alaska are doomed to failure » Unacceptable solutions: Close down all but the Alaska store
– Contemplative: what if analysis • E.g., What is the effect of closing the Alaska store, to the company
21
4.2 Basic features
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
22
4.2 Special features
• 6) Client/Server Architecture
• Special features
– Allow users to share data easily
– 9) Treatment of non-normalized data
• 7) Transparency
• Can load data also from non-RDBMS sources
– The client should not have to be aware of how connections to the OLAP engine or other data sources is made
– 10) Store OLAP results • OLAP data is expensive • Reconstructing it over and over from the live data is not a good idea
• 8) Multiuser support
– OLAP DB is a snapshot of the state of the data sources
– OLAP is read-only therefore no need for transaction control – New OLAP systems allow data query while data is being streamed from external sources Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
20
4.2 Basic features
• 5) OLAP analysis models
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
23
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
24
4
4/24/2009
4.2 Special features
4.2 Reporting features • 13) Flexible reporting
• 11) Extraction of missing values
– Vague
– 2 kinds of missing values
• 14) Uniform reporting performance
• NULL as in SQL meaning we don’t know the value of the attribute • Missing value meaning that the attribute will never have a value for that entity
– Codd required that reporting performance would not be significantly degraded by increasing the number of dimensions or database size – Sounds more like a goal then a rule
• 12) Treatment of missing values
• 15) Automatic adjustment of physical level
• All missing values are ignored by the OLAP analyzer, regardless of their source
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
– OLAP systems adjust its physical storage automatically – Dynamical adjusted HOLAP 25
4.2 Dimension control
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
26
4.2 Codd’s OLAP rules
• Dimension control
• Codd’s OLAP rules turned out not to be a success • Other attempts to define OLAP and offer OLAP guides were made by
– 16) Generic dimensionality • Each dimension must be equivalent in both its structure and operational capabilities • Controversial rule
– 17) Unlimited dimensions and aggregation levels
– The OLAP council – Analytical Solutions Forum – OLAP Solutions
• Unlimited…is physically impossible so we should settle with a large number – E.g., it should support at least 15 to 20 dimensions
– 18) Unrestricted cross-dimensional operations
• Nigel Pendse’s FASMI test
• Operation is not the same as calculation – E.g. “What is Friday divided by red?” but operation on mixed data is possible “How many red shirts did we sell on Friday?” Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
27
4.2 FASMI
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
28
4.2 FASMI
• The FASMI test:
• The FASMI test:
– Fast
– Multidimensional
• System is targeted to respond users within ~ 5 seconds
• The most important factor • Should support multidimensional conceptual views • Full support for hierarchies
– Complex analysis should take no longer than 20 second
• This can be achieved with exotic hardware and lots of precalculated scenarios
– Analysis • The system can cope with any business logic and statistical analysis that is relevant for the application and the user, and keep it easy enough for the target user
– Shared
– Information • All the data and derived information (meta-data), needed • The question is how much input data they can handle not how much GB they use to store it
• The system should implement security requirements • Not all OLAP products are read-only Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
29
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
30
5
4/24/2009
4.2 OLAP operations
4.2 Roll-up
• Typical OLAP operations
• Roll-up (drill-up)
– Roll-up – Drill-down – Slice and dice – Pivot (rotate)
– Taking the current aggregation level of fact values and doing a further aggregation – Summarize data by • Climbing up hierarchy • Or by dimensional reduction • A mix of these 2 techniques
• Other operations – Aggregate functions – Ranking and comparing – Drill-across – Drill-through
– Used for obtaining an increased generalization • E.g., from Time.Week to Time.Year
31
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.2 Roll-up
4.2 Roll-up
• Roll-up operations can be classified into
• Hierarchical roll-ups
– Dimensional roll-ups
– are done on the fact table and some dimension tables by climbing up the attribute hierarchies of dimensions whose hierarchies are used and having at least one attribute of each dimension
• Are done solely on the fact table by dropping one or more dimensions, where the dimensions retained are represented by their keys (basic attributes of the attribute hierarchy) – E.g., drop Client dimension Country
Region
Prod. Categ
City
Prod. Group
Store
Country
Article
Sales Turnover
Week Year
Prod. Categ
Year 33
4.2 Roll-up
Clothes …
Video recorder
Camcorder
Article
Quarter
Month
Sales Turnover
Day
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
34
TV
– Reverse of Roll-up – Represents a de-aggregate operation • From higher level of summary to lower level of summary – detailed data
– Introducing new dimensions – Requires the existence of materialized finer grained data
ALL
Electronics
…
Prod. Group
Store
• Drill-down (roll-down)
– In an ultimate case, hierarchical roll-up above the top level of an attribute hierarchy (attribute “ALL”) can be viewed as converting to a dimensional roll-up
Audio
Prod. Family
City
4.2 Drill-down
• Climbing above the top
Video
District
Day
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
Category
Region
Week
Month
Quarter
• E.g., climbed the Time hierarchy to Quarter and Article Client hierarchy to Prod. group
Client
District
Prod. Family
32
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
Prod. Family
…
• You can’t drill if you don’t have the data Prod. Group
…
TR-34
TS-56
Article
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
35
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
36
6
4/24/2009
4.2 Slice
4.2 Roll-up drill-down example € by bar/drinker Jim
Bob
Mary
Joe’s
45
33
30
Salitos
50
36
42
Roots
38
31
40
• Slice
€ by drinker
Roll-up by BAR
Jim
Bob
Mary
133
100
112
– Reducing the number of dimensions by taking a projection of facts on a proper subset of dimensions for some selected values of dimensions that are being dropped
Drill-down by brand € by brand/drinker
Wolters
Jim
Bob
Mary
48
40
40
Becks
45
31
37
Krombacher
40
29
35
πStoreId, TimeId, Ammount (σArticleId = LaptopId(Sales))
37
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.2 Slice
38
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.2 Dice
– Amounts to equality select condition – WHERE clause in SQL
• Dice – Amounts to range select condition on one dimension, or to equality select condition on more than one dimension
• E.g., slice Laptops Product
• E.g., Range SELECT 818
Product
πStoreId, TimeId, Amount (σArticleId ∈ {Laptop, CellP}(Sales))
Laptops CellP.
Time
818
Laptops CellP.
13.11.2008 18.12.2008
Time 13.11.2008 18.12.2008
Geography Geography Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
39
4.2 Dice
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
40
4.2 Pivot • Pivot (rotate)
• E.g., Equality SELECT on 2 dimensions Product and Time
– Refers to re-arranging data for viewing purposes • E.g., display cities down the pages and products across a page
πStoreId, Amount (σArticleId = Laptop ∧ MonthID = December(Sales))
– The simplest view of pivoting is that it selects two dimensions to aggregate the measure
Product 818
• The aggregated values are often displayed in a grid where each point in the (x, y) coordinate system corresponds to an aggregated value of the measure • The x and y coordinate values are the values of the selected two dimensions
Laptops CellP.
Time January December
– The result of pivoting is also called cross–tabulation
Geography Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
41
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
42
7
4/24/2009
4.2 Pivot
4.2 Pivot • Pivoting on City and Day
• Consider pivoting the following data
Mon Auckland
Location CityId
City
1
Well
2
Nels
3
Auck
Sales
Time
Tue
Wed
Thu
Fri
Sat
San
60
0
140
0
880
0
Nelson
550
0
0
0
100
0
0
650
540
300
0
0
0
0
50
890
1150
360
140
100
880
50
2680
TimId
Day
Wellington
230
1
Mon
SubTotal
300
2
Tue
8
310
3
Wed
7
50
4
Thu
Mon
60
550
540
1150
3
1
550
5
Fri
Tue
60
0
300
360
2
3
5
100
6
Sat
Wed
0
0
0
0
3
4
6
880
Thu
140
0
0
140
3
5
1
60
7
San
Fri
0
100
0
100
3
5
2
60
8
Mon
Sat
880
0
0
880
3
5
4
140
San
0
0
50
50
1140
650
890
2680
CityId
PerId
TimId
1
1
1
1
1
2
1
1
1
2
2
Amnt
0 Auck
SubTotal Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
43
Nels
Well
1140
SubTotal
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.2 Typical analytical requests
44
4.2 OLAP data visualization
• Analytical requests are hard to express
• How do these operations look like for the user?
– Most analysts and decision makers f.region, z.month, sum(a.price * a.volume) won’t enjoy it SELECT FROM Order a, Time z, PoS f
– E.g., Crystal Decisions OLAP software • 2 dimensions … is trivial • E.g., Products by Store
WHERE a.pos = f.name AND a.date = z.date GROUP BY f.region, z.month
– But wait…there are solutions
Product dimension
Store dimension
• OLAP clients allow operations to be performed through GUIs
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
SubTotal
60
45
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.2 OLAP data visualization
46
4.2 OLAP data visualization
• 3 dimensions
• More dimensions are difficult to represent
– We can visualize sold quantity on 3 dimensions as layers
– If we introduce Time dimension, a data cell could be represented by its 4 dimensions as follows: • • • •
Product dimension
Store dimension
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
47
Abc from Supplier dimension Batteries from Products dimension Uptown from Store dimension And Monday from Time dimension
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
48
8
4/24/2009
4.2 OLAP data visualization
4.2 OLAP data visualization
• OLAP products represent 3 and more dimensional data reducing it to a 2D layout
• Another way is by nesting on the same axis
– By picking values of the dimensions which can not be displayed • E.g., Display the number sold of Products by any of the Stores on Monday
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
49
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.2 OLAP data visualization
4.2 OLAP data visualization
• OLAP reporting has to be very flexible
• Drill-down operation
– The IBM way of an OLAP web based report
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
50
– Can be performed easy by going down on the hierarchy and choosing the granularity
51
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
52
4.3 Physical models
4.2 OLAP data visualization • Trends Visualization
• We have seen how it looks at the user level and on the conceptual side • But…how do operations translate from user level downwards?
– With the help of charts
– Well…it depends on the physical models used T i m e
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
53
• • • •
DOLAP (Desktop OLAP) MOLAP (Multidimensional OLAP) ROLAP (Relational OLAP) HOLAP (Hybrid OLAP)
M R H D O O L L A A P P
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
54
9
4/24/2009
4.3 Physical models
4.3 Physical models Client
• DOLAP
• MOLAP
– Developed as extension to the production system reports – The idea behind is • It downloads a small hypercube from a central point (data mart or DW) • Performs multidimensional analysis while disconnected from the data source • The computation occurs on the client
• Computation occurs in this layer during the loading step (not at query)
– Requires little investment – Most are limited to a single user – They lack the ability to manage large data sets Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
55
4.3 MOLAP
Presentation
Server MOLAP Interface
Data
MDB
56
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.3 MOLAP
• Advantages
• All calculations are pre-generated when the cube is created
– Excellent performance • • • •
– Presentation layer provides the multidimensional view – The OLAP server stores data in a multidimensional structure
Fast data retrieval Optimal for slicing and dicing Complex calculations All calculations are pre-generated when the cube is created
all
time
item
0-D(apex) cuboid location
supplier 1-D cuboids
time,item
time,location
item,location
location,supplier 2-D cuboids
time,supplier
time,item,location
item,supplier
time,location,supplier 3-D cuboids time,item,supplier
item,location,supplier
time, item, location, supplier Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
57
4.3 MOLAP
58
4.3 MOLAP
• Disadvantages
• Things to consider when choosing MOLAP
– Limited amount of data it can handle
– MOLAP tools traditionally have difficulty querying models with dimensions with very high cardinality (i.e., millions of members) – Some MOLAP products have difficulty updating and querying models with more than 10 dimensions
• Cube can be derived from large amount of data, but only summary level information will be included in the cube
– Requires additional investment • Cube technology are often proprietary
– Enormous amount of overhead
• It depends on
• An input file of 200 MB can expand to 5 GB with calculations
– the complexity and cardinality of the dimensions in question – the number of facts or measures stored
• Products: – Cognos (IBM), Essbase (Oracle), Microsoft Analysis Service, Palo (open source) Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4-D(base) cuboid
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
– Other MOLAP products can handle hundreds of dimensions 59
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
60
10
4/24/2009
4.3 ROLAP
4.3 ROLAP Client
• Special schema design: e.g., star, snowflake • Special indexes: e.g., bitmap, R-Trees • Advantages
• ROLAP – Presentation layer provides the multidimensional view – The ROLAP Server generates SQL queries, from the OLAP OLAP requests, to query the RDBMS – Data is stored in RDBs
Presentation
Server
– Proven technology (relational model, DBMS) – Can handle large amounts of data (VLDBs)
ROLAP Server
• Disadvantages – Limited SQL functionalities
• Products
RDBMS
– Microsoft Analysis Service, Siebel Analytics (now Oracle BI), Micro Strategy, Mondrian (open source)
Data 61
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
4.3 ROLAP vs. MOLAP
4.3 HOLAP
• Based on OLAP needs OLAP needs
User Benefits
MIS Benefits
62
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
• HOLAP MOLAP
ROLAP
Multidimensional View
√
√
Excellent Performance
√
-
Analytical Flexibility
√
-
Real-Time Data Access
-
√
High Data Capacity
-
√
Easy Development
√
-
Low Structure Maintenance
-
√
Low Aggregate Maintenance
√
-
– Best of both worlds
Presentation
• Storing detailed data in RDBs • Storing aggregated data in MDBs
– Different partitioning approachesServer between MOLAP and ROLAP
HOLAP Server MDDB
• Vertical • Horizontal RDBMS Data
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
63
4.3 HOLAP
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
64
4.3 HOLAP
• Vertical partitioning
• Other approaches
– Aggregations are stored in MOLAP for fast query performance, – Detailed data in ROLAP to optimize time of cube processing (loading the data from the OLTP)
– Store some cubes in MOLAP and others in ROLAP, leveraging the fact that in a large cuboid, there will be dense and sparse sub-regions
• Horizontal partitioning – HOLAP stores some slice of data, usually the more recent one (i.e. sliced by Time dimension) in MOLAP for fast query performance – Older data in ROLAP Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
65
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
66
11
4/24/2009
4.3 Conclusions
4.3 OLAP operations
– ROLAP
• How do these operations look like?
• RDBMS - star/snowflake schema
– As queries they can be expressed through query languages as SQL or MDX – The original SQL/92 was not fit for OLAP
– MOLAP • MDBMS - Cube structures, array based storage
– ROLAP or MOLAP
• But SQL99 has extensions for OLAP functions
• Data models used play major role in performance differences
– GROUP BY, CUBE operators
– MOLAP
• But since the subject is more comprising… we will discuss it in the next lecture
• for summarized and relatively “small” volumes of data (50GB)
– ROLAP • for detailed and larger volumes of data (TB)
– HOLAP is emerging as the OLAP server of choice Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
67
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
68
Next lecture • Queries – OLAP query languages – Logical modeling - implementation
Data Warehousing & OLAP – Wolf-Tilo Balke – Institut für Informationssysteme – TU Braunschweig
69
12