TThheem miiss E Edduuccaattiioonn
On-site and Public, Instructor-led, Hands-on IT Training
Course Synopsis
DW1010 – Designing the Data Warehouse Duration:
Two (2) or Three (3) days.
Delivery Method:
Instructor led
www www.themisinc.com
.themisinc.com
Background and Overview Extra, unexpected querying of operational data in the high-volume, real-time transaction processing environment can often have a huge negative impact on performance. In addition, the very act of analyzing data will produce new data that the company will need to save. For example, subsets of operational data are created; combinations or aggregations are created; historical data is needed. This data my also be subsetted, aggregated, or combined in still new ways. For the information worker, who often makes ad hoc, “what-if” queries, the operational data cannot be directly modified, even temporarily, It must remain unchanged to support the day-to-day activities of the enterprise. The operational data must be completely current and up-to-the minute. The questions arises of where can you analyze this data and where do you store the new information generated? The answer lies in a new concept called a data warehouse. What Is Data Warehouse?
Brief Description
It is a facility for the extraction, management, enhancement and delivery of informational data in a user friendly environment. The data warehouse: Is a decision support data base containing information data Addresses information needs of tactical and strategic management Usually resides on a relational DBMS Provides easy access to information for business analysis. The goal of the data warehouse is to improve business analysis and decision making. It does so by improving data integration, access and availability. This workshop will focus on the specialized techniques used to model data warehouse data. It provides concrete guidelines and rules for modeling this data Warehouse data is different from production data. It is viewed differently, used different, and accessed differently. It needs to be modeled and optimized differently. This workshop will stress that this is not just an intuitive process based solely on the spontaneous judgment of a skilled analyst. It can be based on sound rules and guidelines just as production data models are. Three fundamental concepts are emphasized throughout the workshop:
Finding the right level of atomic data in the warehouse, That the data warehouse is not a database but an integrated environment consisting of different levels of data, Optimizing the data warehouse design through dimensional modeling.
Central Warehouse The workshop will walk through the process of transforming an operational data model to a central data warehouse model, such as how to:
Remove purely production data Add time and date to the data
Page 1 DW1010.docx
Copyright © Themis Inc, All rights reserved
123 South Ave E, Westfield, NJ 07090 800-756-3000
[email protected]
TThheem miiss E Edduuccaattiioonn
On-site and Public, Instructor-led, Hands-on IT Training
DW1010 – Designing the Data Warehouse
www www.themisinc.com
.themisinc.com
Add derived data Determine the right granularity of data Determine functional dependencies across summary data Separate data based upon stability Create data arrays and fact tables Accommodate levels of summarization Transform relationships to data artifacts The processes for creating informational and operational data bases are significantly different. Most crucial to this effort are the interviewing skills and facilitation techniques brought to the table when meeting the business customer. Facilitation techniques refined over many years of building data warehouse systems will be described to help elicit the important information needed when design for warehouse applications. In any good database design, five factors play a key role:
The number of occurrences of each table
The ratio of one table to another
The queries that use the data
The data accesses made by each query
Brief Description
The load factor for each query (number of times performed). The workshop stresses how these apply to the warehouse optimization. OLAP Design This workshop will take the mystery out of designing OLAP (on-line analytical processing) data through a set of clear steps: Examine warehouse central data model Define end user needs Identify facts Identify dimensions Identify dimension hierarchies Identify levels within dimension hierarchies Define dimension attributes Determine necessary calculations Evaluate complexity and feasibility of schemas (star or snowflake) Determine necessary aggregations to store Apply other optimizations Important considerations during this transformation process are the selection of the proper dimensions, choosing the proper levels of granularity of data, and denormalizing in a manner that supports optimal query performance. A clear statement of query objectives is a prerequisite to successful data warehouse implementation. Warehouse data requires an increasing focus on the date and time dimensions of data. The rules of functional dependency and normalization are differently applied in the data warehouse environment. A crucial part of building the data warehouse is the definition of meta-data requirements and the development of a meta-data repository. A generic version of the repository meta-data model for the data warehouse will be presented. It will define the components of the information warehouse repository and show who then can be mapped to data warehouse tables.
Page 2 DW1010.docx
Copyright © Themis Inc, All rights reserved
123 South Ave E, Westfield, NJ 07090 800-756-3000
[email protected]
TThheem miiss E Edduuccaattiioonn
On-site and Public, Instructor-led, Hands-on IT Training
Course Objectives What You’ll Learn
DW1010 – Designing the Data Warehouse
www www.themisinc.com
.themisinc.com
Among the most important factors you will learn at this seminar will be how to design a dimensional data warehouse, often called OLAP. By contrast, OLTP (on-line transaction processing) data models are normalized so as to reduce update problems. In addition, the focus is on current data, which is two-dimensional. Data warehouse design usually introduces a third dimension, which is that of time. For example, to support trend analysis, it is necessary to show the values of data changing over time (such as, monthly, quarterly, annually). Usually, this requires a degree of denormalization when creating the data model. The seminar will teach how to effectively accomplish this aspect of data modeling, while ensuring a quality data warehouse design. Data re-use is an important issue in the data warehouse. The corporate data model should always be the primary source of borrowing designs because data will be modeled in a standard and consistent manner. However, a single data model for usage across a large organization may sometimes result in a data solution which is too generalized for actual application design. Sometimes it will be more useful to maintain often used entities such as customer, product, organization in the business area model. The definitions and relationships are more specific to the business area under examination. A more specific decision support model may be useful when designing data warehouse applications, although data designs would continue to be re-used wherever possible. The class will raise this issue and show how this can be used to the data modeler’s advantage.
I.
Introduction to Data Warehousing
Topics Covered
Scope and Levels of Modeling Kinds of Data The Framework for Data Modeling Challenges in Data Management Five Major Characteristics of Data Warehouse Data Models – Corporate High Level Detailed Level Operational Decision Support Types and Technologies of Data Warehousing II.
Data Warehouse Methodology Explanation of Methodology Steps Iterative Nature of Development
III. Introduction To Data Modeling Definitions and Components – Entity – Attribute – Relationship – Keys – Derived Data Levels of Data Models Rules for Each Level The High Level Data Model Identifying and Defining Subject Areas
The Detailed Level Data Model Normalization – Functional Dependency – Mathematical Normalization – Natural Normalization – Application to the Data Warehouse IV. Building the Data Warehouse Model Comparison of Operational and Informational Data Case 1: Direct Access to Operational Data Case 2: Using Informational Data Bases A Data Controlled Environment Progression of Data in a Data Controlled Environment Two Types of Data Changes in the Data Warehouse: – Transformations – Optimizations V. Levels of Data In the Enterprise Four Types of Data and Systems The Warehouse and Decision Support Data Model Sources of Warehouse and Decision Data Definition and Rules The Corporate Model The Business Area Model
Page 3 DW1010.docx
Copyright © Themis Inc, All rights reserved
123 South Ave E, Westfield, NJ 07090 800-756-3000
[email protected]
TThheem miiss E Edduuccaattiioonn
On-site and Public, Instructor-led, Hands-on IT Training
DW1010 – Designing the Data Warehouse VI. Derived Data Types of Derived Data Different Rules for Handling During Analysis and Design VII. Modeling Time and History
Short Term and long Term View Four Ways of Handling Time and Date Time-series Data Capturing Business Changes Importance of Representing the Business Time Dimension
Topics Covered Continued
Facilitated Sessions Interviews Information Gathering Techniques – Events – Objectives – Queries – Goals – Decisions – Problems IX. Analyzing Current Systems Data Define Key Data Elements Data Stewards for Each Data Element Key Data Element Business Rules Define Domains and Valid Values Define Valid Ranges for Error Document Key Data Elements on Repository Validate Data Mappings Identify Key Data Elements in Source Systems Map Relationships for Repository
X. Data Transformations
.themisinc.com
Add External Data Techniques For Derived Data Different Levels of Summarization XI. Critical Warehouse Components
Definition of Fact Tables and Dimensions Creating Multidimensional Arrays Developing Fact Tables and Arrays Corporate Reference Tables The Star Database Schema The Snowflake Database Schema Meta-data Repository and Components
XII. Optimizing the Data Warehouse Design
VIII. Information Gathering
www www.themisinc.com
Remove Pure Production Data Add Time and History to the Data Identifier Add Data Derivations Find the Right Atomicity of Data Determine the Functional Dependencies in Summary Data Create Data Arrays and Fact Tables Accommodate Varying Levels of Summarization Add Summary Data Merge Like Data From Different Tables Create Arrays of Data Separate Data Based on its Stability Embed Relationships in the Data
Data Design Compromises Safe Compromises to Data – Merge Like Tables (Eliminate 1:1) – Create Arrays of Data (Violate First Normal Form) – Split Data Based on Stability and Usage (Create 1:1) – Add Indices – Encode-Decode Data Aggressive Compromises to Data – Store Derived Data – Summarize Data – Add Redundant Data – Imbed Relationship Data – Add Redundant Relationships – Add Partial Dependencies (Violate Second Normal Form) – Add Transitive Dependencies (Violate Third Normal Form) Critical Factors in Data Design – Number of Occurrences of Each Table – The Ration of One Table to Another – The Queries that Use the Data – The Data Accesses Made by Each Query – The Load Factor for Each Query (Number of Times Performed). The Steps of Optimization XIII. Data Warehouse Technology Categories of Warehouse Tools Review of Major Products
Page 4 DW1010.docx
Copyright © Themis Inc, All rights reserved
123 South Ave E, Westfield, NJ 07090 800-756-3000
[email protected]
TThheem miiss E Edduuccaattiioonn
On-site and Public, Instructor-led, Hands-on IT Training
DW1010 – Designing the Data Warehouse
Topics Covered Continued
XIV. Important Considerations and Issues
XVII.
www www.themisinc.com
.themisinc.com
Case Studies
Selected Mini-Exercises Complete Group Case Study (Moderately Sized) Complete Individual Case Study (Large)
Denormalization and Performance Archiving and Purging Data Distribution and Replication Change Control Copy Management Alternative Models for Copied Data
XVIII.
Glossary
XV. Managing Data Warehouse Projects Data Warehouse Project Structure Managing Multiple Data Warehouse Projects Data Distribution Issues XVI. Summary and Conclusion Selected Warehouse Projects Critical Success Factors
Page 5 DW1010.docx
Copyright © Themis Inc, All rights reserved
123 South Ave E, Westfield, NJ 07090 800-756-3000
[email protected]