DW1010 Designing the Data Warehouse

TThheem miiss E Edduuccaattiioonn On-site and Public, Instructor-led, Hands-on IT Training Course Synopsis DW1010 – Designing the Data Warehouse Du...
Author: Julius Pitts
1 downloads 0 Views 444KB Size
TThheem miiss E Edduuccaattiioonn

On-site and Public, Instructor-led, Hands-on IT Training

Course Synopsis

DW1010 – Designing the Data Warehouse Duration:

Two (2) or Three (3) days.

Delivery Method:

Instructor led

www www.themisinc.com

.themisinc.com

Background and Overview Extra, unexpected querying of operational data in the high-volume, real-time transaction processing environment can often have a huge negative impact on performance. In addition, the very act of analyzing data will produce new data that the company will need to save. For example, subsets of operational data are created; combinations or aggregations are created; historical data is needed. This data my also be subsetted, aggregated, or combined in still new ways. For the information worker, who often makes ad hoc, “what-if” queries, the operational data cannot be directly modified, even temporarily, It must remain unchanged to support the day-to-day activities of the enterprise. The operational data must be completely current and up-to-the minute. The questions arises of where can you analyze this data and where do you store the new information generated? The answer lies in a new concept called a data warehouse. What Is Data Warehouse?

Brief Description

It is a facility for the extraction, management, enhancement and delivery of informational data in a user friendly environment. The data warehouse:  Is a decision support data base containing information data  Addresses information needs of tactical and strategic management  Usually resides on a relational DBMS  Provides easy access to information for business analysis. The goal of the data warehouse is to improve business analysis and decision making. It does so by improving data integration, access and availability. This workshop will focus on the specialized techniques used to model data warehouse data. It provides concrete guidelines and rules for modeling this data Warehouse data is different from production data. It is viewed differently, used different, and accessed differently. It needs to be modeled and optimized differently. This workshop will stress that this is not just an intuitive process based solely on the spontaneous judgment of a skilled analyst. It can be based on sound rules and guidelines just as production data models are. Three fundamental concepts are emphasized throughout the workshop:   

Finding the right level of atomic data in the warehouse, That the data warehouse is not a database but an integrated environment consisting of different levels of data, Optimizing the data warehouse design through dimensional modeling.

Central Warehouse The workshop will walk through the process of transforming an operational data model to a central data warehouse model, such as how to:  

Remove purely production data Add time and date to the data

Page 1 DW1010.docx

Copyright © Themis Inc, All rights reserved

123 South Ave E, Westfield, NJ 07090 800-756-3000 [email protected]

TThheem miiss E Edduuccaattiioonn

On-site and Public, Instructor-led, Hands-on IT Training

DW1010 – Designing the Data Warehouse

www www.themisinc.com

.themisinc.com

 Add derived data  Determine the right granularity of data  Determine functional dependencies across summary data  Separate data based upon stability  Create data arrays and fact tables  Accommodate levels of summarization  Transform relationships to data artifacts The processes for creating informational and operational data bases are significantly different. Most crucial to this effort are the interviewing skills and facilitation techniques brought to the table when meeting the business customer. Facilitation techniques refined over many years of building data warehouse systems will be described to help elicit the important information needed when design for warehouse applications. In any good database design, five factors play a key role: 

The number of occurrences of each table



The ratio of one table to another



The queries that use the data



The data accesses made by each query

Brief Description

 The load factor for each query (number of times performed). The workshop stresses how these apply to the warehouse optimization. OLAP Design This workshop will take the mystery out of designing OLAP (on-line analytical processing) data through a set of clear steps:  Examine warehouse central data model  Define end user needs  Identify facts  Identify dimensions  Identify dimension hierarchies  Identify levels within dimension hierarchies  Define dimension attributes  Determine necessary calculations  Evaluate complexity and feasibility of schemas (star or snowflake)  Determine necessary aggregations to store  Apply other optimizations Important considerations during this transformation process are the selection of the proper dimensions, choosing the proper levels of granularity of data, and denormalizing in a manner that supports optimal query performance. A clear statement of query objectives is a prerequisite to successful data warehouse implementation. Warehouse data requires an increasing focus on the date and time dimensions of data. The rules of functional dependency and normalization are differently applied in the data warehouse environment. A crucial part of building the data warehouse is the definition of meta-data requirements and the development of a meta-data repository. A generic version of the repository meta-data model for the data warehouse will be presented. It will define the components of the information warehouse repository and show who then can be mapped to data warehouse tables.

Page 2 DW1010.docx

Copyright © Themis Inc, All rights reserved

123 South Ave E, Westfield, NJ 07090 800-756-3000 [email protected]

TThheem miiss E Edduuccaattiioonn

On-site and Public, Instructor-led, Hands-on IT Training

Course Objectives What You’ll Learn

DW1010 – Designing the Data Warehouse

www www.themisinc.com

.themisinc.com

Among the most important factors you will learn at this seminar will be how to design a dimensional data warehouse, often called OLAP. By contrast, OLTP (on-line transaction processing) data models are normalized so as to reduce update problems. In addition, the focus is on current data, which is two-dimensional. Data warehouse design usually introduces a third dimension, which is that of time. For example, to support trend analysis, it is necessary to show the values of data changing over time (such as, monthly, quarterly, annually). Usually, this requires a degree of denormalization when creating the data model. The seminar will teach how to effectively accomplish this aspect of data modeling, while ensuring a quality data warehouse design. Data re-use is an important issue in the data warehouse. The corporate data model should always be the primary source of borrowing designs because data will be modeled in a standard and consistent manner. However, a single data model for usage across a large organization may sometimes result in a data solution which is too generalized for actual application design. Sometimes it will be more useful to maintain often used entities such as customer, product, organization in the business area model. The definitions and relationships are more specific to the business area under examination. A more specific decision support model may be useful when designing data warehouse applications, although data designs would continue to be re-used wherever possible. The class will raise this issue and show how this can be used to the data modeler’s advantage.

I.

Introduction to Data Warehousing

Topics Covered

     

Scope and Levels of Modeling Kinds of Data The Framework for Data Modeling Challenges in Data Management Five Major Characteristics of Data Warehouse Data Models – Corporate  High Level  Detailed Level  Operational  Decision Support  Types and Technologies of Data Warehousing II.

Data Warehouse Methodology  Explanation of Methodology Steps  Iterative Nature of Development

III. Introduction To Data Modeling  Definitions and Components – Entity – Attribute – Relationship – Keys – Derived Data  Levels of Data Models  Rules for Each Level  The High Level Data Model  Identifying and Defining Subject Areas

 The Detailed Level Data Model  Normalization – Functional Dependency – Mathematical Normalization – Natural Normalization – Application to the Data Warehouse IV. Building the Data Warehouse Model  Comparison of Operational and Informational Data  Case 1: Direct Access to Operational Data  Case 2: Using Informational Data Bases  A Data Controlled Environment  Progression of Data in a Data Controlled Environment  Two Types of Data Changes in the Data Warehouse: – Transformations – Optimizations V. Levels of Data In the Enterprise  Four Types of Data and Systems  The Warehouse and Decision Support Data Model  Sources of Warehouse and Decision Data  Definition and Rules  The Corporate Model  The Business Area Model

Page 3 DW1010.docx

Copyright © Themis Inc, All rights reserved

123 South Ave E, Westfield, NJ 07090 800-756-3000 [email protected]

TThheem miiss E Edduuccaattiioonn

On-site and Public, Instructor-led, Hands-on IT Training

DW1010 – Designing the Data Warehouse VI. Derived Data  Types of Derived Data  Different Rules for Handling During Analysis and Design VII. Modeling Time and History     

Short Term and long Term View Four Ways of Handling Time and Date Time-series Data Capturing Business Changes Importance of Representing the Business Time Dimension

Topics Covered Continued

 Facilitated Sessions  Interviews  Information Gathering Techniques – Events – Objectives – Queries – Goals – Decisions – Problems IX. Analyzing Current Systems Data Define Key Data Elements Data Stewards for Each Data Element Key Data Element Business Rules Define Domains and Valid Values Define Valid Ranges for Error Document Key Data Elements on Repository Validate Data Mappings Identify Key Data Elements in Source Systems Map Relationships for Repository

X. Data Transformations            

.themisinc.com

 Add External Data  Techniques For Derived Data  Different Levels of Summarization XI. Critical Warehouse Components       

Definition of Fact Tables and Dimensions Creating Multidimensional Arrays Developing Fact Tables and Arrays Corporate Reference Tables The Star Database Schema The Snowflake Database Schema Meta-data Repository and Components

XII. Optimizing the Data Warehouse Design

VIII. Information Gathering

        

www www.themisinc.com

Remove Pure Production Data Add Time and History to the Data Identifier Add Data Derivations Find the Right Atomicity of Data Determine the Functional Dependencies in Summary Data Create Data Arrays and Fact Tables Accommodate Varying Levels of Summarization Add Summary Data Merge Like Data From Different Tables Create Arrays of Data Separate Data Based on its Stability Embed Relationships in the Data

 Data Design Compromises  Safe Compromises to Data – Merge Like Tables (Eliminate 1:1) – Create Arrays of Data (Violate First Normal Form) – Split Data Based on Stability and Usage (Create 1:1) – Add Indices – Encode-Decode Data  Aggressive Compromises to Data – Store Derived Data – Summarize Data – Add Redundant Data – Imbed Relationship Data – Add Redundant Relationships – Add Partial Dependencies (Violate Second Normal Form) – Add Transitive Dependencies (Violate Third Normal Form)  Critical Factors in Data Design – Number of Occurrences of Each Table – The Ration of One Table to Another – The Queries that Use the Data – The Data Accesses Made by Each Query – The Load Factor for Each Query (Number of Times Performed).  The Steps of Optimization XIII. Data Warehouse Technology  Categories of Warehouse Tools  Review of Major Products

Page 4 DW1010.docx

Copyright © Themis Inc, All rights reserved

123 South Ave E, Westfield, NJ 07090 800-756-3000 [email protected]

TThheem miiss E Edduuccaattiioonn

On-site and Public, Instructor-led, Hands-on IT Training

DW1010 – Designing the Data Warehouse

Topics Covered Continued

XIV. Important Considerations and Issues      

XVII.

www www.themisinc.com

.themisinc.com

Case Studies

 Selected Mini-Exercises  Complete Group Case Study (Moderately Sized)  Complete Individual Case Study (Large)

Denormalization and Performance Archiving and Purging Data Distribution and Replication Change Control Copy Management Alternative Models for Copied Data

XVIII.

Glossary

XV. Managing Data Warehouse Projects  Data Warehouse Project Structure  Managing Multiple Data Warehouse Projects  Data Distribution Issues XVI. Summary and Conclusion  Selected Warehouse Projects  Critical Success Factors

Page 5 DW1010.docx

Copyright © Themis Inc, All rights reserved

123 South Ave E, Westfield, NJ 07090 800-756-3000 [email protected]