Introduction: W h y Is D a t a Integration I m p o r t a n t ?
1
Part 1
Overview of Data Integration
5
Chapter 1
Types of Data Integration
7
Data Integration Architectural Patterns Enterprise Application Integration (EAI) Service-Oriented Architecture (SOA) Federation Extract, Transform, Load (ETL) Common Data Integration Functionality Summary End-of-Chapter Questions Chapter 2
A n A r c h i t e c t u r e for D a t a I n t e g r a t i o n
What Is Reference Architecture? Reference Architecture for Data Integration Objectives of the Data Integration Reference Architecture The Data Subject Area-Based Component Design Approach A Scalable Architecture Purposes of the Data Integration Reference Architecture The Layers of the Data Integration Architecture Extract/Subscribe Processes Data Integration Guiding Principle: "Read Once, Write Many" Data Integration Guiding Principle: "Grab Everything" Initial Staging Landing Zone
7 8 9 12 14 15 16 16
19 19 20 21 22 24 26 26 27 28 28 29
Contents
Data Quality Processes What Is Data Quality? Causes of Poor Data Quality Data Quality Check Points Where to Perform a Data Quality Check Clean Staging Landing Zone Transform Processes Conforming Transform Types Calculations and Splits Transform Types Processing and Enrichment Transform Types Target Filters Transform Types Load-Ready Publish Landing Zone Load/Publish Processes Physical Load Architectures An Overall Data Architecture Summary End-of-Chapter Questions
The Business Case for a New Design Process Improving the Development Process Leveraging Process Modeling for Data Integration Overview of Data Integration Modeling Modeling to the Data Integration Architecture Data Integration Models within the SDLC Structuring Models on the Reference Architecture Conceptual Data Integration Models Logical Data Integration Models High-Level Logical Data Integration Model Logical Extraction Data Integration Models Logical Data Quality Data Integration Models Logical Transform Data Integration Models Logical Load Data Integration Models Physical Data Integration Models Converting Logical Data Integration Models to Physical Data Integration Models Target-Based Data Integration Design Technique Overview Physical Source System Data Integration Models Physical Common Component Data Integration Models Physical Subject Area Load Data Integration Models Logical Versus Physical Data Integration Models Tools for Developing Data Integration Models Industry-Based Data Integration Models Summary End-of-Chapter Questions
Case Study Overview Step 1: Build a Conceptual Data Integration Model Step 2: Build a High-Level Logical Model Data Integration Model Step 3: Build the Logical Extract DI Models Confirm the Subject Area Focus from the Data Mapping Document Review Whether the Existing Data Integration Environment Can Fulfill the Requirements Determine the Business Extraction Rules Control File Check Processing Complete the Logical Extract Data Integration Models Final Thoughts on Designing a Logical Extract DI Model Step 4: Define a Logical Data Quality DI Model Design a Logical Data Quality Data Integration Model Identify Technical and Business Data Quality Criteria Determine Absolute and Optional Data Quality Criteria Step 5: Define the Logical Transform DI Model Step 6: Define the Logical Load DI Model Step 7: Determine the Physicalization Strategy Step 8: Convert the Logical Extract Models into Physical Source System Extract DI Models Step 9: Refine the Logical Load Models into Physical Source System Subject Area Load DI Models Step 10: Package the Enterprise Business Rules into Common Component Models Step 11: Sequence the Physical DI Models Summary
Part 2
The Data Integration Systems Development Life Cycle
Chapter 5
Data Integration Analysis
Analyzing Data Integration Requirements Building a Conceptual Data Integration Model Key Conceptual Data Integration Modeling Task Steps Why Is Source System Data Discovery So Difficult? Performing Source System Data Profiling Overview of Data Profiling Key Source System Data Profiling Task Steps Reviewing/Assessing Source Data Quality Validation Checks to Assess the Data Key Review/Assess Source Data Quality Task Steps
Contents
Performing SourceYTarget Data Mappings Overview of Data Mapping Types of Data Mapping Key SourceYTarget Data Mapping Task Steps Summary End-of-Chapter Questions
Chapter 6
111 112 113 115 116 116
Data Integration Analysis Case Study
Case Study Overview Envisioned Wheeler Data Warehouse Environment Aggregations in a Data Warehouse Environment Data Integration Analysis Phase Step 1: Build a Conceptual Data Integration Model Step 2: Perform Source System Data Profiling Step 3: Review/Assess Source Data Quality Step 4: Perform SourceYTarget Data Mappings Summary
Chapter 7
Chapter 8
117 118 120 123 123 124 130 135 145
Data Integration Logical Design
Determining High-Level Data Volumetrics Extract Sizing Disk Space Sizing File Size Impacts Component Design Key Data Integration Volumetrics Task Steps Establishing a Data Integration Architecture Identifying Data Quality Criteria Examples of Data Quality Criteria from a Target Key Data Quality Criteria Identification Task Steps Creating Logical Data Integration Models Key Logical Data Integration Model Task Steps Defining One-Time Data Conversion Load Logical Design Designing a History Conversion One-Time History Data Conversion Task Steps Summary End-of-Chapter Questions
147
,
Data Integration Logical Design Case Study
Step 1: Determine High-Level Data Volumetrics Step 2: Establish the Data Integration Architecture Step 3: Identify Data Quality Criteria Step 4: Create Logical Data Integration Models Define the High-Level Logical Data Integration Model Define the Logical Extraction Data Integration Model
Define the Logical Data Quality Data Integration Model Define Logical Transform Data Integration Model Define Logical Load Data Integration Model Define Logical Data Mart Data Integration Model Develop the History Conversion Design Summary
Chapter 9
187 190 191 192 195 198
Data Integration Physical Design
199
Creating Component-Based Physical Designs Reviewing the Rationale for a Component-Based Design Modularity Design Principles Key Component-Based Physical Designs Creation Task Steps Preparing the DI Development Environment Key Data Integration Development Environment Preparation Task Steps Creating Physical Data Integration Models Point-to-Point Application Development—The Evolution of Data Integration Development The High-Level Logical Data Integration Model in Physical Design Design Physical Common Components Data Integration Models Design Physical Source System Extract Data Integration Models Design Physical Subject Area Load Data Integration Models Designing Parallelism into the Data Integration Models Types of Data Integration Parallel Processing Other Parallel Processing Design Considerations Parallel Processing Pitfalls Key Parallelism Design Task Steps Designing Change Data Capture Append Change Data Capture Design Complexities Key Change Data Capture Design Task Steps Finalizing the History Conversion Design From Hypothesis to Fact Finalize History Data Conversion Design Task Steps Defini ng Data Integration Operational Requirements Determining a Job Schedule for the Data Integration Jobs Determining a Production Support Team Key Data Integration Operational Requirements Task Steps Designing Data Integration Components for SOA Leveraging Traditional Data Integration Processes as SOA Services Appropriate Data Integration Job Types Key Data Integration Design for SOA Task Steps Summary End-of-Chapter Questions
Step 1: Create Physical Data Integration Models Instantiating the Logical Data Integration Models into a Data Integration Package Step 2: Find Opportunities to Tune through Parallel Processing Step 3: Complete Wheeler History Conversion Design Step 4: Define Data Integration Operational Requirements Developing a Job Schedule for Wheeler The Wheeler Monthly Job Schedule The Wheeler Monthly Job Flow Process Step 1: Preparation for the EDW Load Processing Process Step 2: Source System to Subject Area File Processing Process Step 3: Subject Area Files to EDW Load Processing Process Step 4: EDW-to-Product Line Profitability Data Mart Load Processing Production Support Staffing Summary
C h a p t e r 11
Data Integration Development Cycle
Performing General Data Integration Development Activities Data Integration Development Standards Error-Handling Requirements Naming Standards Key General Development Task Steps Prototyping a Set of Data Integration Functionality The Rationale for Prototyping Benefits of Prototyping Prototyping Example Key Data Integration Prototyping Task Steps Completing/Extending Data Integration Job Code Complete/Extend Common Component Data Integration Jobs Complete/Extend the Source System Extract Data Integration Jobs Complete/Extend the Subject Area Load Data Integration Jobs Performing Data Integration Testing Data Warehousing Testing Overview Types of Data Warehousing Testing Perform Data Warehouse Unit Testing Perform Data Warehouse Integration Testing Perform Data Warehouse System and Performance Testing Perform Data Warehouse User Acceptance Testing The Role of Configuration Management in Data Integration What Is Configuration Management? Data Integration Version Control Data Integration Software Promotion Life Cycle Summary End-of-Chapter Questions
Step 1: Prototype the Common Customer Key Step 2: Develop User Test Cases Domestic OM Source System Extract Job Unit Test Case Summary
Part 3
Chapter 13
279 283 284 287
Data Integration with Other Information Management Disciplines
289
Data Integration and Data Governance
291
What Is Data Governance? Why Is Data Governance Important? Components of Data Governance Foundational Data Governance Processes Data Governance Organizational Structure Data Stewardship Processes Data Governance Functions in Data Warehousing Compliance in Data Governance Data Governance Change Management Summary End-of-Chapter Questions
Chapter 14
279
Metadata
What Is Metadata? The Role of Metadata in Data Integration Categories of Metadata Business Metadata Structural Metadata Navigational Metadata Analytic Metadata Operational Metadata Metadata as Part of a Reference Architecture Metadata Users Managing Metadata The Importance of Metadata Management in Data Governance Metadata Environment Current State Metadata Management Plan Metadata Management Life Cycle Summary End-of-Chapter Questions
The Data Quality Framework Key Data Quality Elements The Technical Data Quality Dimension The Business-Process Data Quality Dimension Types of Data Quality Processes The Data Quality Life Cycle The Define Phase Defining the Data Quality Scope Identifying/Defining the Data Quality Elements Developing Preventive Data Quality Processes The Audit Phase Developing a Data Quality Measurement Process Developing Data Quality Reports Auditing Data Quality by LOB or Subject Area The Renovate Phase Data Quality Assessment and Remediation Projects Data Quality SWAT Renovation Projects Data Quality Programs Final Thoughts on Data Quality Summary End-of-Chapter Questions
Write Once, Read Many Grab Everything Data Quality before Transforms Transformation Componentization Where to Perform Aggregations and Calculations Data Integration Environment Volumetric Sizing Subject Area Volumetric Sizing
Appendix С Glossary
369 369 369 370 370 370 370
371
Appendix D Case Study Models Appendix D is an online-only appendix. Print-book readers can download the appendix at www.ibmpressbooks.com/title/9780137084937. For eBook editions, the appendix is included in the book.