Contents. Part 1 Overview of Data Integration. Preface. Acknowledgments. xxii. About the Author. xxiii

Contents Preface XIX Acknowledgments xxii About the Author xxiii Introduction: W h y Is D a t a Integration I m p o r t a n t ? 1 Part 1 Ov...
Author: Spencer Lawson
3 downloads 2 Views 7MB Size
Contents

Preface

XIX

Acknowledgments

xxii

About the Author

xxiii

Introduction: W h y Is D a t a Integration I m p o r t a n t ?

1

Part 1

Overview of Data Integration

5

Chapter 1

Types of Data Integration

7

Data Integration Architectural Patterns Enterprise Application Integration (EAI) Service-Oriented Architecture (SOA) Federation Extract, Transform, Load (ETL) Common Data Integration Functionality Summary End-of-Chapter Questions Chapter 2

A n A r c h i t e c t u r e for D a t a I n t e g r a t i o n

What Is Reference Architecture? Reference Architecture for Data Integration Objectives of the Data Integration Reference Architecture The Data Subject Area-Based Component Design Approach A Scalable Architecture Purposes of the Data Integration Reference Architecture The Layers of the Data Integration Architecture Extract/Subscribe Processes Data Integration Guiding Principle: "Read Once, Write Many" Data Integration Guiding Principle: "Grab Everything" Initial Staging Landing Zone

7 8 9 12 14 15 16 16

19 19 20 21 22 24 26 26 27 28 28 29

Contents

Data Quality Processes What Is Data Quality? Causes of Poor Data Quality Data Quality Check Points Where to Perform a Data Quality Check Clean Staging Landing Zone Transform Processes Conforming Transform Types Calculations and Splits Transform Types Processing and Enrichment Transform Types Target Filters Transform Types Load-Ready Publish Landing Zone Load/Publish Processes Physical Load Architectures An Overall Data Architecture Summary End-of-Chapter Questions

Chapter 3

31 31 31 32 32 34 35 35 35 36 38 39 40 41 41 42 43

A Design Technique: Data Integration Modeling

45

The Business Case for a New Design Process Improving the Development Process Leveraging Process Modeling for Data Integration Overview of Data Integration Modeling Modeling to the Data Integration Architecture Data Integration Models within the SDLC Structuring Models on the Reference Architecture Conceptual Data Integration Models Logical Data Integration Models High-Level Logical Data Integration Model Logical Extraction Data Integration Models Logical Data Quality Data Integration Models Logical Transform Data Integration Models Logical Load Data Integration Models Physical Data Integration Models Converting Logical Data Integration Models to Physical Data Integration Models Target-Based Data Integration Design Technique Overview Physical Source System Data Integration Models Physical Common Component Data Integration Models Physical Subject Area Load Data Integration Models Logical Versus Physical Data Integration Models Tools for Developing Data Integration Models Industry-Based Data Integration Models Summary End-of-Chapter Questions

45 47 48 48 48 49 50 51 51 52 52 53 54 55 56 56 56 57 58 60 61 61 63 64 65

Contents

Chapter 4

Case Study: Customer Loan Data Warehouse Project

Case Study Overview Step 1: Build a Conceptual Data Integration Model Step 2: Build a High-Level Logical Model Data Integration Model Step 3: Build the Logical Extract DI Models Confirm the Subject Area Focus from the Data Mapping Document Review Whether the Existing Data Integration Environment Can Fulfill the Requirements Determine the Business Extraction Rules Control File Check Processing Complete the Logical Extract Data Integration Models Final Thoughts on Designing a Logical Extract DI Model Step 4: Define a Logical Data Quality DI Model Design a Logical Data Quality Data Integration Model Identify Technical and Business Data Quality Criteria Determine Absolute and Optional Data Quality Criteria Step 5: Define the Logical Transform DI Model Step 6: Define the Logical Load DI Model Step 7: Determine the Physicalization Strategy Step 8: Convert the Logical Extract Models into Physical Source System Extract DI Models Step 9: Refine the Logical Load Models into Physical Source System Subject Area Load DI Models Step 10: Package the Enterprise Business Rules into Common Component Models Step 11: Sequence the Physical DI Models Summary

Part 2

The Data Integration Systems Development Life Cycle

Chapter 5

Data Integration Analysis

Analyzing Data Integration Requirements Building a Conceptual Data Integration Model Key Conceptual Data Integration Modeling Task Steps Why Is Source System Data Discovery So Difficult? Performing Source System Data Profiling Overview of Data Profiling Key Source System Data Profiling Task Steps Reviewing/Assessing Source Data Quality Validation Checks to Assess the Data Key Review/Assess Source Data Quality Task Steps

Contents

Performing SourceYTarget Data Mappings Overview of Data Mapping Types of Data Mapping Key SourceYTarget Data Mapping Task Steps Summary End-of-Chapter Questions

Chapter 6

111 112 113 115 116 116

Data Integration Analysis Case Study

Case Study Overview Envisioned Wheeler Data Warehouse Environment Aggregations in a Data Warehouse Environment Data Integration Analysis Phase Step 1: Build a Conceptual Data Integration Model Step 2: Perform Source System Data Profiling Step 3: Review/Assess Source Data Quality Step 4: Perform SourceYTarget Data Mappings Summary

Chapter 7

Chapter 8

117 118 120 123 123 124 130 135 145

Data Integration Logical Design

Determining High-Level Data Volumetrics Extract Sizing Disk Space Sizing File Size Impacts Component Design Key Data Integration Volumetrics Task Steps Establishing a Data Integration Architecture Identifying Data Quality Criteria Examples of Data Quality Criteria from a Target Key Data Quality Criteria Identification Task Steps Creating Logical Data Integration Models Key Logical Data Integration Model Task Steps Defining One-Time Data Conversion Load Logical Design Designing a History Conversion One-Time History Data Conversion Task Steps Summary End-of-Chapter Questions

147

,

Data Integration Logical Design Case Study

Step 1: Determine High-Level Data Volumetrics Step 2: Establish the Data Integration Architecture Step 3: Identify Data Quality Criteria Step 4: Create Logical Data Integration Models Define the High-Level Logical Data Integration Model Define the Logical Extraction Data Integration Model

117

147 148 148 150 150 151 154 155 155 156 157 163 164 166 166 167

169 169 174 177 180 181 183

Contents

Define the Logical Data Quality Data Integration Model Define Logical Transform Data Integration Model Define Logical Load Data Integration Model Define Logical Data Mart Data Integration Model Develop the History Conversion Design Summary

Chapter 9

187 190 191 192 195 198

Data Integration Physical Design

199

Creating Component-Based Physical Designs Reviewing the Rationale for a Component-Based Design Modularity Design Principles Key Component-Based Physical Designs Creation Task Steps Preparing the DI Development Environment Key Data Integration Development Environment Preparation Task Steps Creating Physical Data Integration Models Point-to-Point Application Development—The Evolution of Data Integration Development The High-Level Logical Data Integration Model in Physical Design Design Physical Common Components Data Integration Models Design Physical Source System Extract Data Integration Models Design Physical Subject Area Load Data Integration Models Designing Parallelism into the Data Integration Models Types of Data Integration Parallel Processing Other Parallel Processing Design Considerations Parallel Processing Pitfalls Key Parallelism Design Task Steps Designing Change Data Capture Append Change Data Capture Design Complexities Key Change Data Capture Design Task Steps Finalizing the History Conversion Design From Hypothesis to Fact Finalize History Data Conversion Design Task Steps Defini ng Data Integration Operational Requirements Determining a Job Schedule for the Data Integration Jobs Determining a Production Support Team Key Data Integration Operational Requirements Task Steps Designing Data Integration Components for SOA Leveraging Traditional Data Integration Processes as SOA Services Appropriate Data Integration Job Types Key Data Integration Design for SOA Task Steps Summary End-of-Chapter Questions

200 200 200 201 201 202 203 203 205 206 208 209 210 211 214 215 216 216 217 219 220 220 220 221 221 222 224 225 225 227 227 228 228

Contents

xvi

C h a p t e r 10

Data Integration Physical Design Case Study

Step 1: Create Physical Data Integration Models Instantiating the Logical Data Integration Models into a Data Integration Package Step 2: Find Opportunities to Tune through Parallel Processing Step 3: Complete Wheeler History Conversion Design Step 4: Define Data Integration Operational Requirements Developing a Job Schedule for Wheeler The Wheeler Monthly Job Schedule The Wheeler Monthly Job Flow Process Step 1: Preparation for the EDW Load Processing Process Step 2: Source System to Subject Area File Processing Process Step 3: Subject Area Files to EDW Load Processing Process Step 4: EDW-to-Product Line Profitability Data Mart Load Processing Production Support Staffing Summary

C h a p t e r 11

Data Integration Development Cycle

Performing General Data Integration Development Activities Data Integration Development Standards Error-Handling Requirements Naming Standards Key General Development Task Steps Prototyping a Set of Data Integration Functionality The Rationale for Prototyping Benefits of Prototyping Prototyping Example Key Data Integration Prototyping Task Steps Completing/Extending Data Integration Job Code Complete/Extend Common Component Data Integration Jobs Complete/Extend the Source System Extract Data Integration Jobs Complete/Extend the Subject Area Load Data Integration Jobs Performing Data Integration Testing Data Warehousing Testing Overview Types of Data Warehousing Testing Perform Data Warehouse Unit Testing Perform Data Warehouse Integration Testing Perform Data Warehouse System and Performance Testing Perform Data Warehouse User Acceptance Testing The Role of Configuration Management in Data Integration What Is Configuration Management? Data Integration Version Control Data Integration Software Promotion Life Cycle Summary End-of-Chapter Questions

229 229 229 237 238 239 240 240 240 241 242 245 248 248 249

251 253 253 255 255 256 257 257 257 258 261 262 263 264 265 266 267 268 269 272 273 274 275 276 277 277 277 278

Contents

Chapter 12

xvii

Data Integration Development Cycle Case Study

Step 1: Prototype the Common Customer Key Step 2: Develop User Test Cases Domestic OM Source System Extract Job Unit Test Case Summary

Part 3

Chapter 13

279 283 284 287

Data Integration with Other Information Management Disciplines

289

Data Integration and Data Governance

291

What Is Data Governance? Why Is Data Governance Important? Components of Data Governance Foundational Data Governance Processes Data Governance Organizational Structure Data Stewardship Processes Data Governance Functions in Data Warehousing Compliance in Data Governance Data Governance Change Management Summary End-of-Chapter Questions

Chapter 14

279

Metadata

What Is Metadata? The Role of Metadata in Data Integration Categories of Metadata Business Metadata Structural Metadata Navigational Metadata Analytic Metadata Operational Metadata Metadata as Part of a Reference Architecture Metadata Users Managing Metadata The Importance of Metadata Management in Data Governance Metadata Environment Current State Metadata Management Plan Metadata Management Life Cycle Summary End-of-Chapter Questions

292 294 295 295 298 304 305 309 310 311 311

313 313 314 314 315 315 317 318 319 319 320 321 321 322 322 324 327 327

Contents

C h a p t e r 15

Data Quality

The Data Quality Framework Key Data Quality Elements The Technical Data Quality Dimension The Business-Process Data Quality Dimension Types of Data Quality Processes The Data Quality Life Cycle The Define Phase Defining the Data Quality Scope Identifying/Defining the Data Quality Elements Developing Preventive Data Quality Processes The Audit Phase Developing a Data Quality Measurement Process Developing Data Quality Reports Auditing Data Quality by LOB or Subject Area The Renovate Phase Data Quality Assessment and Remediation Projects Data Quality SWAT Renovation Projects Data Quality Programs Final Thoughts on Data Quality Summary End-of-Chapter Questions

329 330 331 332 333 334 334 336 336 336 337 345 346 348 350 351 352 352 353 353 353 354

Appendix A Exercise Answers

355

Appendix В Data Integration Guiding Principles

369

Write Once, Read Many Grab Everything Data Quality before Transforms Transformation Componentization Where to Perform Aggregations and Calculations Data Integration Environment Volumetric Sizing Subject Area Volumetric Sizing

Appendix С Glossary

369 369 369 370 370 370 370

371

Appendix D Case Study Models Appendix D is an online-only appendix. Print-book readers can download the appendix at www.ibmpressbooks.com/title/9780137084937. For eBook editions, the appendix is included in the book.

Index

375

Suggest Documents