1
Best Practices for Heterogeneous Data Masking Jagan R. R Athreya Director, Database Manageability Oracle
Nirmalya Das Lead DBA Cisco
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, development release release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Your Information Assets Across Heterogeneous Databases
Customer Product
Employee
Finance MS SQL
Clinical Trials
4
Your Information Asset Lifecycle Shared with 3rd Parties
• Almost ost 50% o of a all o organizations ga at o s e exposed posed Production oduct o data in non-Production environments • Only 16% have a system in place for deidentifying sensitive iti d data t 2010 IOUG Data Security Report
Clinical Research
IT Service Providers
Market Research
Business partners
Application Developers 5
Your Information Asset Protection Challenge
• Ensure comprehensive protection of yyour information assets across heterogeneous enterprise databases • Reduce information lifecycle costs through automation Clinical Research
IT Service Providers
Microsoft SQLServer
Market Research
Business partners
Application pp Developers
IBM DB2 6
Secure Test System Deployment Oracle Data Masking
Production
Test
LAST_NAME
SSN
SALARY
LAST_NAME
SSN
AGUILAR
203-33-3234
40,000
SMITH
111—23-1111
60,000
BENSON SO
323-22-2943 3 3 9 3
60,000
MILLER
222-34-1345 222 34 1345
40,000
• • • • • •
Deploy secure test system by masking sensitive data Sensitive data never leaves the database Extensible template library and policies for automation Sophisticated masking: Condition-based, compound, deterministic Integrated masking and cloning Leverage masking templates for common data types
SALARY
Data Masking using Oracle Enterprise Manager Centrally Ce t a y co controlled. t o ed G Globally oba y managed. a aged
• • • • •
Microsoft SQLS SQLServer
Monitoring M it i Performance Diagnostics Patching & Provisioning Configuration Management Data Masking
IBM DB2 8
Data Masking Methodology
Production LAST_NAME SSN
Non-Production Non Production
SALARY
LAST_NAME
SSN
SALARY
AGUILAR
203-33-3234
40,000
SMITH
111—23-1111
40,000
BENSON
323-22-2943
60,000 ,
JOHNSON
222-34-1345 222 34 1345
60,000
IBM DB2 MS SQL
JRA1
• Find: Catalog and identify sensitive data across enterprise databases • Assess: Define the optimal data masking techniques • Secure: Automate non-production systems through data masking • Test: Ensure the integrity of applications through testing
9
FIND: Catalog and identify sensitive data across enterprise p databases
ASSESS SECURE TEST
10
Catalog Sensitive Data in Your Enterprise Databases Person Name
Bank Account Number
Maiden Name
Card Number (Credit or Debit Card Number)
Business Address
Tax Registration Number or National Tax ID
Business Telephone Number
Person Identification Number
Business Email Address
Welfare Pension Insurance Number
Custom Name
Unemployment Insurance Number
Employee Number
Government Affiliation ID
User Global Identifier
Military Service ID
Party Number or Customer Number
Social Insurance Number
Account Name
Pension ID Number
Mail Stop
Article Number
GPS Location
Civil Identifier Number
Student Exam Hall Ticket Number
Hafiza Number
Club Membership ID
Social Security Number
Library Card Number
Trade Union Membership Number
Identity Card Number
Pension Registration Number
Instant Messaging Address
National Insurance Number
Web site
Health Insurance Number
National Identifier
Personal Public Service Number
Passport Number
Electronic Taxpayer Identification Number
Driver’s License Number
Biometrics Data
Personal Address
Digital ID
Personal Telephone Number
Citizenship Number
Personal Email Address
Voter Identification Number
Visa Number or Work Permit
Residency Number (Green Card)
• Business-driven • Criteria: – Violate government regulations – Violate business regulations – Damage shareholder value through loss of • • • • • •
Market capital Valuation Reputation Customers Lawsuits Business-driven
11
Catalog Relationships Created Through Data Flows Cross-database C oss database Referential e e e t a Integrity teg ty
CUSTOMER CUSTID NAME
REP_ID
200
ACME
12
201
BIG BOX
15
• Identify business processes that create data flows across databases • Inspect the data flows for sensitive data content • Define cross-database referential integrity relationships for sensitive data
12
Importance of Referential Integrity CUSTOMER
EMPLOYEE NAME
TITLE
CUSTID NAME
12
SMITH
SALESREP
200
ACME
12
13
JONES
CSR
201
BIG CO
15
14
ELLISON
CEO
SUPPORT
15
FERNICOLA SALES MGR CUSTID
EMPID
200
CUSTOMER
EMPLOYEE REP_ID
CSR_ID
NAME
TITLE
CUSTID NAME
526
SMITH
SALESREP
200
ACME
526
618
JONES
CSR
201
BIG CO
323
253
ELLISON
CEO
SUPPORT
323
FERNICOLA SALES MGR CUSTID
EMPID
13
200
REP_ID
CSR_ID 618
• Database enforced • Application enforced Automatic Referential Integrity
Ensure application relationships while de-identifying sensitive data
Application performance fidelity
Maintains cardinality of data to ensure accurate data performance as production
1313
Find and Catalog Sensitive Data Data Finder Tool
1.
Data Finder Patterns Table Name: “EMP*” Column Name “*SSN*” Data Format ### - ## - ####
• Define pattern match rules for Tables, columns and data
Data Privacy Catalog
4.
PERSON_SSN, EMP_SSN, SOC_SEC_NUM
• New database fields added and then protected
2.
Enterprise Data Sources
• Connect to Oracle, SQLServer and DB2 Databases • Search for Data Finder patterns across databases
3.
Data Finder Reports Data Finder Results
• Results rendered by confidence factor • Relevant database fields imported into the Data Privacy Catalog
14
FIND ASSESS: Define the optimal data masking techniques
SECURE TEST
15
Comprehensive Mask Formats Mask Primitives and User-extensible User extensible Mask Formats
• Mask primitives – Simple mask formats • ALPHA • NUMERIC • DATE
– Simple mask techniques • SHUFFLE • RANDOMIZE • LOOKUP TABLE
• User-defined function – Extensible using PL/SQL – Complex rule-based logic
16
Oracle Data Masking Comprehensive and Extensible Mask Library
Mask formats for common sensitive data
Accelerates solution deployment of masking
Extensible mask routines
E bl customization Enables i i off b business i rules l
Define once, apply everywhere
Ensures consistent enforcement of policies
Oracle Data Masking Sophisticated Masking Techniques
Conditionbased Masking
Compound p Masking
Compound Mask Sets of related columns masked together e.g. Address, City, State, Zip, Phone Condition-based Masking Specify separate mask format for each condition, e.g. driver’s license format for each state SQL-expression based masking Use SQL functions, e.g. g UPPER, SUBSTR, TO_CHAR, to g generate mask values, e.g. SUBSTR(%ORIG_VALUE%,1,3)||’–111-1111’ 18
Deterministic Masking - What Production
Payroll
Expense Reporting
• • • •
Non-Production
SSN
SSN
SSN
203-33-3234
111-23-1111
111-23-1111
323-22-2943
222-34-1345
222-34-1345
SSN
SSN
203-33-3234
111-23-1111
323-22-2943
222-34-1345
Consistent masked values for given original value Secure one-way non-reversible Repeatable across refreshes Repeatable across databases
19
Deterministic Masking - Why Production
Payroll
Expense Reporting
Non-Production
SSN
SSN
SSN
203-33-3234
111-23-1111
132-43-1451
323-22-2943
222-34-1345
233-86-2853
SSN
SSN
SSN
203-33-3234
328-47-3843
111-23-1111
323-22-2943
823-43-0837
222-34-1345
• Preserve referential relationships across databases • Ensure repeatability of test cases after refreshes • Enables incremental masking of data feeds
20
Deterministic Masking techniques
• Use Deterministic user-defined function for algorithmic or numerical based sensitive data – National Identifiers, credit card numbers – Derived mathematically – Guaranteed uniqueness
• Use “Substitute” mask primitive for context-based sensitive data – Names, addresses, product names, medical conditions – Derived through hash lookup of table with replacement values – Small probability of collisions
21
Mask Definition Associate Mask Formats with Identified Sensitive Columns • Automatic discovery and enforcement of referential integrity • Registration and enforcement of referential integrity g y when entered as related columns – Application-enforced referential integrity – Business-process B i b based dd data t relationships – Non-Oracle database based referential integrity
• Imported via XML generated via SQL against meta data
22
FIND ASSESS SECURE: Automate nonproduction systems through data masking
TEST
23
Test System Setup for Oracle Databases Creating Test Databases from Production Business T2
BusinessT1
T1
T2 data
T3
data T4
T5
Clone App Meta data DB dictionary data
Production DB
T4
T3 T5
App Meta data DB dictionary data
Test DB
• Enterprise Manager out-of-the-box workflows • RMAN-based clone-and-masking (Recommended) • Export-Import Export Import • Backup and Restore • Transportable Tablespace
Test System Setup for non-Oracle Databases Creating Test Databases from Production using Oracle Gateways Business
IBM DB2 Microsoft SQLServer Production DB
T2
T1
1 T3
data T4
T5
Clone
App Meta data DB dictionary data
BusinessT1 T2 data T4
T5
IBM DB2 Microsoft SQLSer er SQLServer Test DB
App Meta data DB dictionary data
Database gateway
2
Masking Process 1 Production data copied to Test 1. 2. Sensitive data copied to Staging 3. Sensitive data masked in Staging 4 Masked data copied from Staging to Test 4.
T3
3
BusinessT1 T2 data T4
4
T3 T5
Staging DB
Test System Setup for non-Oracle Databases Creating Test Databases from Production using Oracle Database Gateways
• Connect to non-Oracle Oracle Gateway Solutions Oracle Applications
Heterogeneous database gateway
from an Oracle env.
– Transparently access nonnon Oracle data using Oracle SQL
• Makes the non-oracle look like a remote Oracle – Location Transparency
• Target specific Gateways – DB2, SQL Server, Teradata,, Sybase, y , Informix IBM DB2 Microsoft SQLServer
Test System Setup for non-Oracle Databases Creating Test Databases from Production using Oracle Gateways
Configuration
Pre-Mask Copy
Mask
Post-Mask Copy
Test System Setup for non-Oracle Databases Creating g Test Databases from Production using g Oracle Gateways y
Configuration
• Configure gateways, database links for non-Oracle databases and synonyms for non-Oracle database tables • Create empty tables (no data) in Oracle staging database from non-Oracle non Oracle databases via Gateways
Test System Setup for non-Oracle Databases Creating g Test Databases from Production using g Oracle Gateways y
Configuration
Pre-Mask Copy
• Copy data from non-Oracle to Oracle in Pre-Mask step
Test System Setup for non-Oracle Databases Creating g Test Databases from Production using g Oracle Gateways y
Configuration
PreMask Copy
Mask
Post-Mask Copy
• Copy py masked data back to non-Oracle database •T Truncate t data d t in Oracle staging
Generate Mask Execution Script
• Generate PL/SQL-based masking script upon successful validation • Adds pre- and post-mask SQL • Ensure uniqueness can be maintained • Ensure formats match column data types • Check Space availability • Warn a about Check C ec Constraints Co st a ts • Check presence of default Partitions
31
Highest Performance Mask Execution Capture p and disable Constraints on “sensitive” table
Build mapping table containing original sensitive and masked values using masking routines
Drop Renamed table and mapping table
Rename “sensitive” table
Recreate masked table from original table replacing sensitive with masked values from mapping tables using CTAS
Restore constraints based on original table
• Column scalability – 215 columns masked across 100 tables – 60GB Database – 20 minutes
• Rows scalability – 100 million row table, 6 columns masked – Random Number – 1.3 hours
Linux x86 4 CPU: Single core Pentium 4 (Northwood) [D1]) Memory: 5.7G 32
Oracle Data Masking Appln. Admin
High performance, performance Workflow-based and Optimized for Oracle Identify Sensitive Information
Format Library
Associate mask format with sensitive information
DBA
Mask Definition Clone Prod to Staging
• Mask editing separate from mask execution • One-click cloneand-mask workflow
Execute Mask
High Performance
Rapid creation of test systems
Workflow-based
I Improved d security i via i segregation i off d duties i
Optimized for Oracle
Increased productivity in Oracle environments through integration with cloning cloning, flashback flashback.
Mask Execution Options Unique to Oracle Data Masking
• Comparing before & after values • To save the mapping tables to compare before and after values after a mask run during testing
• REDO and FLASHBACK log enabled • T To allow ll FLASHBACK DATABASE to t pre-masked k d state t t when h testing masking routines.
• Statistics refresh • To enable DBAs to run their own custom statistics generation routine
• Degree of parallelism • To optimize the performance of the mask execution based on the number of processors available
Data Center Friendly • Privilege g delegation g support • Allows Mask execution using sudo or PowerBroker
• Masking script directory specification • Allows DBAs to specify directory location when masking script should be generated
• NEW: Masking tasks command line support via EMCLI • Allow integration with any database automation processes • Generates mask script specific to cloned schema
35
FIND ASSESS SECURE TEST: Ensure the integrity g y of applications through testing
Application Quality Management (AQM) Solutions Install / Upgrade Application
Application Quality Management (AQM) • New solution offering for EM family of products • Consists of three products • Data Masking Pack • Application Testing Suite • Real Application Testing • Supports complete application management lifecycle – from p and testing g to development production deployment • Integration with EM to provide single console for end-to-end application management and testing • Integration with Oracle apps to help lower application deployment costs and improve performance and service levels
DEV Dev / Test System Provisioning
Functional Testing
TEST
PROD
Load Testing
Customer case study
Enterprise p Data Masking Solution
Nirmalya Das Lead DBA, Cisco
Business Drivers
•
Cisco data is required to keep private to be in compliance with external Privacy Laws and Regulations. For example, SOX, Payment Card Industry(PCI), Health Insurance Portability and Accountability Act (HIPAA). (HIPAA) • Visit Cisco privacy central for in-depth view of Privacy http://www.cisco.com/web/siteassets/legal/privacy.html • Privacy policies and guidelines
•
All other business data considered sensitive by Cisco e.g. credit card g g data,, personnel p and customer data numbers,, financial data,, engineering (Personally Identifiable Information (PII)).
Current Challenges
•
• •
Cisco did not have a uniform, standardized process or tool where private data, data classified as confidential or restricted, is disguised in the supporting instances to production. Cisco co could ld not ens ensure re that all pri private ate data is disg disguised ised and no e exposure pos re exists with regards to this data Risk to Cisco involving fraudulent activities, loss of customer trust, damage to brand, expensive notification, remediation efforts, and violations of various regulatory and statutory requirements resulting fines and penalties.
Project goals
• • • • • •
Facilitate the compliance of worldwide Data Privacy rules and regulations at Cisco Reduce the amount of individual manual analysis and effort required to manage and d duplicate plicate masked data among different ffunctional nctional areas Implement an enterprise-wide solution that standardizes a repeatable data-masking process and capabilities for non production environments Ensure masked data is ‘fit fit for use use’ Provide reliable assurance that private data will not be exposed in nonproduction environments Leverage investments in existing tools where possible
RFP-based Evaluation
• •
5 Vendors shortlisted through g RFP p process 2 selected for final evaluation • Technical proof of concept to demonstrate 5 Cisco-specified use cases • Other criteria: Customer references and total cost of ownership
Vendor
Use Case (60%)
Cost (30%)
Customer References (10%)
TOTAL1 (100%)
Vendor X
3.75 / 6
1.5 / 3
0.79 / 1
6.04
4.50 / 6
3/3
0.52 /1
8.00
Vendor Finalist
Notes 1 Total
possible score for each vendor is 10.
2 Oracle
customers were not able to provide the quantitative scoring. However, the customer reference checks have satisfactory results and therefore warrant Oracle with 0.5 of 1 score.
Data Masking Implementation At Cisco
Roles & Responsibilities Owner
Role
Masking Responsibilities
GBP Business
Data Steward
– Identify y & pprioritize sensitive data
GBP IT
Data Analyst
– Locate data in databases – Analyze environments – Initiate Demand Clearing – Create & push PVCS & Kintana Packages
Developer
– Create & test masking scripts
QA Reviewer
– Review & approve Kintana package
DBA
– Participate in Demand Clearing – Flag database in DBTS – Update cloning instructions for database – Set up p environment – Generate monitoring reports periodically
WIPRO
– Perform additional cloning instructions
ATS
ITRM 10/25/2010
Auditor
– Request & inspect compliance w/ process – Identify root cause of issues found – Work with others to resolve systemic issues Kryptos - Data Masking project – Refine process as it matures
45
End-to-End Process Flow Diagram
Data Masking g Solution Features • • • •
• • •
Initiallyy an Oracle database solution only. y Data Masking software tool is a module of Oracle Enterprise Manager, currently used to monitor all Oracle databases at Cisco. Masked data cannot be reversed to its original value. Data Masking tool provides • predetermined masking rules for common sensitive data eg ss#, credit cards • Ability to create customized masking rules Provides User Access Control to Data Masking tool Provides automated Change Control process to deploy production masking rules Creates a script p which masks data duringg the existingg database refresh processes p
Where are we now? • • • •
Oracle data masking solution implemented Phase 1 with HR IT and GPSS IT successful Playbook created for GBPs to implement masking in Phase 2 Continue to work with Oracle to resolve software issues - Open
Masked Data Elements in EBS application Ph Phase 1 GBP Human Resources (HRMS)
GPSS
Phase 2 Field to be Masked Registered Disable Flag Ethnic Origin Termination Reason Code Home Phone Base Salary Bonus/CAP Birth Date Country of Birth National Identifier Address ePM Rating Salary Sales Rep’s Annual Target (Local Currency) Sales Rep’s Annual Target (US Dollars) Sales Rep’s Rep s Annual Target by Territory (Local Currency) Sales Rep’s Annual Target by Territory (US Dollars) Sales Rep’s Quarterly Target (Local Currency) Sales Rep’s Quarterly Target (US Dollars)
GBP Finance (P2R, H2R)
Human Resources (HRMS)
Field to be Masked Emp. p Bank Account # Emp. Corporate Card # Emp. Divorce Status Emp. Nationality Emp. Citizenship Status Emp. Country Emp. Region Emp. Town of Birth Emp. Veteran Status Emp. Separation Package T Type
GGSG
Pay Grade Clearance Level Clearance Bonus
Marketing
Customer & Prospect email
(MODS, CM, SMCC, SMS, GIST)
GPSS
Commission Incentive Bonus Plan Code Bonus Status Bonus Description Bonus Type OMF Opportunity $
Phase 2 Extend the enterprise-wide masking solution to Finance, Marketing, & GGSG Mask sensitive data in a risk based, iterative approach Provide a framework to enable ongoing enterprise-wide adoption ITRM continued monitoring and engagement of GBPs
Life Before and After Data Masking
Process used for masking Data elements protected Databases protected Divisions using data masking
Before Masking
After Masking
Manual
Automated
Unknown
8 (in Phase 1)
1
8 (in Phase 1)
1
2 (in Phase 1) 5 (in Phase 2)
Business Benefits Increase Cisco’s assurance that private data is not unnecessarily exposed and exploited Reduce exposure risk due to private data leakage Reduce the risk of failing an ICS audit or government regulations Increased visibility and traceability where private data is stored and masked Reduce effort by the project teams during project initiative development and testing, where data masking is required Reduce duplicate p effort in definingg what data needs to be masked Increased standardization and uniformity of data masking process Cisco wide Financial benefit to Cisco through improved 'value for money' potential and better management of data usage