Best Practices for Heterogeneous Data Masking

1 Best Practices for Heterogeneous Data Masking Jagan R. R Athreya Director, Database Manageability Oracle Nirmalya Das Lead DBA Cisco The foll...
Author: Sydney Cole
45 downloads 5 Views 3MB Size
1



Best Practices for Heterogeneous Data Masking Jagan R. R Athreya Director, Database Manageability Oracle

Nirmalya Das Lead DBA Cisco

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, development release release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3

Your Information Assets Across Heterogeneous Databases

Customer Product

Employee

Finance MS SQL

Clinical Trials

4

Your Information Asset Lifecycle Shared with 3rd Parties

• Almost ost 50% o of a all o organizations ga at o s e exposed posed Production oduct o data in non-Production environments • Only 16% have a system in place for deidentifying sensitive iti d data t 2010 IOUG Data Security Report

Clinical Research

IT Service Providers

Market Research

Business partners

Application Developers 5

Your Information Asset Protection Challenge

• Ensure comprehensive protection of yyour information assets across heterogeneous enterprise databases • Reduce information lifecycle costs through automation Clinical Research

IT Service Providers

Microsoft SQLServer

Market Research

Business partners

Application pp Developers

IBM DB2 6

Secure Test System Deployment Oracle Data Masking

Production

Test

LAST_NAME

SSN

SALARY

LAST_NAME

SSN

AGUILAR

203-33-3234

40,000

SMITH

111—23-1111

60,000

BENSON SO

323-22-2943 3 3 9 3

60,000

MILLER

222-34-1345 222 34 1345

40,000

• • • • • •

Deploy secure test system by masking sensitive data Sensitive data never leaves the database Extensible template library and policies for automation Sophisticated masking: Condition-based, compound, deterministic Integrated masking and cloning Leverage masking templates for common data types

SALARY

Data Masking using Oracle Enterprise Manager Centrally Ce t a y co controlled. t o ed G Globally oba y managed. a aged

• • • • •

Microsoft SQLS SQLServer

Monitoring M it i Performance Diagnostics Patching & Provisioning Configuration Management Data Masking

IBM DB2 8

Data Masking Methodology

Production LAST_NAME SSN

Non-Production Non Production

SALARY

LAST_NAME

SSN

SALARY

AGUILAR

203-33-3234

40,000

SMITH

111—23-1111

40,000

BENSON

323-22-2943

60,000 ,

JOHNSON

222-34-1345 222 34 1345

60,000

IBM DB2 MS SQL

JRA1

• Find: Catalog and identify sensitive data across enterprise databases • Assess: Define the optimal data masking techniques • Secure: Automate non-production systems through data masking • Test: Ensure the integrity of applications through testing

9

FIND: Catalog and identify sensitive data across enterprise p databases

ASSESS SECURE TEST

10

Catalog Sensitive Data in Your Enterprise Databases Person Name

Bank Account Number

Maiden Name

Card Number (Credit or Debit Card Number)

Business Address

Tax Registration Number or National Tax ID

Business Telephone Number

Person Identification Number

Business Email Address

Welfare Pension Insurance Number

Custom Name

Unemployment Insurance Number

Employee Number

Government Affiliation ID

User Global Identifier

Military Service ID

Party Number or Customer Number

Social Insurance Number

Account Name

Pension ID Number

Mail Stop

Article Number

GPS Location

Civil Identifier Number

Student Exam Hall Ticket Number

Hafiza Number

Club Membership ID

Social Security Number

Library Card Number

Trade Union Membership Number

Identity Card Number

Pension Registration Number

Instant Messaging Address

National Insurance Number

Web site

Health Insurance Number

National Identifier

Personal Public Service Number

Passport Number

Electronic Taxpayer Identification Number

Driver’s License Number

Biometrics Data

Personal Address

Digital ID

Personal Telephone Number

Citizenship Number

Personal Email Address

Voter Identification Number

Visa Number or Work Permit

Residency Number (Green Card)

• Business-driven • Criteria: – Violate government regulations – Violate business regulations – Damage shareholder value through loss of • • • • • •

Market capital Valuation Reputation Customers Lawsuits Business-driven

11

Catalog Relationships Created Through Data Flows Cross-database C oss database Referential e e e t a Integrity teg ty

CUSTOMER CUSTID NAME

REP_ID

200

ACME

12

201

BIG BOX

15

• Identify business processes that create data flows across databases • Inspect the data flows for sensitive data content • Define cross-database referential integrity relationships for sensitive data

12

Importance of Referential Integrity CUSTOMER

EMPLOYEE NAME

TITLE

CUSTID NAME

12

SMITH

SALESREP

200

ACME

12

13

JONES

CSR

201

BIG CO

15

14

ELLISON

CEO

SUPPORT

15

FERNICOLA SALES MGR CUSTID

EMPID

200

CUSTOMER

EMPLOYEE REP_ID

CSR_ID

NAME

TITLE

CUSTID NAME

526

SMITH

SALESREP

200

ACME

526

618

JONES

CSR

201

BIG CO

323

253

ELLISON

CEO

SUPPORT

323

FERNICOLA SALES MGR CUSTID

EMPID

13

200

REP_ID

CSR_ID 618

• Database enforced • Application enforced Automatic Referential Integrity

Ensure application relationships while de-identifying sensitive data

Application performance fidelity

Maintains cardinality of data to ensure accurate data performance as production

1313

Find and Catalog Sensitive Data Data Finder Tool

1.

Data Finder Patterns Table Name: “EMP*” Column Name “*SSN*” Data Format ### - ## - ####

• Define pattern match rules for Tables, columns and data

Data Privacy Catalog

4.

PERSON_SSN, EMP_SSN, SOC_SEC_NUM

• New database fields added and then protected

2.

Enterprise Data Sources

• Connect to Oracle, SQLServer and DB2 Databases • Search for Data Finder patterns across databases

3.

Data Finder Reports Data Finder Results

• Results rendered by confidence factor • Relevant database fields imported into the Data Privacy Catalog

14

FIND ASSESS: Define the optimal data masking techniques

SECURE TEST

15

Comprehensive Mask Formats Mask Primitives and User-extensible User extensible Mask Formats

• Mask primitives – Simple mask formats • ALPHA • NUMERIC • DATE

– Simple mask techniques • SHUFFLE • RANDOMIZE • LOOKUP TABLE

• User-defined function – Extensible using PL/SQL – Complex rule-based logic

16

Oracle Data Masking Comprehensive and Extensible Mask Library

Mask formats for common sensitive data

Accelerates solution deployment of masking

Extensible mask routines

E bl customization Enables i i off b business i rules l

Define once, apply everywhere

Ensures consistent enforcement of policies

Oracle Data Masking Sophisticated Masking Techniques

Conditionbased Masking

Compound p Masking

Compound Mask Sets of related columns masked together e.g. Address, City, State, Zip, Phone Condition-based Masking Specify separate mask format for each condition, e.g. driver’s license format for each state SQL-expression based masking Use SQL functions, e.g. g UPPER, SUBSTR, TO_CHAR, to g generate mask values, e.g. SUBSTR(%ORIG_VALUE%,1,3)||’–111-1111’ 18

Deterministic Masking - What Production

Payroll

Expense Reporting

• • • •

Non-Production

SSN

SSN

SSN

203-33-3234

111-23-1111

111-23-1111

323-22-2943

222-34-1345

222-34-1345

SSN

SSN

203-33-3234

111-23-1111

323-22-2943

222-34-1345

Consistent masked values for given original value Secure one-way non-reversible Repeatable across refreshes Repeatable across databases

19

Deterministic Masking - Why Production

Payroll

Expense Reporting

Non-Production

SSN

SSN

SSN

203-33-3234

111-23-1111

132-43-1451

323-22-2943

222-34-1345

233-86-2853

SSN

SSN

SSN

203-33-3234

328-47-3843

111-23-1111

323-22-2943

823-43-0837

222-34-1345

• Preserve referential relationships across databases • Ensure repeatability of test cases after refreshes • Enables incremental masking of data feeds

20

Deterministic Masking techniques

• Use Deterministic user-defined function for algorithmic or numerical based sensitive data – National Identifiers, credit card numbers – Derived mathematically – Guaranteed uniqueness

• Use “Substitute” mask primitive for context-based sensitive data – Names, addresses, product names, medical conditions – Derived through hash lookup of table with replacement values – Small probability of collisions

21

Mask Definition Associate Mask Formats with Identified Sensitive Columns • Automatic discovery and enforcement of referential integrity • Registration and enforcement of referential integrity g y when entered as related columns – Application-enforced referential integrity – Business-process B i b based dd data t relationships – Non-Oracle database based referential integrity

• Imported via XML generated via SQL against meta data

22

FIND ASSESS SECURE: Automate nonproduction systems through data masking

TEST

23

Test System Setup for Oracle Databases Creating Test Databases from Production Business T2

BusinessT1

T1

T2 data

T3

data T4

T5

Clone App Meta data DB dictionary data

Production DB

T4

T3 T5

App Meta data DB dictionary data

Test DB

• Enterprise Manager out-of-the-box workflows • RMAN-based clone-and-masking (Recommended) • Export-Import Export Import • Backup and Restore • Transportable Tablespace

Test System Setup for non-Oracle Databases Creating Test Databases from Production using Oracle Gateways Business

IBM DB2 Microsoft SQLServer Production DB

T2

T1

1 T3

data T4

T5

Clone

App Meta data DB dictionary data

BusinessT1 T2 data T4

T5

IBM DB2 Microsoft SQLSer er SQLServer Test DB

App Meta data DB dictionary data

Database gateway

2

Masking Process 1 Production data copied to Test 1. 2. Sensitive data copied to Staging 3. Sensitive data masked in Staging 4 Masked data copied from Staging to Test 4.

T3

3

BusinessT1 T2 data T4

4

T3 T5

Staging DB

Test System Setup for non-Oracle Databases Creating Test Databases from Production using Oracle Database Gateways

• Connect to non-Oracle Oracle Gateway Solutions Oracle Applications

Heterogeneous database gateway

from an Oracle env.

– Transparently access nonnon Oracle data using Oracle SQL

• Makes the non-oracle look like a remote Oracle – Location Transparency

• Target specific Gateways – DB2, SQL Server, Teradata,, Sybase, y , Informix IBM DB2 Microsoft SQLServer

Test System Setup for non-Oracle Databases Creating Test Databases from Production using Oracle Gateways

Configuration

Pre-Mask Copy

Mask

Post-Mask Copy

Test System Setup for non-Oracle Databases Creating g Test Databases from Production using g Oracle Gateways y

Configuration

• Configure gateways, database links for non-Oracle databases and synonyms for non-Oracle database tables • Create empty tables (no data) in Oracle staging database from non-Oracle non Oracle databases via Gateways

Test System Setup for non-Oracle Databases Creating g Test Databases from Production using g Oracle Gateways y

Configuration

Pre-Mask Copy

• Copy data from non-Oracle to Oracle in Pre-Mask step

Test System Setup for non-Oracle Databases Creating g Test Databases from Production using g Oracle Gateways y

Configuration

PreMask Copy

Mask

Post-Mask Copy

• Copy py masked data back to non-Oracle database •T Truncate t data d t in Oracle staging

Generate Mask Execution Script

• Generate PL/SQL-based masking script upon successful validation • Adds pre- and post-mask SQL • Ensure uniqueness can be maintained • Ensure formats match column data types • Check Space availability • Warn a about Check C ec Constraints Co st a ts • Check presence of default Partitions

31

Highest Performance Mask Execution Capture p and disable Constraints on “sensitive” table

Build mapping table containing original sensitive and masked values using masking routines

Drop Renamed table and mapping table

Rename “sensitive” table

Recreate masked table from original table replacing sensitive with masked values from mapping tables using CTAS

Restore constraints based on original table

• Column scalability – 215 columns masked across 100 tables – 60GB Database – 20 minutes

• Rows scalability – 100 million row table, 6 columns masked – Random Number – 1.3 hours

Linux x86 4 CPU: Single core Pentium 4 (Northwood) [D1]) Memory: 5.7G 32

Oracle Data Masking Appln. Admin

High performance, performance Workflow-based and Optimized for Oracle Identify Sensitive Information

Format Library

Associate mask format with sensitive information

DBA

Mask Definition Clone Prod to Staging

• Mask editing separate from mask execution • One-click cloneand-mask workflow

Execute Mask

High Performance

Rapid creation of test systems

Workflow-based

I Improved d security i via i segregation i off d duties i

Optimized for Oracle

Increased productivity in Oracle environments through integration with cloning cloning, flashback flashback.

Mask Execution Options Unique to Oracle Data Masking

• Comparing before & after values • To save the mapping tables to compare before and after values after a mask run during testing

• REDO and FLASHBACK log enabled • T To allow ll FLASHBACK DATABASE to t pre-masked k d state t t when h testing masking routines.

• Statistics refresh • To enable DBAs to run their own custom statistics generation routine

• Degree of parallelism • To optimize the performance of the mask execution based on the number of processors available

Data Center Friendly • Privilege g delegation g support • Allows Mask execution using sudo or PowerBroker

• Masking script directory specification • Allows DBAs to specify directory location when masking script should be generated

• NEW: Masking tasks command line support via EMCLI • Allow integration with any database automation processes • Generates mask script specific to cloned schema

35

FIND ASSESS SECURE TEST: Ensure the integrity g y of applications through testing

Application Quality Management (AQM) Solutions Install / Upgrade Application

Application Quality Management (AQM) • New solution offering for EM family of products • Consists of three products • Data Masking Pack • Application Testing Suite • Real Application Testing • Supports complete application management lifecycle – from p and testing g to development production deployment • Integration with EM to provide single console for end-to-end application management and testing • Integration with Oracle apps to help lower application deployment costs and improve performance and service levels

DEV Dev / Test System Provisioning

Functional Testing

TEST

PROD

Load Testing

Customer case study

Enterprise p Data Masking Solution

Nirmalya Das Lead DBA, Cisco

Business Drivers



Cisco data is required to keep private to be in compliance with external Privacy Laws and Regulations. For example, SOX, Payment Card Industry(PCI), Health Insurance Portability and Accountability Act (HIPAA). (HIPAA) • Visit Cisco privacy central for in-depth view of Privacy http://www.cisco.com/web/siteassets/legal/privacy.html • Privacy policies and guidelines



All other business data considered sensitive by Cisco e.g. credit card g g data,, personnel p and customer data numbers,, financial data,, engineering (Personally Identifiable Information (PII)).

Current Challenges



• •

Cisco did not have a uniform, standardized process or tool where private data, data classified as confidential or restricted, is disguised in the supporting instances to production. Cisco co could ld not ens ensure re that all pri private ate data is disg disguised ised and no e exposure pos re exists with regards to this data Risk to Cisco involving fraudulent activities, loss of customer trust, damage to brand, expensive notification, remediation efforts, and violations of various regulatory and statutory requirements resulting fines and penalties.

Project goals

• • • • • •

Facilitate the compliance of worldwide Data Privacy rules and regulations at Cisco Reduce the amount of individual manual analysis and effort required to manage and d duplicate plicate masked data among different ffunctional nctional areas Implement an enterprise-wide solution that standardizes a repeatable data-masking process and capabilities for non production environments Ensure masked data is ‘fit fit for use use’ Provide reliable assurance that private data will not be exposed in nonproduction environments Leverage investments in existing tools where possible

RFP-based Evaluation

• •

5 Vendors shortlisted through g RFP p process 2 selected for final evaluation • Technical proof of concept to demonstrate 5 Cisco-specified use cases • Other criteria: Customer references and total cost of ownership

Vendor

Use Case (60%)

Cost (30%)

Customer References (10%)

TOTAL1 (100%)

Vendor X

3.75 / 6

1.5 / 3

0.79 / 1

6.04

4.50 / 6

3/3

0.52 /1

8.00

Vendor Finalist

Notes 1 Total

possible score for each vendor is 10.

2 Oracle

customers were not able to provide the quantitative scoring. However, the customer reference checks have satisfactory results and therefore warrant Oracle with 0.5 of 1 score.

Data Masking Implementation At Cisco

Roles & Responsibilities Owner

Role

Masking Responsibilities

GBP Business

Data Steward

– Identify y & pprioritize sensitive data

GBP IT

Data Analyst

– Locate data in databases – Analyze environments – Initiate Demand Clearing – Create & push PVCS & Kintana Packages

Developer

– Create & test masking scripts

QA Reviewer

– Review & approve Kintana package

DBA

– Participate in Demand Clearing – Flag database in DBTS – Update cloning instructions for database – Set up p environment – Generate monitoring reports periodically

WIPRO

– Perform additional cloning instructions

ATS

ITRM 10/25/2010

Auditor

– Request & inspect compliance w/ process – Identify root cause of issues found – Work with others to resolve systemic issues Kryptos - Data Masking project – Refine process as it matures

45

End-to-End Process Flow Diagram

Data Masking g Solution Features • • • •

• • •

Initiallyy an Oracle database solution only. y Data Masking software tool is a module of Oracle Enterprise Manager, currently used to monitor all Oracle databases at Cisco. Masked data cannot be reversed to its original value. Data Masking tool provides • predetermined masking rules for common sensitive data eg ss#, credit cards • Ability to create customized masking rules Provides User Access Control to Data Masking tool Provides automated Change Control process to deploy production masking rules Creates a script p which masks data duringg the existingg database refresh processes p

Where are we now? • • • •

Oracle data masking solution implemented Phase 1 with HR IT and GPSS IT successful Playbook created for GBPs to implement masking in Phase 2 Continue to work with Oracle to resolve software issues - Open

Masked Data Elements in EBS application Ph Phase 1 GBP Human Resources (HRMS)

GPSS

Phase 2 Field to be Masked Registered Disable Flag Ethnic Origin Termination Reason Code Home Phone Base Salary Bonus/CAP Birth Date Country of Birth National Identifier Address ePM Rating Salary Sales Rep’s Annual Target (Local Currency) Sales Rep’s Annual Target (US Dollars) Sales Rep’s Rep s Annual Target by Territory (Local Currency) Sales Rep’s Annual Target by Territory (US Dollars) Sales Rep’s Quarterly Target (Local Currency) Sales Rep’s Quarterly Target (US Dollars)

GBP Finance (P2R, H2R)

Human Resources (HRMS)

Field to be Masked Emp. p Bank Account # Emp. Corporate Card # Emp. Divorce Status Emp. Nationality Emp. Citizenship Status Emp. Country Emp. Region Emp. Town of Birth Emp. Veteran Status Emp. Separation Package T Type

GGSG

Pay Grade Clearance Level Clearance Bonus

Marketing

Customer & Prospect email

(MODS, CM, SMCC, SMS, GIST)

GPSS

Commission Incentive Bonus Plan Code Bonus Status Bonus Description Bonus Type OMF Opportunity $

Phase 2 ƒ Extend the enterprise-wide masking solution to Finance, Marketing, & GGSG ƒ Mask sensitive data in a risk based, iterative approach ƒ Provide a framework to enable ongoing enterprise-wide adoption ƒ ITRM continued monitoring and engagement of GBPs

Life Before and After Data Masking

Process used for masking Data elements protected Databases protected Divisions using data masking

Before Masking

After Masking

Manual

Automated

Unknown

8 (in Phase 1)

1

8 (in Phase 1)

1

2 (in Phase 1) 5 (in Phase 2)

Business Benefits ƒ Increase Cisco’s assurance that private data is not unnecessarily exposed and exploited ƒ Reduce exposure risk due to private data leakage ƒ Reduce the risk of failing an ICS audit or government regulations ƒ Increased visibility and traceability where private data is stored and masked ƒ Reduce effort by the project teams during project initiative development and testing, where data masking is required ƒ Reduce duplicate p effort in definingg what data needs to be masked ƒ Increased standardization and uniformity of data masking process Cisco wide ƒ Financial benefit to Cisco through improved 'value for money' potential and better management of data usage