Data Masking Tool. Data Masking: Need. About the tool

Table of Contents Data Masking: Need ....................................................................................................................
Author: Monica Short
5 downloads 2 Views 684KB Size
Table of Contents Data Masking: Need ...................................................................................................................................... 2 About the tool ............................................................................................................................................... 2 Masking Algorithms ...................................................................................................................................... 3 1.

Vigenere Masking: ............................................................................................................................ 3

2.

Random Masking: ............................................................................................................................. 3

3.

Default Masking: ............................................................................................................................... 3

4.

Date Shift Masking: ........................................................................................................................... 4

5.

Fixed List Masking: ............................................................................................................................ 4

6.

Hash Masking: ................................................................................................................................... 4

How to use .................................................................................................................................................... 5 Steps to complete the template document .................................................................................................. 5 General_Details:........................................................................................................................................ 5 Configuration: ........................................................................................................................................... 5 Important points to remember..................................................................................................................... 7

Data Masking Tool Data Masking: Need Data masking is the process of obscuring (masking) specific data elements within data stores. It ensures that sensitive data is replaced with realistic but not real data. The goal is that sensitive customer information is not available outside of the authorized environment. Data masking is typically done while provisioning non-production environments so that copies created to support test and development processes are not exposing sensitive information.

About the tool Data masking tool is an excel-based data masking tool. The tool can be used to generate reusable Informatica Powercenter Java transformation. The user has an option of selecting one out of six available masking algorithms. The available masking algorithms are: Vigenere, Default, Random, Fixed List, Date Shift and Hash algorithms. These algorithms have been detailed later in the guide. Of these, Vigenere and Date Shift algorithms are decipherable i.e. given the masking key, the original data can be reproduced. The tool can also be used to create a reusable Java transformation for unmasking the code masked using the masking transformation. Only the columns that were masked using the reversible algorithms can be reversed into the original values. The configuration used to generate the masking transformation can be reused, but the generated XML will only contain the code for reversible algorithms and the other columns will be directly mapped without any transformation.

Masking Algorithms Following algorithms are available to be used for masking source data: 1. Vigenere Masking: In Vigenere masking, the key is an alphanumeric string, which is repeatedly used to mask the source data. Each letter in the key represents its position in the English alphabet, e.g. L is 12. The corresponding character of the input column is shifted forward by this number. So, if the first letter in the Key is L (12th letter in English alphabet) and the corresponding character in the input text is B we shift it by 12 places and it becomes M in the masked text. The attached Vigenere matrix explains the masking algorithm. If the key has numbers, then the input string is shifted by the same value to the right. So, if the letter in key is 2 the input string is B, then the output string will be D. Since different alphabets maybe used to replace the same alphabet in the source string, the same alphabet in the source may result into different alphabets in the masked data. This is a reversible masking algorithm. Example: Input Data: BEBACKATDAWN Key: LEMON Masked Data: MINOPVEFRNHR

Note: - Vigenere Masking masks a numeric input into numeric output, a small case input into a small case output and an upper case input into upper case output. This masking shouldn’t be used to mask special characters as unmasking can result unexpected output.

2. Random Masking: As the name suggests, random masking substitutes the source string or a part thereof with a random number, alphabet or alphanumeric string as per the inputs in the masking key rules file. It can be used to mask a part of the source string only as well. It is irreversible masking algorithm. 3. Default Masking: Default masking substitutes the source string or a part thereof with a default string/number as given in the masking key rules file. What part of the source string is to be substituted can be provided in the excel sheet generator. It is irreversible masking algorithm.

4. Date Shift Masking: Date shift masking is used to mask date fields. In this masking, the source date is increased or decreased as per the masking rules specifications. The format of the output is the same as the input. It is reversible masking algorithm. 5. Fixed List Masking: Fixed list masking can be used to mask a source column using a user provided dictionary file. The algorithm looks up into the provided dictionary file and replaces the source data with structurally similar data from the dictionary using hash algorithm. The dictionary file can be a list of any type and it should contain only one column containing the values that the input values are to be masked into. The file should also not have any header records as the code is not filtering out the first record. The algorithm is designed so that the same input string will always produce the same output, though not unique. This masking can be used where the masked data needs to maintain a structural integrity e.g. Zip, Address, Meaningful names, SSN, Credit card numbers etc. It is an irreversible masking algorithm. 6. Hash Masking: The Hash algorithm encrypts the source data using SHA-1(Secure Hash Algorithm). The source value is converted to bytes by the Hash function which is then converted to Hex before populating in the target. This should be used on with strings. The output of this algorithm is always a string of length 40. It is an irreversible masking algorithm. For example: Source: Dancing with the stars Output: 5c3347f65557bde0c244536cf3f7a4d19b71a3ef

How to use To use the data making tool, we need to follow the following general steps: 1. Fill in all the details in the macro enabled excel sheet (Template_Macro.xlsm) as per the masking requirements. Detailed steps are given in the next section. 2. Generate the masking, unmasking or both Java transformation XMLs. Explained in the next section. 3. Import the XMLs generated in step 2 into Informatica Powercenter Designer. Compile the java code in the transformation. 4. The transformation is ready to be plugged-in into any powercenter mapping for masking/unmasking.

Steps to complete the template document The macro enabled excel tool consists of two worksheets: General_Details and Configuration.

General_Details: In this tab, we need to provide the following attributes:    



Destination Repository Name: Provide the name of the powercenter repository in which the transformation is to be used. Destination Folder Name: Provide the folder name in which the transformation is to be imported. Owner: Provide the owner of the destination folder in Informatica Powercenter. Destination Path (XML): Provide the destination path in the local system in which the generated reusable Java transformation should be created. Please note that the path should exist in the system already, else the file is not generated and an error message pops up. This is a mandatory field. Java Transformation Name: Provide the name of the java transformation to be generated. The generated XML file will have the same name as well.

Configuration: This tab contains a configuration table in the left and two buttons on the right. The first button generates the masking transformation, while the second button generated the unmasking transformation for the same configuration. To generate the unmasking transformation, the same configuration used for masking is to be used, but the generated XML will only contain the code for reversible algorithms and the other columns will be directly mapped without any transformation. It is recommended that name of the transformation be changed from what was used to generate the masking algorithm or it would be overwritten. The visible columns present in the configuration table are :

Input Column Name: Provide the list of input column names as you would like them to appear in the java transformation. Masking Algorithm: Provide the masking algorithm for the column. In case the column doesn’t need to be masked leave all the other fields blank. Based upon the provided masking algorithm, the non-essential columns in the record are blackened out. No data is required for the blackened attribute. All the available masking algorithms are described in the Masking Algorithms section of this document. Format: Format is required only for the date shift algorithm. The column gives the format of the source data in java syntax. E.g. yyyyMMdd Start Pos: Starting position in the input string for masking. This is required for Random masking and Default masking. This is to be used if only a part of the source data is to masked. Leaving this and “End Pos” attribute blank would result into masking of the full text in the input column. End Pos: End position in the input string for masking. Required for Random masking and Default masking. This is to be used if only a part of the source data is to masked. Leaving this and “Start Pos” attribute blank would result into masking of the complete source data. Masking Key: Masking key attribute needs to be filled with values depending upon the masking algorithm used. The list is given below: 

  





Random: If the source data type is numeric, then range of the masked value is to be provided. This is mandatory only for numeric data types. If the data type of input strings is varchar, the key is not expected and should not be provided. E.g. 23-45: This would mean that the masked value lies between 23 and 45. 999-2011: This would replace the string to be substituted with any number between 999 and 2011. Hash: No value required (cell is blackened out) Fixed List: Server path for the dictionary file. (Mandatory) E.g. C:\Informatica\9.0.1\server\infa_shared\SrcFiles\Mask_dic.csv Vigenere: The masking key i.e. the text to be used to mask the source data. This can be an alphanumeric string of any length. The key can contain alphabets and numbers but not any special characters. Default: The default value which will replace the part/whole of the source data. If the length of the part of the input string to be replaced and the length of the masking key differ, then the masked string will have different length than the input string. Date Shift: The value by which the date is to be shifted. The first two characters of the masking key denote the attribute to be changed. The codes to be used are: Year: YY, Month: MM, Days: DD, Minutes: MI and Seconds: SS. E.g. To increase the year by 12, masking key to be used: YY12 To decrease the year by 12, masking key to be used: YY-12

To increase the month by 12, masking key to be used: MM12 To increase the day by 12, masking key to be used: DD12 Output Port: Output port value is automatically generated in the sheet. “o_” is prefixed in the input column name. This is the output port which holds the masked/unmasked value for the input ports in java transformation . For the columns where no masking is done, the output port contains the source value without any change. A sample configuration is shown in the figure below:

Important points to remember 1. The same configuration details can be used to generate the masking transformation as well as the unmasking transformation. Though, it is suggested to change the transformation name in the General_Details worksheet, so as to avoid the masking transformation from getting overwritten. 2. For the columns which needn’t be masked, only the input column needs to be provided and all other attributes are to be left blank. 3. For Random masking, range value in masking key attribute is a must. The format is as described above. 4. Dictionary file path should be provided for all the source columns which use “Fixed List” masking, even if the same dictionary file is being used. 5. The unmasking transformation deciphers only the columns which are masked using Vigenere and Date Shift algorithms and directly passes the source column value into the target columns for the columns where other algorithms are used. Though, the same configuration can be used to generate the unmasking transformation as well. 6. The destination path for the generated transformation should be already present. Otherwise, the transformation is not generated.