Masking Medicare beneficiary identifiers
Using the tPatternMasking component, you can replace personally identifiable information, such as Medicare Beneficiary Identifiers (MBI), with realistic values in a consistent manner.
This scenario applies only to Talend Data Management Platform, Talend Big Data Platform, Talend Real-Time Big Data Platform, Talend MDM Platform, Talend Data Services Platform, Talend MDM Platform and Talend Data Fabric.
- A digit in the 1 to 9 range.
- A letter in the A to Z range (minus S, L, O, I, B, Z).
- A digit or a letter in the A to Z range (minus S, L, O, I, B, Z).
- A digit in the 0 to 9 range.
- A letter in the A to Z range (minus S, L, O, I, B, Z).
- A digit or a letter in the A to Z range (minus S, L, O, I, B, Z).
- A digit in the 0 to 9 range.
- A letter in the A to Z range (minus S, L, O, I, B, Z).
- A letter in the A to Z range (minus S, L, O, I, B, Z).
- A digit in the 0 to 9 range.
- A digit in the 0 to 9 range.
For example, 1EG4-TE5-MK73 is a valid MBI.
- The tFixedFlowInput component generates MBIs.
-
The tPatternMasking component replaces the original MBIs with random digits or letters from a set of named values, or a random digit from a specified range.
- The tLogRow component outputs the substitute dataset.
Setting up the Job
Procedure
- Drop the following components from the Palette onto the design workspace: tFixedFlowInput, tPatternMasking and tLogRow.
- Connect the three components together using links.
Configuring the input component
Procedure
Configuring the masking operations
The alpha_values.zip file contains the allowed alphabetic values: all letters in the A to Z range (minus S, L, O, I, B, Z). The alphanum_values.zip file contains the allowed alphanumeric values: the values from alpha_values.zip and digits.
Before you begin
- You downloaded and extracted the alpha_values.zip and alphanum_values.zip.
- You defined context variables to the alpha_values.csv and alphanum_values.csv files. For further information, see Defining context variables for a Job or Route.
Procedure
Configuring the output component and executing the Job
Procedure
Results
The tPatternMasking component alters the values from the input data and outputs original and substitute records.
The input data has been altered but the output data looks real and consistent. The substitute data is still usable for non-production purposes.
- The first character is replaced with a digit from the 1 to 9 range, as defined in the tPatternMasking properties.
- The second, fifth, eighth, and ninth characters are replaced with a letter from the list of authorized values defined in the enumeration file.
- The third and sixth characters are replaced with one of the authorized alphanumeric values defined in the enumeration file.
- The fourth and seventh characters are replaced with a digit from the 0 to 9 range, as defined in the tPatternMasking properties.
- The last two characters are replaced with a number from the 0 to 99 range, as defined in the tPatternMasking properties.
- The input uses dashes as separators and they remain unchanged in the output.
The tPatternMasking component outputs null for 0EF6-T-F4-AC44 because this value is invalid: the first character, 0, is out of the specified range ("1,9").