Data Masking effects
Text and semantic types
For textual data, Talend Data Preparation automatically suggests either one of the predefined semantic types, one of your custom semantic types, or the Text type. In the case of the predefined and custom semantic types, they can be based either on a regular expression, or a dictionary of values.
The following table lists the available masking routines for a column with the Text type, or any of the predefined or custom semantic types, and their effects on the value Talend in 2018 is awesome for example.
Masking routine | Description | Parameters | Output |
---|---|---|---|
Semantic masking | For regular expression-based semantic types, the function will
generate random records that correspond to the regular expression pattern. Information noteNote: Semantic types built with regular expressions that are not
compatible with the
dk.brics.automaton
library do not
support semantic masking, and every character of the record is randomly
replaced.
|
Masking mode: Random or Repeatable | Äåòçôî ëð 1889 òn äipïåvu |
For dictionary-based semantic types, the function will randomly replace the records with values extracted from the dictionary used to create the semantic type in the first place. | |||
Keep characters between two positions | All the characters included in the selected interval remain as is, while the ones outside the interval are deleted. | Beginning index: 11 | 2018 is awesome |
End index: 25 | |||
Generate from Char Pattern | A records with random characters will be created from the pattern of your choice. | Character pattern: aaaaaa 9999 aaaaaaa | õaßayè 8908 æluäco |
Masking mode: Random or Repeatable | |||
Remove characters between two positions | All the characters included in the selected interval are removed, while the ones outside the interval remain as is. | Beginning index: 7 | Talend is awesome |
End index: 14 | |||
Replace all | All the characters are replaced with the substitute of your choice. | Replacement: x | xxxxxxxxxxxxxxxxxxxxxxxxx |
Masking mode: Random or Repeatable | |||
Replace all digits | All the digits are replaced with the substitute of your choice. Letters are kept as is. | Replacement: 9 | Talend in 9999 is awesome |
Masking mode: Random or Repeatable | |||
Replace all letters | All the letters are replaced with the substitute of your choice. Digits are kept as is. | Replacement: y | yyyyyy yy 2018 yy yyyyyyy |
Masking mode: Random or Repeatable | |||
Replace characters between two positions | All the characters included in the selected interval are replaced, while the ones outside the interval remain as is. | Beginning index: 1 | aaaaaa in 2018 is awesome |
End index: 6 | |||
Replacement: a | |||
Masking mode: Random or Repeatable | |||
Replace first n characters | Replaces the first n characters with the substitute of your choice, while the following ones remain as is. | Number of characters: 17 | @@@@@@@@@@@@@@@@@ awesome |
Replacement: @ | |||
Masking mode: Random or Repeatable | |||
Replace last n characters | Replaces the last n characters with the substitute of your choice, while the previous ones remain as is. | Number of characters: 10 | Talend in 2018 !!!!!!!!!! |
Replacement: ! | |||
Masking mode: Random or Repeatable | |||
Keep first n digits and replace following ones | Keep the first n digits as is and replaces subsequent ones with random digits. Non-digits characters remain as is. | Number of digits: 1 | Talend in 2436 is awesome |
Masking mode: Random or Repeatable | |||
Keep last n digits and replace previous ones | Keep the last n digits as is and replaces previous ones with random digits. Non-digits characters remain as is. | Number of digits: 2 | Talend in 1618 is awesome |
Masking mode: Random or Repeatable |
Numeric values
The following table lists the available masking routines for a column containing numeric values, with the Integer or Decimal type, and their effect on the value 21803 for example.
Masking routine | Parameters | Output |
---|---|---|
Replace with random value | Maximum variation (%): 10 | 21499 |
Masking mode: Random or Repeatable | ||
Generate value between two values | Minimum value: 20000 | 21876 |
Maximum value: 22000 | ||
Masking mode: Random or Repeatable |
Dates
The following table lists the available masking routines for a column with the Date semantic type, and their effects on the value 05/04/2018 for example.
Masking routine | Parameters | Output |
---|---|---|
Replace with random date | Maximum variation (in days): 365 | 23/11/2017 |
Masking mode: Random or Repeatable | ||
Keep year and set day and month to 01/01 | 01/01/2018 |