Standardizing values using dictionaries
Standardizing values in cells with autocompletion
When editing a cell in a column which semantic type is based on a dictionary or a compound type, the application will automatically suggest values that are part of this dictionary as you type, to ensure that the whole column follows the same standard.
In this example, you are working on a dataset containing customer data, including US state codes. Because all the occurrences of Texas have not been correctly entered in the two-letter code format, part of the data is considered invalid in the State column, as shown by the quality bar. You are going to isolate the rows containing errors, edit one of the cells with autocompletion, and apply the change to all identical cells to fix the whole column in a single operation.
Before you begin
Procedure
Results
Automatically standardizing values in a column
You can use the Standardize value (fuzzy matching) to find the closest valid value for invalid values within a column.
The function checks the invalid data contained in a column against the current semantic type, and retrieves the correct values, if the selected matching threshold is achieved. This function is only available if the semantic type is based on a dictionary of values or compound type present by default in Talend Data Preparation, or that you have created using Talend Dictionary Service. For more information on how to create custom semantic types, or edit the existing ones, see Enriching the semantic types libraries.
Let's say that you have to work on a dataset containing various information on customers based in the United-States, such as their names, email addresses and the State they live in.
As you can see in the header of the State column, the data has been recognized as US States, but as shown in the quality bar, some of the entries contain invalid names.
In a single action, you will fix those invalid values, and replace them with the correct value from the US State dictionary, or semantic type, that contains an exhaustive list of all the US States.
Procedure
Results
The incorrect values have been standardized, using the dictionary of US States.