Removing duplicate values
The profiling results of the column analysis show that there are some duplicate records in the email and phone columns. Check Showing analysis results for detail.
From the analysis results, you can generate out-of-box Jobs that separate unique from duplicate records in the selected columns. Such Jobs output all the duplicates in a reject delimited file by default, and writes the unique values in the database used in the analysis.
You can follow the same procedure to remove duplicates from the Email or Phone columns.
Before you begin
-
You have opened the Profiling perspective in Talend Studio.
-
You have created and executed the column analysis. For further information, see Identifying anomalies in data.
Procedure
Results
Duplicate values are written to the output file and unique records are written to a new table in the gettingstarted database in MySQL.