Validating data using data quality rules and filtering the invalid data
The Job in this scenario uses:
- The tDataQualityRules component to validate data using a data quality rule from Talend Cloud Data Stewardship.
- The tFilterRow component to filter the non-executable data.
The Job uses the following data quality rule:
You can download and extract the following file: scholarship_programs.zip.
Setting up the Job
Procedure
Configuring the tFixedFlowInput component
Procedure
Configuring the tDataQualityRules component
Before you begin
- You have created at least one data quality rule in Talend Cloud Data Stewardship.
- You have a personal access token. To generate one, see https://help.talend.com/r/en-US/Cloud/management-console-user-guide/cloud-access-token.Information noteTip: Talend Cloud Data Stewardship is used in this example. You can also enter the URL of the hybrid version of Talend Data Stewardship 8.0 R2022-07 and greater, and Talend Cloud Data Inventory.
About this task
Procedure
Configuring the tExtractJSONFields component
About this task
Procedure
Configuring the tFilterRow component
About this task
- NE: The rule cannot be executed on the data.
- Invalid: The data fulfill the condition but not the validation expression.
Procedure
Configuring the tLogRow components
Procedure
Saving and executing the Job
Procedure
Save your Job and press F6 to execute it.
Results
- Status code is 2632 and the date is
2021-09-01.
OR
- Status code is not 2632 and the program ends with 10AB or 10AC.
.-------+----+-------+-----------+-----------.
| main |
|=------+----+-------+-----------+----------=|
|ID |Code|Program|EffectiveOn|Nationality|
|=------+----+-------+-----------+----------=|
|1002145|2632|2310AB |2021-09-01 |Canada |
|1002146|1132|3610AC |2021-10-01 |US |
|1002149|2632|3711AB |2021-09-01 |US |
|1002150|2632|2310AB |2021-09-01 |US |
|1002151|2632|3711AB |2021-09-01 |US |
|1002152|2632|2310AB |2021-09-01 |US |
|1002153|2632|3711AB |2021-09-01 |US |
|1002155|2675|3610AC |2021-10-01 |US |
|1002156|2675|2310AB |2021-10-01 |Canada |
|1002158|2632|2310AB |2021-09-01 |US |
|1002159|1132|2310AB |2021-10-01 |US |
|1002162|2675|2310AB |2021-10-01 |US |
|1002163|2632|3610AC |2021-09-01 |US |
'-------+----+-------+-----------+-----------'
.-------+----+-------+-----------+-----------+------.
| NE |
|=------+----+-------+-----------+-----------+-----=|
|ID |Code|Program|EffectiveOn|Nationality|Status|
|=------+----+-------+-----------+-----------+-----=|
|1002161|abcd|3611AC |2021-09-01 |US |NE |
'-------+----+-------+-----------+-----------+------'
The NE status means that the rule cannot be executed on the data. In this scenario, the status code is abcd but the rule condition is is equal to 2632.
The other invalid data mean that:
- Status code is 2632 but the date is not
2021-09-01
or
- Status code is not 2632 and the program does not end with
10AB or
10AC.
.-------+----+-------+-----------+-----------+-------+----------------------------------. | OtherInvalid | |=------+----+-------+-----------+-----------+-------+---------------------------------=| |ID |Code|Program|EffectiveOn|Nationality|Status |errorMessage | |=------+----+-------+-----------+-----------+-------+---------------------------------=| |1002147|2632|3610AC |2022-01-01 |Canada |Invalid|Status.compareTo("NE") == 0 failed| |1002148|2632|2310AB |2022-01-01 |Canada |Invalid|Status.compareTo("NE") == 0 failed| |1002154|2632|3610AC |2022-01-01 |Canada |Invalid|Status.compareTo("NE") == 0 failed| |1002157|2632|3610AC |2022-01-01 |Canada |Invalid|Status.compareTo("NE") == 0 failed| |1002160|2632|2310AB |2022-01-01 |Canada |Invalid|Status.compareTo("NE") == 0 failed| |1002164|2675|3611AC |2021-09-01 |US |Invalid|Status.compareTo("NE") == 0 failed| '-------+----+-------+-----------+-----------+-------+----------------------------------'
When you save the Job, a JAR file containing the rule definition
is created
in:
studio-path\configuration\.m2\repository\org\talend\libraries\rules-number