Drop the following components from the Palette onto the design
workspace: tFixedFlowInput,
tDataQualityRules,
tExtractJSONFields, tFilterRow
and three tLogRow components.
Using Main links, connect:
tFixedFlowInput to
tDataQualityRules
tDataQualityRules to one
tLogRow
tExtractJSONFields to
tFilterRow
Using Filter links, connect
tFilterRow to another
tLogRow.
Using Reject links, connect:
tDataQualityRules to
tExtractJSONFields
tFilterRow to the last
tLogRow
Information noteTip: If you want valid data only, connect
tDataQualityRules to only one
tLogRow component using a Main
link. You do not need the tExtractJSONFields and
tFilterRow components.
Configuring the tFixedFlowInput component
Procedure
Double-click tFixedFlowInput to open its Basic
settings view.
Select Built-In as Schema and
click […] next to Edit
schema.
To add columns, click and click OK.
Select Use Inline Content (delimited file).
Enter your data. You can use the data from the dataset in the following file: scholarship_programs.zip.
Configuring the tDataQualityRules component
Before you begin
You have created at least one data quality rule in Talend Cloud Data Stewardship.
Information noteTip:Talend Cloud Data Stewardship is used in this example. You can also enter the URL of the hybrid version
of Talend Data Stewardship 8.0 R2022-07 and greater, and Talend Cloud Data Inventory.
About this task
You need the
Rules - View permission to retrieve the rules.
For more information, see the predefined user roles for the
app you are using.
Procedure
Select Built-in as Output
schema.
To retrieve the schema from the input component, click Sync
columns.
From the Application drop-down list, select Data
Stewardship.
Enter the URL to Talend Cloud Data Stewardship:
"https://tds.us.cloud.talend.com/rulerepository/api/v1".
To retrieve the data quality rules, click Refresh.
If you update the data quality rules in Talend Cloud Data Stewardship, click Refresh to retrieve the latest version.
Click [...] next to Configure DQ
rules.
Select the rule and associate the rule variables to the input columns.
You can associate the data types from Talend Studio
with some data types from Talend Cloud Data Stewardship.
See Associating data types from the properties.
If no rules or input columns are available, verify that:
Data quality rules have been retrieved in DQ rule library
timestamp.
The input schema is correct.
To validate the configuration, click OK.
Configuring the tExtractJSONFields component
About this task
The tExtractJSONFields component lets
you get more readable data so you can easily filter them using the
tFilterRow component.
Procedure
Select Built-In as Property Type
and Schema.
Click [...] next to Edit
schema.
The INVALID_STATUS column has been added to the input
schema. You can rename the column in the output schema.
Select JsonPath and the API
version2.1.0.
Enter the wildcard * in the Loop Jsonpath
query field to select all elements.
Select INVALID_STATUS from the JSON
Field drop-down list.
Define the
JSON query in the
Mapping table.
The columns in the schema are listed in the Column field of the
Mapping table.
In the Json query column of the
Mapping table, enter
executionResults[0].status for
Status.
executionResults[0].status is the path to
status where [0]
represents the execution details of the first rule. If you need the execution
details of the second rule, use [1], of the third
rule, use [2] and so on. This makes easier to filter the data using the tFilterRow
component.
Example
The INVALID_STATUS is:
[{"executionResults":[{"ruleName":"ScholarshipProgram_Else","inputColumn":["EffectiveOn","Code","Program"],"status":"Invalid"}]}].
tExtractJSONFields transforms it into
Invalid.
Configuring the tFilterRow component
About this task
The tFilterRow component in this
scenario creates two categories:
NE: The rule cannot be executed on the data.
Invalid: The data fulfill the condition but not the validation expression.
Procedure
In Conditions, click .
In InputColumn, select
Status.
In Value, enter "NE".
Information noteRemember: If you add more than one column, define the logical
operator above the table:
And/Or. By default,
And is selected.
Configuring the tLogRow components
Procedure
Double-click the tLogRow components to open their
Basic settings view.
Select Built-in as Schema and
click Sync columns.
In the Mode area, select Table (print values
in cells of a table).
Saving and executing the Job
Procedure
Save your Job and press F6 to execute it.
Results
The tLogRow from the Main link returns the
students whose:
Status code is 2632 and the date is
2021-09-01.
OR
Status code is not 2632 and the program ends with
10AB or 10AC.
The NE status means that the rule cannot be executed on the data.
In this scenario, the status code is abcd but the rule condition
is is equal to 2632.
The other invalid data mean that:
Status code is 2632 but the date is not
2021-09-01
or
Status code is not 2632 and the program does not end with
10AB or
10AC.