tDataQualityRules
Validates data using data quality rules by connecting Talend Studio to:
- Talend Cloud Data Stewardship. For more information, see What is a data quality rule? in the Talend Cloud Data Stewardship User Guide.
- The hybrid version of Talend Data Stewardship 8.0 R2022-07 and greater. For more information, see the What is a data quality rule? in the Talend Data Stewardship User Guide. For more information on the hybrid version, see What is hybrid for Talend Cloud?.
- Talend Cloud Data Inventory. For more information, see the What is a data quality rule? in the Talend Cloud Data Inventory User Guide. You can connect to this app in Talend Studio 8.0 R2023-06 and greater.
A data quality rule is a set of business requirements which defines the values your data must comply with.
To retrieve the data quality rules into a JAR file, tDataQualityRules connects Talend Studio to one of the app listed above. Talend Studio uses the retrieved library so you can apply data quality rules to your data. For more information on how the component works, see tDataQualityRules local and Cloud/hybrid process.
- Valid:
- Valid: The data fulfill the condition and the validation expression or the data fulfill the alternative validation expression only.
- Not applicable (NA): The data do not fulfill the condition. The rule cannot be applied to the data.
These data follow the Main flow.
- Invalid:
- Invalid: The data fulfill the condition but not the validation expression.
- Not executable (NE): The rule cannot be executed on the data.
These data follow the Reject flow.
This component is not shipped with your Talend Studio by default. You need to install it using the Feature Manager. For more information, see Installing features using the Feature Manager.
You need the Qlik Talend Cloud Enterprise Edition to use the tDataQualityRules component.
- In local mode, Apache Spark 3.0 and greater.
- Cloudera Data Engineering service with Apache Spark 3.1 or 3.2.
tDataQualityRules Standard properties
These properties are used to configure tDataQualityRules running in the standard Job framework.
The standard tDataQualityRules component belongs to the Data Quality family.
Basic Settings
Output schema and Edit schema |
A schema is a row description that defines the number of fields (columns) to be processed and passed on to the next component. Click Sync columns to retrieve the schema from the previous component connected in the Job. Select the Schema type:
Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available: View schema: choose this option to view the schema only. Change to built-in property: choose this option to change the schema to Built-in for local changes. Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. The supported types are: Boolean, Date, Double, Dynamic, Float, Integer, Long, Short, and String. |
Application | Select the application from which you want to retrieve the data quality
rules. This option is available from Talend Studio 8.0 R2024-01 onwards. |
URL |
Information noteImportant:
You need the
Rules - View permission to retrieve the rules.
For more information, see the predefined user roles for the
app you are using.
Enter the URL of the app selected from the
Application drop-down list. When the URL does
not match the selected app, the Job may fail. The following URLs are
supported:
|
Token | Enter your personal access token. To generate one, see https://help.talend.com/r/en-US/Cloud/management-console-user-guide/cloud-access-token. |
DQ rule library timestamp | After you entered the URL and token, click
Refresh. If the URL and token are correct, the data quality rules are retrieved into Talend Studio: yyyy-MM-dd hh:mm:ss (library_number). When you update the data quality rules in the Cloud or hybrid application, click Refresh to retrieve the latest version. |
Configure DQ rules | Associate the variables of the rule with the input data. The rules are
retrieved from the library:
If no rules or input columns are available, verify that:
You can associate the data types from Talend Studio with some data types from Talend Cloud Data Stewardship or hybrid Talend Data Stewardship. See Associating data types below. To apply more rules,
click |
Associating data types
Rule variable from the app* | Input column (from Talend Studio) |
---|---|
Number | Double, Float, Integer, Long, Short, and String |
Boolean | Boolean |
Text | String |
Date | Date |
Any data types supported in the application | Dynamic |
* You can enter the URL of:
|
Advanced Settings
tStatCatcher statistics |
Select this check box to gather the Job processing metadata at the Job level as well as at each component level. |