tDataprepRun Standard properties for a cloud deployment
These properties are used to configure the cloud version of tDataprepRun running in the Standard Job framework.
The Standard tDataprepRun component belongs to the Talend Data Preparation family.
To use the cloud deployment version of the tDataprepRun component, select Cloud from the Deployment drop-down list of the component's basic settings.
The component in this framework is available in all subscription-based Talend products.
Basic settings
Deployment |
From the drop-down list, select your current deployment type, Cloud in this case, and click Apply. With this setting, your data will not transit in Talend Cloud, and the preparation will be run locally. |
Property Type |
Either Built-in or Repository.
|
Data Preparation server |
From the drop-down list, select the data center that corresponds to your region:
|
Authentication type |
|
Login |
Type the email address that you use to log in the Talend Cloud Data Preparation application. This field is only available if you select Basic from the Authentication type drop-down list. |
Password |
Click the [...] button and type your user password for the Talend Cloud Data Preparation application, between double quotes. This field is only available if you select Basic from the Authentication type drop-down list. |
PAT |
Click the [...] and enter your Personal Access Token between double quotes. A PAT can be retrieved from the Profile preferences page of Talend Cloud. See Generating a Personal Access Token for more information. This field is only available if you select PAT from the Authentication type drop-down list. |
Dynamic preparation selection |
Select this check box to define a preparation path and version using context variables or manually entered values. The preparation will be dynamically selected at runtime. |
Preparation path |
Use a context variable to define a preparation path. Paths with or without the initial / are supported. This field is only available if you select the Dynamic preparation selection check box. |
Preparation version |
Use a context variable to define the version of the preparation to use. Preparation versions are referenced by their number. As a consequence, to execute the version #2 of a preparation for example, the expected value is "2". To use the current version of the preparation, the expected value is "Current state". This field is only available if you select the Dynamic preparation selection check box. |
Preparation identifier |
Use the [...] button to select a preparation from the list. |
Preparation version |
Use the [...] button to select a version from the list. Preparation versions are referenced by their number, starting from the oldest in ascending order. After selecting a version, the corresponding preparation ID will display in the Preparation version field. If you select the current version, it will show as HEAD in the field. |
Schema and Edit Schema |
A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields. Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:
Click Sync columns to retrieve the schema from the previous component connected in the Job. |
Guess Schema |
Click this button to retrieve the schema from the preparation defined in the Preparation identifier field. |
Advanced settings
tStatCatcher Statistics |
Select this check box to gather the Job processing metadata at the Job level as well as at each component level. |
Use Dictionary |
Select this check box if your preparation includes steps that modify the semantic types of the columns or functions that use semantic types. When selected, you need to configure the following parameters:
|
Usage
Usage rule |
This component is an intermediary step. It requires an input flow as well as an output. |
Best practice |
When using dynamic mode, any potential lookup dataset called as dependency of the Job has to be from the same technology as the source dataset of the preparation, or the Job will fail. A good practice to work around this is to directly specify the dependencies in advance using a tLibraryLoad component for example and connect it to the Job using the tDataprepRun with an On Subjob Ok link. |
Limitations |
|