Skip to main content Skip to complementary content

tDataprepRun Standard properties for a cloud deployment

These properties are used to configure the cloud version of tDataprepRun running in the Standard Job framework.

The Standard tDataprepRun component belongs to the Talend Data Preparation family.

To use the cloud deployment version of the tDataprepRun component, select Cloud from the Deployment drop-down list of the component's basic settings.

The component in this framework is available in all subscription-based Talend products.

Information noteWarning: The cloud version of tDataprepRun has a large amount of dependencies. When using this component in ESB cases, if the build type of the Job is set to Talend Runtime (OSGi), you need to increase the memory size by specifying -Xmx2048m or bigger in the .ini file under the Talend Studio installation directory according to your operating system. Otherwise the execution of the Job might fail with the OutOfMemoryError exception.

Basic settings

Deployment

From the drop-down list, select your current deployment type, Cloud in this case, and click Apply.

With this setting, your data will not transit in Talend Cloud, and the preparation will be run locally.

Property Type

Either Built-in or Repository.

  • Built-In: You create and store the schema locally for this component only.
  • Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Data Preparation server

From the drop-down list, select the data center that corresponds to your region:

  • EU
  • US
  • US-West
  • AU
  • AP
  • Custom (to enter the endpoint manually)

Authentication type

  • Availability-noteDeprecated
    Basic, to log in to your Talend Cloud Data Preparation instance using your login and password information.
  • PAT, to log in to your Talend Cloud Data Preparation instance using a Personal Access Token.
Availability-noteDeprecated

Login

Type the email address that you use to log in the Talend Cloud Data Preparation application.

This field is only available if you select Basic from the Authentication type drop-down list.

Availability-noteDeprecated

Password

Click the [...] button and type your user password for the Talend Cloud Data Preparation application, between double quotes.

This field is only available if you select Basic from the Authentication type drop-down list.

PAT

Click the [...] and enter your Personal Access Token between double quotes. A PAT can be retrieved from the Profile preferences page of Talend Cloud. See Generating a Personal Access Token for more information.

This field is only available if you select PAT from the Authentication type drop-down list.

Dynamic preparation selection

Select this check box to define a preparation path and version using context variables or manually entered values. The preparation will be dynamically selected at runtime.

Preparation path

Use a context variable to define a preparation path. Paths with or without the initial / are supported.

This field is only available if you select the Dynamic preparation selection check box.

Preparation version

Use a context variable to define the version of the preparation to use. Preparation versions are referenced by their number. As a consequence, to execute the version #2 of a preparation for example, the expected value is "2". To use the current version of the preparation, the expected value is "Current state".

This field is only available if you select the Dynamic preparation selection check box.

Preparation identifier

Use the [...] button to select a preparation from the list.

Preparation version

Use the [...] button to select a version from the list.

Preparation versions are referenced by their number, starting from the oldest in ascending order. After selecting a version, the corresponding preparation ID will display in the Preparation version field. If you select the current version, it will show as HEAD in the field.

Schema and Edit Schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Click Edit schema to make changes to the schema. If the current schema is of the Repository type, three options are available:

  • View schema: choose this option to view the schema only.

  • Change to built-in property: choose this option to change the schema to Built-in for local changes.

  • Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion.

    If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the Repository Content window.

Click Sync columns to retrieve the schema from the previous component connected in the Job.

Guess Schema

Click this button to retrieve the schema from the preparation defined in the Preparation identifier field.

Advanced settings

tStatCatcher Statistics

Select this check box to gather the Job processing metadata at the Job level as well as at each component level.

Use Dictionary

Select this check box if your preparation includes steps that modify the semantic types of the columns or functions that use semantic types. When selected, you need to configure the following parameters:

  • Personal access token: Use the [...] button to enter a personal access token, that you can retrieve in the Profile preferences page of Talend Cloud, between double quotes.

    This field is only available If you have selected Basic as Authentication type.

  • Dictionary version: Click the [...] button to select the latest version from the list.

Usage

Usage rule

This component is an intermediary step. It requires an input flow as well as an output.

Best practice

When using dynamic mode, any potential lookup dataset called as dependency of the Job has to be from the same technology as the source dataset of the preparation, or the Job will fail. A good practice to work around this is to directly specify the dependencies in advance using a tLibraryLoad component for example and connect it to the Job using the tDataprepRun with an On Subjob Ok link.

Limitations

  • If the dataset is updated after the tDataprepRun component has been configured, the schema needs to be fetched again.
  • The cloud deployment mode will not work when using a Talend Runtime or a Cloud Engine for Design.
  • The cloud deployment is not compatible with ESB.
  • Preparation created prior to the R2024-12 monthly release need to be opened at least once in the Talend Cloud Data Preparation interface. This ensures that the preparations are persisted in the Talend Cloud Data Preparation database, and can be used in Talend Studio Jobs regardless of the state of the Remote Engine Gen2.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!