Talend Cloud Pipeline Designer concepts

This diagram and these definitions will help you understand the main concepts of Talend Cloud Pipeline Designer.

Relationship between connections, datasets, and other entities

Remote Engine Gen2 : A Remote Engine Gen2 is a secure execution engine on which you can safely run pipelines. It allows you to have control over your execution environment and resources as you are able to create and configure the engine in your own environment (Virtual Private Cloud or on premises).
A Remote Engine Gen2 ensures:
- Data processing in a safe and secure environment as Talend never has access to the data and resources of your pipelines
- Optimal performance and security by increasing the data locality instead of moving large data to computation
Cloud Engine for Design : The Cloud Engine is a built-in runner that allows users to easily design pipelines without having to set up any processing engines. With this engine you can run two pipelines in parallel. For advanced processing of data it is recommended to install the secure Remote Engine Gen2.
Connection: Connections are environments or systems where datasets are stored, including databases, file systems, distributed systems or platforms, etc. The connection information to these systems only needs to be set up once since they are reusable.
Dataset: Datasets are collections of data. They can be database tables, file names, topics (Kafka), file paths (HDFS), etc. You also have the possibility to create test datasets that you enter manually and store in a test connection, and even import local files as datasets. Several datasets can be connected to the same system (One-to-many connectivity) and are stored in reusable connections.
Pipeline: Pipelines consist of a process (similar to a Talend Job) that keeps on listening to incoming data, and a pipe where data comes from a source, the dataset, and is sent to a destination.
Pipelines can be:
- Batch, or bounded, which means the data is collected and the pipeline stops once all data is processed
- Streaming, or unbounded, which means the pipeline never stops reading data unless you stop it
Processor: Processors are components that you can add to your pipelines in order to transform your incoming batch or streaming data and return the transformed data to the next step of the pipeline.
Sample: Your data will be visible in the form of a sample, retrieved from the dataset metadata.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here