Talend Managed / Customer Managed decision tree and use cases
The two main approaches (Talend Managed / Customer Managed) for choosing how to deploy and execute the Talend artifacts that you build with the platform are described here: Talend Managed/Serverless versus Customer Managed.
You can find a decision tree and use case sections per below to refer to your specific requirements.
The diagram below describes the main decision points for deciding whether the Talend Managed or Customer Managed approaches best fit your needs. You may adopt those principles into many other fine-grained environmental variations.
Batch ETL and Event-driven processes
Batch processing is useful when you need to periodically ingest large amounts of data in a limited time window, but configuring and scaling a cluster of virtual machines to process complex batch Jobs can be difficult to manage. Increasingly the trend here is for push-down processing – ELT for databases or executing a Job on a big data cluster.
Within the Event-driven concept, data is processed when an event happens, such as a file arrival, data update, and so on. This process is often used by scenarios such as e-commerce sites to process customer orders with downstream applications, financial institutions processing their partners’ transaction files, and so on. – Consider this example: a financial company hosts a Datalake on AWS S3 that allows their partner companies to upload transaction files into that location, and the partner files will be processed by downstream data integration ETL processes to update the financial company’s own ERP system. This process can be running immediately via an event-driven framework or it can be done on scheduled based batch pattern – it all depends on financial company’s business and regulatory requirements.
- Operational efficiency: Use Talend native managed services/features including Monitoring, scheduling, deployment, and so on, so it fits natively into the Batch process.
- Time-to-value: Quick to set up environment and development tools.
- Lower maintenance costs: No costs for installing or supporting Talend Cloud native services (such as TMC interfaces).
- Vertical scalability: If you need more capacity for a single Job, you can increase the size of your remote engine machines.
- Horizontal scalability (multiple Jobs): Multiple engines and utilize the ‘virtual server’ concept for round-robin allocation
- Native horizontal scalability: Generally, with a Talend ETL Job, you have one execution instance of it running at any given time. However, certain use cases do allow for multiple instances of the same Job to run in parallel. Of course, you may also have multiple Jobs, placing different load demands on your compute infrastructure at different times.
- Zero Talend server applications: No need to host Talend server applications for Job executions.
For more information, see the following articles:
Pushdown ETL and big data Jobs
For the customers that leverage big data platforms (such as AWS EMR) or Cloud data warehouses (such as Snowflake), Talend provides execution pushdown capabilities with generated native Spark codes or ELT code. Instead of extracting /transforming the data before it's written (as is the case with Talend ETL Jobs running in the Remote Engine), this mechanism lets the target platforms to execute the heavy data processing. The executables generated by Talend (ELT or big data Jobs) will be first copied to the target platforms (data warehouse or big data platform) and then processed in place. You can therefore choose to run push-down processes with either the Talend Managed or Customer Managed approach. However, as the compute is not being done by the Talend Job, data service or route in a push-down scenario, the additional complexity of the ‘customer-managed’ approach should be avoided unless already being leveraged for other purposes.
REST/SOAP Data Services and Mediation Routes
For real-time application integration, Talend supports developing both Data Services jobs (RESTful & SOAP) and Mediation Routes, and these can be deployed into the Talend Managed platform as a native option. Moreover, Talend also provides the capability to generate RESTful Data Services Jobs or Mediation Routes as self-contained microservices (Spring boot), so these can be deployed into a Customer-managed container/serverless platform without dependencies on infrastructure applications such as Talend Runtime. Talend does not support publishing SOAP Data Services as Microservices directly, but SOAP services built using Mediation Routes are allowed. SOAP is a more rigid set of messaging patterns than RESTful, and the rules in SOAP are important because it is not possible to achieve any level of standardization without them. The SOAP Data Services will also need to involve more infrastructure services such security, authorization, authentication, Service Registry and so on, and those services are usually handled by external API gateways (which is beyond the scope of this document).
Benefits of a Talend Managed approach:
- Operational efficiency: Use Talend native managed services including monitoring, scheduling, deployment, and so on.
- Time-to-value: Quick to set up environment (Runtime or microservice + Remote Engine pair) and development tool (Talend Studio).
- Lower maintenance costs: No costs for installing or supporting Talend Cloud native services (such as TMC interfaces).
Benefits of a Customer-managed approach:
- Native horizontal scalability for higher demands: Container orchestration/serverless platform scales out microservices when facing higher APIs demands, for instance, online shopping companies always need to design a stable and scalable environment for their web services applications – especially with the need to be 100% available during sales events such as Black Friday or Boxing day.
- Zero Talend server applications: No need for hosting Talend server applications for Web services or real-time mediation routes.
Both approaches require sufficient engine tokens. Your account team can assist with licensing questions.