Adding a new database type

Talend Data Preparation allows a direct connection to various types of databases. You can use them as source to create new datasets.

It is possible to manually enrich the list of databases from which you can import data.

The list of available database types for dataset creation actually depends on the JDBC drivers that you have stored in the <components_catalog_path>/.m2 folder.

Let's say that you have some customer data stored on an Oracle database, and you want to import it in Talend Data Preparation to perform cleansing operations. You will add a JDBC driver .jar file specific to Oracle databases to the Components Catalog folder structure to add this new source of data in the Talend Data Preparation interface.

In a Big Data context, if you want to run preparations made on data from your Oracle database, on the Hadoop cluster, the same driver must be added to the Spark Job Server folder structure.

You do not need to stop or restart any of the services to complete the following procedure.

Before you begin

The Components Catalog server and the Spark Job Server are installed and running on a Linux machine.

Procedure

Download the latest Oracle jdbc driver called ojdbc7.jar from the Oracle website.
Create the <components_catalog_path>/.m2/jdbc-drivers/oracle/7/ folder.

Information noteWarning: The folder structure must follow this template: .m2/jdbc-drivers/<database_name>/<jdbc_version>.
Copy the ojdbc7.jar in the newly created folder.
Change the name of the file from ojdbc7.jar to oracle-7.jar.

Information noteWarning: The file name must follow this template: <database_name>-<jdbc_version>.

The purpose of renaming the .jar file and the folder structure, is to ensure naming consistency and make them Maven compliant.
Update the <components_catalog_path>/config/jdbc_config.json file by adding the following lines:
```
,
    {
        "id" : "Oracle Thin",
        "class" : "oracle.jdbc.driver.OracleDriver",
        "url" : "jdbc:oracle:thin:@myhost:1521:thedb",
        "paths" : 
        [
            {"path" : "mvn:jdbc-drivers/oracle/7"}
        ]

    }
```
Where:
- id is the value that will be displayed in the Talend Data Preparation interface as Database type.
- class is the driver class used to communicate with the database.
- url is the URL template to access a database.
- path follows this model: mvn:jdbc-drivers/my_databse_name/my_version
In the case where the database configuration requires more than one .jar file, rename them according to the template mentioned earlier, and add them to their dedicated .m2/jdbc-drivers/<jar_name>/<jdbc_version> folder like you did for the Oracle driver. For a database that would need two .jar files for example, you would end up with the two following files:
.m2/jdbc-drivers/<jar_1>/<version>/<jar_name_1>-<version>.jar and .m2/jdbc-drivers/<jar_2>/<version>/<jar_name_2>-<version>.jar
To finish the configuration, update the <components_catalog_path>/config/jdbc_config.json file using the following model:
```
,
    {
        "id" : "Database_type",
        "class" : "<driver_class>",
        "url" : "<url_to_access_database>",
        "paths" : 
        [
            {"path" : "mvn:jdbc-drivers/jar_1/version"},
			{"path" : "mvn:jdbc-drivers/jar_2/version"}
        ]

    }
```
To enable export on the Hadoop cluster for the new dataset type, copy the oracle-7.jar file to the <spark_job_server_path>/datastreams-deps/ folder.
Copy the changes made in the <components_catalog_path>/config/jdbc_config.json file, and paste them into the <spark_job_server_path>/jdbc_config.json file.

Results

The Oracle database is now available in the database type drop-down list in the import form.

When exporting a preparation made on data stored on your Oracle database, you can choose to process the data on the Talend Data Preparation server, or a Hadoop Cluster if you are using Big Data.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!

Leave your feedback here