When you select the Custom option from the
Distribution drop-down list, you are connecting to a Hadoop
distribution different from any of the Hadoop distributions provided on that
Distribution list in Talend Studio.
About this task
After selecting this Custom option, click
[...] to display the Import custom
definition dialog box and proceed as follows:
Note that custom versions are not officially supported by
Talend
.
Talend
and its community provide you with the opportunity to connect to custom
versions from Talend Studio
but cannot guarantee that the configuration of whichever version you choose will be
easy, due to the wide range of different Hadoop distributions and versions that are
available. As such, you should only attempt to set up such a connection if you have
sufficient Hadoop experience to handle any issues on your own.
-
Depending on your situation, select Import from existing
version or Import from zip to
configure the custom Hadoop distribution to be connected to.
-
If you have the configuration zip file of the custom Hadoop distribution you
need to connect to, select Import from zip.
-
Otherwise, select Import from existing
version to import an officially supported Hadoop distribution
as base so as to customize it by following the wizard. Adopting this
approach requires knowledge about the configuration of the Hadoop
distribution to be used.
Note that the check boxes in the wizard allow you to select the Hadoop elements
you need to import. All the check boxes are not always displayed in your wizard
depending on the context in which you are creating the connection. For example,
if you are creating this connection for Spark, then only the
Spark check box appears.
-
Whether you have selected Import from existing version
or Import from zip, verify that each check box next to
the Hadoop element you need to import has been selected.
-
Click OK and then in the pop-up warning, click
Yes to accept overwriting any custom setup of JAR
files previously implemented.
Once done, the Custom Hadoop version definition dialog
box becomes active.
This dialog box lists the Hadoop elements and their jar files you are
importing.
-
If you have selected Import from zip, click
OK to validate the imported
configuration.
If you have selected Import from existing version as base,
you should still need to add more jar files to customize that version. Then from
the tab of the Hadoop element you need to customize, for example, the
Spark tab, click the [+]
button to open the Select libraries dialog box.
-
Select the External libraries option to open its
view.
-
Browse to and select any jar file you need to import.
-
Click OK to validate the changes and to close the
Select libraries dialog box.
Once done, the selected jar file appears on the list in the tab of the Hadoop
element being configured.
Note that if you need to share the custom Hadoop setup with another
Talend Studio, you can export this custom connection from the
Custom Hadoop
version definition window using the
button.
-
In the Custom Hadoop version definition dialog
box, click OK to validate the customized
configuration. This brings you back to the configuration view in which you have
selected the Custom option.
Results
Now that the configuration of the custom Hadoop version has been set up and you are back
to the Hadoop connection configuration view, you are able to continue to enter other
parameters required by the connection.
If the custom Hadoop version you need to connect to contains YARN and you want to use it,
select the Use YARN check box next to the Distribution list.
A video is available in the following link to demonstrate, by taking HDFS as example, how
to set up the connection to a custom Hadoop cluster, also referred to as an unsupported
Hadoop distribution: How to add an unsupported
Hadoop distribution to the Studio.