Skip to main content Skip to complementary content
  • New archived content: Talend MDM, Talend Data Catalog 8.0, and Talend 7.3 products reached their end of life in 2024. Their documentation was moved to the Talend Archive page and will no longer receive content updates.
Close announcements banner

Exporting a Kerberos-secured Hive dataset to HDFS

To enable exports from to a Kerberos Cloudera environment for Hive datasets, you must edit the Spark Job Server configuration files.

Information noteImportant: Make sure that your keytab file used to authenticate to HDFS is accessible to all the workers on the cluster.

Procedure

  1. Create a <sjs_path>/jobserver_gss.conf file, and add the following configuration parameters:
    com.sun.security.jgss.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    useTicketCache=false
    doNotPrompt=true
    useKeyTab=true
    keyTab="/path/to/the/keytab/keytab_file.keytab"
    principal="your@principalHere"
    debug=true;
    };
  2. In the <sjs_path>/manager_start.sh file, set these parameters with the following values to reference the previously created <sjs_path>/jobserver_gss.conf file:
    KRB5_OPTS="-Djava.security.auth.login.config=jobserver_gss.conf
     -Djava.security.krb5.debug=true
     -Djava.security.krb5.conf=/path/to/krb5.conf
     -Djavax.security.auth.useSubjectCredsOnly=false"
     --conf "spark.executor.extraJavaOptions=$LOGGING_OPTS $KRB5_OPTS"
     --conf "spark.yarn.dist.files=/path/to/jobserver_gss.conf"
     --proxy-user $4
     --driver-java-options "$GC_OPTS $JAVA_OPTS $LOGGING_OPTS $CONFIG_OVERRIDES $JDBC_PROPERTIES $KRB5_OPTS"
  3. When importing your dataset in Talend Data Preparation, the JDBC URL used to connect to Hive must follow this model:
    jdbc:hive2://host:10000/default;principal=<your_principal>
  4. Copy the <components_catalog_path>/config/jdbc_config.json file that contains the Hive driver to the Spark Job Server installation folder.
  5. Copy the .jar files from the <components_catalog_path>/.m2 folder to the <sjs_path>/datastreams-deps folder.

Results

You can now export your Hive datasets to HDFS.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!