Skip to main content Skip to complementary content
  • New archived content: Talend MDM, Talend Data Catalog 8.0, and Talend 7.3 products reached their end of life in 2024. Their documentation was moved to the Talend Archive page and will no longer receive content updates.
Close announcements banner

Machine Learning 101 - Decision Trees

This article explains how to develop machine learning and decision trees.

Overview

This hands on tutorial demonstrates the basics of developing a machine learning routine using Talend and Spark. Specifically, decision tree learning will be leveraged for classification of real-life bank marketing data. Upon completion, you will have a working knowledge of how machine learning is integrated into a Talend workflow and some re-usable code snippets.

The source data used in this tutorial was retrieved from the UCI Machine Learning Repository. Irvine, CA: University of California, Schools of Information and Computer Science. It is available in the public domain and is attributed to: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems, Elsevier, 62:22-31, June 2014: Bank Marketing dataset.

Prerequisites

You have:
  • Hortonworks 2.4 (HDP) installed and configured. You can also use Hortonworks sandbox, a virtual machine (VM) that you can download. For more information, see Create HDFS Metadata - Hortonworks.
  • Basic knowledge of:
    • Hadoop ecosystem's tools and technologies.
    • Hadoop Distributed File System (HDFS) and Spark.
  • Working knowledge of Talend Studio and Talend Big Data Platform.
  • Talend Big Data Platform installed and configured.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!