Skip to main content Skip to complementary content

Machine Learning 101 - Decision Trees

This article explains how to develop machine learning and decision trees.

Overview

This hands on tutorial demonstrates the basics of developing a machine learning routine using Talend and Spark. Specifically, decision tree learning will be leveraged for classification of real-life bank marketing data. Upon completion, you will have a working knowledge of how machine learning is integrated into a Talend workflow and some re-usable code snippets.

The source data used in this tutorial was retrieved from the UCI Machine Learning Repository. Irvine, CA: University of California, Schools of Information and Computer Science. It is available in the public domain and is attributed to: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems, Elsevier, 62:22-31, June 2014: Bank Marketing dataset.

Prerequisites

You have:
  • Hortonworks 2.4 (HDP) installed and configured. You can also use Hortonworks sandbox, a virtual machine (VM) that you can download. For more information, see Create HDFS Metadata - Hortonworks.
  • Basic knowledge of:
    • Hadoop ecosystem's tools and technologies.
    • Hadoop Distributed File System (HDFS) and Spark.
  • Working knowledge of Talend Studio and Talend Big Data Platform.
  • Talend Big Data Platform installed and configured.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!