Skip to main content Skip to complementary content
  • New archived content: Talend MDM, Talend Data Catalog 8.0, and Talend 7.3 products reached their end of life in 2024. Their documentation was moved to the Talend Archive page and will no longer receive content updates.
Close announcements banner

Tuning hyper-parameters and using K-fold cross-validation to improve the matching model

Testing the model using the K-fold cross-validation technique

The K-fold cross-validation technique consists of assessing how good the model will be on an independent dataset.

To test the model, the dataset is split into k subsets and the Random forest algorithm is ran k times:

  • At each iteration, one of the k subsets is retained as the validation set and the remaining k-1 subsets are the training set.
  • A score for each of the k runs is computed and then the scores obtained are averaged to calculate a global score.

Tuning the Random forest algorithm hyper-parameters using grid search

You can specify values for the two Random forest algorithm hyper-parameters:

  • The number of decision trees
  • The maximum depth of a decision tree

To improve the quality of the model and tune the hyper-parameters, grid search builds models for each combination of the two Random forest algorithm hyper-parameter values within the limits you specified.

For example:

  • The number of trees ranges from 5 to 50 with a step of 5; and
  • the tree depth goes from 5 to 10 with a step of 1.

In this example, there will be 60 different combinations (10 × 6).

Only the best combination of the two hyper-parameters values used to train the best model is retained. This measure is reported by the K-fold cross-validation.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!