Skip to main content

Doing continuous matching

If you want to match new records against a clean data set, you do not need to restart the matching process from scratch.

You can reuse and index the clean set and to do continuous matching.

To be able to perform continuous matching tasks, Elasticsearch version 5.1.2+ must be running.

The continuous matching process is made up of the following steps:

  1. The first step consists of computing suffixes to separate clean and deduplicated records from a data set and indexing them in Elasticsearch using tMatchIndex.

    For an example of how to index a data in Elasticsearch using tMatchIndex, see Indexing a reference data set in Elasticsearch.

    You can find an example of how to index a data in Elasticsearch using tMatchIndexon Talend Help Center (https://help.talend.com).

  2. The second step consists of comparing the indexed records with new records having the same schema and outputting matching and non-matching records using tMatchIndexPredict. This component uses the pairing and matching models generated by tMatchPairing and tMatchModel.

    For an example of how to matching new records against records from a reference dataset, see Doing continuous matching using tMatchIndexPredict.

    You can find an example of how to do continuous matching using tMatchIndex on Talend Help Center (https://help.talend.com).

You can then clean and deduplicate the non-matching records using tRuleSurvivorship and populate the clean data set indexed in Elasticsearch using tMatchIndex.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!