Skip to main content
  • New archived content: Talend MDM, Talend Data Catalog 8.0, and Talend 7.3 products reached their end of life in 2024. Their documentation was moved to the Talend Archive page and will no longer receive content updates.
Close announcements banner

How does tMatchPairing compute the sample of suspect duplicate pairs?

The list of suspect duplicate pairs can be very large. You label only a subset of this list to identify the potential groups of duplicates.

You can then use machine learning to predict labels for the whole list. Then, it is possible to output a sample of this list, with a size fixed manually. The sample is chosen randomly.

For an example of how to handle grouping tasks to decide on relationship among pairs of records using Talend Data Stewardship, see .

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!