Skip to main content Skip to complementary content
  • New archived content: Talend MDM, Talend Data Catalog 8.0, and Talend 7.3 products reached their end of life in 2024. Their documentation was moved to the Talend Archive page and will no longer receive content updates.
Close announcements banner

What is data matching?

Data matching is the process that enables you to find records representing the same entity in a dataset.

General definition

Data matching enables you to:
  • Find duplicates, potential duplicates and non-duplicates in a data source
  • Analyze data and return weighted probabilities of matching
  • Merge identical or similar entries into a single entry; and
  • Reduce disparity across different data sources.

Record linkage

Record linkage consists of identifying records that refer to the same entity in a dataset.

Two types of data record linkage exist:
  • Deterministic record linkage, which is based on identifiers that match; and
  • Probabilistic record linkage, which is based on the probability that identifiers match.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!