Skip to main content Skip to complementary content

Improving the Talend Trust Score™

The following example shows you how you can improve the Talend Trust Score™ using Talend Cloud Data Inventory and Talend Cloud Data Preparation.

In this example, you are working in a e-commerce company. Some orders have not been shipped yet. While you are looking into the order progress, you noticed some country names and tax identification numbers are wrong.

Here is a sample of the dataset:
Sample of a dataset containing information of an e-commerce company.

Checking the actual Talend Trust Score™

Procedure

  1. Go to the Datasets tab.
  2. To find the dataset of which you want to improve the Talend Trust Score™, filter the datasets.
    Add filter option selected.
  3. In this example, use the tags to filter the datasets.
    Tag in orders selected to filter the datasets.
    The dataset list is filtered. The Talend Trust Score™ is 3.38/5.

What to do next

To improve the Talend Trust Score™, you can create a preparation from this dataset. See Creating a preparation from the dataset.

For more information on preparations, see the Talend Cloud Data Preparation User Guide.

Creating a preparation from the dataset

Procedure

  1. In Talend Cloud Data Inventory, go to the Datasets tab.
    Your dataset list is filtered according to the filter applied in the previous section.
  2. Hover over the dataset and click the Preparations icon.
    Preparations icon selected.
    The Preparations wizard opens.
  3. Click Add.
    You are redirected to Talend Cloud Data Preparation and the preparation is created.

What to do next

You can now configure the preparation.

Configuring the preparation

About this task

This example makes you use functions from Talend Cloud Data Preparation.

Procedure

  1. To correct the country names, use the fuzzy matching function.
    1. Select the column: delivery_country.
    2. In the right panel, select Column and start typing fuzzy matching.
    3. Select the function Standardize value (fuzzy matching).
    4. Set the Match threshold to Default (> 80%).
    5. Click Submit. The step is added to the preparation steps in the left panel and the country names are corrected. For example, United Staates is replaced by United States.
  2. To convert the country codes, use a conversion function. The delivery_country column is still selected.
    1. In the right panel, select Column and start typing convert.
    2. Select the function Convert country names and codes.
    3. Set From to ISO country code and To to English country name.
    4. Click Submit. The country names are converted. For example, CA is replaced by Canada.
  3. To correct the TIN, use the lookup feature.
    It lets you match the data from the current preparation with a reference dataset. For more information, see the Dynamically using the data from another dataset.
    You need to associate matching columns.
    1. Select the column: customer_id. In this example, this column is the matching one.
    2. Click the lookup icon above the right panel.
      Lookup icon selected.
      The Lookup panel opens as the right panel.
    3. Click Select dataset.
    4. Select the reference dataset and click Select. You are back to the Lookup panel and the reference dataset is displayed below the preparation.
      Lookup panel with reference dataset displayed below the preparation.
    5. In Current preparation and Lookup dataset, select customer_id.
    6. Select the column from the reference dataset to be added to the preparation.
      In this example, you want to correct the TIN. You need to select customer_tax_id.
      customer_tax_id selected from the Import columns option.
    7. Click Submit. The step is added to the preparation steps in the left panel.

Running the preparation

Procedure

  1. Click Run.
  2. Select the current dataset to output the prepared data.
    When you have selected the dataset, the Additional parameters display.
  3. Select Update as Action.
  4. In Operation keys, type order_id.
  5. Select (Create entry) and click Next.
    order_id showed in the Operation keys drop-down list.
  6. Map customer_tax_id with customer_tin.
    customer_tax_id mapped with custom_tin.

    If the preparation has more columns than the reference dataset, you can leave some columns unmapped.

  7. Click Next.
  8. Select the run profile and click Run.
    Running the preparation can take a few minutes. You can check the run progress in the Run history above the right panel.
    Run history button clicked.
  9. To see the improvements on your dataset, go back to Talend Cloud Data Inventory.

Checking the new Talend Trust Score™

Procedure

  1. To see the improvements from the preparation, open the dataset and refresh the sample from the Sample view.
    Refresh button in the Head sample view.
    You can see the country names are all correct and some TIN have been filled in too.
  2. To see the effects on the dataset quality, go to the dataset overview.
    The Talend Trust Score™ has raised from 3.38/5 to 3.57/5.
    Talend Trust Score™ of 3.57/5 illustrated.

What to do next

Data quality rules can also help you improve the Talend Trust Score™. For more information, see Managing data quality rules.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!