Skip to main content Skip to complementary content

Generating the data profiling and sampling

Talend Data Catalog incrementally profiles all tables or files in the import scope and collects sample rows, when you enable the data profiling during metadata import.

As data profiling and metadata import processes share the data store connectivity and scope details, you do not need to configure the data profiling connectivity explicitly.

Data sampling and data profiling can be defined and performed independently.

Data sampling and data profiling can be performed as part of the model harvesting or on demand.

Data sampling and profiling is required to perform the auto-tagging for data classification.

Before you begin

  • Make sure that the bridge of the data source supports data profiling.
  • You have been assigned an object role with the Data Management capability.

Procedure

  1. Open the Import Options tab to enable the data profiling and/or sampling options.
  2. Select the Data Profiling check box and define the number of rows to profile.
  3. Select the Data Sampling check box and define the number of rows for preview.
  4. Select the Profile only objects that are not profiled yet check box to enable data profiling only on imported objects which have not been profiled.
    If the check box is cleared, Talend Data Catalog re-profiles imported objects based on their last modification time.
  5. Select the Data Classification check box to run automatically the data classification on the newly profiled objects.
  6. Select the Hide data using Sensitivity Label check box and select a sensitivity label from the list to apply the selected sensitivity label to the new imported objects in the scope.
  7. Save your changes.
  8. To run or refresh the data profiling and/or sampling, do one of the following:
    • Re-import the model and go to the object page.
    • Generate the data profiling and sampling from any level of an imported objects including Tables/Files/Views (Classifier), Schema/Package, Model or File System folder.
      1. Go to the object page.
      2. In the Data Request SQL area, specify your SQL query on the object as needed. The Data Request SQL is used after the re-harvesting.
      3. In the More actions menu, click Generate Data Sampling and Profiling.
      4. Configure the options as needed.
      5. Click OK to run the operation.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!