Sampling and profiling data
Talend Data Catalog provides the option to profile and sample the data contained in files and tables during the harvesting process.
Before the harvesting process, you can specify the number of records to profile and how many should be maintained as a sample for visualization later.
- The data profiling helps to discover business knowledge embedded in the data itself, improve the understanding of the data and enables to classify data. The data profiling process creates a summary with statistics and charts. It helps you to find out if the correct data is available at the appropriate detail level.
- The data sampling provides sample rows from the dataset.
The information is available when you navigate to the file or table object page or when looking at individual fields or columns from the file or table.
Sampled data and the profiling results are hidden from most users by default. You must be assigned an object role with the Data Viewing capability for the model. You can also hide the sample data and profiling results for specific models.
The application can store and display the following data profiling details for table/view
and column objects:
- Counts (standard and custom counts, like empty and valid rows)
- Values (distinct values and their counts)
- Patterns (patterns and their counts)
- Data types (inferred data types and their counts)