Skip to main content Skip to complementary content

tReservoirSampling Standard properties

These properties are used to configure tReservoirSampling running in the Standard Job framework.

The Standard tReservoirSampling component belongs to the Data Quality family.

The component in this framework is available in Talend Data Management Platform, Talend Big Data Platform, Talend Real Time Big Data Platform, Talend Data Services Platform, and in Talend Data Fabric.

Basic settings

Schema and Edit schema

A schema is a row description. It defines the number of fields (columns) to be processed and passed on to the next component. When you create a Spark Job, avoid the reserved word line when naming the fields.

Click Sync columns to retrieve the schema from the previous component in the Job.

 

Built-In: You create and store the schema locally for this component only.

 

Repository: You have already created the schema and stored it in the Repository. You can reuse it in various projects and Job designs.

Sample Size

Set how many rows to sample from the input flow.

Advanced settings

Seed for random generator

Set a random number if you want to extract the same sample in different executions of the Job.

Repeating the execution with a different value for the seed will result in a different data samples being extracted.

Keep this field empty if you want to extract a different data sample each time you execute the Job.

tStat Catcher Statistics

Select this check box to collect log data at the component level.

Usage

Usage rule

This component helps you to test profiling analyses on a sample data and have results similar to the results on the full data set.

tReservoirSampling can not be used in Map/Reduce Jobs for the time being.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!