Working on large datasets
By default, a dataset that exceeds 10,000 rows in Talend Cloud Data Preparation is considered a large dataset.
Even if there is no limitation regarding the size of the dataset that you can create, the export settings and the display of large datasets are different than usual. You will be able to work on a sample displaying the first 10,000 rows, but your preparation can also be applied to the rest of your dataset. The following scenario will illustrate the example of a dataset containing 50,000 rows.
Fetching more data from a large dataset
When working on a large dataset in Talend Cloud Data Preparation, 50,000 rows for example, only a sample of the first 10,000 rows is displayed.
You can start preparing your data and apply functions, like you would normally do for any other dataset. However, one difference occurs when you apply a filter of any type on your data. Since you are working on a sample, only the matching rows among the first 10,000 will be retrieved. But you have the possibility to fetch more matching rows, among the remaining 40,000 and refine your preparation based on this new sample.