Skip to main content Skip to complementary content

Creating "or" conditions on numerical data

When working on datasets containing numerical data, you can use a filter to isolate rows that include a specific value, or select a range of values via the Chart panel to create the corresponding filter. However, to create a filter that returns the rows corresponding to a list of values, in other words an "or" filter, you need to proceed differently. You will need to create a regular expression matching all the numerical values of your choice, and filter on rows that match this regular expression.

Using the car dealership dataset, for the sake of the example, you will create a filter to display the sales for cars that were released on specific dates. You will then be able to export the customer information corresponding to this filter only.

Adding a preparation for the car dealership dataset

Add a preparation to start preparing and cleansing your data.

You can create a preparation from a dataset already available in Talend Cloud Data Preparation or one of your local files. When you add a preparation with the corresponding button, it will be created in the folder in which you are currently working. Furthermore, your preparation will be automatically saved in the preparations list, and all the changes you make are also saved automatically.

Before you begin

You have created the car_dealership dataset in Talend Cloud Data Preparation using the previously downloaded file.

Procedure

  1. From the homepage, click Preparations to open the list of preparations.
  2. Click the Add Preparation button.
  3. From the list of Datasets, select car_dealership to use it as source material.
    Add a preparation window with a list of datasets available to create a preparation, including car_dealership.
    The Preparation name field is automatically filled in but you can change the preparation name.
  4. Click Submit.

Results

Your dataset opens with an empty recipe, and you can start adding preparation steps. All your changes are automatically saved.

Creating a regular expression to match the car years

You will create a regular expression to match all the dates you are interested in, namely 1950, 1960, 1970, 1980, 1990, 2000 and 2010.

Procedure

  1. Click the header of the car_year column to select its content.
  2. In the functions panel, on the right side of the screen, type Match pattern and click the result to open the options for the associated function.
  3. In the Pattern field, select Other.
  4. From the Manual pattern drop-down list, select RegEx.
  5. In the Manual pattern field, enter the following regular expression:
    ^1950$|^1960$|^1970$|^1980$|^1990$|^2000$|^2010$

    This expression will match all the rows with sales involving cars from the year 1950, 1960, 1970 and so on.

  6. Click Submit.

Results

A new column is created, where rows that match the regular expression are marked as true, while those who do not are marked as false.
A new column is added to the preparation to show is values match the expression.

Filtering on these values to create an "or" condition

Now that you have a way of knowing which rows match the regular expression, you will be able to retrieve them all by creating a filter on this new column.

Procedure

  1. Click the header of the car_year_matching column to select its content.
    In the data profiling area, on the bottom right of the screen, you can see a horizontal bar chart showing the repartition between the true and false values in the column. Point your mouse over each bar to display the exact number of occurrences.
    Bar chart showing the repartition of true and false values.
  2. To create the filter, click the true bar.
    You can see from the filter bar that the filter was applied on the dataset, and customers data is only displayed if it matches the condition previously set by the regular expression.
    A filter is applied to only show the rows with the true value.

Results

With this filter based on not just one, but several numerical values at the same time, you have isolated sales involving cars from the specified years. From there you can apply any function and work on this sample only.

Exporting the filtered rows

Now that you have highlighted and prepared a restricted list of customers, you will export it as a local file.

Procedure

  1. Click the Export button.
  2. Select the All data radio button and make sure the Apply filters switch is activated.
    That way, the export will include all the potential rows that match the filter, even beyond the sample if your dataset is bigger than that.
  3. Choose the file format you want to use when exporting your data.
    • If you choose Local CSV file, select which field delimiter, text enclosure and escape characters to use and enter a name for the file to export.
    • If you choose Local XLSX file, choose a name for the file to export.
    • If you choose Amazon S3, enter your credentials and other information to store your file on Amazon S3.

Results

The data you cleansed using your preparation is exported to a local file.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!