Skip to main content Skip to complementary content

Cleansing data coming from a human resource management system

The HRMS Export dataset corresponds to an Excel file that has been exported from an American human resource management system (or HRMS).

It contains the full list of employees since the creation of the company with their name, job title, hiring date, departure date if any and bank information for their salary. In this dataset, the dates are in the American date format and you want to transform them to the French date format so that they can be used with French software solutions. Also, you want to extract the account number from the IBAN number for the French accounts.

Download the file: HRMS_export.xlsx.

Adding a preparation for the HRMS export dataset

Add a preparation to start preparing and cleansing your data.

You can create a preparation from a dataset already available in Talend Cloud Data Preparation or one of your local files. When you add a preparation with the corresponding button, it will be created in the folder in which you are currently working. Furthermore, your preparation will be automatically saved in the preparations list, and all the changes you make are also saved automatically.

Before you begin

You have created the HRMS_export dataset in Talend Cloud Data Preparation using the previously downloaded file.

Procedure

  1. From the homepage, click Preparations to open the list of preparations.
  2. Click the Add Preparation button.
  3. From the list of Datasets, select HRMS_export to use it as source material.
    Add a preparation window with a list of datasets available to create a preparation, including HRMS_export.
    The Preparation name field is automatically filled in but you can change the preparation name.
  4. Click Submit.

Results

Your dataset opens with an empty recipe, and you can start adding preparation steps. All your changes are automatically saved.

Converting dates to the French format

As the date formats used across the world are not the same, you may need to change the format used in a column containing dates.

You will change the date format that is used in this dataset, from the American format, to the French format.

Procedure

  1. Select the entry_date column.
  2. In the functions panel, on the right side of the screen, type Change date format and click the result to open the options for the associated function.
  3. Configure the function as follows:
    • In the Current format field, select I don't know, best guess.
    • In the New format field, select Other.
    • In the Your format field, enter dd/MM/yyyy.

    The French format that you want to use, dd/MM/yyyy, is not available by default so you need to enter it as a custom value in the Your format field.

    For example, this will change 12/25/2015 to 25/12/2015.

  4. Click Submit to apply the function.

    You have the possibility to output the result of this function in a new column by selecting the Create new column check box.

Results

The date format is changed in the selected column.

Extracting the bank account number

If you want to take part of the text contained in a cell and reuse it elsewhere, you can extract part of the text.

The HRMS Export preparation contains French International Bank Account Numbers (IBAN). An IBAN is a 33-characters code, including spaces. It is made of a Country code, two check digits, a five-digit bank identifier, a five-digit branch identifier, an eleven-digit account number, and two final check digits.

You will extract the account number part of those IBAN, to a new column.

Before you begin

It is recommended to remove unnecessary blank spaces from the text records and to make sure the cells have the same length before proceeding.

Procedure

  1. Select the iban column.
  2. In the functions panel, type Extract a value by index and click the result to display the options of the associated function.
  3. Configure the function as follows:
    • Select the Create new column check box.
    • In the From field, select From beginning.
    • In the To field, select To index.
    • In the End index field, enter 5.

    The selection will start at the 17th character, spaces included, and end two characters before the end.

  4. Click the Submit button to extract the selection you made to a new column, iban_substring in this case.

Results

The text corresponding to the selection you made is extracted to a new column, that you can rename if you want.

Exporting the prepared HRMS data

Once your preparation is complete, you may want to export the data you have cleansed.

The preparation on the hrms_export.xlsx aimed at changing the date format and extracting the account numbers from the IBAN, is now complete and you can export it.

Procedure

  1. Click the Export button.
  2. Choose the file format you want to use when exporting your data:
    • If you choose Local CSV file, select which field delimiter, text enclosure and escape characters to use and enter a name for the file to export.
    • If you choose Local XLSX file, choose a name for the file to export.
    • If you choose Amazon S3, enter your credentials and other information to store your file on Amazon S3.

Results

The data you cleansed using your preparation is exported to a local file.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!