Skip to main content Skip to complementary content

Changing the semantic type of the popularity column

The semantic type corresponds to the category (names, emails, phone numbers, etc.) of the data. If the semantic type that has been detected for a column is not the desired one, you have the possibility to manually change it to one of the predefined types, based on your experience.

In the case of the movies_gsg dataset, you can see by looking at the sample that most columns have been assigned a type that corresponds to the actual data, like String for titles, Date for release dates, or Language code iso2 for the original language for example. However, you will notice that popularity column is marked as geographical coordinates which is not correct in this specific context. The way the data is formatted does match how coordinates can be written, but you will update it so that the type is more inline with the actual content of the column.

Procedure

  1. Click the header of the popularity column.
    When clicking the Edit icon next to the semantic type in the right panel, you can see the top matching type, geographical coordinates in the case, as well as the more standard types such as Text, Integer, Decimal, or Boolean. The geographical coordinate type has been automatically assigned because of the 99% compatibility, the missing 1% resulting in the only invalid value of the column.
  2. From the list of available types, select Decimal.
  3. Repeat these last steps to change the type of the runtime column to the more suited Decimal as well.
  4. Click Apply 2 changes.

Results

You have changed the semantic type of the popularity and runtime columns. And because the Decimal type matches 100% of the data in the popularity column, the quality bar in the column header is not showing orange anymore.

Did this page help you?

If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!