Data mining types
When you create a column analysis in Talend Studio, you can see a Datamining Type box next to each of the columns you want to analyze.
The selected type in the box represents the data mining type of the associated column.
These data mining types help Talend Studio to choose the appropriate metrics for the associated column since not all indicators (or metrics) can be computed on all data types.
Available data mining types are: Nominal, Interval, Unstructured Text and Other. The sections below describe these data mining types.
Nominal
Nominal data is categorical data which values/observations can be assigned a code in the form of a number where the numbers are simply labels. You can count, but not order or measure nominal data.
In Talend Studio, the mining type of textual data is set to nominal. For example, a column called WEATHER with the values: sun, cloud and rain is nominal.
Also, a column called POSTAL_CODE that has the values 52200 and 75014 is nominal as well in spite of the numerical values. Such data is of nominal type because it identifies a postal code in France. Computing mathematical quantities such as the average on such data is non sense. In such a case, you should set the data mining type of the column to Nominal, because there is currently no way in Talend Studio to automatically guess the correct type of data.
The same is true for primary or foreign-key data. Keys are most of the time represented by numerical data, but their data mining type is Nominal.
Interval
This data mining type is used for numerical data and time data. Averages can be computed on this kind of data. In databases, sometimes numerical quantities are stored in textual fields.
In Talend Studio, it is possible to declare the data mining type of a textual column (e.g. a column of type VARCHAR) as Interval. In that case, the data should be treated as numerical data and summary statistics should be available.
Unstructured text
This data mining type is dedicated to handle unstructured textual data.
For example, the data mining type of a column called COMMENT that contains commentary text can not be Nominal, since the text in it is unstructured. Still, you can be interested in seeing the duplicate values of such a column and here comes the need for such a new data mining type.
Other
This type designs the data that Talend Studio does not know how to handle.