Fields
An entity is composed of records and fields that hold data; a record is composed of fields that populate with data. Field metadata is critical to ingest, validation and profiling of the data. Each field is described by specific metadata that can be viewed and/or edited.
Field Information: General Information
Select (view details) icon to display column attributes, metadata, and statistical information about data loaded.
Select (sample data) icon to display sample records for that field. Note that multiple fields can be selected and then select top-level Sample Data button to display sample data for multiple fields.
Select editable fields or select options from the dropdowns.
Name |
Created when data is ingested |
Business Name |
user-defined |
Business Description |
user-defined |
Technical Description |
user-defined. Freeform field to describe technical characteristics of the data. |
Internal Data Type |
The data type as stored in Receiving Directory. Supported Data Types:
Note that Qlik Catalogwill convert DOUBLE/DECIMAL internal fields to scientific notation when the field is very large or very small (more than seven decimal places). |
Last Updated at |
auto-generated. Provides data and the last time metadata was last updated (ISO standard) |
Index |
column sequence number, column position of field in table (ex. 1, 2, 3, 4…) |
Field Information: Properties
Source ingest Information (key/value) properties can be added from the second modal tab, Properties.
Key/Value pairs, also known as attribute-value pairs, are specific to the object level at which they are applied.
Select the (plus) icon (Add Property) to open a drop-down with optional field properties.
Field Information: Lineage
Parent Lineage shows the root source of the field data. Child Lineage shows the source of the field data and identifies any other Qlik Catalog objects using this field. Select the arrow icon to display lineage information.
Field Information: Assigning tags to a field
Tags assist in locating and organizing data. Tags can be assigned in the Field Information box under the Tags tab, by filling in the Add a tag field and selecting tab or enter.
Field Information: Comments
Field Information Comments allows authorized users to view and edit details and properties of the selected field. The authorized user can create a Comment Topic, and then type in Comment Details in the boxes indicated. Additional comments can be entered by selecting + Add Comments, which will create another comment field. Save each comment. A Success message will appear above the box tabs. Comments are subject to collaborative review and can be saved as Draft or Approved.
Field Information: Data Distribution
Field level profiling statistics and data distributions are calculated for each field and recalculated against each successive data load.
Profile values
Profiling metrics of fields data provide the following top-level information:
Cardinality |
The number of unique values for that field. Cardinality can be examined by Percent, Count, and Value. |
Survey Count |
The number of records in the field. |
Survey Type |
An index describing distribution method. The following Survey Types are represented: Census: Every value in the field is counted for an exact distribution. Sample: There were too many unique values in the field to be efficiently counted; a sample of values was used to estimate the cardinality and distribution. Log10Survey: A counting method used for distributions with high cardinality— number of different values in the specified range. |
Reading (FIELD) data distribution
Cardinality
Estimated cardinality is denoted by the "approximately" equal ≈ symbol. Note that in the case of INTEGER Log10 and STRING samples, exact cardinality cannot be computed but estimated cardinality is computed and displayed with an "approximately" equal symbol.
Intervals
Reading intervals: square bracket [] and parenthesis () notation with half-open or half-closed brackets and parentheses (ex. '[10.0, 100.0)') is used to indicate an interval from '10.0' to '100.0' that is inclusive of '10 .0' but exclusive of '100.0'. In other words, [10.0, 100.0) would be the set of all real numbers between 10.0 and 100.0, including 10.0 but not 100.0. Numbers within that interval may come very close to 100.0 (for example, 99.9999999) but 100.0 is not included and would be included in the next represented interval (ex. '[100.0, 10000.0)')
Note that intervals are listed in descending order of occurring frequency rather than value.
Scientific notation
Data distribution intervals are notated scientifically to help represent very large and small numbers in a way that is easy to read and understand.
Qlik Catalog covers the following ranges:
- INTEGER (-1E+18, 1E+18)
- 18 digits, negative to positive
- DOUBLE or DECIMAL (-1.0E+38, 1.0E+38)
- 38 digits, negative to positive
Census vs. sample for string fields: Qlik Catalog samples data to effectively build a histogram of unique data value distribution. Columns with cardinality < 4001 conduct a census that includes every unique observation, columns that number beyond that range conduct a sample.
For Integer, Double, and Decimal numeric fields, Qlik Catalog profiling of numeric fields conducts a LOG10_SURVEY, which effectively builds a histogram distribution of the log10 (numeric_observation). LOG10_SURVEY results present with Survey Count, Survey Type description, and survey profile stats: Percent, Count, Value (between) [low value, high value].