Working with data mappings
Some data flow processes are not harvestable using the bridges. If these processed are not modeled in Talend Data Catalog, it will leave gaps in the lineage and impact analysis answers and provide an incomplete picture of the physical architecture of your systems.
Data mapping specifications are high-level logical definitions of the way data "flows" from some number of source models into elements of a target model.
These mappings are specified using a simple web based drag and drop type mapping specification editor. They are defined using descriptive text. You may also define pseudo operations using an operation editor.
- A query mapping is most flexible. You define a column by column mapping definition for
all the columns in the target table. They can include joins, filters, transformations,
etc. Each query mapping is defined for one target classifier.
When defining a query mapping, the target classifier can come from any data store model defined in the configuration. Sources can come from any data store model(s), and as you can have many source classifiers, they could come from multiple source contents in the same mapping.
- A bulk mapping is assumed to be for source and target tables which have matching columns names or positions. You can certainly define filters, therefore the mappings at the column level are simple and name or position matching. Each bulk mapping is defined for one target schema and one source schema.
Any data mapping with bulk mappings and/or query mappings can be exported to and imported from the data mapping scripts format without loss. This format is based on the standard database SQL Data Manipulation Language (SQL DML) syntax and includes both:
- The data connection data models (for example, database schema, tables and columns) of their source and target data stores faithfully representing any supported technology (RDBMS, NoSQL, File Systems).
- The data integration (DI/ETL/ELT/CDC) for the data flow lineage between these data stores.
The specifics of the syntax for the data mapping script format are explained in a sample file at <TDC_HOME>/TalendDataCatalog/conf/MIRModelBridgeTemplate/DataMappingScript/DataMappingScriptTutorial.sql
The data mapping scripts can be edited/modified or may be generated entirely from scratch to model (simulate) a DI/ETL/ELT/CDC tool which may not be a part of the supported tools for a native model import.
The data mapping scripts can then be imported as independent DI models or imported back into a data mapping. The resulting data flow lineage would be the same in both cases.
The new data mapping script format provides a number of advantages over the older metadata Excel format (deprecated) including:
- The data mapping script format supports a full round trip (export/import) from/to a data mapping, while this is not possible with the older metadata Excel format where the export is more for reporting and the import for bootstrapping.
- The data mapping script format files are generally much more compact than the very verbose metadata Excel format files based on classifier/table maps and feature/column maps.
- The data mapping script format is easier to generate from SQL based DI/ETL/ELT/CDC tools.
- The data mapping script format supports both bulk mappings and query mappings, while the metadata Excel format is quite limited in its handling of bulk mappings.
- The data mapping script format supports more source and target data store technologies (RDBMS, NoSQL, File Systems) than the metadata Excel format.
Therefore, the export/import of data mapping script based files is the recommended method, while the legacy metadata Excel format is now deprecated.
.