Data Mapping Script Tutorial Sample
The specifics of the syntax for the data mapping script format are explained in a sample file at:
$MM_HOME/conf/MIRModelBridgeTemplate/DataMappingScript/DataMappingScriptTutorial.sql
/*
This data mapping script illustrates the use of classic database SQL Data Manipulation Language (SQL DML) syntax for
moving data from hierarchical file data lake to relational database data warehouse through flattening and staging areas.
The tutorial presents mapping capabilities from simple to complex
starting from the relational database end of the data movement and going to the hierarchical files processing beginning.
*/
/*
Below is the simplified syntax of the supported statements.
===== CREATE CONNECTION statement =====
CREATE CONNECTION <connection_name> TYPE <connection_type> ";";
<connection_type> =
"Relational Database"
|"Oracle Database"
|"Spanner Database"
|"Hive Database"
|"BigQuery Database"
|"File System"
|"Google Cloud Storage"
|"Amazon S3"
|"Messaging System"
|"Kafka"
|"NoSQL Database"
|"Cosmos DB"
|"MongoDB"
|"Tableau Extract Data Source";
===== SELECT INTO statement =====
[<with_clause_ansi>]
SELECT <column_exp> [AS <column_alias>] {, <column_exp> [AS <column_alias>]} INTO (<table_name> | <file_path>) FROM (<table_name> | <file_path>)
[<join_clause_ansi>][<group_by_clause_ansi>][<where_clause_ansi>][<order_by_clause_ansi>] ";";
<column_exp> =
<column_name>
| <arithmetical_expression>
| <function_call>
| <const> ;
All the literals must be escaped with the double quotes if they contain special characters or spaces. <file_path> uses a relaxed literal rule and follow the following pattern [^ \n\r\t\;.@\"]+
*/
-- define a mapping between tables in the same database
-- define the database connection by specifying its name and type.
-- the connection name should match the database server name to enable automatic connection resolution.
-- the database server name follows the server[:port] format.
CREATE Connection "dw:1521" type "Oracle Database";
-- bulk mapping from t1 table in staging schema to t1 table in warehouse schema
SELECT *
INTO warehouse.t1
FROM staging.t1;
-- column level mapping of a column x, and b and c columns to y using b+c operation
SELECT
a as x,
b+c as y
INTO warehouse.t2
FROM staging.t2;
-- define a mapping between different databases
-- the last defined connection is used in SELECT by default.
-- You can reference a table in another connection using @connection_name syntax
-- define a new database connection
CREATE Connection "eds:1521" type "Oracle Database";
SELECT *
INTO warehouse.t1@"dw:1521"
FROM eds_schema.t1;
-- define a mapping between files and tables
-- the connection name should match the file system model name to enable automatic connection resolution.
-- the file system model name can include the import root directory
-- file system connection of type AWS S3 where the name includes the root bucket
CREATE Connection "us-west-2.amazonaws.com/mybucket" type "Amazon S3";
-- mapping between file and table columns
SELECT
col1 as x,
col2 as y
INTO eds_schema.t1@"eds:1521"
FROM folder/file1.csv;
-- when a file contains hierarchal structure you can use dot separator to depict a data field hierarchy
-- here is a mapping example for for following JSON structure
-- {
-- "attr1": "",
-- "obj1":
-- {
-- "attr21": "",
-- "attr22": ""
-- }
-- }
SELECT
f.attr1 as x,
f.obj1.attr22 as y,
(f.obj1.attr22+f.obj1.attr22)/365 as z
INTO eds_schema.t1@"eds:1521"
FROM folder/file.json f;
-- you can specify conditional SQL logic using WHERE, JOIN and GROUP by clauses
SELECT s1.a as x
INTO warehouse.t1@"dw:1521"
FROM staging.t1@"dw:1521" s1,
staging.t2@"dw:1521" s2
WHERE s1.id = s2.id;
-- TBD: Should we requre aliases for all source classifiers? SELECT src.attr1 as x FROM s.t src
-- TBD: What separators should we use for file folder hierarchy? dot, slash, backslash?