Step 2: Loading changes from the source database table into the Hive external
table
This step reads only the changes from the source database table and loads them into
the Hive external table employee_extnl.
Procedure
The Big Data Batch Job is as follow:
The source table is filtered by the last updated timestamp which is
maintained in the cdc_control table. This is
done by using this SQL in the Where condition of the
tmysqlInput component.
where
cdc.Table_Name='employee_table' and
emp.`Record_DateTime`>
cdc.Last_executed"
The tAggregateRow loads one row per run into the
cdc_control table. It does an update else
insert operation on the table. If a record for the table already exists,
it will update the record with the run time of the Job.
The runtime
can be set by using the
TalendDate.getCurrentDate()
function.
The following shows the data in the source employee_table table after
new records are added:
Run the Job.
The following shows the data in the employee_extnl
external Hive table after the Job is run:
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!