Writing the aggregated data about street incidents to EMR
Procedure
Double-click the tFileOutputParquet component to
open its Component view.
Example
Select the Define a storage
configuration component check box and then select the tS3Configuration component you configured in the
previous steps.
Click Sync columns to
ensure that tFileOutputParquet retrieve the
schema from the output side of tAggregateRow.
In the Folder/File
field, enter the name of the folder to be used to store the aggregated data in
the S3 bucket specified in tS3Configuration. For example,
enter /sample_user, then at runtime, the folder called
sample_user at the root of the bucket is used to
store the output of your Job.
From the Action
drop-down list, select Create if the
folder to be used does not exist yet in the bucket to be used; if this folder
already exists, select Overwrite.
Click Run to open its view and then click the
Spark Configuration tab to display its view
for configuring the Spark connection.
Select the Use local mode check box to test your Job locally before eventually submitting it to the remote Spark cluster.
In the local mode, the Studio builds the Spark environment in itself on the fly in order to
run the Job in. Each processor of the local machine is used as a Spark
worker to perform the computations.
In this mode, your local file system is used; therefore, deactivate the
configuration components such as tS3Configuration or
tHDFSConfiguration that provides connection
information to a remote file system, if you have placed these components
in your Job.
In the Component view of tFileOutputParquet, change the file path in the Folder/File field to a local directory and adapt the action to be taken on the Action drop-down list, that is to say, creating a new folder or overwriting the existing one.
On the Run tab, click Basic
Run and in this view, click Run to execute your Job locally to test its design
logic.
When your Job runs successfully, clear the Use local
mode check box in the Spark Configuration
view of the Run tab, then in the design workspace of your
Job, activate the configuration components and revert the changes you just made
in tFileOutputParquet for the local test.
Did this page help you?
If you find any issues with this page or its content – a typo, a missing step, or a technical error – please let us know!