The tFileInputDelimited components are configured to load data from DBFS
into the Job.
Procedure
-
Expand the File delimited node under the Metadata node in the Repository and then the movies file connection node and its child node
to display the movies schema
metadata node.
-
Double-click this schema metadata node to open its
wizard.
-
Click the button to export the schema to a
local directory.
-
Double-click the movie
tFileInputDelimited
component to open its Component view.
-
Ensure that the Define a storage configuration component
check box is clear. This allows this component to directly read data from the
file system of the Spark cluster to be defined later in the Spark
configuration tab; In this scenario, this file system is
DBFS.
-
Click Edit schema to open the editor of the schema and
click the button to import
the schema of the movie data you exported previously from the File
delimited metadata in Repository.
-
In the Folder/File field, enter the path pointing to the
movie data stored in DBFS.
-
In the Header field, enter 1
without any quotation marks. This allows the component to recognize the first
row of the data as data header.
-
Double-click the director
tFileInputDelimited component to open its Component view.
-
Ensure that the Define a storage configuration component
check box is clear for the same reason as explained in the previous steps.
-
Click the [...] button next to Edit schema to open the schema editor.
-
Click the [+] button twice to add two rows and in
the Column column, rename them to ID and Name, respectively.
-
Click OK to validate these changes and accept the
propagation prompted by the pop-up dialog box.
-
In the Folder/File field, enter the directory where the director data
is stored. As is explained in Uploading files to DBFS (Databricks File System),
this data has been written in /FileStore/ychen/movie_library/directors.txt.
-
In Field separator field, enter a
comma (,) as this is the separator used by the director data.
Results
The input components are now configured to load the movie
data and the director data to the Job.