Flink comes with an integrated interactive Scala Shell.
It can be used in a local setup as well as in a cluster setup.
To use the shell with an integrated Flink cluster just execute:
in the root directory of your binary Flink directory. To run the Shell on a
cluster, please see the Setup section below.
Usage
The shell supports DataSet, DataStream, Table API and SQL.
Four different Environments are automatically prebound after startup.
Use “benv” and “senv” to access the Batch and Streaming ExecutionEnvironment respectively.
Use “btenv” and “stenv” to access BatchTableEnvironment and StreamTableEnvironment respectively.
DataSet API
The following example will execute the wordcount program in the Scala shell:
The print() command will automatically send the specified tasks to the JobManager for execution and will show the result of the computation in the terminal.
It is possible to write results to a file. However, in this case you need to call execute, to run your program:
DataStream API
Similar to the batch program above, we can execute a streaming program through the DataStream API:
Note, that in the Streaming case, the print operation does not trigger execution directly.
The Flink Shell comes with command history and auto-completion.
Table API
The example below is a wordcount program using Table API:
Note, that using $ as a prefix for the class name of TableFunction is a workaround of the issue that scala incorrectly generated inner class name.
SQL
The following example is a wordcount program written in SQL:
Adding external dependencies
It is possible to add external classpaths to the Scala-shell. These will be sent to the Jobmanager automatically alongside your shell program, when calling execute.
Use the parameter -a <path/to/jar.jar> or --addclasspath <path/to/jar.jar> to load additional classes.
Setup
To get an overview of what options the Scala Shell provides, please use
Local
To use the shell with an integrated Flink cluster just execute:
Remote
To use it with a running cluster start the scala shell with the keyword remote
and supply the host and port of the JobManager with:
Yarn Scala Shell cluster
The shell can deploy a Flink cluster to YARN, which is used exclusively by the
shell. The number of YARN containers can be controlled by the parameter -n <arg>.
The shell deploys a new Flink cluster on YARN and connects the
cluster. You can also specify options for YARN cluster such as memory for
JobManager, name of YARN application, etc.
For example, to start a Yarn cluster for the Scala Shell with two TaskManagers
use the following:
For all other options, see the full reference at the bottom.
Yarn Session
If you have previously deployed a Flink cluster using the Flink Yarn Session,
the Scala shell can connect with it using the following command: