StreamExecutionEnvironment#

StreamExecutionEnvironment#

The StreamExecutionEnvironment is the context in which a streaming program is executed. A LocalStreamEnvironment will cause execution in the attached JVM, a RemoteStreamEnvironment will cause execution on a remote setup.

The environment provides methods to control the job execution (such as setting the parallelism or the fault tolerance/checkpointing parameters) and to interact with the outside world (data access).

StreamExecutionEnvironment.get_config()

Gets the config object.

StreamExecutionEnvironment.set_parallelism(...)

Sets the parallelism for operations executed through this environment.

StreamExecutionEnvironment.set_max_parallelism(...)

Sets the maximum degree of parallelism defined for the program.

StreamExecutionEnvironment.register_slot_sharing_group(...)

Register a slot sharing group with its resource spec.

StreamExecutionEnvironment.get_parallelism()

Gets the parallelism with which operation are executed by default.

StreamExecutionEnvironment.get_max_parallelism()

Gets the maximum degree of parallelism defined for the program.

StreamExecutionEnvironment.set_runtime_mode(...)

Sets the runtime execution mode for the application RuntimeExecutionMode.

StreamExecutionEnvironment.set_buffer_timeout(...)

Sets the maximum time frequency (milliseconds) for the flushing of the output buffers.

StreamExecutionEnvironment.get_buffer_timeout()

Gets the maximum time frequency (milliseconds) for the flushing of the output buffers.

StreamExecutionEnvironment.disable_operator_chaining()

Disables operator chaining for streaming operators.

StreamExecutionEnvironment.is_chaining_enabled()

Returns whether operator chaining is enabled.

StreamExecutionEnvironment.get_checkpoint_config()

Gets the checkpoint config, which defines values like checkpoint interval, delay between checkpoints, etc.

StreamExecutionEnvironment.enable_checkpointing(...)

Enables checkpointing for the streaming job.

StreamExecutionEnvironment.get_checkpoint_interval()

Returns the checkpointing interval or -1 if checkpointing is disabled.

StreamExecutionEnvironment.get_checkpointing_mode()

Returns the checkpointing mode (exactly-once vs.

StreamExecutionEnvironment.enable_changelog_state_backend(enabled)

Enable the change log for current state backend.

StreamExecutionEnvironment.is_changelog_state_backend_enabled()

Gets the enable status of change log for state backend.

StreamExecutionEnvironment.set_default_savepoint_directory(...)

Sets the default savepoint directory, where savepoints will be written to if none is explicitly provided when triggered.

StreamExecutionEnvironment.get_default_savepoint_directory()

Gets the default savepoint directory for this Job.

StreamExecutionEnvironment.configure(...)

Sets all relevant options contained in the Configuration.

StreamExecutionEnvironment.add_python_file(...)

Adds a python dependency which could be python files, python packages or local directories.

StreamExecutionEnvironment.set_python_requirements(...)

Specifies a requirements.txt file which defines the third-party dependencies.

StreamExecutionEnvironment.add_python_archive(...)

Adds a python archive file.

StreamExecutionEnvironment.set_python_executable(...)

Sets the path of the python interpreter which is used to execute the python udf workers.

StreamExecutionEnvironment.add_jars(*jars_path)

Adds a list of jar files that will be uploaded to the cluster and referenced by the job.

StreamExecutionEnvironment.add_classpaths(...)

Adds a list of URLs that are added to the classpath of each user code classloader of the program.

StreamExecutionEnvironment.get_default_local_parallelism()

Gets the default parallelism that will be used for the local execution environment.

StreamExecutionEnvironment.set_default_local_parallelism(...)

Sets the default parallelism that will be used for the local execution environment.

StreamExecutionEnvironment.execute([job_name])

Triggers the program execution.

StreamExecutionEnvironment.execute_async([...])

Triggers the program asynchronously.

StreamExecutionEnvironment.get_execution_plan()

Creates the plan with which the system will execute the program, and returns it as a String using a JSON representation of the execution data flow graph.

StreamExecutionEnvironment.register_cached_file(...)

Registers a file at the distributed cache under the given name.

StreamExecutionEnvironment.get_execution_environment([...])

Creates an execution environment that represents the context in which the program is currently executed.

StreamExecutionEnvironment.create_input(...)

Create an input data stream with InputFormat.

StreamExecutionEnvironment.add_source(...[, ...])

Adds a data source to the streaming topology.

StreamExecutionEnvironment.from_source(...)

Adds a data Source to the environment to get a DataStream.

StreamExecutionEnvironment.from_collection(...)

Creates a data stream from the given non-empty collection.

StreamExecutionEnvironment.is_unaligned_checkpoints_enabled()

Returns whether Unaligned Checkpoints are enabled.

StreamExecutionEnvironment.is_force_unaligned_checkpoints()

Returns whether Unaligned Checkpoints are force-enabled.

StreamExecutionEnvironment.close()

Close and clean up the execution environment.

RuntimeExecutionMode#

Runtime execution mode of DataStream programs. Among other things, this controls task scheduling, network shuffle behavior, and time semantics. Some operations will also change their record emission behaviour based on the configured execution mode.

STREAMING:

The Pipeline will be executed with Streaming Semantics. All tasks will be deployed before execution starts, checkpoints will be enabled, and both processing and event time will be fully supported.

BATCH:

The Pipeline will be executed with Batch Semantics. Tasks will be scheduled gradually based on the scheduling region they belong, shuffles between regions will be blocking, watermarks are assumed to be “perfect” i.e. no late data, and processing time is assumed to not advance during execution.

AUTOMATIC:

Flink will set the execution mode to BATCH if all sources are bounded, or STREAMING if there is at least one source which is unbounded.

RuntimeExecutionMode.STREAMING

RuntimeExecutionMode.BATCH

RuntimeExecutionMode.AUTOMATIC

SlotSharingGroup#

SlotSharingGroup(j_slot_sharing_group)

Describe the name and the different resource components of a slot sharing group.

MemorySize([j_memory_size, bytes_size])

MemorySize is a representation of a number of bytes, viewable in different units.