class ExecutionEnvironment extends AnyRef
The ExecutionEnvironment is the context in which a program is executed. A local environment will cause execution in the current JVM, a remote environment will cause execution on a remote cluster installation.
The environment provides methods to control the job execution (such as setting the parallelism) and to interact with the outside world (data access).
To get an execution environment use the methods on the companion object:
- ExecutionEnvironment#getExecutionEnvironment
- ExecutionEnvironment#createLocalEnvironment
- ExecutionEnvironment#createRemoteEnvironment
Use ExecutionEnvironment#getExecutionEnvironment to get the correct environment depending on where the program is executed. If it is run inside an IDE a local environment will be created. If the program is submitted to a cluster a remote execution environment will be created.
- Annotations
- @deprecated @Public()
- Deprecated
(Since version 1.17.0)
- See also
- Alphabetic
- By Inheritance
- ExecutionEnvironment
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
ExecutionEnvironment(javaEnv: java.ExecutionEnvironment)
- Deprecated
All Flink Scala APIs are deprecated and will be removed in a future Flink version. You can still build your application in Scala, but you should move to the Java version of either the DataStream and/or Table API.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
addDefaultKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit
Registers a default serializer for the given class and its sub-classes at Kryo.
Registers a default serializer for the given class and its sub-classes at Kryo.
Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.
-
def
addDefaultKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit
Registers a default serializer for the given class and its sub-classes at Kryo.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clearJobListeners(): Unit
Clear all registered JobListeners.
Clear all registered JobListeners.
- Annotations
- @PublicEvolving()
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
configure(configuration: ReadableConfig, classLoader: ClassLoader): Unit
Sets all relevant options contained in the ReadableConfig such as e.g.
Sets all relevant options contained in the ReadableConfig such as e.g. org.apache.flink.configuration.PipelineOptions#CACHED_FILES. It will reconfigure ExecutionEnvironment and ExecutionConfig.
It will change the value of a setting only if a corresponding option was set in the
configuration
. If a key is not present, the current value of a field will remain untouched.- configuration
a configuration to read the values from
- classLoader
a class loader to use when loading classes
- Annotations
- @PublicEvolving()
-
def
createInput[T](inputFormat: InputFormat[T, _])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]
Generic method to create an input DataSet with an org.apache.flink.api.common.io.InputFormat.
-
def
createProgramPlan(jobName: String = ""): Plan
Creates the program's org.apache.flink.api.common.Plan.
Creates the program's org.apache.flink.api.common.Plan. The plan is a description of all data sources, data sinks, and operations and how they interact, as an isolated unit that can be executed with an PipelineExecutor. Obtaining a plan and starting it with an executor is an alternative way to run a program and is only possible if the program only consists of distributed operations.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
execute(jobName: String): JobExecutionResult
Triggers the program execution.
Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.
The program execution will be logged and displayed with the given name.
- returns
The result of the job execution, containing elapsed time and accumulators.
-
def
execute(): JobExecutionResult
Triggers the program execution.
Triggers the program execution. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.
The program execution will be logged and displayed with a generated default name.
- returns
The result of the job execution, containing elapsed time and accumulators.
-
def
executeAsync(jobName: String): JobClient
Triggers the program execution asynchronously.
Triggers the program execution asynchronously. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.
The program execution will be logged and displayed with the given name.
ATTENTION: The caller of this method is responsible for managing the lifecycle of the returned JobClient. This means calling JobClient#close() at the end of its usage. In other case, there may be resource leaks depending on the JobClient implementation.
- returns
A JobClient that can be used to communicate with the submitted job, completed on submission succeeded.
- Annotations
- @PublicEvolving()
-
def
executeAsync(): JobClient
Triggers the program execution asynchronously.
Triggers the program execution asynchronously. The environment will execute all parts of the program that have resulted in a "sink" operation. Sink operations are for example printing results DataSet.print, writing results (e.g. DataSet.writeAsText, DataSet.write, or other generic data sinks created with DataSet.output.
The program execution will be logged and displayed with a generated default name.
ATTENTION: The caller of this method is responsible for managing the lifecycle of the returned JobClient. This means calling JobClient#close() at the end of its usage. In other case, there may be resource leaks depending on the JobClient implementation.
- returns
A JobClient that can be used to communicate with the submitted job, completed on submission succeeded.
- Annotations
- @PublicEvolving()
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
fromCollection[T](data: Iterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]
Creates a DataSet from the given Iterator.
Creates a DataSet from the given Iterator.
Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
-
def
fromCollection[T](data: Iterable[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]
Creates a DataSet from the given non-empty Iterable.
Creates a DataSet from the given non-empty Iterable.
Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
-
def
fromElements[T](data: T*)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]
Creates a new data set that contains the given elements.
Creates a new data set that contains the given elements.
* Note that this operation will result in a non-parallel data source, i.e. a data source with a parallelism of one.
-
def
fromParallelCollection[T](iterator: SplittableIterator[T])(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]
Creates a new data set that contains elements in the iterator.
Creates a new data set that contains elements in the iterator. The iterator is splittable, allowing the framework to create a parallel data source that returns the elements in the iterator.
-
def
generateSequence(from: Long, to: Long): DataSet[Long]
Creates a new data set that contains a sequence of numbers.
Creates a new data set that contains a sequence of numbers. The data set will be created in parallel, so there is no guarantee about the oder of the elements.
- from
The number to start at (inclusive).
- to
The number to stop at (inclusive).
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getConfig: ExecutionConfig
Gets the config object.
-
def
getExecutionPlan(): String
Creates the plan with which the system will execute the program, and returns it as a String using a JSON representation of the execution data flow graph.
-
def
getJavaEnv: java.ExecutionEnvironment
- returns
the Java Execution environment.
-
def
getLastJobExecutionResult: JobExecutionResult
Gets the JobExecutionResult of the last executed job.
-
def
getParallelism: Int
Returns the default parallelism for this execution environment.
Returns the default parallelism for this execution environment. Note that this value can be overridden by individual operations using DataSet.setParallelism
-
def
getRestartStrategy: RestartStrategyConfiguration
Returns the specified restart strategy configuration.
Returns the specified restart strategy configuration.
- returns
The restart strategy configuration to be used
- Annotations
- @PublicEvolving()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
readCsvFile[T](filePath: String, lineDelimiter: String = "\n", fieldDelimiter: String = ",", quoteCharacter: Character = null, ignoreFirstLine: Boolean = false, ignoreComments: String = null, lenient: Boolean = false, includedFields: Array[Int] = null, pojoFields: Array[String] = null)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]
Creates a DataSet by reading the given CSV file.
Creates a DataSet by reading the given CSV file. The type parameter must be used to specify a Tuple type that has the same number of fields as there are fields in the CSV file. If the number of fields in the CSV file is not the same, the
includedFields
parameter can be used to only read specific fields.- filePath
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
- lineDelimiter
The string that separates lines, defaults to newline.
- fieldDelimiter
The string that separates individual fields, defaults to ",".
- quoteCharacter
The character to use for quoted String parsing, disabled by default.
- ignoreFirstLine
Whether the first line in the file should be ignored.
- ignoreComments
Lines that start with the given String are ignored, disabled by default.
- lenient
Whether the parser should silently ignore malformed lines.
- includedFields
The fields in the file that should be read. Per default all fields are read.
- pojoFields
The fields of the POJO which are mapped to CSV fields.
-
def
readFile[T](inputFormat: FileInputFormat[T], filePath: String)(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]
Creates a new DataSource by reading the specified file using the custom org.apache.flink.api.common.io.FileInputFormat.
-
def
readFileOfPrimitives[T](filePath: String, delimiter: String = "\n")(implicit arg0: ClassTag[T], arg1: TypeInformation[T]): DataSet[T]
Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple.
Creates a DataSet that represents the primitive type produced by reading the given file in delimited way.This method is similar to readCsvFile with single field, but it produces a DataSet not through Tuple. The type parameter must be used to specify the primitive type.
- filePath
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
- delimiter
The string that separates primitives , defaults to newline.
-
def
readTextFile(filePath: String, charsetName: String = "UTF-8"): DataSet[String]
Creates a DataSet of Strings produced by reading the given file line wise.
Creates a DataSet of Strings produced by reading the given file line wise.
- filePath
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
- charsetName
The name of the character set used to read the file. Default is UTF-8
-
def
readTextFileWithValue(filePath: String, charsetName: String = "UTF-8"): DataSet[StringValue]
Creates a DataSet of Strings produced by reading the given file line wise.
Creates a DataSet of Strings produced by reading the given file line wise. This method is similar to readTextFile, but it produces a DataSet with mutable StringValue objects, rather than Java Strings. StringValues can be used to tune implementations to be less object and garbage collection heavy.
- filePath
The path of the file, as a URI (e.g., "file:///some/local/file" or "hdfs://host:port/file/path").
- charsetName
The name of the character set used to read the file. Default is UTF-8
-
def
registerCachedFile(filePath: String, name: String, executable: Boolean = false): Unit
Registers a file at the distributed cache under the given name.
Registers a file at the distributed cache under the given name. The file will be accessible from any user-defined function in the (distributed) runtime under a local path. Files may be local files (which will be distributed via BlobServer), or files in a distributed file system. The runtime will copy the files temporarily to a local cache, if needed.
The org.apache.flink.api.common.functions.RuntimeContext can be obtained inside UDFs via org.apache.flink.api.common.functions.RichFunction#getRuntimeContext and provides access via org.apache.flink.api.common.functions.RuntimeContext#getDistributedCache
- filePath
The path of the file, as a URI (e.g. "file:///some/path" or "hdfs://host:port/and/path")
- name
The name under which the file is registered.
- executable
Flag indicating whether the file should be executable
-
def
registerJobListener(jobListener: JobListener): Unit
Register a JobListener in this environment.
Register a JobListener in this environment. The JobListener will be notified on specific job status changed.
- Annotations
- @PublicEvolving()
-
def
registerType(typeClass: Class[_]): Unit
Registers the given type with the serialization stack.
Registers the given type with the serialization stack. If the type is eventually serialized as a POJO, then the type is registered with the POJO serializer. If the type ends up being serialized with Kryo, then it will be registered at Kryo to make sure that only tags are written.
-
def
registerTypeWithKryoSerializer(clazz: Class[_], serializer: Class[_ <: Serializer[_]]): Unit
Registers the given type with the serializer at the KryoSerializer.
-
def
registerTypeWithKryoSerializer[T <: Serializer[_] with Serializable](clazz: Class[_], serializer: T): Unit
Registers the given type with the serializer at the KryoSerializer.
Registers the given type with the serializer at the KryoSerializer.
Note that the serializer instance must be serializable (as defined by java.io.Serializable), because it may be distributed to the worker nodes by java serialization.
-
def
setParallelism(parallelism: Int): Unit
Sets the parallelism (parallelism) for operations executed through this environment.
Sets the parallelism (parallelism) for operations executed through this environment. Setting a parallelism of x here will cause all operators (such as join, map, reduce) to run with x parallel instances. This value can be overridden by specific operations using DataSet.setParallelism.
-
def
setRestartStrategy(restartStrategyConfiguration: RestartStrategyConfiguration): Unit
Sets the restart strategy configuration.
Sets the restart strategy configuration. The configuration specifies which restart strategy will be used for the execution graph in case of a restart.
- restartStrategyConfiguration
Restart strategy configuration to be set
- Annotations
- @PublicEvolving()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
- def union[T](sets: Seq[DataSet[T]]): DataSet[T]
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
Deprecated Value Members
-
def
getNumberOfExecutionRetries: Int
Gets the number of times the system will try to re-execute failed tasks.
Gets the number of times the system will try to re-execute failed tasks. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.
- Annotations
- @Deprecated @PublicEvolving()
- Deprecated
This method will be replaced by getRestartStrategy. The FixedDelayRestartStrategyConfiguration contains the number of execution retries.
-
def
setNumberOfExecutionRetries(numRetries: Int): Unit
Sets the number of times that failed tasks are re-executed.
Sets the number of times that failed tasks are re-executed. A value of zero effectively disables fault tolerance. A value of "-1" indicates that the system default value (as defined in the configuration) should be used.
- Annotations
- @Deprecated @PublicEvolving()
- Deprecated
This method will be replaced by setRestartStrategy(). The FixedDelayRestartStrategyConfiguration contains the number of execution retries.