This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.
All Flink Scala APIs are deprecated and will be removed in a future Flink version. You can still build your application in Scala, but you should move to the Java version of either the DataStream and/or Table API.
Scala API Extensions #
In order to keep a fair amount of consistency between the Scala and Java APIs, some of the features that allow a high-level of expressiveness in Scala have been left out from the standard APIs for both batch and streaming.
If you want to enjoy the full Scala experience you can choose to opt-in to extensions that enhance the Scala API via implicit conversions.
To use all the available extensions, you can just add a simple import
for the DataStream API
import org.apache.flink.streaming.api.scala.extensions._
Alternatively, you can import individual extensions a-là-carte to only use those you prefer.
Accept partial functions #
Normally, the DataStream API does not accept anonymous pattern matching functions to deconstruct tuples, case classes or collections, like the following:
val data: DataStream[(Int, String, Double)] = // [...]
data.map {
case (id, name, temperature) => // [...]
// The previous line causes the following compilation error:
// "The argument types of an anonymous function must be fully known. (SLS 8.5)"
}
This extension introduces new methods in the DataStream Scala API that have a one-to-one correspondence in the extended API. These delegating methods do support anonymous pattern matching functions.
DataStream API #
Method | Original | Example |
---|---|---|
mapWith | map (DataStream) |
|
flatMapWith | flatMap (DataStream) |
|
filterWith | filter (DataStream) |
|
keyingBy | keyBy (DataStream) |
|
mapWith | map (ConnectedDataStream) |
|
flatMapWith | flatMap (ConnectedDataStream) |
|
keyingBy | keyBy (ConnectedDataStream) |
|
reduceWith | reduce (KeyedStream, WindowedStream) |
|
projecting | apply (JoinedStream) |
|
For more information on the semantics of each method, please refer to the DataStream API documentation.
To use this extension exclusively, you can add the following import
:
import org.apache.flink.api.scala.extensions.acceptPartialFunctions
for the DataSet extensions and
import org.apache.flink.streaming.api.scala.extensions.acceptPartialFunctions
The following snippet shows a minimal example of how to use these extension methods together (with the DataSet API):
object Main {
import org.apache.flink.streaming.api.scala.extensions._
case class Point(x: Double, y: Double)
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val ds = env.fromElements(Point(1, 2), Point(3, 4), Point(5, 6))
ds.filterWith {
case Point(x, _) => x > 1
}.reduceWith {
case (Point(x1, y1), (Point(x2, y2))) => Point(x1 + y1, x2 + y2)
}.mapWith {
case Point(x, y) => (x, y)
}.flatMapWith {
case (x, y) => Seq("x" -> x, "y" -> y)
}.keyingBy {
case (id, value) => id
}
}
}