Package

org.apache.flink.api

scala

Permalink

package scala

The Flink Scala API. org.apache.flink.api.scala.ExecutionEnvironment is the starting-point of any Flink program. It can be used to read from local files, HDFS, or other sources. org.apache.flink.api.scala.DataSet is the main abstraction of data in Flink. It provides operations that create new DataSets via transformations. org.apache.flink.api.scala.GroupedDataSet provides operations on grouped data that results from org.apache.flink.api.scala.DataSet.groupBy().

Use org.apache.flink.api.scala.ExecutionEnvironment.getExecutionEnvironment to obtain an execution environment. This will either create a local environment or a remote environment, depending on the context where your program is executing.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. scala
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class AggregateDataSet[T] extends DataSet[T]

    Permalink

    The result of DataSet.aggregate.

    The result of DataSet.aggregate. This can be used to chain more aggregations to the one aggregate operator.

    T

    The type of the DataSet, i.e., the type of the elements of the DataSet.

    Annotations
    @Public()
  2. class CoGroupDataSet[L, R] extends DataSet[(Array[L], Array[R])]

    Permalink

    A specific DataSet that results from a coGroup operation.

    A specific DataSet that results from a coGroup operation. The result of a default coGroup is a tuple containing two arrays of values from the two sides of the coGroup. The result of the coGroup can be changed by specifying a custom coGroup function using the apply method or by providing a RichCoGroupFunction.

    Example:

    val left = ...
    val right = ...
    val coGroupResult = left.coGroup(right).where(0, 2).isEqualTo(0, 1) {
      (left, right) => new MyCoGroupResult(left.min, right.max)
    }

    Or, using key selector functions with tuple data types:

    val left = ...
    val right = ...
    val coGroupResult = left.coGroup(right).where({_._1}).isEqualTo({_._1) {
      (left, right) => new MyCoGroupResult(left.max, right.min)
    }
    L

    Type of the left input of the coGroup.

    R

    Type of the right input of the coGroup.

    Annotations
    @Public()
  3. class CrossDataSet[L, R] extends DataSet[(L, R)]

    Permalink

    A specific DataSet that results from a cross operation.

    A specific DataSet that results from a cross operation. The result of a default cross is a tuple containing the two values from the two sides of the cartesian product. The result of the cross can be changed by specifying a custom cross function using the apply method or by providing a RichCrossFunction.

    Example:

    val left = ...
    val right = ...
    val crossResult = left.cross(right) {
      (left, right) => new MyCrossResult(left, right)
    }
    L

    Type of the left input of the cross.

    R

    Type of the right input of the cross.

    Annotations
    @Public()
  4. class DataSet[T] extends AnyRef

    Permalink

    The DataSet, the basic abstraction of Flink.

    The DataSet, the basic abstraction of Flink. This represents a collection of elements of a specific type T. The operations in this class can be used to create new DataSets and to combine two DataSets. The methods of ExecutionEnvironment can be used to create a DataSet from an external source, such as files in HDFS. The write* methods can be used to write the elements to storage.

    All operations accept either a lambda function or an operation-specific function object for specifying the operation. For example, using a lambda:

    val input: DataSet[String] = ...
    val mapped = input flatMap { _.split(" ") }

    And using a MapFunction:

    val input: DataSet[String] = ...
    val mapped = input flatMap { new FlatMapFunction[String, String] {
      def flatMap(in: String, out: Collector[String]): Unit = {
        in.split(" ") foreach { out.collect(_) }
      }
    }

    A rich function can be used when more control is required, for example for accessing the RuntimeContext. The rich function for flatMap is RichFlatMapFunction, all other functions are named similarly. All functions are available in package org.apache.flink.api.common.functions.

    The elements are partitioned depending on the parallelism of the ExecutionEnvironment or of one specific DataSet.

    Most of the operations have an implicit TypeInformation parameter. This is supplied by an implicit conversion in the flink.api.scala Package. For this to work, createTypeInformation needs to be imported. This is normally achieved with a

    import org.apache.flink.api.scala._
    T

    The type of the DataSet, i.e., the type of the elements of the DataSet.

    Annotations
    @Public()
  5. class ExecutionEnvironment extends AnyRef

    Permalink

    The ExecutionEnvironment is the context in which a program is executed.

    The ExecutionEnvironment is the context in which a program is executed. A local environment will cause execution in the current JVM, a remote environment will cause execution on a remote cluster installation.

    The environment provides methods to control the job execution (such as setting the parallelism) and to interact with the outside world (data access).

    To get an execution environment use the methods on the companion object:

    Use ExecutionEnvironment#getExecutionEnvironment to get the correct environment depending on where the program is executed. If it is run inside an IDE a local environment will be created. If the program is submitted to a cluster a remote execution environment will be created.

    Annotations
    @Public()
  6. class GroupedDataSet[T] extends AnyRef

    Permalink

    A DataSet to which a grouping key was added.

    A DataSet to which a grouping key was added. Operations work on groups of elements with the same key (aggregate, reduce, and reduceGroup).

    A secondary sort order can be added with sortGroup, but this is only used when using one of the group-at-a-time operations, i.e. reduceGroup.

    Annotations
    @Public()
  7. class JoinDataSet[L, R] extends DataSet[(L, R)] with JoinFunctionAssigner[L, R]

    Permalink

    A specific DataSet that results from a join operation.

    A specific DataSet that results from a join operation. The result of a default join is a tuple containing the two values from the two sides of the join. The result of the join can be changed by specifying a custom join function using the apply method or by providing a RichFlatJoinFunction.

    Example:

    val left = ...
    val right = ...
    val joinResult = left.join(right).where(0, 2).equalTo(0, 1) {
      (left, right) => new MyJoinResult(left, right)
    }

    Or, using key selector functions with tuple data types:

    val left = ...
    val right = ...
    val joinResult = left.join(right).where({_._1}).equalTo({_._1) {
      (left, right) => new MyJoinResult(left, right)
    }
    L

    Type of the left input of the join.

    R

    Type of the right input of the join.

    Annotations
    @Public()
  8. trait JoinFunctionAssigner[L, R] extends AnyRef

    Permalink
    Annotations
    @Public()
  9. class PartitionSortedDataSet[T] extends DataSet[T]

    Permalink

    The result of DataSet.sortPartition.

    The result of DataSet.sortPartition. This can be used to append additional sort fields to the one sort-partition operator.

    T

    The type of the DataSet, i.e., the type of the elements of the DataSet.

    Annotations
    @Public()
  10. class SelectByMaxFunction[T] extends ReduceFunction[T]

    Permalink

    SelectByMaxFunction to work with Scala tuples

    SelectByMaxFunction to work with Scala tuples

    Annotations
    @Internal()
  11. class SelectByMinFunction[T] extends ReduceFunction[T]

    Permalink

    SelectByMinFunction to work with Scala tuples

    SelectByMinFunction to work with Scala tuples

    Annotations
    @Internal()
  12. class Tuple2CaseClassSerializer[T1, T2] extends ScalaCaseClassSerializer[(T1, T2)] with SelfResolvingTypeSerializer[(T1, T2)]

    Permalink
  13. class UnfinishedCoGroupOperation[L, R] extends UnfinishedKeyPairOperation[L, R, CoGroupDataSet[L, R]]

    Permalink

    An unfinished coGroup operation that results from DataSet.coGroup The keys for the left and right side must be specified using first where and then equalTo.

    An unfinished coGroup operation that results from DataSet.coGroup The keys for the left and right side must be specified using first where and then equalTo. For example:

    val left = ...
    val right = ...
    val coGroupResult = left.coGroup(right).where(...).equalTo(...)
    L

    The type of the left input of the coGroup.

    R

    The type of the right input of the coGroup.

    Annotations
    @Public()
  14. class UnfinishedJoinOperation[L, R] extends UnfinishedJoinOperationBase[L, R, JoinDataSet[L, R]]

    Permalink

    An unfinished inner join operation that results from calling DataSet.join().

    An unfinished inner join operation that results from calling DataSet.join(). The keys for the left and right side must be specified using first where and then equalTo.

    For example:

    val left = ...
    val right = ...
    val joinResult = left.join(right).where(...).equalTo(...)
    L

    The type of the left input of the join.

    R

    The type of the right input of the join.

    Annotations
    @Public()
  15. class UnfinishedOuterJoinOperation[L, R] extends UnfinishedJoinOperationBase[L, R, JoinFunctionAssigner[L, R]]

    Permalink

    An unfinished outer join operation that results from calling, e.g.

    An unfinished outer join operation that results from calling, e.g. DataSet.fullOuterJoin(). The keys for the left and right side must be specified using first where and then equalTo.

    Note that a join function must always be specified explicitly when construction an outer join operator.

    For example:

    val left = ...
    val right = ...
    val joinResult = left.fullOuterJoin(right).where(...).equalTo(...) {
      (first, second) => ...
    }
    L

    The type of the left input of the join.

    R

    The type of the right input of the join.

    Annotations
    @Public()

Value Members

  1. object ClosureCleaner

    Permalink
    Annotations
    @Internal()
  2. object ExecutionEnvironment

    Permalink
    Annotations
    @Public()
  3. def createTuple2TypeInformation[T1, T2](t1: TypeInformation[T1], t2: TypeInformation[T2]): TypeInformation[(T1, T2)]

    Permalink
  4. implicit macro def createTypeInformation[T]: TypeInformation[T]

    Permalink
  5. package extensions

    Permalink

    acceptPartialFunctions extends the original DataSet with methods with unique names that delegate to core higher-order functions (e.g.

    acceptPartialFunctions extends the original DataSet with methods with unique names that delegate to core higher-order functions (e.g. map) so that we can work around the fact that overloaded methods taking functions as parameters can't accept partial functions as well. This enables the possibility to directly apply pattern matching to decompose inputs such as tuples, case classes and collections.

    The following is a small example that showcases how this extensions would work on a Flink data set:

    object Main {
      import org.apache.flink.api.scala.extensions._
      case class Point(x: Double, y: Double)
      def main(args: Array[String]): Unit = {
        val env = ExecutionEnvironment.getExecutionEnvironment
        val ds = env.fromElements(Point(1, 2), Point(3, 4), Point(5, 6))
        ds.filterWith {
          case Point(x, _) => x > 1
        }.reduceWith {
          case (Point(x1, y1), (Point(x2, y2))) => Point(x1 + y1, x2 + y2)
        }.mapWith {
          case Point(x, y) => (x, y)
        }.flatMapWith {
          case (x, y) => Seq('x' -> x, 'y' -> y)
        }.groupingBy {
          case (id, value) => id
        }
      }
    }

    The extension consists of several implicit conversions over all the data set representations that could gain from this feature. To use this set of extensions methods the user has to explicitly opt-in by importing org.apache.flink.api.scala.extensions.acceptPartialFunctions.

    For more information and usage examples please consult the Apache Flink official documentation.

  6. def getCallLocationName(depth: Int = 3): String

    Permalink
  7. package metrics

    Permalink
  8. package operators

    Permalink
  9. implicit val scalaNothingTypeInfo: TypeInformation[Nothing]

    Permalink
  10. package typeutils

    Permalink
  11. package utils

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped