Class

org.apache.flink.api.scala

AggregateDataSet

Related Doc: package scala

Permalink

class AggregateDataSet[T] extends DataSet[T]

The result of DataSet.aggregate. This can be used to chain more aggregations to the one aggregate operator.

T

The type of the DataSet, i.e., the type of the elements of the DataSet.

Annotations
@Public()
Linear Supertypes
DataSet[T], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. AggregateDataSet
  2. DataSet
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new AggregateDataSet(set: ScalaAggregateOperator[T])(implicit arg0: ClassTag[T])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def aggregate(agg: Aggregations, field: String): AggregateDataSet[T]

    Permalink

    Creates a new DataSet by aggregating the specified field using the given aggregation function.

    Creates a new DataSet by aggregating the specified field using the given aggregation function. Since this is not a keyed DataSet the aggregation will be performed on the whole collection of elements.

    This only works on CaseClass DataSets.

    Definition Classes
    DataSet
  5. def aggregate(agg: Aggregations, field: Int): AggregateDataSet[T]

    Permalink

    Creates a new DataSet by aggregating the specified tuple field using the given aggregation function.

    Creates a new DataSet by aggregating the specified tuple field using the given aggregation function. Since this is not a keyed DataSet the aggregation will be performed on the whole collection of elements.

    This only works on Tuple DataSets.

    Definition Classes
    DataSet
  6. def and(agg: Aggregations, field: String): AggregateDataSet[T]

    Permalink

    Adds the given aggregation on the given field to the previous aggregation operation.

    Adds the given aggregation on the given field to the previous aggregation operation.

    This only works on CaseClass DataSets.

  7. def and(agg: Aggregations, field: Int): AggregateDataSet[T]

    Permalink

    Adds the given aggregation on the given field to the previous aggregation operation.

    Adds the given aggregation on the given field to the previous aggregation operation.

    This only works on Tuple DataSets.

  8. def andMax(field: String): AggregateDataSet[T]

    Permalink

    Syntactic sugar for and with MAX

  9. def andMax(field: Int): AggregateDataSet[T]

    Permalink

    Syntactic sugar for and with MAX

  10. def andMin(field: String): AggregateDataSet[T]

    Permalink

    Syntactic sugar for and with MIN

  11. def andMin(field: Int): AggregateDataSet[T]

    Permalink

    Syntactic sugar for and with MIN

  12. def andSum(field: String): AggregateDataSet[T]

    Permalink

    Syntactic sugar for and with SUM

  13. def andSum(field: Int): AggregateDataSet[T]

    Permalink

    Syntactic sugar for and with SUM

  14. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  15. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  16. def coGroup[O](other: DataSet[O])(implicit arg0: ClassTag[O]): UnfinishedCoGroupOperation[T, O]

    Permalink

    For each key in this DataSet and the other DataSet, create a tuple containing a list of elements for that key from both DataSets.

    For each key in this DataSet and the other DataSet, create a tuple containing a list of elements for that key from both DataSets. To specify the join keys the where and isEqualTo methods must be used. For example:

    val left: DataSet[(String, Int, Int)] = ...
    val right: DataSet[(Int, String, Int)] = ...
    val coGrouped = left.coGroup(right).where(0).isEqualTo(1)

    A custom coGroup function can be used if more control over the result is required. This can either be given as a lambda or a custom CoGroupFunction. For example:

    val left: DataSet[(String, Int, Int)] = ...
    val right: DataSet[(Int, String, Int)] = ...
    val coGrouped = left.coGroup(right).where(0).isEqualTo(1) { (l, r) =>
      // l and r are of type Iterator
      (l.min, r.max)
    }

    A coGroup function with a Collector can be used to implement a filter directly in the coGroup or to output more than one values. This type of coGroup function does not return a value, instead values are emitted using the collector

    val left: DataSet[(String, Int, Int)] = ...
    val right: DataSet[(Int, String, Int)] = ...
    val coGrouped = left.coGroup(right).where(0).isEqualTo(1) {
      (l, r, out: Collector[(String, Int)]) =>
        out.collect((l.min, r.max))
        out.collect(l.max, r.min))
      }
    Definition Classes
    DataSet
  17. def collect(): Seq[T]

    Permalink

    Convenience method to get the elements of a DataSet as a List As DataSet can contain a lot of data, this method should be used with caution.

    Convenience method to get the elements of a DataSet as a List As DataSet can contain a lot of data, this method should be used with caution.

    returns

    A Seq containing the elements of the DataSet

    Definition Classes
    DataSet
    Annotations
    @throws( classOf[Exception] )
    See also

    org.apache.flink.api.java.Utils.CollectHelper

  18. def combineGroup[R](fun: (Iterator[T], Collector[R]) ⇒ Unit)(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Applies a GroupCombineFunction on a grouped DataSet.

    Applies a GroupCombineFunction on a grouped DataSet. A GroupCombineFunction is similar to a GroupReduceFunction but does not perform a full data exchange. Instead, the GroupCombineFunction calls the combine method once per partition for combining a group of results. This operator is suitable for combining values into an intermediate format before doing a proper groupReduce where the data is shuffled across the node for further reduction. The GroupReduce operator can also be supplied with a combiner by implementing the RichGroupReduce function. The combine method of the RichGroupReduce function demands input and output type to be the same. The GroupCombineFunction, on the other side, can have an arbitrary output type.

    Definition Classes
    DataSet
  19. def combineGroup[R](combiner: GroupCombineFunction[T, R])(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Applies a GroupCombineFunction on a grouped DataSet.

    Applies a GroupCombineFunction on a grouped DataSet. A GroupCombineFunction is similar to a GroupReduceFunction but does not perform a full data exchange. Instead, the GroupCombineFunction calls the combine method once per partition for combining a group of results. This operator is suitable for combining values into an intermediate format before doing a proper groupReduce where the data is shuffled across the node for further reduction. The GroupReduce operator can also be supplied with a combiner by implementing the RichGroupReduce function. The combine method of the RichGroupReduce function demands input and output type to be the same. The GroupCombineFunction, on the other side, can have an arbitrary output type.

    Definition Classes
    DataSet
  20. def count(): Long

    Permalink

    Convenience method to get the count (number of elements) of a DataSet

    Convenience method to get the count (number of elements) of a DataSet

    returns

    A long integer that represents the number of elements in the set

    Definition Classes
    DataSet
    Annotations
    @throws( classOf[Exception] )
    See also

    org.apache.flink.api.java.Utils.CountHelper

  21. def cross[O](other: DataSet[O]): CrossDataSet[T, O]

    Permalink

    Creates a new DataSet by forming the cartesian product of this DataSet and the other DataSet.

    Creates a new DataSet by forming the cartesian product of this DataSet and the other DataSet.

    The default cross result is a DataSet with 2-Tuples of the combined values. A custom cross function can be used if more control over the result is required. This can either be given as a lambda or a custom CrossFunction. For example:

    val left: DataSet[(String, Int, Int)] = ...
    val right: DataSet[(Int, String, Int)] = ...
    val product = left.cross(right) { (l, r) => (l._2, r._3) }
    }
    Definition Classes
    DataSet
  22. def crossWithHuge[O](other: DataSet[O]): CrossDataSet[T, O]

    Permalink

    Special cross operation for explicitly telling the system that the left side is assumed to be a lot smaller than the right side of the cartesian product.

    Special cross operation for explicitly telling the system that the left side is assumed to be a lot smaller than the right side of the cartesian product.

    Definition Classes
    DataSet
  23. def crossWithTiny[O](other: DataSet[O]): CrossDataSet[T, O]

    Permalink

    Special cross operation for explicitly telling the system that the right side is assumed to be a lot smaller than the left side of the cartesian product.

    Special cross operation for explicitly telling the system that the right side is assumed to be a lot smaller than the left side of the cartesian product.

    Definition Classes
    DataSet
  24. def distinct(firstField: String, otherFields: String*): DataSet[T]

    Permalink

    Returns a distinct set of this DataSet using expression keys.

    Returns a distinct set of this DataSet using expression keys.

    The field position keys specify the fields of Tuples or Pojos on which the decision is made if two elements are distinct or not.

    The field expression keys specify the fields of a org.apache.flink.api.common.typeutils.CompositeType (e.g., Tuple or Pojo type) on which the decision is made if two elements are distinct or not. In case of a org.apache.flink.api.common.typeinfo.AtomicType, only the wildcard expression ("_") is valid.

    firstField

    First field position on which the distinction of the DataSet is decided

    otherFields

    Zero or more field positions on which the distinction of the DataSet is decided

    Definition Classes
    DataSet
  25. def distinct(fields: Int*): DataSet[T]

    Permalink

    Returns a distinct set of a tuple DataSet using field position keys.

    Returns a distinct set of a tuple DataSet using field position keys.

    The field position keys specify the fields of Tuples on which the decision is made if two Tuples are distinct or not.

    Note: Field position keys can only be specified for Tuple DataSets.

    fields

    One or more field positions on which the distinction of the DataSet is decided.

    Definition Classes
    DataSet
  26. def distinct(): DataSet[T]

    Permalink

    Returns a distinct set of this DataSet.

    Returns a distinct set of this DataSet.

    If the input is a composite type (Tuple or Pojo type), distinct is performed on all fields and each field must be a key type.

    Definition Classes
    DataSet
  27. def distinct[K](fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]

    Permalink

    Creates a new DataSet containing the distinct elements of this DataSet.

    Creates a new DataSet containing the distinct elements of this DataSet. The decision whether two elements are distinct or not is made using the return value of the given function.

    fun

    The function which extracts the key values from the DataSet on which the distinction of the DataSet is decided.

    Definition Classes
    DataSet
  28. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  29. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  30. def filter(fun: (T) ⇒ Boolean): DataSet[T]

    Permalink

    Creates a new DataSet that contains only the elements satisfying the given filter predicate.

    Creates a new DataSet that contains only the elements satisfying the given filter predicate.

    Definition Classes
    DataSet
  31. def filter(filter: FilterFunction[T]): DataSet[T]

    Permalink

    Creates a new DataSet that contains only the elements satisfying the given filter predicate.

    Creates a new DataSet that contains only the elements satisfying the given filter predicate.

    Definition Classes
    DataSet
  32. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  33. def first(n: Int): DataSet[T]

    Permalink

    Creates a new DataSet containing the first n elements of this DataSet.

    Creates a new DataSet containing the first n elements of this DataSet.

    Definition Classes
    DataSet
  34. def flatMap[R](fun: (T) ⇒ TraversableOnce[R])(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by applying the given function to every element and flattening the results.

    Creates a new DataSet by applying the given function to every element and flattening the results.

    Definition Classes
    DataSet
  35. def flatMap[R](fun: (T, Collector[R]) ⇒ Unit)(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by applying the given function to every element and flattening the results.

    Creates a new DataSet by applying the given function to every element and flattening the results.

    Definition Classes
    DataSet
  36. def flatMap[R](flatMapper: FlatMapFunction[T, R])(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by applying the given function to every element and flattening the results.

    Creates a new DataSet by applying the given function to every element and flattening the results.

    Definition Classes
    DataSet
  37. def fullOuterJoin[O](other: DataSet[O], strategy: JoinHint): UnfinishedOuterJoinOperation[T, O]

    Permalink

    Special fullOuterJoin operation for explicitly telling the system what join strategy to use.

    Special fullOuterJoin operation for explicitly telling the system what join strategy to use. If null is given as the join strategy, then the optimizer will pick the strategy.

    Definition Classes
    DataSet
  38. def fullOuterJoin[O](other: DataSet[O]): UnfinishedOuterJoinOperation[T, O]

    Permalink

    Creates a new DataSet by performing a full outer join of this DataSet with the other DataSet, by combining two elements of two DataSets on key equality.

    Creates a new DataSet by performing a full outer join of this DataSet with the other DataSet, by combining two elements of two DataSets on key equality. Elements of both DataSets that do not have a matching element on the opposing side are joined with null and emitted to the resulting DataSet.

    To specify the join keys the where and equalTo methods must be used. For example:

    val left: DataSet[(String, Int, Int)] = ...
    val right: DataSet[(Int, String, Int)] = ...
    val joined = left.fullOuterJoin(right).where(0).equalTo(1)

    When using an outer join you are required to specify a join function. For example:

     val joined = left.fullOuterJoin(right).where(0).equalTo(1) {
       (left, right) =>
         val a = if (left == null) null else left._1
         val b = if (right == null) null else right._3
         (a, b)
    }
    Definition Classes
    DataSet
  39. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  40. def getExecutionEnvironment: ExecutionEnvironment

    Permalink

    Returns the execution environment associated with the current DataSet.

    Returns the execution environment associated with the current DataSet.

    returns

    associated execution environment

    Definition Classes
    DataSet
  41. def getParallelism: Int

    Permalink

    Returns the parallelism of this operation.

    Returns the parallelism of this operation.

    Definition Classes
    DataSet
  42. def getType(): TypeInformation[T]

    Permalink

    Returns the TypeInformation for the elements of this DataSet.

    Returns the TypeInformation for the elements of this DataSet.

    Definition Classes
    DataSet
  43. def groupBy(firstField: String, otherFields: String*): GroupedDataSet[T]

    Permalink

    Creates a GroupedDataSet which provides operations on groups of elements.

    Creates a GroupedDataSet which provides operations on groups of elements. Elements are grouped based on the given fields.

    This will not create a new DataSet, it will just attach the field names which will be used for grouping when executing a grouped operation.

    Definition Classes
    DataSet
  44. def groupBy(fields: Int*): GroupedDataSet[T]

    Permalink

    Creates a GroupedDataSet which provides operations on groups of elements.

    Creates a GroupedDataSet which provides operations on groups of elements. Elements are grouped based on the given tuple fields.

    This will not create a new DataSet, it will just attach the tuple field positions which will be used for grouping when executing a grouped operation.

    This only works on Tuple DataSets.

    Definition Classes
    DataSet
  45. def groupBy[K](fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): GroupedDataSet[T]

    Permalink

    Creates a GroupedDataSet which provides operations on groups of elements.

    Creates a GroupedDataSet which provides operations on groups of elements. Elements are grouped based on the value returned by the given function.

    This will not create a new DataSet, it will just attach the key function which will be used for grouping when executing a grouped operation.

    Definition Classes
    DataSet
  46. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  47. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  48. def iterate(maxIterations: Int)(stepFunction: (DataSet[T]) ⇒ DataSet[T]): DataSet[T]

    Permalink

    Creates a new DataSet by performing bulk iterations using the given step function.

    Creates a new DataSet by performing bulk iterations using the given step function. The iterations terminate when maxIterations iterations have been performed.

    For example:

    val input: DataSet[(String, Int)] = ...
    val iterated = input.iterate(5) { previous =>
      val next = previous.map { x => (x._1, x._2 + 1) }
      next
    }

    This example will simply increase the second field of the tuple by 5.

    Definition Classes
    DataSet
  49. def iterateDelta[R](workset: DataSet[R], maxIterations: Int, keyFields: Array[String], solutionSetUnManaged: Boolean)(stepFunction: (DataSet[T], DataSet[R]) ⇒ (DataSet[T], DataSet[R]))(implicit arg0: ClassTag[R]): DataSet[T]

    Permalink

    Creates a new DataSet by performing delta (or workset) iterations using the given step function.

    Creates a new DataSet by performing delta (or workset) iterations using the given step function. At the beginning this DataSet is the solution set and workset is the Workset. The iteration step function gets the current solution set and workset and must output the delta for the solution set and the workset for the next iteration.

    Note: The syntax of delta iterations are very likely going to change soon.

    Definition Classes
    DataSet
  50. def iterateDelta[R](workset: DataSet[R], maxIterations: Int, keyFields: Array[String])(stepFunction: (DataSet[T], DataSet[R]) ⇒ (DataSet[T], DataSet[R]))(implicit arg0: ClassTag[R]): DataSet[T]

    Permalink

    Creates a new DataSet by performing delta (or workset) iterations using the given step function.

    Creates a new DataSet by performing delta (or workset) iterations using the given step function. At the beginning this DataSet is the solution set and workset is the Workset. The iteration step function gets the current solution set and workset and must output the delta for the solution set and the workset for the next iteration.

    Note: The syntax of delta iterations are very likely going to change soon.

    Definition Classes
    DataSet
  51. def iterateDelta[R](workset: DataSet[R], maxIterations: Int, keyFields: Array[Int], solutionSetUnManaged: Boolean)(stepFunction: (DataSet[T], DataSet[R]) ⇒ (DataSet[T], DataSet[R]))(implicit arg0: ClassTag[R]): DataSet[T]

    Permalink

    Creates a new DataSet by performing delta (or workset) iterations using the given step function.

    Creates a new DataSet by performing delta (or workset) iterations using the given step function. At the beginning this DataSet is the solution set and workset is the Workset. The iteration step function gets the current solution set and workset and must output the delta for the solution set and the workset for the next iteration.

    Note: The syntax of delta iterations are very likely going to change soon.

    Definition Classes
    DataSet
  52. def iterateDelta[R](workset: DataSet[R], maxIterations: Int, keyFields: Array[Int])(stepFunction: (DataSet[T], DataSet[R]) ⇒ (DataSet[T], DataSet[R]))(implicit arg0: ClassTag[R]): DataSet[T]

    Permalink

    Creates a new DataSet by performing delta (or workset) iterations using the given step function.

    Creates a new DataSet by performing delta (or workset) iterations using the given step function. At the beginning this DataSet is the solution set and workset is the Workset. The iteration step function gets the current solution set and workset and must output the delta for the solution set and the workset for the next iteration.

    Note: The syntax of delta iterations are very likely going to change soon.

    Definition Classes
    DataSet
  53. def iterateWithTermination(maxIterations: Int)(stepFunction: (DataSet[T]) ⇒ (DataSet[T], DataSet[_])): DataSet[T]

    Permalink

    Creates a new DataSet by performing bulk iterations using the given step function.

    Creates a new DataSet by performing bulk iterations using the given step function. The first DataSet the step function returns is the input for the next iteration, the second DataSet is the termination criterion. The iterations terminate when either the termination criterion DataSet contains no elements or when maxIterations iterations have been performed.

    For example:

    val input: DataSet[(String, Int)] = ...
    val iterated = input.iterateWithTermination(5) { previous =>
      val next = previous.map { x => (x._1, x._2 + 1) }
      val term = next.filter { _._2 <  3 }
      (next, term)
    }

    This example will simply increase the second field of the Tuples until they are no longer smaller than 3.

    Definition Classes
    DataSet
  54. def join[O](other: DataSet[O], strategy: JoinHint): UnfinishedJoinOperation[T, O]

    Permalink

    Special join operation for explicitly telling the system what join strategy to use.

    Special join operation for explicitly telling the system what join strategy to use. If null is given as the join strategy, then the optimizer will pick the strategy.

    Definition Classes
    DataSet
  55. def join[O](other: DataSet[O]): UnfinishedJoinOperation[T, O]

    Permalink

    Creates a new DataSet by joining this DataSet with the other DataSet.

    Creates a new DataSet by joining this DataSet with the other DataSet. To specify the join keys the where and equalTo methods must be used. For example:

    val left: DataSet[(String, Int, Int)] = ...
    val right: DataSet[(Int, String, Int)] = ...
    val joined = left.join(right).where(0).equalTo(1)

    The default join result is a DataSet with 2-Tuples of the joined values. In the above example that would be ((String, Int, Int), (Int, String, Int)). A custom join function can be used if more control over the result is required. This can either be given as a lambda or a custom JoinFunction. For example:

    val left: DataSet[(String, Int, Int)] = ...
    val right: DataSet[(Int, String, Int)] = ...
    val joined = left.join(right).where(0).equalTo(1) { (l, r) =>
      (l._1, r._2)
    }

    A join function with a Collector can be used to implement a filter directly in the join or to output more than one values. This type of join function does not return a value, instead values are emitted using the collector:

    val left: DataSet[(String, Int, Int)] = ...
    val right: DataSet[(Int, String, Int)] = ...
    val joined = left.join(right).where(0).equalTo(1) {
      (l, r, out: Collector[(String, Int)]) =>
        if (l._2 > 4) {
          out.collect((l._1, r._3))
          out.collect((l._1, r._1))
        } else {
          None
        }
      }
    Definition Classes
    DataSet
  56. def joinWithHuge[O](other: DataSet[O]): UnfinishedJoinOperation[T, O]

    Permalink

    Special join operation for explicitly telling the system that the left side is assumed to be a lot smaller than the right side of the join.

    Special join operation for explicitly telling the system that the left side is assumed to be a lot smaller than the right side of the join.

    Definition Classes
    DataSet
  57. def joinWithTiny[O](other: DataSet[O]): UnfinishedJoinOperation[T, O]

    Permalink

    Special join operation for explicitly telling the system that the right side is assumed to be a lot smaller than the left side of the join.

    Special join operation for explicitly telling the system that the right side is assumed to be a lot smaller than the left side of the join.

    Definition Classes
    DataSet
  58. def leftOuterJoin[O](other: DataSet[O], strategy: JoinHint): UnfinishedOuterJoinOperation[T, O]

    Permalink

    An outer join on the left side.

    An outer join on the left side.

    Elements of the left side (i.e. this) that do not have a matching element on the other side are joined with null and emitted to the resulting DataSet.

    other

    The other DataSet with which this DataSet is joined.

    strategy

    The strategy that should be used execute the join. If { @code null} is given, then the optimizer will pick the join strategy.

    returns

    An UnfinishedJoinOperation to continue with the definition of the join transformation

    Definition Classes
    DataSet
    See also

    #fullOuterJoin

  59. def leftOuterJoin[O](other: DataSet[O]): UnfinishedOuterJoinOperation[T, O]

    Permalink

    An outer join on the left side.

    An outer join on the left side.

    Elements of the left side (i.e. this) that do not have a matching element on the other side are joined with null and emitted to the resulting DataSet.

    other

    The other DataSet with which this DataSet is joined.

    returns

    An UnfinishedJoinOperation to continue with the definition of the join transformation

    Definition Classes
    DataSet
    See also

    #fullOuterJoin

  60. def map[R](fun: (T) ⇒ R)(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by applying the given function to every element of this DataSet.

    Creates a new DataSet by applying the given function to every element of this DataSet.

    Definition Classes
    DataSet
  61. def map[R](mapper: MapFunction[T, R])(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by applying the given function to every element of this DataSet.

    Creates a new DataSet by applying the given function to every element of this DataSet.

    Definition Classes
    DataSet
  62. def mapPartition[R](fun: (Iterator[T]) ⇒ TraversableOnce[R])(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by applying the given function to each parallel partition of the DataSet.

    Creates a new DataSet by applying the given function to each parallel partition of the DataSet.

    This function is intended for operations that cannot transform individual elements and requires no grouping of elements. To transform individual elements, the use of map and flatMap is preferable.

    Definition Classes
    DataSet
  63. def mapPartition[R](fun: (Iterator[T], Collector[R]) ⇒ Unit)(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by applying the given function to each parallel partition of the DataSet.

    Creates a new DataSet by applying the given function to each parallel partition of the DataSet.

    This function is intended for operations that cannot transform individual elements and requires no grouping of elements. To transform individual elements, the use of map and flatMap is preferable.

    Definition Classes
    DataSet
  64. def mapPartition[R](partitionMapper: MapPartitionFunction[T, R])(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by applying the given function to each parallel partition of the DataSet.

    Creates a new DataSet by applying the given function to each parallel partition of the DataSet.

    This function is intended for operations that cannot transform individual elements and requires no grouping of elements. To transform individual elements, the use of map and flatMap is preferable.

    Definition Classes
    DataSet
  65. def max(field: String): AggregateDataSet[T]

    Permalink

    Syntactic sugar for aggregate with MAX

    Syntactic sugar for aggregate with MAX

    Definition Classes
    DataSet
  66. def max(field: Int): AggregateDataSet[T]

    Permalink

    Syntactic sugar for aggregate with MAX

    Syntactic sugar for aggregate with MAX

    Definition Classes
    DataSet
  67. def maxBy(fields: Int*): DataSet[T]

    Permalink

    Selects an element with maximum value.

    Selects an element with maximum value.

    The maximum is computed over the specified fields in lexicographical order.

    Example 1: Given a data set with elements [0, 1], [1, 0], the results will be:

    maxBy(0)[1, 0]
    maxBy(1)[0, 1]

    Example 2: Given a data set with elements [0, 0], [0, 1], the results will be:

    maxBy(0, 1)[0, 1]

    If multiple values with maximum value at the specified fields exist, a random one will be picked Internally, this operation is implemented as a ReduceFunction.

    Definition Classes
    DataSet
  68. def min(field: String): AggregateDataSet[T]

    Permalink

    Syntactic sugar for aggregate with MIN

    Syntactic sugar for aggregate with MIN

    Definition Classes
    DataSet
  69. def min(field: Int): AggregateDataSet[T]

    Permalink

    Syntactic sugar for aggregate with MIN

    Syntactic sugar for aggregate with MIN

    Definition Classes
    DataSet
  70. def minBy(fields: Int*): DataSet[T]

    Permalink

    Selects an element with minimum value.

    Selects an element with minimum value.

    The minimum is computed over the specified fields in lexicographical order.

    Example 1: Given a data set with elements [0, 1], [1, 0], the results will be:

    minBy(0)[0, 1]
    minBy(1)[1, 0]

    Example 2: Given a data set with elements [0, 0], [0, 1], the results will be:

    minBy(0, 1)[0, 0]

    If multiple values with minimum value at the specified fields exist, a random one will be picked. Internally, this operation is implemented as a ReduceFunction

    Definition Classes
    DataSet
  71. def minResources: ResourceSpec

    Permalink

    Returns the minimum resources of this operation.

    Returns the minimum resources of this operation.

    Definition Classes
    DataSet
    Annotations
    @PublicEvolving()
  72. def name(name: String): DataSet[T]

    Permalink

    Sets the name of the DataSet.

    Sets the name of the DataSet. This will appear in logs and graphical representations of the execution graph.

    Definition Classes
    DataSet
  73. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  74. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  75. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  76. def output(outputFormat: OutputFormat[T]): DataSink[T]

    Permalink

    Emits this DataSet using a custom org.apache.flink.api.common.io.OutputFormat.

    Emits this DataSet using a custom org.apache.flink.api.common.io.OutputFormat.

    Definition Classes
    DataSet
  77. def partitionByHash[K](fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]

    Permalink

    Partitions a DataSet using the specified key selector function.

    Partitions a DataSet using the specified key selector function.

    Important:This operation shuffles the whole DataSet over the network and can take significant amount of time.

    Definition Classes
    DataSet
  78. def partitionByHash(firstField: String, otherFields: String*): DataSet[T]

    Permalink

    Hash-partitions a DataSet on the specified fields.

    Hash-partitions a DataSet on the specified fields.

    important: This operation shuffles the whole DataSet over the network and can take significant amount of time.

    Definition Classes
    DataSet
  79. def partitionByHash(fields: Int*): DataSet[T]

    Permalink

    Hash-partitions a DataSet on the specified tuple field positions.

    Hash-partitions a DataSet on the specified tuple field positions.

    important: This operation shuffles the whole DataSet over the network and can take significant amount of time.

    Definition Classes
    DataSet
  80. def partitionByRange[K](fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]

    Permalink

    Range-partitions a DataSet using the specified key selector function.

    Range-partitions a DataSet using the specified key selector function.

    important: This operation requires an extra pass over the DataSet to compute the range boundaries and shuffles the whole DataSet over the network. This can take significant amount of time.

    Definition Classes
    DataSet
  81. def partitionByRange(firstField: String, otherFields: String*): DataSet[T]

    Permalink

    Range-partitions a DataSet on the specified fields.

    Range-partitions a DataSet on the specified fields.

    important: This operation requires an extra pass over the DataSet to compute the range boundaries and shuffles the whole DataSet over the network. This can take significant amount of time.

    Definition Classes
    DataSet
  82. def partitionByRange(fields: Int*): DataSet[T]

    Permalink

    Range-partitions a DataSet on the specified tuple field positions.

    Range-partitions a DataSet on the specified tuple field positions.

    important: This operation requires an extra pass over the DataSet to compute the range boundaries and shuffles the whole DataSet over the network. This can take significant amount of time.

    Definition Classes
    DataSet
  83. def partitionCustom[K](partitioner: Partitioner[K], fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]

    Permalink

    Partitions a DataSet on the key returned by the selector, using a custom partitioner.

    Partitions a DataSet on the key returned by the selector, using a custom partitioner. This method takes the key selector to get the key to partition on, and a partitioner that accepts the key type.

    Note: This method works only on single field keys, i.e. the selector cannot return tuples of fields.

    Definition Classes
    DataSet
  84. def partitionCustom[K](partitioner: Partitioner[K], field: String)(implicit arg0: TypeInformation[K]): DataSet[T]

    Permalink

    Partitions a POJO DataSet on the specified key fields using a custom partitioner.

    Partitions a POJO DataSet on the specified key fields using a custom partitioner. This method takes the key expression to partition on, and a partitioner that accepts the key type.

    Note: This method works only on single field keys.

    Definition Classes
    DataSet
  85. def partitionCustom[K](partitioner: Partitioner[K], field: Int)(implicit arg0: TypeInformation[K]): DataSet[T]

    Permalink

    Partitions a tuple DataSet on the specified key fields using a custom partitioner.

    Partitions a tuple DataSet on the specified key fields using a custom partitioner. This method takes the key position to partition on, and a partitioner that accepts the key type.

    Note: This method works only on single field keys.

    Definition Classes
    DataSet
  86. def preferredResources: ResourceSpec

    Permalink

    Returns the preferred resources of this operation.

    Returns the preferred resources of this operation.

    Definition Classes
    DataSet
    Annotations
    @PublicEvolving()
  87. def print(): Unit

    Permalink

    Prints the elements in a DataSet to the standard output stream System.out of the JVM that calls the print() method.

    Prints the elements in a DataSet to the standard output stream System.out of the JVM that calls the print() method. For programs that are executed in a cluster, this method needs to gather the contents of the DataSet back to the client, to print it there.

    The string written for each element is defined by the AnyRef.toString method.

    This method immediately triggers the program execution, similar to the collect() and count() methods.

    Definition Classes
    DataSet
  88. def printOnTaskManager(prefix: String): DataSink[T]

    Permalink

    Writes a DataSet to the standard output streams (stdout) of the TaskManagers that execute the program (or more specifically, the data sink operators).

    Writes a DataSet to the standard output streams (stdout) of the TaskManagers that execute the program (or more specifically, the data sink operators). On a typical cluster setup, the data will appear in the TaskManagers' .out files.

    To print the data to the console or stdout stream of the client process instead, use the print() method.

    For each element of the DataSet the result of AnyRef.toString() is written.

    prefix

    The string to prefix each line of the output with. This helps identifying outputs from different printing sinks.

    returns

    The DataSink operator that writes the DataSet.

    Definition Classes
    DataSet
  89. def printToErr(): Unit

    Permalink

    Prints the elements in a DataSet to the standard error stream System.err of the JVM that calls the print() method.

    Prints the elements in a DataSet to the standard error stream System.err of the JVM that calls the print() method. For programs that are executed in a cluster, this method needs to gather the contents of the DataSet back to the client, to print it there.

    The string written for each element is defined by the AnyRef.toString method.

    This method immediately triggers the program execution, similar to the collect() and count() methods.

    Definition Classes
    DataSet
  90. def rebalance(): DataSet[T]

    Permalink

    Enforces a re-balancing of the DataSet, i.e., the DataSet is evenly distributed over all parallel instances of the following task.

    Enforces a re-balancing of the DataSet, i.e., the DataSet is evenly distributed over all parallel instances of the following task. This can help to improve performance in case of heavy data skew and compute intensive operations.

    Important: This operation shuffles the whole DataSet over the network and can take significant amount of time.

    returns

    The rebalanced DataSet.

    Definition Classes
    DataSet
  91. def reduce(fun: (T, T) ⇒ T): DataSet[T]

    Permalink

    Creates a new DataSet by merging the elements of this DataSet using an associative reduce function.

    Creates a new DataSet by merging the elements of this DataSet using an associative reduce function.

    Definition Classes
    DataSet
  92. def reduce(reducer: ReduceFunction[T]): DataSet[T]

    Permalink

    Creates a new DataSet by merging the elements of this DataSet using an associative reduce function.

    Creates a new DataSet by merging the elements of this DataSet using an associative reduce function.

    Definition Classes
    DataSet
  93. def reduceGroup[R](fun: (Iterator[T]) ⇒ R)(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by passing all elements in this DataSet to the group reduce function.

    Creates a new DataSet by passing all elements in this DataSet to the group reduce function.

    Definition Classes
    DataSet
  94. def reduceGroup[R](fun: (Iterator[T], Collector[R]) ⇒ Unit)(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by passing all elements in this DataSet to the group reduce function.

    Creates a new DataSet by passing all elements in this DataSet to the group reduce function. The function can output zero or more elements using the Collector. The concatenation of the emitted values will form the resulting DataSet.

    Definition Classes
    DataSet
  95. def reduceGroup[R](reducer: GroupReduceFunction[T, R])(implicit arg0: TypeInformation[R], arg1: ClassTag[R]): DataSet[R]

    Permalink

    Creates a new DataSet by passing all elements in this DataSet to the group reduce function.

    Creates a new DataSet by passing all elements in this DataSet to the group reduce function. The function can output zero or more elements using the Collector. The concatenation of the emitted values will form the resulting DataSet.

    Definition Classes
    DataSet
  96. def registerAggregator(name: String, aggregator: Aggregator[_]): DataSet[T]

    Permalink

    Registers an org.apache.flink.api.common.aggregators.Aggregator for the iteration.

    Registers an org.apache.flink.api.common.aggregators.Aggregator for the iteration. Aggregators can be used to maintain simple statistics during the iteration, such as number of elements processed. The aggregators compute global aggregates: After each iteration step, the values are globally aggregated to produce one aggregate that represents statistics across all parallel instances. The value of an aggregator can be accessed in the next iteration.

    Aggregators can be accessed inside a function via org.apache.flink.api.common.functions.AbstractRichFunction#getIterationRuntimeContext.

    name

    The name under which the aggregator is registered.

    aggregator

    The aggregator class.

    Definition Classes
    DataSet
    Annotations
    @PublicEvolving()
  97. def rightOuterJoin[O](other: DataSet[O], strategy: JoinHint): UnfinishedOuterJoinOperation[T, O]

    Permalink

    An outer join on the right side.

    An outer join on the right side.

    Elements of the right side (i.e. other) that do not have a matching element on this side are joined with null and emitted to the resulting DataSet.

    other

    The other DataSet with which this DataSet is joined.

    strategy

    The strategy that should be used execute the join. If { @code null} is given, then the optimizer will pick the join strategy.

    returns

    An UnfinishedJoinOperation to continue with the definition of the join transformation

    Definition Classes
    DataSet
    See also

    #fullOuterJoin

  98. def rightOuterJoin[O](other: DataSet[O]): UnfinishedOuterJoinOperation[T, O]

    Permalink

    An outer join on the right side.

    An outer join on the right side.

    Elements of the right side (i.e. other) that do not have a matching element on this side are joined with null and emitted to the resulting DataSet.

    other

    The other DataSet with which this DataSet is joined.

    returns

    An UnfinishedJoinOperation to continue with the definition of the join transformation

    Definition Classes
    DataSet
    See also

    #fullOuterJoin

  99. def setParallelism(parallelism: Int): DataSet[T]

    Permalink

    Sets the parallelism of this operation.

    Sets the parallelism of this operation. This must be greater than 1.

    Definition Classes
    DataSet
  100. def sortPartition[K](fun: (T) ⇒ K, order: Order)(implicit arg0: TypeInformation[K]): DataSet[T]

    Permalink

    Locally sorts the partitions of the DataSet on the extracted key in the specified order.

    Locally sorts the partitions of the DataSet on the extracted key in the specified order. The DataSet can be sorted on multiple values by returning a tuple from the KeySelector.

    Note that no additional sort keys can be appended to a KeySelector sort keys. To sort the partitions by multiple values using KeySelector, the KeySelector must return a tuple consisting of the values.

    Definition Classes
    DataSet
  101. def sortPartition(field: String, order: Order): DataSet[T]

    Permalink

    Locally sorts the partitions of the DataSet on the specified field in the specified order.

    Locally sorts the partitions of the DataSet on the specified field in the specified order. The DataSet can be sorted on multiple fields by chaining sortPartition() calls.

    Definition Classes
    DataSet
  102. def sortPartition(field: Int, order: Order): DataSet[T]

    Permalink

    Locally sorts the partitions of the DataSet on the specified field in the specified order.

    Locally sorts the partitions of the DataSet on the specified field in the specified order. The DataSet can be sorted on multiple fields by chaining sortPartition() calls.

    Definition Classes
    DataSet
  103. def sum(field: String): AggregateDataSet[T]

    Permalink

    Syntactic sugar for aggregate with SUM

    Syntactic sugar for aggregate with SUM

    Definition Classes
    DataSet
  104. def sum(field: Int): AggregateDataSet[T]

    Permalink

    Syntactic sugar for aggregate with SUM

    Syntactic sugar for aggregate with SUM

    Definition Classes
    DataSet
  105. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  106. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  107. def union(other: DataSet[T]): DataSet[T]

    Permalink

    Creates a new DataSet containing the elements from both this DataSet and the other DataSet.

    Creates a new DataSet containing the elements from both this DataSet and the other DataSet.

    Definition Classes
    DataSet
  108. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  109. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  110. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  111. def withBroadcastSet(data: DataSet[_], name: String): DataSet[T]

    Permalink

    Adds a certain data set as a broadcast set to this operator.

    Adds a certain data set as a broadcast set to this operator. Broadcast data sets are available at all parallel instances of this operator. A broadcast data set is registered under a certain name, and can be retrieved under that name from the operators runtime context via org.apache.flink.api.common.functions.RuntimeContext.getBroadCastVariable(String)

    The runtime context itself is available in all UDFs via org.apache.flink.api.common.functions.AbstractRichFunction#getRuntimeContext()

    data

    The data set to be broadcast.

    name

    The name under which the broadcast data set retrieved.

    returns

    The operator itself, to allow chaining function calls.

    Definition Classes
    DataSet
  112. def withForwardedFields(forwardedFields: String*): DataSet[T]

    Permalink
    Definition Classes
    DataSet
  113. def withForwardedFieldsFirst(forwardedFields: String*): DataSet[T]

    Permalink
    Definition Classes
    DataSet
  114. def withForwardedFieldsSecond(forwardedFields: String*): DataSet[T]

    Permalink
    Definition Classes
    DataSet
  115. def withParameters(parameters: Configuration): DataSet[T]

    Permalink
    Definition Classes
    DataSet
  116. def write(outputFormat: FileOutputFormat[T], filePath: String, writeMode: WriteMode = null): DataSink[T]

    Permalink

    Writes this DataSet to the specified location using a custom org.apache.flink.api.common.io.FileOutputFormat.

    Writes this DataSet to the specified location using a custom org.apache.flink.api.common.io.FileOutputFormat.

    Definition Classes
    DataSet
  117. def writeAsCsv(filePath: String, rowDelimiter: String = ..., fieldDelimiter: String = ..., writeMode: WriteMode = null): DataSink[T]

    Permalink

    Writes this DataSet to the specified location as CSV file(s).

    Writes this DataSet to the specified location as CSV file(s).

    This only works on Tuple DataSets. For individual tuple fields AnyRef.toString is used.

    Definition Classes
    DataSet
    See also

    org.apache.flink.api.java.DataSet#writeAsText(String)

  118. def writeAsText(filePath: String, writeMode: WriteMode = null): DataSink[T]

    Permalink

    Writes this DataSet to the specified location.

    Writes this DataSet to the specified location. This uses AnyRef.toString on each element.

    Definition Classes
    DataSet
    See also

    org.apache.flink.api.java.DataSet#writeAsText(String)

Deprecated Value Members

  1. def print(sinkIdentifier: String): DataSink[T]

    Permalink

    * Writes a DataSet to the standard output stream (stdout) with a sink identifier prefixed.

    * Writes a DataSet to the standard output stream (stdout) with a sink identifier prefixed. This uses AnyRef.toString on each element.

    sinkIdentifier

    The string to prefix the output with.

    Definition Classes
    DataSet
    Annotations
    @deprecated @PublicEvolving()
    Deprecated
  2. def printToErr(sinkIdentifier: String): DataSink[T]

    Permalink

    Writes a DataSet to the standard error stream (stderr) with a sink identifier prefixed.

    Writes a DataSet to the standard error stream (stderr) with a sink identifier prefixed. This uses AnyRef.toString on each element.

    sinkIdentifier

    The string to prefix the output with.

    Definition Classes
    DataSet
    Annotations
    @deprecated @PublicEvolving()
    Deprecated

Inherited from DataSet[T]

Inherited from AnyRef

Inherited from Any

Ungrouped