public class CoGroupDataSet<L,R> extends DataSet<scala.Tuple2<Object,Object>>
DataSet
that results from a coGroup
operation. The result of a default coGroup
is a tuple containing two arrays of values from the two sides of the coGroup. The result of the
coGroup can be changed by specifying a custom coGroup function using the apply
method or by
providing a RichCoGroupFunction
.
Example:
val left = ...
val right = ...
val coGroupResult = left.coGroup(right).where(0, 2).isEqualTo(0, 1) {
(left, right) => new MyCoGroupResult(left.min, right.max)
}
Or, using key selector functions with tuple data types:
val left = ...
val right = ...
val coGroupResult = left.coGroup(right).where({_._1}).isEqualTo({_._1) {
(left, right) => new MyCoGroupResult(left.max, right.min)
}
Constructor and Description |
---|
CoGroupDataSet(CoGroupOperator<L,R,scala.Tuple2<Object,Object>> defaultCoGroup,
DataSet<L> leftInput,
DataSet<R> rightInput,
Keys<L> leftKeys,
Keys<R> rightKeys) |
Modifier and Type | Method and Description |
---|---|
<O> DataSet<O> |
apply(CoGroupFunction<L,R,O> coGrouper,
TypeInformation<O> evidence$5,
scala.reflect.ClassTag<O> evidence$6)
Creates a new
DataSet by passing each pair of co-grouped element lists to the given
function. |
<O> DataSet<O> |
apply(scala.Function2<scala.collection.Iterator<L>,scala.collection.Iterator<R>,O> fun,
TypeInformation<O> evidence$1,
scala.reflect.ClassTag<O> evidence$2)
Creates a new
DataSet where the result for each pair of co-grouped element lists is the
result of the given function. |
<O> DataSet<O> |
apply(scala.Function3<scala.collection.Iterator<L>,scala.collection.Iterator<R>,Collector<O>,scala.runtime.BoxedUnit> fun,
TypeInformation<O> evidence$3,
scala.reflect.ClassTag<O> evidence$4)
Creates a new
DataSet where the result for each pair of co-grouped element lists is the
result of the given function. |
<K> Partitioner<K> |
getPartitioner()
Gets the custom partitioner used by this join, or null, if none is set.
|
CoGroupDataSet<L,R> |
sortFirstGroup(int field,
Order order)
Adds a secondary sort key to the first input of this
CoGroupDataSet . |
CoGroupDataSet<L,R> |
sortFirstGroup(String field,
Order order)
Adds a secondary sort key to the first input of this
CoGroupDataSet . |
CoGroupDataSet<L,R> |
sortSecondGroup(int field,
Order order)
Adds a secondary sort key to the second input of this
CoGroupDataSet . |
CoGroupDataSet<L,R> |
sortSecondGroup(String field,
Order order)
Adds a secondary sort key to the second input of this
CoGroupDataSet . |
<K> CoGroupDataSet<L,R> |
withPartitioner(Partitioner<K> partitioner,
TypeInformation<K> evidence$7) |
aggregate, aggregate, clean, coGroup, collect, combineGroup, combineGroup, count, cross, crossWithHuge, crossWithTiny, distinct, distinct, distinct, distinct, filter, filter, first, flatMap, flatMap, flatMap, fullOuterJoin, fullOuterJoin, getExecutionEnvironment, getParallelism, getType, groupBy, groupBy, groupBy, iterate, iterateDelta, iterateDelta, iterateDelta, iterateDelta, iterateWithTermination, javaSet, join, join, joinWithHuge, joinWithTiny, leftOuterJoin, leftOuterJoin, map, map, mapPartition, mapPartition, mapPartition, max, max, min, min, name, output, partitionByHash, partitionByHash, partitionByHash, partitionByRange, partitionByRange, partitionByRange, partitionCustom, partitionCustom, partitionCustom, print, print, printOnTaskManager, printToErr, printToErr, rebalance, reduce, reduce, reduceGroup, reduceGroup, reduceGroup, registerAggregator, rightOuterJoin, rightOuterJoin, setParallelism, sortPartition, sortPartition, sortPartition, sum, sum, union, withBroadcastSet, withForwardedFields, withForwardedFieldsFirst, withForwardedFieldsSecond, withParameters, write, writeAsCsv, writeAsText
public <O> DataSet<O> apply(scala.Function2<scala.collection.Iterator<L>,scala.collection.Iterator<R>,O> fun, TypeInformation<O> evidence$1, scala.reflect.ClassTag<O> evidence$2)
DataSet
where the result for each pair of co-grouped element lists is the
result of the given function.public <O> DataSet<O> apply(scala.Function3<scala.collection.Iterator<L>,scala.collection.Iterator<R>,Collector<O>,scala.runtime.BoxedUnit> fun, TypeInformation<O> evidence$3, scala.reflect.ClassTag<O> evidence$4)
DataSet
where the result for each pair of co-grouped element lists is the
result of the given function. The function can output zero or more elements using the
Collector
which will form the result.public <O> DataSet<O> apply(CoGroupFunction<L,R,O> coGrouper, TypeInformation<O> evidence$5, scala.reflect.ClassTag<O> evidence$6)
DataSet
by passing each pair of co-grouped element lists to the given
function. The function can output zero or more elements using the Collector
which will form
the result.
A RichCoGroupFunction
can be used to access the
broadcast variables and the RuntimeContext
.
public <K> CoGroupDataSet<L,R> withPartitioner(Partitioner<K> partitioner, TypeInformation<K> evidence$7)
public <K> Partitioner<K> getPartitioner()
public CoGroupDataSet<L,R> sortFirstGroup(int field, Order order)
CoGroupDataSet
.
This only works on Tuple DataSets.
public CoGroupDataSet<L,R> sortFirstGroup(String field, Order order)
CoGroupDataSet
.public CoGroupDataSet<L,R> sortSecondGroup(int field, Order order)
CoGroupDataSet
.
This only works on Tuple DataSets.
public CoGroupDataSet<L,R> sortSecondGroup(String field, Order order)
CoGroupDataSet
.Copyright © 2014–2017 The Apache Software Foundation. All rights reserved.