public class GroupedDataSet<T> extends Object
DataSet
to which a grouping key was added. Operations work on groups of elements with the
same key (aggregate
, reduce
, and reduceGroup
).
A secondary sort order can be added with sortGroup, but this is only used when using one
of the group-at-a-time operations, i.e. reduceGroup
.
Constructor and Description |
---|
GroupedDataSet(DataSet<T> set,
Keys<T> keys,
scala.reflect.ClassTag<T> evidence$1) |
Modifier and Type | Method and Description |
---|---|
AggregateDataSet<T> |
aggregate(Aggregations agg,
int field)
Creates a new
DataSet by aggregating the specified field using the given aggregation
function. |
AggregateDataSet<T> |
aggregate(Aggregations agg,
String field)
Creates a new
DataSet by aggregating the specified tuple field using the given aggregation
function. |
<R> DataSet<R> |
combineGroup(scala.Function2<scala.collection.Iterator<T>,Collector<R>,scala.runtime.BoxedUnit> fun,
TypeInformation<R> evidence$10,
scala.reflect.ClassTag<R> evidence$11)
Applies a CombineFunction on a grouped
DataSet . |
<R> DataSet<R> |
combineGroup(GroupCombineFunction<T,R> combiner,
TypeInformation<R> evidence$12,
scala.reflect.ClassTag<R> evidence$13)
Applies a CombineFunction on a grouped
DataSet . |
DataSet<T> |
first(int n)
Creates a new DataSet containing the first
n elements of each group of this DataSet. |
<K> Partitioner<K> |
getCustomPartitioner()
Gets the custom partitioner to be used for this grouping, or null, if
none was defined.
|
AggregateDataSet<T> |
max(int field)
Syntactic sugar for
aggregate with MAX |
AggregateDataSet<T> |
max(String field)
Syntactic sugar for
aggregate with MAX |
AggregateDataSet<T> |
min(int field)
Syntactic sugar for
aggregate with MIN |
AggregateDataSet<T> |
min(String field)
Syntactic sugar for
aggregate with MIN |
DataSet<T> |
reduce(scala.Function2<T,T,T> fun)
Creates a new
DataSet by merging the elements of each group (elements with the same key)
using an associative reduce function. |
DataSet<T> |
reduce(ReduceFunction<T> reducer)
Creates a new
DataSet by merging the elements of each group (elements with the same key)
using an associative reduce function. |
<R> DataSet<R> |
reduceGroup(scala.Function1<scala.collection.Iterator<T>,R> fun,
TypeInformation<R> evidence$4,
scala.reflect.ClassTag<R> evidence$5)
Creates a new
DataSet by passing for each group (elements with the same key) the list
of elements to the group reduce function. |
<R> DataSet<R> |
reduceGroup(scala.Function2<scala.collection.Iterator<T>,Collector<R>,scala.runtime.BoxedUnit> fun,
TypeInformation<R> evidence$6,
scala.reflect.ClassTag<R> evidence$7)
Creates a new
DataSet by passing for each group (elements with the same key) the list
of elements to the group reduce function. |
<R> DataSet<R> |
reduceGroup(GroupReduceFunction<T,R> reducer,
TypeInformation<R> evidence$8,
scala.reflect.ClassTag<R> evidence$9)
Creates a new
DataSet by passing for each group (elements with the same key) the list
of elements to the GroupReduceFunction . |
<K> GroupedDataSet<T> |
sortGroup(scala.Function1<T,K> fun,
Order order,
TypeInformation<K> evidence$2)
Adds a secondary sort key to this
GroupedDataSet . |
GroupedDataSet<T> |
sortGroup(int field,
Order order)
Adds a secondary sort key to this
GroupedDataSet . |
GroupedDataSet<T> |
sortGroup(String field,
Order order)
Adds a secondary sort key to this
GroupedDataSet . |
AggregateDataSet<T> |
sum(int field)
Syntactic sugar for
aggregate with SUM |
AggregateDataSet<T> |
sum(String field)
Syntactic sugar for
aggregate with SUM |
<K> GroupedDataSet<T> |
withPartitioner(Partitioner<K> partitioner,
TypeInformation<K> evidence$3)
Sets a custom partitioner for the grouping.
|
public GroupedDataSet<T> sortGroup(int field, Order order)
GroupedDataSet
. This will only have an effect if you
use one of the group-at-a-time, i.e. reduceGroup
.
This only works on Tuple DataSets.
public GroupedDataSet<T> sortGroup(String field, Order order)
GroupedDataSet
. This will only have an effect if you
use one of the group-at-a-time, i.e. reduceGroup
.
This only works on CaseClass DataSets.
public <K> GroupedDataSet<T> sortGroup(scala.Function1<T,K> fun, Order order, TypeInformation<K> evidence$2)
GroupedDataSet
. This will only have an effect if you
use one of the group-at-a-time, i.e. reduceGroup
.
This works on any data type.
public <K> GroupedDataSet<T> withPartitioner(Partitioner<K> partitioner, TypeInformation<K> evidence$3)
public <K> Partitioner<K> getCustomPartitioner()
public AggregateDataSet<T> aggregate(Aggregations agg, String field)
DataSet
by aggregating the specified tuple field using the given aggregation
function. Since this is a keyed DataSet the aggregation will be performed on groups of
tuples with the same key.
This only works on Tuple DataSets.
public AggregateDataSet<T> aggregate(Aggregations agg, int field)
DataSet
by aggregating the specified field using the given aggregation
function. Since this is a keyed DataSet the aggregation will be performed on groups of
elements with the same key.
This only works on CaseClass DataSets.
public AggregateDataSet<T> sum(int field)
aggregate
with SUM
public AggregateDataSet<T> max(int field)
aggregate
with MAX
public AggregateDataSet<T> min(int field)
aggregate
with MIN
public AggregateDataSet<T> sum(String field)
aggregate
with SUM
public AggregateDataSet<T> max(String field)
aggregate
with MAX
public AggregateDataSet<T> min(String field)
aggregate
with MIN
public DataSet<T> reduce(scala.Function2<T,T,T> fun)
DataSet
by merging the elements of each group (elements with the same key)
using an associative reduce function.public DataSet<T> reduce(ReduceFunction<T> reducer)
DataSet
by merging the elements of each group (elements with the same key)
using an associative reduce function.public <R> DataSet<R> reduceGroup(scala.Function1<scala.collection.Iterator<T>,R> fun, TypeInformation<R> evidence$4, scala.reflect.ClassTag<R> evidence$5)
public <R> DataSet<R> reduceGroup(scala.Function2<scala.collection.Iterator<T>,Collector<R>,scala.runtime.BoxedUnit> fun, TypeInformation<R> evidence$6, scala.reflect.ClassTag<R> evidence$7)
public <R> DataSet<R> reduceGroup(GroupReduceFunction<T,R> reducer, TypeInformation<R> evidence$8, scala.reflect.ClassTag<R> evidence$9)
public <R> DataSet<R> combineGroup(scala.Function2<scala.collection.Iterator<T>,Collector<R>,scala.runtime.BoxedUnit> fun, TypeInformation<R> evidence$10, scala.reflect.ClassTag<R> evidence$11)
DataSet
. A
CombineFunction is similar to a GroupReduceFunction but does not
perform a full data exchange. Instead, the CombineFunction calls
the combine method once per partition for combining a group of
results. This operator is suitable for combining values into an
intermediate format before doing a proper groupReduce where the
data is shuffled across the node for further reduction. The
GroupReduce operator can also be supplied with a combiner by
implementing the RichGroupReduce function. The combine method of
the RichGroupReduce function demands input and output type to be
the same. The CombineFunction, on the other side, can have an
arbitrary output type.public <R> DataSet<R> combineGroup(GroupCombineFunction<T,R> combiner, TypeInformation<R> evidence$12, scala.reflect.ClassTag<R> evidence$13)
DataSet
. A
CombineFunction is similar to a GroupReduceFunction but does not
perform a full data exchange. Instead, the CombineFunction calls
the combine method once per partition for combining a group of
results. This operator is suitable for combining values into an
intermediate format before doing a proper groupReduce where the
data is shuffled across the node for further reduction. The
GroupReduce operator can also be supplied with a combiner by
implementing the RichGroupReduce function. The combine method of
the RichGroupReduce function demands input and output type to be
the same. The CombineFunction, on the other side, can have an
arbitrary output type.Copyright © 2014–2017 The Apache Software Foundation. All rights reserved.