T
- The type of the DataSource on which the SplitDataProperties are defined.@PublicEvolving public class SplitDataProperties<T> extends Object implements GenericDataSourceBase.SplitDataProperties<T>
InputSplit
generated by the InputFormat
of a DataSource
.
InputSplits are units of input which are distributed among and assigned to parallel data source subtasks. SplitDataProperties can define that the elements which are generated by the associated InputFormat are
IMPORTANT: SplitDataProperties can improve the execution of a program because certain data reorganization steps such as shuffling or sorting can be avoided. HOWEVER, if SplitDataProperties are not correctly defined, the result of the program might be wrong!
InputSplit
,
InputFormat
,
DataSource
Modifier and Type | Class and Description |
---|---|
static class |
SplitDataProperties.SourcePartitionerMarker<T>
A custom partitioner to mark compatible split partitionings.
|
Constructor and Description |
---|
SplitDataProperties(DataSource<T> source)
Creates SplitDataProperties for the given data types.
|
SplitDataProperties(TypeInformation<T> type)
Creates SplitDataProperties for the given data types.
|
Modifier and Type | Method and Description |
---|---|
int[] |
getSplitGroupKeys() |
Ordering |
getSplitOrder() |
Partitioner<T> |
getSplitPartitioner() |
int[] |
getSplitPartitionKeys() |
SplitDataProperties<T> |
splitsGroupedBy(int... groupFields)
Defines that the data within an input split is grouped on the fields defined by the field positions.
|
SplitDataProperties<T> |
splitsGroupedBy(String groupFields)
Defines that the data within an input split is grouped on the fields defined by the field expressions.
|
SplitDataProperties<T> |
splitsOrderedBy(int[] orderFields,
Order[] orders)
Defines that the data within an input split is sorted on the fields defined by the field positions
in the specified orders.
|
SplitDataProperties<T> |
splitsOrderedBy(String orderFields,
Order[] orders)
Defines that the data within an input split is sorted on the fields defined by the field expressions
in the specified orders.
|
SplitDataProperties<T> |
splitsPartitionedBy(int... partitionFields)
Defines that data is partitioned across input splits on the fields defined by field positions.
|
SplitDataProperties<T> |
splitsPartitionedBy(String partitionFields)
Defines that data is partitioned across input splits on the fields defined by field expressions.
|
SplitDataProperties<T> |
splitsPartitionedBy(String partitionMethodId,
int... partitionFields)
Defines that data is partitioned using a specific partitioning method
across input splits on the fields defined by field positions.
|
SplitDataProperties<T> |
splitsPartitionedBy(String partitionMethodId,
String partitionFields)
Defines that data is partitioned using an identifiable method
across input splits on the fields defined by field expressions.
|
public SplitDataProperties(TypeInformation<T> type)
type
- The data type of the SplitDataProperties.public SplitDataProperties(DataSource<T> source)
source
- The DataSource for which the SplitDataProperties are created.public SplitDataProperties<T> splitsPartitionedBy(int... partitionFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
partitionFields
- The field positions of the partitioning keys.public SplitDataProperties<T> splitsPartitionedBy(String partitionMethodId, int... partitionFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
partitionMethodId
- An ID for the method that was used to partition the data across splits.partitionFields
- The field positions of the partitioning keys.public SplitDataProperties<T> splitsPartitionedBy(String partitionFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
partitionFields
- The field expressions of the partitioning keys.public SplitDataProperties<T> splitsPartitionedBy(String partitionMethodId, String partitionFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
partitionMethodId
- An ID for the method that was used to partition the data across splits.partitionFields
- The field expressions of the partitioning keys.public SplitDataProperties<T> splitsGroupedBy(int... groupFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
groupFields
- The field positions of the grouping keys.public SplitDataProperties<T> splitsGroupedBy(String groupFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
groupFields
- The field expressions of the grouping keys.public SplitDataProperties<T> splitsOrderedBy(int[] orderFields, Order[] orders)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
orderFields
- The field positions of the grouping keys.orders
- The orders of the fields.public SplitDataProperties<T> splitsOrderedBy(String orderFields, Order[] orders)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
orderFields
- The field expressions of the grouping key.orders
- The orders of the fields.public int[] getSplitPartitionKeys()
getSplitPartitionKeys
in interface GenericDataSourceBase.SplitDataProperties<T>
public Partitioner<T> getSplitPartitioner()
getSplitPartitioner
in interface GenericDataSourceBase.SplitDataProperties<T>
public int[] getSplitGroupKeys()
getSplitGroupKeys
in interface GenericDataSourceBase.SplitDataProperties<T>
public Ordering getSplitOrder()
getSplitOrder
in interface GenericDataSourceBase.SplitDataProperties<T>
Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.