T
- The type of the DataSource on which the SplitDataProperties are defined.@PublicEvolving public class SplitDataProperties<T> extends Object implements GenericDataSourceBase.SplitDataProperties<T>
InputSplit
generated by the InputFormat
of a DataSource
.
InputSplits are units of input which are distributed among and assigned to parallel data source subtasks. SplitDataProperties can define that the elements which are generated by the associated InputFormat are
IMPORTANT: SplitDataProperties can improve the execution of a program because certain data reorganization steps such as shuffling or sorting can be avoided. HOWEVER, if SplitDataProperties are not correctly defined, the result of the program might be wrong!
InputSplit
,
InputFormat
,
DataSource
Modifier and Type | Class and Description |
---|---|
static class |
SplitDataProperties.SourcePartitionerMarker<T>
A custom partitioner to mark compatible split partitionings.
|
Constructor and Description |
---|
SplitDataProperties(DataSource<T> source)
Creates SplitDataProperties for the given data types.
|
SplitDataProperties(TypeInformation<T> type)
Creates SplitDataProperties for the given data types.
|
Modifier and Type | Method and Description |
---|---|
int[] |
getSplitGroupKeys() |
Ordering |
getSplitOrder() |
Partitioner<T> |
getSplitPartitioner() |
int[] |
getSplitPartitionKeys() |
SplitDataProperties<T> |
splitsGroupedBy(int... groupFields)
Defines that the data within an input split is grouped on the fields defined by the field
positions.
|
SplitDataProperties<T> |
splitsGroupedBy(String groupFields)
Defines that the data within an input split is grouped on the fields defined by the field
expressions.
|
SplitDataProperties<T> |
splitsOrderedBy(int[] orderFields,
Order[] orders)
Defines that the data within an input split is sorted on the fields defined by the field
positions in the specified orders.
|
SplitDataProperties<T> |
splitsOrderedBy(String orderFields,
Order[] orders)
Defines that the data within an input split is sorted on the fields defined by the field
expressions in the specified orders.
|
SplitDataProperties<T> |
splitsPartitionedBy(int... partitionFields)
Defines that data is partitioned across input splits on the fields defined by field
positions.
|
SplitDataProperties<T> |
splitsPartitionedBy(String partitionFields)
Defines that data is partitioned across input splits on the fields defined by field
expressions.
|
SplitDataProperties<T> |
splitsPartitionedBy(String partitionMethodId,
int... partitionFields)
Defines that data is partitioned using a specific partitioning method across input splits on
the fields defined by field positions.
|
SplitDataProperties<T> |
splitsPartitionedBy(String partitionMethodId,
String partitionFields)
Defines that data is partitioned using an identifiable method across input splits on the
fields defined by field expressions.
|
public SplitDataProperties(TypeInformation<T> type)
type
- The data type of the SplitDataProperties.public SplitDataProperties(DataSource<T> source)
source
- The DataSource for which the SplitDataProperties are created.public SplitDataProperties<T> splitsPartitionedBy(int... partitionFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
partitionFields
- The field positions of the partitioning keys.public SplitDataProperties<T> splitsPartitionedBy(String partitionMethodId, int... partitionFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
partitionMethodId
- An ID for the method that was used to partition the data across
splits.partitionFields
- The field positions of the partitioning keys.public SplitDataProperties<T> splitsPartitionedBy(String partitionFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
partitionFields
- The field expressions of the partitioning keys.public SplitDataProperties<T> splitsPartitionedBy(String partitionMethodId, String partitionFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
partitionMethodId
- An ID for the method that was used to partition the data across
splits.partitionFields
- The field expressions of the partitioning keys.public SplitDataProperties<T> splitsGroupedBy(int... groupFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
groupFields
- The field positions of the grouping keys.public SplitDataProperties<T> splitsGroupedBy(String groupFields)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
groupFields
- The field expressions of the grouping keys.public SplitDataProperties<T> splitsOrderedBy(int[] orderFields, Order[] orders)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
orderFields
- The field positions of the grouping keys.orders
- The orders of the fields.public SplitDataProperties<T> splitsOrderedBy(String orderFields, Order[] orders)
IMPORTANT: Providing wrong information with SplitDataProperties can cause wrong results!
orderFields
- The field expressions of the grouping key.orders
- The orders of the fields.public int[] getSplitPartitionKeys()
getSplitPartitionKeys
in interface GenericDataSourceBase.SplitDataProperties<T>
public Partitioner<T> getSplitPartitioner()
getSplitPartitioner
in interface GenericDataSourceBase.SplitDataProperties<T>
public int[] getSplitGroupKeys()
getSplitGroupKeys
in interface GenericDataSourceBase.SplitDataProperties<T>
public Ordering getSplitOrder()
getSplitOrder
in interface GenericDataSourceBase.SplitDataProperties<T>
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.