@PublicEvolving public interface DataDistribution extends IOReadableWritable, Serializable
Modifier and Type | Method and Description |
---|---|
Object[] |
getBucketBoundary(int bucketNum,
int totalNumBuckets)
Returns the i'th bucket's upper bound, given that the distribution is to be split into
totalBuckets buckets. |
TypeInformation[] |
getKeyTypes()
Gets the type of the key by which the dataSet is partitioned.
|
int |
getNumberOfFields()
The number of fields in the (composite) key.
|
read, write
Object[] getBucketBoundary(int bucketNum, int totalNumBuckets)
totalBuckets
buckets.
Assuming n buckets, let B_i
be the result from calling getBucketBoundary(i, n)
, then the distribution will partition the data domain in the
following fashion:
(-inf, B_1] (B_1, B_2] ... (B_n-2, B_n-1] (B_n-1, inf)
Note: The last bucket's upper bound is actually discarded by many algorithms. The last
bucket is assumed to hold all values v such that v > getBucketBoundary(n-1,
n)
, where n is the number of buckets.
bucketNum
- The number of the bucket for which to get the upper bound.totalNumBuckets
- The number of buckets to split the data into.int getNumberOfFields()
getBucketBoundary(int, int)
.TypeInformation[] getKeyTypes()
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.