@PublicEvolving public interface DataDistribution extends IOReadableWritable, Serializable
|Modifier and Type||Method and Description|
Returns the i'th bucket's upper bound, given that the distribution is to be split into
Gets the type of the key by which the dataSet is partitioned.
The number of fields in the (composite) key.
Object getBucketBoundary(int bucketNum, int totalNumBuckets)
Assuming n buckets, let
B_i be the result from calling
getBucketBoundary(i, n), then the distribution will partition the data domain in the
(-inf, B_1] (B_1, B_2] ... (B_n-2, B_n-1] (B_n-1, inf)
Note: The last bucket's upper bound is actually discarded by many algorithms. The last
bucket is assumed to hold all values v such that
v > getBucketBoundary(n-1,
n), where n is the number of buckets.
bucketNum- The number of the bucket for which to get the upper bound.
totalNumBuckets- The number of buckets to split the data into.
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.