Interface DataDistribution

    • Method Detail

      • getBucketBoundary

        Object[] getBucketBoundary​(int bucketNum,
                                   int totalNumBuckets)
        Returns the i'th bucket's upper bound, given that the distribution is to be split into totalBuckets buckets.

        Assuming n buckets, let B_i be the result from calling getBucketBoundary(i, n), then the distribution will partition the data domain in the following fashion:

         (-inf, B_1] (B_1, B_2] ... (B_n-2, B_n-1] (B_n-1, inf)
         

        Note: The last bucket's upper bound is actually discarded by many algorithms. The last bucket is assumed to hold all values v such that v > getBucketBoundary(n-1, n), where n is the number of buckets.

        Parameters:
        bucketNum - The number of the bucket for which to get the upper bound.
        totalNumBuckets - The number of buckets to split the data into.
        Returns:
        A record whose values act as bucket boundaries for the specified bucket.
      • getNumberOfFields

        int getNumberOfFields()
        The number of fields in the (composite) key. This determines how many fields in the records define the bucket. The number of fields must be the size of the array returned by the function getBucketBoundary(int, int).
        Returns:
        The number of fields in the (composite) key.
      • getKeyTypes

        TypeInformation[] getKeyTypes()
        Gets the type of the key by which the dataSet is partitioned.
        Returns:
        The type of the key by which the dataSet is partitioned.