Class Buckets<IN,​BucketID>

  • Type Parameters:
    IN - The type of input elements.
    BucketID - The type of ids for the buckets, as returned by the BucketAssigner.

    @Internal
    public class Buckets<IN,​BucketID>
    extends Object
    The manager of the different active buckets in the StreamingFileSink.

    This class is responsible for all bucket-related operations and the actual StreamingFileSink is just plugging in the functionality offered by this class to the lifecycle of the operator.

    • Method Detail

      • initializeState

        public void initializeState​(ListState<byte[]> bucketStates,
                                    ListState<Long> partCounterState)
                             throws Exception
        Initializes the state after recovery from a failure.

        During this process:

        1. we set the initial value for part counter to the maximum value used before across all tasks and buckets. This guarantees that we do not overwrite valid data,
        2. we commit any pending files for previous checkpoints (previous to the last successful one from which we restore),
        3. we resume writing to the previous in-progress file of each bucket, and
        4. if we receive multiple states for the same bucket, we merge them.
        Parameters:
        bucketStates - the state holding recovered state about active buckets.
        partCounterState - the state holding the max previously used part counters.
        Throws:
        Exception - if anything goes wrong during retrieving the state or restoring/committing of any in-progress/pending part files
      • commitUpToCheckpoint

        public void commitUpToCheckpoint​(long checkpointId)
                                  throws IOException
        Throws:
        IOException
      • onProcessingTime

        public void onProcessingTime​(long timestamp)
                              throws Exception
        Throws:
        Exception
      • close

        public void close()