Class FsCheckpointStreamFactory

  • All Implemented Interfaces:
    CheckpointStreamFactory
    Direct Known Subclasses:
    FsCheckpointStorageLocation

    public class FsCheckpointStreamFactory
    extends Object
    implements CheckpointStreamFactory
    A CheckpointStreamFactory that produces streams that write to a FileSystem. The streams from the factory put their data into files with a random name, within the given directory.

    If the state written to the stream is fewer bytes than a configurable threshold, then no files are written, but the state is returned inline in the state handle instead. This reduces the problem of many small files that have only few bytes.

    Note on directory creation

    The given target directory must already exist, this factory does not ensure that the directory gets created. That is important, because if this factory checked for directory existence, there would be many checks per checkpoint (from each TaskManager and operator) and such floods of directory existence checks can be prohibitive on larger scale setups for some file systems.

    For example many S3 file systems (like Hadoop's s3a) use HTTP HEAD requests to check for the existence of a directory. S3 sometimes limits the number of HTTP HEAD requests to a few hundred per second only. Those numbers are easily reached by moderately large setups. Surprisingly (and fortunately), the actual state writing (POST) have much higher quotas.