public class FsCheckpointStreamFactory extends Object implements CheckpointStreamFactory
CheckpointStreamFactorythat produces streams that write to a
FileSystem. The streams from the factory put their data into files with a random name, within the given directory.
If the state written to the stream is fewer bytes than a configurable threshold, then no files are written, but the state is returned inline in the state handle instead. This reduces the problem of many small files that have only few bytes.
The given target directory must already exist, this factory does not ensure that the directory gets created. That is important, because if this factory checked for directory existence, there would be many checks per checkpoint (from each TaskManager and operator) and such floods of directory existence checks can be prohibitive on larger scale setups for some file systems.
For example many S3 file systems (like Hadoop's s3a) use HTTP HEAD requests to check for the existence of a directory. S3 sometimes limits the number of HTTP HEAD requests to a few hundred per second only. Those numbers are easily reached by moderately large setups. Surprisingly (and fortunately), the actual state writing (POST) have much higher quotas.
|Modifier and Type||Class and Description|
|Modifier and Type||Field and Description|
Maximum size of state that is stored with the metadata, rather than in files.
|Constructor and Description|
Creates a new stream factory that stores its checkpoint data in the file system and location defined by the given Path.
|Modifier and Type||Method and Description|
Creates an new
public static final int MAX_FILE_STATE_THRESHOLD
public FsCheckpointStreamFactory(FileSystem fileSystem, Path checkpointDirectory, Path sharedStateDirectory, int fileStateSizeThreshold, int writeBufferSize)
Important: The given checkpoint directory must already exist. Refer to the class-level JavaDocs for an explanation why this factory must not try and create the checkpoints.
fileSystem- The filesystem to write to.
checkpointDirectory- The directory for checkpoint exclusive state data.
sharedStateDirectory- The directory for shared checkpoint data.
fileStateSizeThreshold- State up to this size will be stored as part of the metadata, rather than in files
writeBufferSize- The write buffer size.
public FsCheckpointStreamFactory.FsCheckpointStateOutputStream createCheckpointStateOutputStream(CheckpointedStateScope scope) throws IOException
CheckpointStreamFactory.CheckpointStateOutputStream. When the stream is closed, it returns a state handle that can retrieve the state back.
scope- The state's scope, whether it is exclusive or shared.
IOException- Exceptions may occur while creating the stream and should be forwarded.
Copyright © 2014–2023 The Apache Software Foundation. All rights reserved.