Class FlinkS3FileSystem
- java.lang.Object
-
- org.apache.flink.core.fs.FileSystem
-
- org.apache.flink.runtime.fs.hdfs.HadoopFileSystem
-
- org.apache.flink.fs.s3.common.FlinkS3FileSystem
-
- All Implemented Interfaces:
EntropyInjectingFileSystem
,IFileSystem
,PathsCopyingFileSystem
- Direct Known Subclasses:
FlinkS3PrestoFileSystem
public class FlinkS3FileSystem extends HadoopFileSystem implements EntropyInjectingFileSystem, PathsCopyingFileSystem
Implementation of the FlinkFileSystem
interface for S3. This class implements the common behavior implemented directly by Flink and delegates common calls to an implementation of Hadoop's filesystem abstraction.Optionally this
FlinkS3FileSystem
can use the s5cmd tool to speed up copying files.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
FlinkS3FileSystem.S5CmdConfiguration
POJO representing parameters to configure s5cmd.-
Nested classes/interfaces inherited from class org.apache.flink.core.fs.FileSystem
FileSystem.FSKey, FileSystem.WriteMode
-
Nested classes/interfaces inherited from interface org.apache.flink.core.fs.PathsCopyingFileSystem
PathsCopyingFileSystem.CopyRequest
-
-
Field Summary
Fields Modifier and Type Field Description static long
S3_MULTIPART_MIN_PART_SIZE
The minimum size of a part in the multipart upload, except for the last part: 5 MIBytes.
-
Constructor Summary
Constructors Constructor Description FlinkS3FileSystem(org.apache.hadoop.fs.FileSystem hadoopS3FileSystem, FlinkS3FileSystem.S5CmdConfiguration s5CmdConfiguration, String localTmpDirectory, String entropyInjectionKey, int entropyLength, S3AccessHelper s3UploadHelper, long s3uploadPartSize, int maxConcurrentUploadsPerStream)
Creates a FlinkS3FileSystem based on the given Hadoop S3 file system.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
canCopyPaths(Path source, Path destination)
Tells if thisFileSystem
supports an optimised way to directly copy between given paths.void
copyFiles(List<PathsCopyingFileSystem.CopyRequest> requests, ICloseableRegistry closeableRegistry)
List ofPathsCopyingFileSystem.CopyRequest
to copy in batch by thisPathsCopyingFileSystem
.RecoverableWriter
createRecoverableWriter()
Creates a newRecoverableWriter
.String
generateEntropy()
Creates a string with random entropy to be injected into a path.String
getEntropyInjectionKey()
Gets the marker string that represents the substring of a path to be replaced by the entropy characters.String
getLocalTmpDir()
-
Methods inherited from class org.apache.flink.runtime.fs.hdfs.HadoopFileSystem
create, create, createRecoverableWriter, delete, exists, getDefaultBlockSize, getFileBlockLocations, getFileStatus, getHadoopFileSystem, getHomeDirectory, getUri, getWorkingDirectory, isDistributedFS, listStatus, mkdirs, open, open, rename, toHadoopPath
-
Methods inherited from class org.apache.flink.core.fs.FileSystem
create, get, getDefaultFsUri, getLocalFileSystem, getUnguardedFileSystem, initialize, initialize, initOutPathDistFS, initOutPathLocalFS
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.core.fs.IFileSystem
create, createRecoverableWriter, delete, exists, getFileBlockLocations, getFileStatus, getHomeDirectory, getUri, getWorkingDirectory, initOutPathDistFS, initOutPathLocalFS, isDistributedFS, listStatus, mkdirs, open, open, rename
-
-
-
-
Field Detail
-
S3_MULTIPART_MIN_PART_SIZE
public static final long S3_MULTIPART_MIN_PART_SIZE
The minimum size of a part in the multipart upload, except for the last part: 5 MIBytes.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
FlinkS3FileSystem
public FlinkS3FileSystem(org.apache.hadoop.fs.FileSystem hadoopS3FileSystem, @Nullable FlinkS3FileSystem.S5CmdConfiguration s5CmdConfiguration, String localTmpDirectory, @Nullable String entropyInjectionKey, int entropyLength, @Nullable S3AccessHelper s3UploadHelper, long s3uploadPartSize, int maxConcurrentUploadsPerStream)
Creates a FlinkS3FileSystem based on the given Hadoop S3 file system. The given Hadoop file system object is expected to be initialized already.This constructor additionally configures the entropy injection for the file system.
- Parameters:
hadoopS3FileSystem
- The Hadoop FileSystem that will be used under the hood.s5CmdConfiguration
- Configuration of the s5cmd.entropyInjectionKey
- The substring that will be replaced by entropy or removed.entropyLength
- The number of random alphanumeric characters to inject as entropy.
-
-
Method Detail
-
canCopyPaths
public boolean canCopyPaths(Path source, Path destination)
Description copied from interface:IFileSystem
Tells if thisFileSystem
supports an optimised way to directly copy between given paths. In other words if it implementsPathsCopyingFileSystem
.At least one of, either source or destination belongs to this
IFileSystem
. One of them can point to the local file system. In other words this request can correspond to either: downloading a file from the remote file system, uploading a file to the remote file system or duplicating a file in the remote file system.- Specified by:
canCopyPaths
in interfaceIFileSystem
- Specified by:
canCopyPaths
in interfacePathsCopyingFileSystem
- Parameters:
source
- The path of the source file to duplicatedestination
- The path where to duplicate the source file- Returns:
- true, if this
IFileSystem
can perform this operation more quickly compared to the generic code path of using streams.
-
copyFiles
public void copyFiles(List<PathsCopyingFileSystem.CopyRequest> requests, ICloseableRegistry closeableRegistry) throws IOException
Description copied from interface:PathsCopyingFileSystem
List ofPathsCopyingFileSystem.CopyRequest
to copy in batch by thisPathsCopyingFileSystem
. In case of an exception some files might have been already copied fully or partially. Caller should clean this up. Copy can be interrupted by theCloseableRegistry
.- Specified by:
copyFiles
in interfacePathsCopyingFileSystem
- Throws:
IOException
-
getEntropyInjectionKey
@Nullable public String getEntropyInjectionKey()
Description copied from interface:EntropyInjectingFileSystem
Gets the marker string that represents the substring of a path to be replaced by the entropy characters.You can disable entropy injection if you return null here.
- Specified by:
getEntropyInjectionKey
in interfaceEntropyInjectingFileSystem
-
generateEntropy
public String generateEntropy()
Description copied from interface:EntropyInjectingFileSystem
Creates a string with random entropy to be injected into a path.- Specified by:
generateEntropy
in interfaceEntropyInjectingFileSystem
-
getLocalTmpDir
public String getLocalTmpDir()
-
createRecoverableWriter
public RecoverableWriter createRecoverableWriter() throws IOException
Description copied from interface:IFileSystem
Creates a newRecoverableWriter
. A recoverable writer creates streams that can persist and recover their intermediate state. Persisting and recovering intermediate state is a core building block for writing to files that span multiple checkpoints.The returned object can act as a shared factory to open and recover multiple streams.
This method is optional on file systems and various file system implementations may not support this method, throwing an
UnsupportedOperationException
.- Specified by:
createRecoverableWriter
in interfaceIFileSystem
- Overrides:
createRecoverableWriter
in classHadoopFileSystem
- Returns:
- A RecoverableWriter for this file system.
- Throws:
IOException
- Thrown, if the recoverable writer cannot be instantiated.
-
-