Class FileSourceSplit
- java.lang.Object
-
- org.apache.flink.connector.file.src.FileSourceSplit
-
- All Implemented Interfaces:
Serializable
,SourceSplit
@PublicEvolving public class FileSourceSplit extends Object implements SourceSplit, Serializable
ASourceSplit
that represents a file, or a region of a file.The split has an offset and an end, which defines the region of the file represented by the split. For splits representing the while file, the offset is zero and the length is the file size.
The split may furthermore have a "reader position", which is the checkpointed position from a reader previously reading this split. This position is typically null when the split is assigned from the enumerator to the readers, and is non-null when the readers checkpoint their state in a file source split.
This class is
Serializable
for convenience. For Flink's internal serialization (both for RPC and for checkpoints), theFileSourceSplitSerializer
is used.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize)
Constructs a split with host information.FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize, String... hostnames)
Constructs a split with host information.FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize, String[] hostnames, CheckpointedPosition readerPosition)
Constructs a split with host information.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description long
fileModificationTime()
Returns the modification time of the file, fromFileStatus.getModificationTime()
.long
fileSize()
Returns the full file size in bytes, fromFileStatus.getLen()
.Optional<CheckpointedPosition>
getReaderPosition()
Gets the (checkpointed) position of the reader, if set.String[]
hostnames()
Gets the hostnames of the nodes storing the file range described by this split.long
length()
Returns the number of bytes in the file region described by this source split.long
offset()
Returns the start of the file region referenced by this source split.Path
path()
Gets the file's path.String
splitId()
Get the split id of this source split.String
toString()
FileSourceSplit
updateWithCheckpointedPosition(CheckpointedPosition position)
Creates a copy of this split where the checkpointed position is replaced by the given new position.
-
-
-
Constructor Detail
-
FileSourceSplit
public FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize)
Constructs a split with host information.- Parameters:
id
- The unique ID of this source split.filePath
- The path to the file.offset
- The start (inclusive) of the split's rage in the file.length
- The number of bytes in the split (starting from the offset)fileModificationTime
- The modification time of the filefileSize
- The size of the full file
-
FileSourceSplit
public FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize, String... hostnames)
Constructs a split with host information.- Parameters:
filePath
- The path to the file.offset
- The start (inclusive) of the split's rage in the file.length
- The number of bytes in the split (starting from the offset)fileModificationTime
- The modification time of the filefileSize
- The size of the full filehostnames
- The hostnames of the nodes storing the split's file range.
-
FileSourceSplit
public FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize, String[] hostnames, @Nullable CheckpointedPosition readerPosition)
Constructs a split with host information.- Parameters:
filePath
- The path to the file.offset
- The start (inclusive) of the split's rage in the file.length
- The number of bytes in the split (starting from the offset)fileModificationTime
- The modification time of the filefileSize
- The size of the full filehostnames
- The hostnames of the nodes storing the split's file range.
-
-
Method Detail
-
splitId
public String splitId()
Description copied from interface:SourceSplit
Get the split id of this source split.- Specified by:
splitId
in interfaceSourceSplit
- Returns:
- id of this source split.
-
path
public Path path()
Gets the file's path.
-
offset
public long offset()
Returns the start of the file region referenced by this source split. The position is inclusive, the value indicates the first byte that is part of the split.
-
length
public long length()
Returns the number of bytes in the file region described by this source split.
-
fileModificationTime
public long fileModificationTime()
Returns the modification time of the file, fromFileStatus.getModificationTime()
.
-
fileSize
public long fileSize()
Returns the full file size in bytes, fromFileStatus.getLen()
.
-
hostnames
public String[] hostnames()
Gets the hostnames of the nodes storing the file range described by this split. The returned array is empty, if no host information is available.Host information is typically only available on specific file systems, like HDFS.
-
getReaderPosition
public Optional<CheckpointedPosition> getReaderPosition()
Gets the (checkpointed) position of the reader, if set. This value is typically absent for splits when assigned from the enumerator to the readers, and present when the splits are recovered from a checkpoint.
-
updateWithCheckpointedPosition
public FileSourceSplit updateWithCheckpointedPosition(@Nullable CheckpointedPosition position)
Creates a copy of this split where the checkpointed position is replaced by the given new position.IMPORTANT: Subclasses that add additional information to the split must override this method to return that subclass type. This contract is enforced by checks in the file source implementation. We did not try to enforce this contract via generics in this split class, because it leads to very ugly and verbose use of generics.
-
-