@PublicEvolving public class FileSourceSplit extends Object implements SourceSplit, Serializable
SourceSplit
that represents a file, or a region of a file.
The split has an offset and an end, which defines the region of the file represented by the split. For splits representing the while file, the offset is zero and the length is the file size.
The split may furthermore have a "reader position", which is the checkpointed position from a reader previously reading this split. This position is typically null when the split is assigned from the enumerator to the readers, and is non-null when the readers checkpoint their state in a file source split.
This class is Serializable
for convenience. For Flink's internal serialization (both
for RPC and for checkpoints), the FileSourceSplitSerializer
is used.
Constructor and Description |
---|
FileSourceSplit(String id,
Path filePath,
long offset,
long length)
Deprecated.
You should use
FileSourceSplit(String, Path, long, long, long, long) |
FileSourceSplit(String id,
Path filePath,
long offset,
long length,
long fileModificationTime,
long fileSize)
Constructs a split with host information.
|
FileSourceSplit(String id,
Path filePath,
long offset,
long length,
long fileModificationTime,
long fileSize,
String... hostnames)
Constructs a split with host information.
|
FileSourceSplit(String id,
Path filePath,
long offset,
long length,
long fileModificationTime,
long fileSize,
String[] hostnames,
CheckpointedPosition readerPosition)
Constructs a split with host information.
|
FileSourceSplit(String id,
Path filePath,
long offset,
long length,
String... hostnames)
Deprecated.
|
FileSourceSplit(String id,
Path filePath,
long offset,
long length,
String[] hostnames,
CheckpointedPosition readerPosition)
Deprecated.
|
Modifier and Type | Method and Description |
---|---|
long |
fileModificationTime()
Returns the modification time of the file, from
FileStatus.getModificationTime() . |
long |
fileSize()
Returns the full file size in bytes, from
FileStatus.getLen() . |
Optional<CheckpointedPosition> |
getReaderPosition()
Gets the (checkpointed) position of the reader, if set.
|
String[] |
hostnames()
Gets the hostnames of the nodes storing the file range described by this split.
|
long |
length()
Returns the number of bytes in the file region described by this source split.
|
long |
offset()
Returns the start of the file region referenced by this source split.
|
Path |
path()
Gets the file's path.
|
String |
splitId()
Get the split id of this source split.
|
String |
toString() |
FileSourceSplit |
updateWithCheckpointedPosition(CheckpointedPosition position)
Creates a copy of this split where the checkpointed position is replaced by the given new
position.
|
public FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize)
id
- The unique ID of this source split.filePath
- The path to the file.offset
- The start (inclusive) of the split's rage in the file.length
- The number of bytes in the split (starting from the offset)fileModificationTime
- The modification time of the filefileSize
- The size of the full filepublic FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize, String... hostnames)
filePath
- The path to the file.offset
- The start (inclusive) of the split's rage in the file.length
- The number of bytes in the split (starting from the offset)fileModificationTime
- The modification time of the filefileSize
- The size of the full filehostnames
- The hostnames of the nodes storing the split's file range.public FileSourceSplit(String id, Path filePath, long offset, long length, long fileModificationTime, long fileSize, String[] hostnames, @Nullable CheckpointedPosition readerPosition)
filePath
- The path to the file.offset
- The start (inclusive) of the split's rage in the file.length
- The number of bytes in the split (starting from the offset)fileModificationTime
- The modification time of the filefileSize
- The size of the full filehostnames
- The hostnames of the nodes storing the split's file range.@Deprecated public FileSourceSplit(String id, Path filePath, long offset, long length)
FileSourceSplit(String, Path, long, long, long, long)
@Deprecated public FileSourceSplit(String id, Path filePath, long offset, long length, String... hostnames)
FileSourceSplit(String, Path, long, long, long, long,
String...)
@Deprecated public FileSourceSplit(String id, Path filePath, long offset, long length, String[] hostnames, @Nullable CheckpointedPosition readerPosition)
FileSourceSplit(String, Path, long, long, long, long,
String[], CheckpointedPosition)
public String splitId()
SourceSplit
splitId
in interface SourceSplit
public Path path()
public long offset()
public long length()
public long fileModificationTime()
FileStatus.getModificationTime()
.public long fileSize()
FileStatus.getLen()
.public String[] hostnames()
Host information is typically only available on specific file systems, like HDFS.
public Optional<CheckpointedPosition> getReaderPosition()
public FileSourceSplit updateWithCheckpointedPosition(@Nullable CheckpointedPosition position)
IMPORTANT: Subclasses that add additional information to the split must override this method to return that subclass type. This contract is enforced by checks in the file source implementation. We did not try to enforce this contract via generics in this split class, because it leads to very ugly and verbose use of generics.
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.