@Public public abstract class DelimitedInputFormat<OT> extends FileInputFormat<OT> implements CheckpointableInputFormat<FileInputSplit,Long>
readRecord(Object, byte[], int, int)
method.
The default delimiter is the newline character '\n'
.
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
Modifier and Type | Field and Description |
---|---|
protected byte[] |
currBuffer |
protected int |
currLen |
protected int |
currOffset |
protected static String |
RECORD_DELIMITER
The configuration key to set the record delimiter.
|
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Modifier | Constructor and Description |
---|---|
|
DelimitedInputFormat() |
protected |
DelimitedInputFormat(Path filePath,
Configuration configuration) |
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the input by releasing all buffers and closing the file input stream.
|
void |
configure(Configuration parameters)
Configures this input format by reading the path to the file from the configuration and the
string that defines the record delimiter.
|
int |
getBufferSize() |
Charset |
getCharset()
Get the character set used for the row delimiter.
|
Long |
getCurrentState()
Returns the split currently being read, along with its current state.
|
byte[] |
getDelimiter() |
int |
getLineLengthLimit() |
int |
getNumLineSamples() |
FileInputFormat.FileBaseStatistics |
getStatistics(BaseStatistics cachedStats)
Obtains basic file statistics containing only file size.
|
protected void |
initializeSplit(FileInputSplit split,
Long state)
Initialization method that is called after opening or reopening an input split.
|
protected static void |
loadConfigParameters(Configuration parameters) |
protected static void |
loadGlobalConfigParams()
Deprecated.
Please use
loadConfigParameters(Configuration config |
OT |
nextRecord(OT record)
Reads the next record from the input.
|
void |
open(FileInputSplit split)
Opens the given input split.
|
boolean |
reachedEnd()
Checks whether the current split is at its end.
|
protected boolean |
readLine() |
abstract OT |
readRecord(OT reuse,
byte[] bytes,
int offset,
int numBytes)
This function parses the given byte array which represents a serialized record.
|
void |
reopen(FileInputSplit split,
Long state)
Restores the state of a parallel instance reading from an
InputFormat . |
void |
setBufferSize(int bufferSize) |
void |
setCharset(String charset)
Set the name of the character set used for the row delimiter.
|
void |
setDelimiter(byte[] delimiter) |
void |
setDelimiter(char delimiter) |
void |
setDelimiter(String delimiter) |
void |
setLineLengthLimit(int lineLengthLimit) |
void |
setNumLineSamples(int numLineSamples) |
acceptFile, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getSupportedCompressionFormats, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, supportsMultiPaths, testForUnsplittable, toString
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
protected transient byte[] currBuffer
protected transient int currOffset
protected transient int currLen
protected static final String RECORD_DELIMITER
public DelimitedInputFormat()
protected DelimitedInputFormat(Path filePath, Configuration configuration)
@Deprecated protected static void loadGlobalConfigParams()
loadConfigParameters(Configuration config
protected static void loadConfigParameters(Configuration parameters)
@PublicEvolving public Charset getCharset()
FieldParser
s.@PublicEvolving public void setCharset(String charset)
FieldParser
s.
These fields are interpreted when set. Changing the charset thereafter may cause unexpected results.
charset
- name of the charsetpublic byte[] getDelimiter()
public void setDelimiter(byte[] delimiter)
public void setDelimiter(char delimiter)
public void setDelimiter(String delimiter)
public int getLineLengthLimit()
public void setLineLengthLimit(int lineLengthLimit)
public int getBufferSize()
public void setBufferSize(int bufferSize)
public int getNumLineSamples()
public void setNumLineSamples(int numLineSamples)
public abstract OT readRecord(OT reuse, byte[] bytes, int offset, int numBytes) throws IOException
reuse
- An optionally reusable object.bytes
- Binary data of serialized records.offset
- The offset where to start to read the record data.numBytes
- The number of bytes that can be read starting at the offset position.IOException
- if the record could not be read.public void configure(Configuration parameters)
configure
in interface InputFormat<OT,FileInputSplit>
configure
in class FileInputFormat<OT>
parameters
- The configuration object to read the parameters from.InputFormat.configure(org.apache.flink.configuration.Configuration)
public FileInputFormat.FileBaseStatistics getStatistics(BaseStatistics cachedStats) throws IOException
FileInputFormat
getStatistics
in interface InputFormat<OT,FileInputSplit>
getStatistics
in class FileInputFormat<OT>
cachedStats
- The statistics that were cached. May be null.IOException
InputFormat.getStatistics(org.apache.flink.api.common.io.statistics.BaseStatistics)
public void open(FileInputSplit split) throws IOException
open
in interface InputFormat<OT,FileInputSplit>
open
in class FileInputFormat<OT>
split
- The input split to open.IOException
- Thrown, if the spit could not be opened due to an I/O problem.FileInputFormat.open(org.apache.flink.core.fs.FileInputSplit)
public boolean reachedEnd()
reachedEnd
in interface InputFormat<OT,FileInputSplit>
public OT nextRecord(OT record) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
nextRecord
in interface InputFormat<OT,FileInputSplit>
record
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public void close() throws IOException
close
in interface InputFormat<OT,FileInputSplit>
close
in class FileInputFormat<OT>
IOException
- Thrown, if the closing of the file stream causes an I/O error.protected final boolean readLine() throws IOException
IOException
@PublicEvolving public Long getCurrentState() throws IOException
CheckpointableInputFormat
getCurrentState
in interface CheckpointableInputFormat<FileInputSplit,Long>
IOException
- Thrown if the creation of the state object failed.@PublicEvolving public void reopen(FileInputSplit split, Long state) throws IOException
CheckpointableInputFormat
InputFormat
. This is
necessary when recovering from a task failure. When this method is called, the input format
it guaranteed to be configured.
NOTE: The caller has to make sure that the provided split is the one to whom the state belongs.
reopen
in interface CheckpointableInputFormat<FileInputSplit,Long>
split
- The split to be opened.state
- The state from which to start from. This can contain the offset, but also other
data, depending on the input format.IOException
protected void initializeSplit(FileInputSplit split, @Nullable Long state) throws IOException
split
- Split that was opened or reopenedstate
- Checkpointed state if the split was reopenedIOException
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.