public class RowCsvInputFormat extends AbstractCsvInputFormat<Row>
Row
.
Different from old csv org.apache.flink.api.java.io.RowCsvInputFormat
: 1.New csv will
emit this row (Fill null the remaining fields) when row is too short. But Old csv will skip this
too short row. 2.New csv, escape char will be removed. But old csv will keep the escape char.
These can be continuously improved in new csv input format: 1.New csv not support configure comment char. The comment char is "#". 2.New csv not support configure multi chars field delimiter. 3.New csv not support read first N, it will throw exception. 4.Only support configure line delimiter: "\r" or "\n" or "\r\n".
Modifier and Type | Class and Description |
---|---|
static class |
RowCsvInputFormat.Builder
A builder for creating a
RowCsvInputFormat . |
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
csvInputStream, csvSchema
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Modifier and Type | Method and Description |
---|---|
static RowCsvInputFormat.Builder |
builder(TypeInformation<Row> typeInfo,
Path... filePaths)
Create a builder.
|
Row |
nextRecord(Row record)
Reads the next record from the input.
|
void |
open(FileInputSplit split)
Opens an input stream to the file defined in the input format.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
supportsMultiPaths
acceptFile, close, configure, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, testForUnsplittable, toString
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
public void open(FileInputSplit split) throws IOException
FileInputFormat
The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
open
in interface InputFormat<Row,FileInputSplit>
open
in class AbstractCsvInputFormat<Row>
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public boolean reachedEnd()
InputFormat
When this method is called, the input format it guaranteed to be opened.
public Row nextRecord(Row record) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
record
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public static RowCsvInputFormat.Builder builder(TypeInformation<Row> typeInfo, Path... filePaths)
Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.