Package org.apache.flink.api.common.io
Class GenericCsvInputFormat<OT>
- java.lang.Object
-
- org.apache.flink.api.common.io.RichInputFormat<OT,FileInputSplit>
-
- org.apache.flink.api.common.io.FileInputFormat<OT>
-
- org.apache.flink.api.common.io.DelimitedInputFormat<OT>
-
- org.apache.flink.api.common.io.GenericCsvInputFormat<OT>
-
- All Implemented Interfaces:
Serializable
,CheckpointableInputFormat<FileInputSplit,Long>
,InputFormat<OT,FileInputSplit>
,InputSplitSource<FileInputSplit>
- Direct Known Subclasses:
CsvInputFormat
@Internal public abstract class GenericCsvInputFormat<OT> extends DelimitedInputFormat<OT>
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.api.common.io.FileInputFormat
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
-
-
Field Summary
Fields Modifier and Type Field Description protected int
commentCount
protected byte[]
commentPrefix
protected boolean[]
fieldIncluded
protected int
invalidLineCount
protected boolean
lineDelimiterIsLinebreak
-
Fields inherited from class org.apache.flink.api.common.io.DelimitedInputFormat
currBuffer, currLen, currOffset, RECORD_DELIMITER
-
Fields inherited from class org.apache.flink.api.common.io.FileInputFormat
currentSplit, enumerateNestedFiles, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
GenericCsvInputFormat()
protected
GenericCsvInputFormat(Path filePath)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected static void
checkAndCoSort(int[] positions, Class<?>[] types)
protected static void
checkForMonotonousOrder(int[] positions, Class<?>[] types)
void
close()
Closes the input by releasing all buffers and closing the file input stream.void
enableQuotedStringParsing(char quoteCharacter)
byte[]
getCommentPrefix()
byte[]
getFieldDelimiter()
protected FieldParser<?>[]
getFieldParsers()
protected Class<?>[]
getGenericFieldTypes()
int
getNumberOfFieldsTotal()
int
getNumberOfNonNullFields()
protected void
initializeSplit(FileInputSplit split, Long offset)
Initialization method that is called after opening or reopening an input split.boolean
isLenient()
boolean
isSkippingFirstLineAsHeader()
protected boolean
parseRecord(Object[] holders, byte[] bytes, int offset, int numBytes)
void
setCharset(String charset)
Set the name of the character set used for the row delimiter.void
setCommentPrefix(String commentPrefix)
void
setFieldDelimiter(String delimiter)
protected void
setFieldsGeneric(boolean[] includedMask, Class<?>[] fieldTypes)
protected void
setFieldsGeneric(int[] sourceFieldIndices, Class<?>[] fieldTypes)
protected void
setFieldTypesGeneric(Class<?>... fieldTypes)
void
setLenient(boolean lenient)
void
setSkipFirstLineAsHeader(boolean skipFirstLine)
protected int
skipFields(byte[] bytes, int startPos, int limit, byte[] delim)
-
Methods inherited from class org.apache.flink.api.common.io.DelimitedInputFormat
configure, getBufferSize, getCharset, getCurrentState, getDelimiter, getLineLengthLimit, getNumLineSamples, getStatistics, loadConfigParameters, nextRecord, open, reachedEnd, readLine, readRecord, reopen, setBufferSize, setDelimiter, setDelimiter, setDelimiter, setLineLengthLimit, setNumLineSamples
-
Methods inherited from class org.apache.flink.api.common.io.FileInputFormat
acceptFile, createInputSplits, decorateInputStream, extractFileExtension, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getSupportedCompressionFormats, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, testForUnsplittable, toString
-
Methods inherited from class org.apache.flink.api.common.io.RichInputFormat
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
-
-
-
-
Constructor Detail
-
GenericCsvInputFormat
protected GenericCsvInputFormat()
-
GenericCsvInputFormat
protected GenericCsvInputFormat(Path filePath)
-
-
Method Detail
-
getNumberOfFieldsTotal
public int getNumberOfFieldsTotal()
-
getNumberOfNonNullFields
public int getNumberOfNonNullFields()
-
setCharset
public void setCharset(String charset)
Description copied from class:DelimitedInputFormat
Set the name of the character set used for the row delimiter. This is also used by subclasses to interpret field delimiters, comment strings, and for configuringFieldParser
s.These fields are interpreted when set. Changing the charset thereafter may cause unexpected results.
- Overrides:
setCharset
in classDelimitedInputFormat<OT>
- Parameters:
charset
- name of the charset
-
getCommentPrefix
public byte[] getCommentPrefix()
-
setCommentPrefix
public void setCommentPrefix(String commentPrefix)
-
getFieldDelimiter
public byte[] getFieldDelimiter()
-
setFieldDelimiter
public void setFieldDelimiter(String delimiter)
-
isLenient
public boolean isLenient()
-
setLenient
public void setLenient(boolean lenient)
-
isSkippingFirstLineAsHeader
public boolean isSkippingFirstLineAsHeader()
-
setSkipFirstLineAsHeader
public void setSkipFirstLineAsHeader(boolean skipFirstLine)
-
enableQuotedStringParsing
public void enableQuotedStringParsing(char quoteCharacter)
-
getFieldParsers
protected FieldParser<?>[] getFieldParsers()
-
getGenericFieldTypes
protected Class<?>[] getGenericFieldTypes()
-
setFieldTypesGeneric
protected void setFieldTypesGeneric(Class<?>... fieldTypes)
-
setFieldsGeneric
protected void setFieldsGeneric(int[] sourceFieldIndices, Class<?>[] fieldTypes)
-
setFieldsGeneric
protected void setFieldsGeneric(boolean[] includedMask, Class<?>[] fieldTypes)
-
initializeSplit
protected void initializeSplit(FileInputSplit split, Long offset) throws IOException
Description copied from class:DelimitedInputFormat
Initialization method that is called after opening or reopening an input split.- Overrides:
initializeSplit
in classDelimitedInputFormat<OT>
- Parameters:
split
- Split that was opened or reopenedoffset
- Checkpointed state if the split was reopened- Throws:
IOException
-
close
public void close() throws IOException
Description copied from class:DelimitedInputFormat
Closes the input by releasing all buffers and closing the file input stream.- Specified by:
close
in interfaceInputFormat<OT,FileInputSplit>
- Overrides:
close
in classDelimitedInputFormat<OT>
- Throws:
IOException
- Thrown, if the closing of the file stream causes an I/O error.
-
parseRecord
protected boolean parseRecord(Object[] holders, byte[] bytes, int offset, int numBytes) throws ParseException
- Throws:
ParseException
-
skipFields
protected int skipFields(byte[] bytes, int startPos, int limit, byte[] delim)
-
checkAndCoSort
protected static void checkAndCoSort(int[] positions, Class<?>[] types)
-
checkForMonotonousOrder
protected static void checkForMonotonousOrder(int[] positions, Class<?>[] types)
-
-