Package org.apache.flink.formats.csv
Class CsvReaderFormat<T>
- java.lang.Object
-
- org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
-
- org.apache.flink.formats.csv.CsvReaderFormat<T>
-
- Type Parameters:
T
- The type of the returned elements.
- All Implemented Interfaces:
Serializable
,ResultTypeQueryable<T>
,StreamFormat<T>
@PublicEvolving public class CsvReaderFormat<T> extends SimpleStreamFormat<T>
AStreamFormat
for reading CSV files.The following example shows how to create a
CsvReaderFormat
where the schema for CSV parsing is automatically derived based on the fields of a POJO class.
Note: you might need to addCsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class); FileSource<SomePojo> source = FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();
@JsonPropertyOrder({field1, field2, ...})
annotation from theJackson
library to your class definition with the fields order exactly matching those of the CSV file columns).If you need more fine-grained control over the CSV schema or the parsing options, use the more low-level
forSchema
static factory method based on theJackson
library utilities:Function<CsvMapper, CsvSchema> schemaGenerator = mapper -> mapper.schemaFor(SomePojo.class) .withColumnSeparator('|'); CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forSchema(() -> new CsvMapper(), schemaGenerator, TypeInformation.of(SomePojo.class)); FileSource<SomePojo> source = FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();
- See Also:
- Serialized Form
-
-
Field Summary
-
Fields inherited from interface org.apache.flink.connector.file.src.reader.StreamFormat
FETCH_IO_SIZE
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description StreamFormat.Reader<T>
createReader(Configuration config, FSDataInputStream stream)
Creates a new reader.static <T> CsvReaderFormat<T>
forPojo(Class<T> pojoType)
Builds a newCsvReaderFormat
for reading CSV files mapped to the provided POJO class definition.static <T> CsvReaderFormat<T>
forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, TypeInformation<T> typeInformation)
Builds a newCsvReaderFormat
using aCsvSchema
.static <T> CsvReaderFormat<T>
forSchema(SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory, SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator, TypeInformation<T> typeInformation)
Builds a newCsvReaderFormat
using aCsvSchema
generator andCsvMapper
factory.TypeInformation<T>
getProducedType()
Gets the type produced by this format.CsvReaderFormat<T>
withIgnoreParseErrors()
Returns a newCsvReaderFormat
configured to ignore all parsing errors.-
Methods inherited from class org.apache.flink.connector.file.src.reader.SimpleStreamFormat
createReader, isSplittable, restoreReader
-
-
-
-
Method Detail
-
forSchema
public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, TypeInformation<T> typeInformation)
Builds a newCsvReaderFormat
using aCsvSchema
.- Type Parameters:
T
- The type of the returned elements.- Parameters:
schema
- The Jackson CSV schema configured for parsing specific CSV files.typeInformation
- The Flink type descriptor of the returned elements.
-
forSchema
public static <T> CsvReaderFormat<T> forSchema(SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory, SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator, TypeInformation<T> typeInformation)
Builds a newCsvReaderFormat
using aCsvSchema
generator andCsvMapper
factory.- Type Parameters:
T
- The type of the returned elements.- Parameters:
mapperFactory
- The factory creating theCsvMapper
.schemaGenerator
- A generator that creates and configures the Jackson CSV schema for parsing specific CSV files, from a mapper created by the mapper factory.typeInformation
- The Flink type descriptor of the returned elements.
-
forPojo
public static <T> CsvReaderFormat<T> forPojo(Class<T> pojoType)
Builds a newCsvReaderFormat
for reading CSV files mapped to the provided POJO class definition. Produced reader uses default mapper and schema settings, useforSchema
if you need customizations.- Type Parameters:
T
- The type of the returned elements.- Parameters:
pojoType
- The type class of the POJO.
-
withIgnoreParseErrors
public CsvReaderFormat<T> withIgnoreParseErrors()
Returns a newCsvReaderFormat
configured to ignore all parsing errors. All the other options directly carried over from the subject of the method call.
-
createReader
public StreamFormat.Reader<T> createReader(Configuration config, FSDataInputStream stream) throws IOException
Description copied from class:SimpleStreamFormat
Creates a new reader. This method is called both for the creation of new reader (from the beginning of a file) and for restoring checkpointed readers.If the reader previously checkpointed an offset, then the input stream will be positioned to that particular offset. Readers checkpoint an offset by returning a value from the method
StreamFormat.Reader.getCheckpointedPosition()
method with an offset other thanCheckpointedPosition.NO_OFFSET
).- Specified by:
createReader
in classSimpleStreamFormat<T>
- Throws:
IOException
-
getProducedType
public TypeInformation<T> getProducedType()
Description copied from class:SimpleStreamFormat
Gets the type produced by this format. This type will be the type produced by the file source as a whole.- Specified by:
getProducedType
in interfaceResultTypeQueryable<T>
- Specified by:
getProducedType
in interfaceStreamFormat<T>
- Specified by:
getProducedType
in classSimpleStreamFormat<T>
- Returns:
- The data type produced by this function or input format.
-
-