T
- The type of the returned elements.@PublicEvolving public class CsvReaderFormat<T> extends SimpleStreamFormat<T>
StreamFormat
for reading CSV files.
The following example shows how to create a CsvReaderFormat
where the schema for CSV
parsing is automatically derived based on the fields of a POJO class.
CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
FileSource<SomePojo> source =
FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();
Note: you might need to add @JsonPropertyOrder({field1, field2, ...})
annotation from
the Jackson
library to your class definition with the fields order exactly matching those
of the CSV file columns).
If you need more fine-grained control over the CSV schema or the parsing options, use the more
low-level forSchema
static factory method based on the Jackson
library utilities:
Function<CsvMapper, CsvSchema> schemaGenerator =
mapper -> mapper.schemaFor(SomePojo.class)
.withColumnSeparator('|');
CsvReaderFormat<SomePojo> csvFormat =
CsvReaderFormat.forSchema(() -> new CsvMapper(), schemaGenerator, TypeInformation.of(SomePojo.class));
FileSource<SomePojo> source =
FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();
FETCH_IO_SIZE
Modifier and Type | Method and Description |
---|---|
StreamFormat.Reader<T> |
createReader(Configuration config,
FSDataInputStream stream)
Creates a new reader.
|
static <T> CsvReaderFormat<T> |
forPojo(Class<T> pojoType)
Builds a new
CsvReaderFormat for reading CSV files mapped to the provided POJO class
definition. |
static <T> CsvReaderFormat<T> |
forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper mapper,
org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema,
TypeInformation<T> typeInformation)
Deprecated.
This method is limited to serializable
CsvMappers , preventing
the usage of certain Jackson modules (like the Java 8 Date/Time Serializers ). Use #forSchema(Supplier, Function,
TypeInformation) instead. |
static <T> CsvReaderFormat<T> |
forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema,
TypeInformation<T> typeInformation)
Builds a new
CsvReaderFormat using a CsvSchema . |
static <T> CsvReaderFormat<T> |
forSchema(SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory,
SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator,
TypeInformation<T> typeInformation)
Builds a new
CsvReaderFormat using a CsvSchema generator and CsvMapper factory. |
TypeInformation<T> |
getProducedType()
Gets the type produced by this format.
|
CsvReaderFormat<T> |
withIgnoreParseErrors()
Returns a new
CsvReaderFormat configured to ignore all parsing errors. |
createReader, isSplittable, restoreReader
public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, TypeInformation<T> typeInformation)
CsvReaderFormat
using a CsvSchema
.T
- The type of the returned elements.schema
- The Jackson CSV schema configured for parsing specific CSV files.typeInformation
- The Flink type descriptor of the returned elements.@Deprecated public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper mapper, org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, TypeInformation<T> typeInformation)
CsvMappers
, preventing
the usage of certain Jackson modules (like the Java 8 Date/Time Serializers
). Use #forSchema(Supplier, Function,
TypeInformation)
instead.public static <T> CsvReaderFormat<T> forSchema(SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory, SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator, TypeInformation<T> typeInformation)
CsvReaderFormat
using a CsvSchema
generator and CsvMapper
factory.T
- The type of the returned elements.mapperFactory
- The factory creating the CsvMapper
.schemaGenerator
- A generator that creates and configures the Jackson CSV schema for
parsing specific CSV files, from a mapper created by the mapper factory.typeInformation
- The Flink type descriptor of the returned elements.public static <T> CsvReaderFormat<T> forPojo(Class<T> pojoType)
CsvReaderFormat
for reading CSV files mapped to the provided POJO class
definition. Produced reader uses default mapper and schema settings, use forSchema
if
you need customizations.T
- The type of the returned elements.pojoType
- The type class of the POJO.public CsvReaderFormat<T> withIgnoreParseErrors()
CsvReaderFormat
configured to ignore all parsing errors. All the other
options directly carried over from the subject of the method call.public StreamFormat.Reader<T> createReader(Configuration config, FSDataInputStream stream) throws IOException
SimpleStreamFormat
If the reader previously checkpointed an offset, then the input stream will be positioned
to that particular offset. Readers checkpoint an offset by returning a value from the method
Reader#getCheckpointedPosition()
method with an offset other than CheckpointedPosition.NO_OFFSET
).
createReader
in class SimpleStreamFormat<T>
IOException
public TypeInformation<T> getProducedType()
SimpleStreamFormat
getProducedType
in interface ResultTypeQueryable<T>
getProducedType
in interface StreamFormat<T>
getProducedType
in class SimpleStreamFormat<T>
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.