Formats#

Avro#

AvroSchema(j_schema)

Avro Schema class contains Java org.apache.avro.Schema.

GenericRecordAvroTypeInfo(schema)

A TypeInformation of Avro's GenericRecord, including the schema.

AvroInputFormat(path, schema)

Provides a FileInputFormat for Avro records.

AvroBulkWriters()

Convenience builder to create BulkWriterFactory for Avro types.

AvroRowDeserializationSchema([record_class, ...])

Deserialization schema from Avro bytes to Row.

AvroRowSerializationSchema([record_class, ...])

Serialization schema that serializes to Avro binary format.

CSV#

CsvSchema(j_schema, row_type)

CsvSchema holds schema information of a csv file, corresponding to Java com.fasterxml.jackson.dataformat.csv.CsvSchema class.

CsvSchemaBuilder()

CsvSchemaBuilder is for building a CsvSchema, corresponding to Java com.fasterxml.jackson.dataformat.csv.CsvSchema.Builder class.

CsvReaderFormat(j_csv_format)

The StreamFormat for reading csv files.

CsvBulkWriters()

CsvBulkWriter is for building BulkWriterFactory to write Rows with a predefined CSV schema to partitioned files in a bulk fashion.

CsvRowDeserializationSchema(...)

Deserialization schema from CSV to Flink types.

CsvRowSerializationSchema(...)

Serialization schema that serializes an object of Flink types into a CSV bytes.

Json#

JsonRowDeserializationSchema(...)

Deserialization schema from JSON to Flink types.

JsonRowSerializationSchema(...)

Serialization schema that serializes an object of Flink types into a JSON bytes.

Orc#

OrcBulkWriters()

Convenient builder to create a BulkWriterFactory that writes records with a predefined schema into Orc files in a batch fashion.

Parquet#

AvroParquetReaders()

A convenience builder to create reader format that reads individual Avro records from a Parquet stream.

AvroParquetWriters()

Convenient builder to create Parquet BulkWriterFactory instances for Avro types.

ParquetColumnarRowInputFormat(row_type[, ...])

A ParquetVectorizedInputFormat to provide RowData iterator.

ParquetBulkWriters()

Convenient builder to create a BulkWriterFactory that writes records with a predefined schema into Parquet files in a batch fashion.