public class OrcRowInputFormat extends OrcInputFormat<Row> implements ResultTypeQueryable<Row>
Row
.FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
batchSize, conf, conjunctPredicates, reader, schema, selectedFields
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Constructor and Description |
---|
OrcRowInputFormat(String path,
String schemaString,
Configuration orcConfig)
Creates an OrcRowInputFormat.
|
OrcRowInputFormat(String path,
String schemaString,
Configuration orcConfig,
int batchSize)
Creates an OrcRowInputFormat.
|
OrcRowInputFormat(String path,
org.apache.orc.TypeDescription orcSchema,
Configuration orcConfig,
int batchSize)
Creates an OrcRowInputFormat.
|
Modifier and Type | Method and Description |
---|---|
TypeInformation<Row> |
getProducedType()
Gets the data type (as a
TypeInformation ) produced by this function or input format. |
void |
open(FileInputSplit fileSplit)
Opens an input stream to the file defined in the input format.
|
void |
selectFields(int... selectedFields)
Selects the fields from the ORC schema that are returned by InputFormat.
|
addPredicate, close, closeInputFormat, nextRecord, reachedEnd, supportsMultiPaths
acceptFile, configure, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, testForUnsplittable, toString
getRuntimeContext, openInputFormat, setRuntimeContext
public OrcRowInputFormat(String path, String schemaString, Configuration orcConfig)
path
- The path to read ORC files from.schemaString
- The schema of the ORC files as String.orcConfig
- The configuration to read the ORC files with.public OrcRowInputFormat(String path, String schemaString, Configuration orcConfig, int batchSize)
path
- The path to read ORC files from.schemaString
- The schema of the ORC files as String.orcConfig
- The configuration to read the ORC files with.batchSize
- The number of Row objects to read in a batch.public OrcRowInputFormat(String path, org.apache.orc.TypeDescription orcSchema, Configuration orcConfig, int batchSize)
path
- The path to read ORC files from.orcSchema
- The schema of the ORC files as ORC TypeDescription.orcConfig
- The configuration to read the ORC files with.batchSize
- The number of Row objects to read in a batch.public void selectFields(int... selectedFields)
OrcInputFormat
selectFields
in class OrcInputFormat<Row>
selectedFields
- The indices of the fields of the ORC schema that are returned by the
InputFormat.public void open(FileInputSplit fileSplit) throws IOException
FileInputFormat
The stream is actually opened in an asynchronous thread to make sure any interruptions to the thread working on the input format do not reach the file system.
open
in interface InputFormat<Row,FileInputSplit>
open
in class FileInputFormat<Row>
fileSplit
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public TypeInformation<Row> getProducedType()
ResultTypeQueryable
TypeInformation
) produced by this function or input format.getProducedType
in interface ResultTypeQueryable<Row>
Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.