public abstract class OrcInputFormat<T> extends FileInputFormat<T>
FileInputFormat.FileBaseStatistics, FileInputFormat.InputSplitOpenThread
Modifier and Type | Field and Description |
---|---|
protected int |
batchSize |
protected Configuration |
conf |
protected ArrayList<OrcSplitReader.Predicate> |
conjunctPredicates |
protected OrcSplitReader<T> |
reader |
protected org.apache.orc.TypeDescription |
schema |
protected int[] |
selectedFields |
currentSplit, ENUMERATE_NESTED_FILES_FLAG, enumerateNestedFiles, filePath, INFLATER_INPUT_STREAM_FACTORIES, minSplitSize, numSplits, openTimeout, READ_WHOLE_SPLIT_FLAG, splitLength, splitStart, stream, unsplittable
Constructor and Description |
---|
OrcInputFormat(Path path,
org.apache.orc.TypeDescription orcSchema,
Configuration orcConfig,
int batchSize)
Creates an OrcInputFormat.
|
Modifier and Type | Method and Description |
---|---|
void |
addPredicate(OrcSplitReader.Predicate predicate)
Adds a filter predicate to reduce the number of rows to be returned by the input format.
|
void |
close()
Closes the file input stream of the input format.
|
void |
closeInputFormat()
Closes this InputFormat instance.
|
T |
nextRecord(T reuse)
Reads the next record from the input.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
void |
selectFields(int... selectedFields)
Selects the fields from the ORC schema that are returned by InputFormat.
|
boolean |
supportsMultiPaths()
Override this method to supports multiple paths.
|
acceptFile, configure, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitStart, getStatistics, open, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, testForUnsplittable, toString
getRuntimeContext, openInputFormat, setRuntimeContext
protected int batchSize
protected Configuration conf
protected org.apache.orc.TypeDescription schema
protected int[] selectedFields
protected ArrayList<OrcSplitReader.Predicate> conjunctPredicates
protected transient OrcSplitReader<T> reader
public OrcInputFormat(Path path, org.apache.orc.TypeDescription orcSchema, Configuration orcConfig, int batchSize)
path
- The path to read ORC files from.orcSchema
- The schema of the ORC files as ORC TypeDescription.orcConfig
- The configuration to read the ORC files with.batchSize
- The number of Row objects to read in a batch.public void selectFields(int... selectedFields)
selectedFields
- The indices of the fields of the ORC schema that are returned by the InputFormat.public void addPredicate(OrcSplitReader.Predicate predicate)
Note: Predicates can significantly reduce the amount of data that is read. However, the OrcInputFormat does not guarantee that all returned rows qualify the predicates. Moreover, predicates are only applied if the referenced field is among the selected fields.
predicate
- The filter predicate.public void close() throws IOException
FileInputFormat
close
in interface InputFormat<T,FileInputSplit>
close
in class FileInputFormat<T>
IOException
- Thrown, if the input could not be closed properly.public void closeInputFormat() throws IOException
RichInputFormat
RichInputFormat.openInputFormat()
should be closed in this method.closeInputFormat
in class RichInputFormat<T,FileInputSplit>
IOException
- in case closing the resources failedInputFormat
public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if an I/O error occurred.public T nextRecord(T reuse) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public boolean supportsMultiPaths()
FileInputFormat
supportsMultiPaths
in class FileInputFormat<T>
Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.