Package org.apache.flink.orc.vector
Class Vectorizer<T>
- java.lang.Object
-
- org.apache.flink.orc.vector.Vectorizer<T>
-
- Type Parameters:
T
- The type of the element
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
RowDataVectorizer
@PublicEvolving public abstract class Vectorizer<T> extends Object implements Serializable
This class provides an abstracted set of methods to handle the lifecycle ofVectorizedRowBatch
.Users have to extend this class and override the vectorize() method with the logic to transform the element to a
VectorizedRowBatch
.- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description Vectorizer(String schema)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
addUserMetadata(String key, ByteBuffer value)
Adds arbitrary user metadata to the outgoing ORC file.org.apache.orc.TypeDescription
getSchema()
Provides the ORC schema.void
setWriter(org.apache.orc.Writer writer)
Users are not supposed to use this method since this is intended to be used only by theOrcBulkWriter
.abstract void
vectorize(T element, org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch)
Transforms the provided element to ColumnVectors and sets them in the exposed VectorizedRowBatch.
-
-
-
Constructor Detail
-
Vectorizer
public Vectorizer(String schema)
-
-
Method Detail
-
getSchema
public org.apache.orc.TypeDescription getSchema()
Provides the ORC schema.- Returns:
- the ORC schema
-
setWriter
public void setWriter(org.apache.orc.Writer writer)
Users are not supposed to use this method since this is intended to be used only by theOrcBulkWriter
.- Parameters:
writer
- the underlying ORC Writer.
-
addUserMetadata
public void addUserMetadata(String key, ByteBuffer value)
Adds arbitrary user metadata to the outgoing ORC file.Users who want to dynamically add new metadata either based on either the input or from an external system can do so by calling
addUserMetadata(...)
inside the overridden vectorize() method.- Parameters:
key
- a key to label the data with.value
- the contents of the metadata.
-
vectorize
public abstract void vectorize(T element, org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) throws IOException
Transforms the provided element to ColumnVectors and sets them in the exposed VectorizedRowBatch.- Parameters:
element
- The input elementbatch
- The batch to write the ColumnVectors- Throws:
IOException
- if there is an error while transforming the input.
-
-