T
- The type of the element@PublicEvolving public abstract class Vectorizer<T> extends Object implements Serializable
VectorizedRowBatch
.
Users have to extend this class and override the vectorize() method with the logic to
transform the element to a VectorizedRowBatch
.
Constructor and Description |
---|
Vectorizer(String schema) |
Modifier and Type | Method and Description |
---|---|
void |
addUserMetadata(String key,
ByteBuffer value)
Adds arbitrary user metadata to the outgoing ORC file.
|
org.apache.orc.TypeDescription |
getSchema()
Provides the ORC schema.
|
void |
setWriter(org.apache.orc.Writer writer)
Users are not supposed to use this method since this is intended to be used only by the
OrcBulkWriter . |
abstract void |
vectorize(T element,
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch)
Transforms the provided element to ColumnVectors and sets them in the exposed
VectorizedRowBatch.
|
public Vectorizer(String schema)
public org.apache.orc.TypeDescription getSchema()
public void setWriter(org.apache.orc.Writer writer)
OrcBulkWriter
.writer
- the underlying ORC Writer.public void addUserMetadata(String key, ByteBuffer value)
Users who want to dynamically add new metadata either based on either the input or from an
external system can do so by calling addUserMetadata(...)
inside the overridden
vectorize() method.
key
- a key to label the data with.value
- the contents of the metadata.public abstract void vectorize(T element, org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch) throws IOException
element
- The input elementbatch
- The batch to write the ColumnVectorsIOException
- if there is an error while transforming the input.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.