Class Vectorizer<T>

  • Type Parameters:
    T - The type of the element
    All Implemented Interfaces:
    Serializable
    Direct Known Subclasses:
    RowDataVectorizer

    @PublicEvolving
    public abstract class Vectorizer<T>
    extends Object
    implements Serializable
    This class provides an abstracted set of methods to handle the lifecycle of VectorizedRowBatch.

    Users have to extend this class and override the vectorize() method with the logic to transform the element to a VectorizedRowBatch.

    See Also:
    Serialized Form
    • Constructor Detail

      • Vectorizer

        public Vectorizer​(String schema)
    • Method Detail

      • getSchema

        public org.apache.orc.TypeDescription getSchema()
        Provides the ORC schema.
        Returns:
        the ORC schema
      • setWriter

        public void setWriter​(org.apache.orc.Writer writer)
        Users are not supposed to use this method since this is intended to be used only by the OrcBulkWriter.
        Parameters:
        writer - the underlying ORC Writer.
      • addUserMetadata

        public void addUserMetadata​(String key,
                                    ByteBuffer value)
        Adds arbitrary user metadata to the outgoing ORC file.

        Users who want to dynamically add new metadata either based on either the input or from an external system can do so by calling addUserMetadata(...) inside the overridden vectorize() method.

        Parameters:
        key - a key to label the data with.
        value - the contents of the metadata.
      • vectorize

        public abstract void vectorize​(T element,
                                       org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch batch)
                                throws IOException
        Transforms the provided element to ColumnVectors and sets them in the exposed VectorizedRowBatch.
        Parameters:
        element - The input element
        batch - The batch to write the ColumnVectors
        Throws:
        IOException - if there is an error while transforming the input.