T
- public abstract class HCatInputFormatBase<T> extends RichInputFormat<T,HadoopInputSplit> implements ResultTypeQueryable<T>
Data can be returned as HCatRecord
or Flink-native tuple.
Note: Flink tuples might only support a limited number of fields (depending on the API).
Modifier and Type | Field and Description |
---|---|
protected String[] |
fieldNames |
protected org.apache.hive.hcatalog.data.schema.HCatSchema |
outputSchema |
Constructor and Description |
---|
HCatInputFormatBase() |
HCatInputFormatBase(String database,
String table)
Creates a HCatInputFormat for the given database and table.
|
HCatInputFormatBase(String database,
String table,
Configuration config)
Creates a HCatInputFormat for the given database, table, and
Configuration . |
Modifier and Type | Method and Description |
---|---|
HCatInputFormatBase<T> |
asFlinkTuples()
Specifies that the InputFormat returns Flink tuples instead of
HCatRecord . |
protected abstract T |
buildFlinkTuple(T t,
org.apache.hive.hcatalog.data.HCatRecord record) |
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
configure(Configuration parameters)
Configures this input format.
|
HadoopInputSplit[] |
createInputSplits(int minNumSplits)
Creates the different splits of the input that can be processed in parallel.
|
Configuration |
getConfiguration()
Returns the
Configuration of the HCatInputFormat. |
HCatInputFormatBase<T> |
getFields(String... fields)
Specifies the fields which are returned by the InputFormat and their order.
|
InputSplitAssigner |
getInputSplitAssigner(HadoopInputSplit[] inputSplits)
Gets the type of the input splits that are processed by this input format.
|
protected abstract int |
getMaxFlinkTupleSize() |
org.apache.hive.hcatalog.data.schema.HCatSchema |
getOutputSchema()
Returns the
HCatSchema of the HCatRecord
returned by this InputFormat. |
TypeInformation<T> |
getProducedType()
Gets the data type (as a
TypeInformation ) produced by this function or input format. |
BaseStatistics |
getStatistics(BaseStatistics cachedStats)
Gets the basic statistics from the input described by this format.
|
T |
nextRecord(T record)
Reads the next record from the input.
|
void |
open(HadoopInputSplit split)
Opens a parallel instance of the input format to work on a split.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
HCatInputFormatBase<T> |
withFilter(String filter)
Specifies a SQL-like filter condition on the table's partition columns.
|
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
protected String[] fieldNames
protected org.apache.hive.hcatalog.data.schema.HCatSchema outputSchema
public HCatInputFormatBase()
public HCatInputFormatBase(String database, String table) throws IOException
HCatRecord
.
The return type of the InputFormat can be changed to Flink-native tuples by calling
asFlinkTuples()
.database
- The name of the database to read from.table
- The name of the table to read.IOException
public HCatInputFormatBase(String database, String table, Configuration config) throws IOException
Configuration
.
By default, the InputFormat returns HCatRecord
.
The return type of the InputFormat can be changed to Flink-native tuples by calling
asFlinkTuples()
.database
- The name of the database to read from.table
- The name of the table to read.config
- The Configuration for the InputFormat.IOException
public HCatInputFormatBase<T> getFields(String... fields) throws IOException
fields
- The fields and their order which are returned by the InputFormat.IOException
public HCatInputFormatBase<T> withFilter(String filter) throws IOException
filter
- A SQL-like filter condition on the table's partition columns.IOException
public HCatInputFormatBase<T> asFlinkTuples() throws org.apache.hive.hcatalog.common.HCatException
HCatRecord
.
Note: Flink tuples might only support a limited number of fields (depending on the API).
org.apache.hive.hcatalog.common.HCatException
protected abstract int getMaxFlinkTupleSize()
public Configuration getConfiguration()
Configuration
of the HCatInputFormat.public org.apache.hive.hcatalog.data.schema.HCatSchema getOutputSchema()
HCatSchema
of the HCatRecord
returned by this InputFormat.public void configure(Configuration parameters)
InputFormat
This method is always called first on a newly instantiated input format.
configure
in interface InputFormat<T,HadoopInputSplit>
parameters
- The configuration with all parameters (note: not the Flink config but the TaskConfig).public BaseStatistics getStatistics(BaseStatistics cachedStats) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
getStatistics
in interface InputFormat<T,HadoopInputSplit>
cachedStats
- The statistics that were cached. May be null.IOException
public HadoopInputSplit[] createInputSplits(int minNumSplits) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
createInputSplits
in interface InputFormat<T,HadoopInputSplit>
createInputSplits
in interface InputSplitSource<HadoopInputSplit>
minNumSplits
- The minimum desired number of splits. If fewer are created, some parallel
instances may remain idle.IOException
- Thrown, when the creation of the splits was erroneous.public InputSplitAssigner getInputSplitAssigner(HadoopInputSplit[] inputSplits)
InputFormat
getInputSplitAssigner
in interface InputFormat<T,HadoopInputSplit>
getInputSplitAssigner
in interface InputSplitSource<HadoopInputSplit>
public void open(HadoopInputSplit split) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
open
in interface InputFormat<T,HadoopInputSplit>
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reachedEnd
in interface InputFormat<T,HadoopInputSplit>
IOException
- Thrown, if an I/O error occurred.public T nextRecord(T record) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
nextRecord
in interface InputFormat<T,HadoopInputSplit>
record
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.protected abstract T buildFlinkTuple(T t, org.apache.hive.hcatalog.data.HCatRecord record) throws org.apache.hive.hcatalog.common.HCatException
org.apache.hive.hcatalog.common.HCatException
public void close() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
close
in interface InputFormat<T,HadoopInputSplit>
IOException
- Thrown, if the input could not be closed properly.public TypeInformation<T> getProducedType()
ResultTypeQueryable
TypeInformation
) produced by this function or input format.getProducedType
in interface ResultTypeQueryable<T>
Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.