T
- public abstract class HCatInputFormatBase<T> extends RichInputFormat<T,HadoopInputSplit> implements ResultTypeQueryable<T>
Data can be returned as HCatRecord
or Flink-native
tuple.
Note: Flink tuples might only support a limited number of fields (depending on the API).
Modifier and Type | Field and Description |
---|---|
protected String[] |
fieldNames |
protected org.apache.hive.hcatalog.data.schema.HCatSchema |
outputSchema |
Constructor and Description |
---|
HCatInputFormatBase() |
HCatInputFormatBase(String database,
String table)
Creates a HCatInputFormat for the given database and table.
|
HCatInputFormatBase(String database,
String table,
Configuration config)
Creates a HCatInputFormat for the given database, table, and
Configuration . |
Modifier and Type | Method and Description |
---|---|
HCatInputFormatBase<T> |
asFlinkTuples()
Specifies that the InputFormat returns Flink tuples instead of
HCatRecord . |
protected abstract T |
buildFlinkTuple(T t,
org.apache.hive.hcatalog.data.HCatRecord record) |
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
configure(Configuration parameters)
Configures this input format.
|
HadoopInputSplit[] |
createInputSplits(int minNumSplits)
Computes the input splits.
|
Configuration |
getConfiguration()
Returns the
Configuration of the HCatInputFormat. |
HCatInputFormatBase<T> |
getFields(String... fields)
Specifies the fields which are returned by the InputFormat and their order.
|
InputSplitAssigner |
getInputSplitAssigner(HadoopInputSplit[] inputSplits)
Returns the assigner for the input splits.
|
protected abstract int |
getMaxFlinkTupleSize() |
org.apache.hive.hcatalog.data.schema.HCatSchema |
getOutputSchema()
Returns the
HCatSchema of the HCatRecord returned by this InputFormat. |
TypeInformation<T> |
getProducedType()
Gets the data type (as a
TypeInformation ) produced by this function or input format. |
BaseStatistics |
getStatistics(BaseStatistics cachedStats)
Gets the basic statistics from the input described by this format.
|
T |
nextRecord(T record)
Reads the next record from the input.
|
void |
open(HadoopInputSplit split)
Opens a parallel instance of the input format to work on a split.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
HCatInputFormatBase<T> |
withFilter(String filter)
Specifies a SQL-like filter condition on the table's partition columns.
|
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
protected String[] fieldNames
protected org.apache.hive.hcatalog.data.schema.HCatSchema outputSchema
public HCatInputFormatBase()
public HCatInputFormatBase(String database, String table) throws IOException
HCatRecord
. The return type of the InputFormat
can be changed to Flink-native tuples by calling asFlinkTuples()
.database
- The name of the database to read from.table
- The name of the table to read.IOException
public HCatInputFormatBase(String database, String table, Configuration config) throws IOException
Configuration
. By default, the InputFormat returns HCatRecord
. The return type of the InputFormat can be changed
to Flink-native tuples by calling asFlinkTuples()
.database
- The name of the database to read from.table
- The name of the table to read.config
- The Configuration for the InputFormat.IOException
public HCatInputFormatBase<T> getFields(String... fields) throws IOException
fields
- The fields and their order which are returned by the InputFormat.IOException
public HCatInputFormatBase<T> withFilter(String filter) throws IOException
filter
- A SQL-like filter condition on the table's partition columns.IOException
public HCatInputFormatBase<T> asFlinkTuples() throws org.apache.hive.hcatalog.common.HCatException
HCatRecord
.
Note: Flink tuples might only support a limited number of fields (depending on the API).
org.apache.hive.hcatalog.common.HCatException
protected abstract int getMaxFlinkTupleSize()
public Configuration getConfiguration()
Configuration
of the HCatInputFormat.public org.apache.hive.hcatalog.data.schema.HCatSchema getOutputSchema()
HCatSchema
of the HCatRecord
returned by this InputFormat.public void configure(Configuration parameters)
InputFormat
This method is always called first on a newly instantiated input format.
configure
in interface InputFormat<T,HadoopInputSplit>
parameters
- The configuration with all parameters (note: not the Flink config but the
TaskConfig).public BaseStatistics getStatistics(BaseStatistics cachedStats) throws IOException
InputFormat
When this method is called, the input format is guaranteed to be configured.
getStatistics
in interface InputFormat<T,HadoopInputSplit>
cachedStats
- The statistics that were cached. May be null.IOException
public HadoopInputSplit[] createInputSplits(int minNumSplits) throws IOException
InputSplitSource
createInputSplits
in interface InputFormat<T,HadoopInputSplit>
createInputSplits
in interface InputSplitSource<HadoopInputSplit>
minNumSplits
- Number of minimal input splits, as a hint.IOException
public InputSplitAssigner getInputSplitAssigner(HadoopInputSplit[] inputSplits)
InputSplitSource
getInputSplitAssigner
in interface InputFormat<T,HadoopInputSplit>
getInputSplitAssigner
in interface InputSplitSource<HadoopInputSplit>
public void open(HadoopInputSplit split) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
open
in interface InputFormat<T,HadoopInputSplit>
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reachedEnd
in interface InputFormat<T,HadoopInputSplit>
IOException
- Thrown, if an I/O error occurred.public T nextRecord(T record) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
nextRecord
in interface InputFormat<T,HadoopInputSplit>
record
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.protected abstract T buildFlinkTuple(T t, org.apache.hive.hcatalog.data.HCatRecord record) throws org.apache.hive.hcatalog.common.HCatException
org.apache.hive.hcatalog.common.HCatException
public void close() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
close
in interface InputFormat<T,HadoopInputSplit>
IOException
- Thrown, if the input could not be closed properly.public TypeInformation<T> getProducedType()
ResultTypeQueryable
TypeInformation
) produced by this function or input format.getProducedType
in interface ResultTypeQueryable<T>
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.