@Internal public abstract class AbstractTableInputFormat<T> extends RichInputFormat<T,TableInputSplit>
InputFormat
to read data from HBase tables.Modifier and Type | Field and Description |
---|---|
protected org.apache.hadoop.hbase.client.Connection |
connection |
protected byte[] |
currentRow |
protected boolean |
endReached |
protected static org.slf4j.Logger |
LOG |
protected org.apache.hadoop.hbase.client.RegionLocator |
regionLocator |
protected org.apache.hadoop.hbase.client.ResultScanner |
resultScanner
HBase iterator wrapper.
|
protected org.apache.hadoop.hbase.client.Scan |
scan |
protected long |
scannedRows |
protected byte[] |
serializedConfig |
protected org.apache.hadoop.hbase.client.Table |
table |
Constructor and Description |
---|
AbstractTableInputFormat(org.apache.hadoop.conf.Configuration hConf) |
Modifier and Type | Method and Description |
---|---|
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
closeTable() |
void |
configure(Configuration parameters)
Configures this input format.
|
TableInputSplit[] |
createInputSplits(int minNumSplits)
Computes the input splits.
|
org.apache.hadoop.hbase.client.Connection |
getConnection() |
protected org.apache.hadoop.conf.Configuration |
getHadoopConfiguration() |
InputSplitAssigner |
getInputSplitAssigner(TableInputSplit[] inputSplits)
Returns the assigner for the input splits.
|
protected abstract org.apache.hadoop.hbase.client.Scan |
getScanner()
Returns an instance of Scan that retrieves the required subset of records from the HBase
table.
|
BaseStatistics |
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.
|
protected abstract String |
getTableName()
What table is to be read.
|
protected boolean |
includeRegionInScan(byte[] startKey,
byte[] endKey)
Test if the given region is to be included in the scan while splitting the regions of a
table.
|
protected abstract void |
initTable()
Creates a
Scan object and opens the HTable connection to initialize the HBase
table. |
protected abstract T |
mapResultToOutType(org.apache.hadoop.hbase.client.Result r)
HBase returns an instance of
Result . |
T |
nextRecord(T reuse)
Reads the next record from the input.
|
void |
open(TableInputSplit split)
Creates a
Scan object and opens the HTable connection. |
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
closeInputFormat, getRuntimeContext, openInputFormat, setRuntimeContext
protected static final org.slf4j.Logger LOG
protected boolean endReached
protected transient org.apache.hadoop.hbase.client.Connection connection
protected transient org.apache.hadoop.hbase.client.Table table
protected transient org.apache.hadoop.hbase.client.RegionLocator regionLocator
protected transient org.apache.hadoop.hbase.client.Scan scan
protected org.apache.hadoop.hbase.client.ResultScanner resultScanner
protected byte[] currentRow
protected long scannedRows
protected byte[] serializedConfig
public AbstractTableInputFormat(org.apache.hadoop.conf.Configuration hConf)
protected abstract void initTable() throws IOException
Scan
object and opens the HTable
connection to initialize the HBase
table.IOException
- Thrown, if the connection could not be opened due to an I/O problem.protected abstract org.apache.hadoop.hbase.client.Scan getScanner()
protected abstract String getTableName()
Per instance of a TableInputFormat derivative only a single table name is possible.
protected abstract T mapResultToOutType(org.apache.hadoop.hbase.client.Result r)
Result
.
This method maps the returned Result
instance into the output type T
.
r
- The Result instance from HBase that needs to be convertedT
that contains the data of Result.public void configure(Configuration parameters)
InputFormat
This method is always called first on a newly instantiated input format.
parameters
- The configuration with all parameters (note: not the Flink config but the
TaskConfig).protected org.apache.hadoop.conf.Configuration getHadoopConfiguration()
public void open(TableInputSplit split) throws IOException
Scan
object and opens the HTable
connection. The connection is
opened in this method and closed in close()
.split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public T nextRecord(T reuse) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if an I/O error occurred.public void close() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if the input could not be closed properly.public void closeTable()
public TableInputSplit[] createInputSplits(int minNumSplits) throws IOException
InputSplitSource
minNumSplits
- Number of minimal input splits, as a hint.IOException
protected boolean includeRegionInScan(byte[] startKey, byte[] endKey)
startKey
- Start key of the regionendKey
- End key of the regionpublic InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits)
InputSplitSource
public BaseStatistics getStatistics(BaseStatistics cachedStatistics)
InputFormat
When this method is called, the input format is guaranteed to be configured.
cachedStatistics
- The statistics that were cached. May be null.@VisibleForTesting public org.apache.hadoop.hbase.client.Connection getConnection()
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.