public abstract class AbstractTableInputFormat<T> extends RichInputFormat<T,TableInputSplit>
InputFormat
to read data from HBase tables.Modifier and Type | Field and Description |
---|---|
protected byte[] |
currentRow |
protected boolean |
endReached |
protected static org.slf4j.Logger |
LOG |
protected org.apache.hadoop.hbase.client.ResultScanner |
resultScanner
HBase iterator wrapper.
|
protected org.apache.hadoop.hbase.client.Scan |
scan |
protected long |
scannedRows |
protected org.apache.hadoop.hbase.client.HTable |
table |
Constructor and Description |
---|
AbstractTableInputFormat() |
Modifier and Type | Method and Description |
---|---|
void |
close()
Method that marks the end of the life-cycle of an input split.
|
void |
closeInputFormat()
Closes this InputFormat instance.
|
abstract void |
configure(Configuration parameters)
Creates a
Scan object and opens the HTable connection. |
TableInputSplit[] |
createInputSplits(int minNumSplits)
Creates the different splits of the input that can be processed in parallel.
|
InputSplitAssigner |
getInputSplitAssigner(TableInputSplit[] inputSplits)
Gets the type of the input splits that are processed by this input format.
|
protected abstract org.apache.hadoop.hbase.client.Scan |
getScanner()
Returns an instance of Scan that retrieves the required subset of records from the HBase table.
|
BaseStatistics |
getStatistics(BaseStatistics cachedStatistics)
Gets the basic statistics from the input described by this format.
|
protected abstract String |
getTableName()
What table is to be read.
|
protected boolean |
includeRegionInScan(byte[] startKey,
byte[] endKey)
Test if the given region is to be included in the scan while splitting the regions of a table.
|
protected abstract T |
mapResultToOutType(org.apache.hadoop.hbase.client.Result r)
HBase returns an instance of
Result . |
T |
nextRecord(T reuse)
Reads the next record from the input.
|
void |
open(TableInputSplit split)
Opens a parallel instance of the input format to work on a split.
|
boolean |
reachedEnd()
Method used to check if the end of the input is reached.
|
getRuntimeContext, openInputFormat, setRuntimeContext
protected static final org.slf4j.Logger LOG
protected boolean endReached
protected transient org.apache.hadoop.hbase.client.HTable table
protected transient org.apache.hadoop.hbase.client.Scan scan
protected org.apache.hadoop.hbase.client.ResultScanner resultScanner
protected byte[] currentRow
protected long scannedRows
protected abstract org.apache.hadoop.hbase.client.Scan getScanner()
protected abstract String getTableName()
Per instance of a TableInputFormat derivative only a single table name is possible.
protected abstract T mapResultToOutType(org.apache.hadoop.hbase.client.Result r)
Result
.
This method maps the returned Result
instance into the output type T
.
r
- The Result instance from HBase that needs to be convertedT
that contains the data of Result.public abstract void configure(Configuration parameters)
Scan
object and opens the HTable
connection.
These are opened here because they are needed in the createInputSplits which is called before the openInputFormat method.
The connection is opened in this method and closed in closeInputFormat()
.
parameters
- The configuration that is to be usedConfiguration
public void open(TableInputSplit split) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
split
- The split to be opened.IOException
- Thrown, if the spit could not be opened due to an I/O problem.public T nextRecord(T reuse) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
reuse
- Object that may be reused.IOException
- Thrown, if an I/O error occurred.public boolean reachedEnd() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if an I/O error occurred.public void close() throws IOException
InputFormat
When this method is called, the input format it guaranteed to be opened.
IOException
- Thrown, if the input could not be closed properly.public void closeInputFormat() throws IOException
RichInputFormat
RichInputFormat.openInputFormat()
should be closed in this method.closeInputFormat
in class RichInputFormat<T,TableInputSplit>
IOException
- in case closing the resources failedInputFormat
public TableInputSplit[] createInputSplits(int minNumSplits) throws IOException
InputFormat
When this method is called, the input format it guaranteed to be configured.
minNumSplits
- The minimum desired number of splits. If fewer are created, some parallel
instances may remain idle.IOException
- Thrown, when the creation of the splits was erroneous.protected boolean includeRegionInScan(byte[] startKey, byte[] endKey)
startKey
- Start key of the regionendKey
- End key of the regionpublic InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits)
InputFormat
public BaseStatistics getStatistics(BaseStatistics cachedStatistics)
InputFormat
When this method is called, the input format it guaranteed to be configured.
cachedStatistics
- The statistics that were cached. May be null.Copyright © 2014–2020 The Apache Software Foundation. All rights reserved.