TableInputFormat (flink 1.1-SNAPSHOT API)

java.lang.Object
- org.apache.flink.api.common.io.RichInputFormat<T,TableInputSplit>
- - org.apache.flink.addons.hbase.TableInputFormat<T>

All Implemented Interfaces:

Serializable, InputFormat<T,TableInputSplit>, InputSplitSource<TableInputSplit>
```
public abstract class TableInputFormat<T extends Tuple>
extends RichInputFormat<T,TableInputSplit>
```
InputFormat subclass that wraps the access for HTables.

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type Field and Description

protected org.apache.hadoop.hbase.client.Scan scan

protected org.apache.hadoop.hbase.client.HTable table

Fields
Modifier and Type	Field and Description
`protected org.apache.hadoop.hbase.client.Scan`	`scan`
`protected org.apache.hadoop.hbase.client.HTable`	`table`

Constructor Summary

Constructors
Constructor and Description

TableInputFormat()

Constructors
Constructor and Description
`TableInputFormat()`

Method Summary

All Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`close()` Method that marks the end of the life-cycle of an input split.
`void`	`closeInputFormat()` Closes this InputFormat instance.
`void`	`configure(Configuration parameters)` Creates a `Scan` object and opens the `HTable` connection.
`TableInputSplit[]`	`createInputSplits(int minNumSplits)` Creates the different splits of the input that can be processed in parallel.
`InputSplitAssigner`	`getInputSplitAssigner(TableInputSplit[] inputSplits)` Gets the type of the input splits that are processed by this input format.
`protected abstract org.apache.hadoop.hbase.client.Scan`	`getScanner()` Returns an instance of Scan that retrieves the required subset of records from the HBase table.
`BaseStatistics`	`getStatistics(BaseStatistics cachedStatistics)` Gets the basic statistics from the input described by this format.
`protected abstract String`	`getTableName()` What table is to be read.
`protected boolean`	`includeRegionInSplit(byte[] startKey, byte[] endKey)` Test if the given region is to be included in the InputSplit while splitting the regions of a table.
`protected abstract T`	`mapResultToTuple(org.apache.hadoop.hbase.client.Result r)` The output from HBase is always an instance of `Result`.
`T`	`nextRecord(T reuse)` Reads the next record from the input.
`void`	`open(TableInputSplit split)` Opens a parallel instance of the input format to work on a split.
`boolean`	`reachedEnd()` Method used to check if the end of the input is reached.

Methods inherited from class org.apache.flink.api.common.io.RichInputFormat
getRuntimeContext, openInputFormat, setRuntimeContext

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - table
```
protected transient org.apache.hadoop.hbase.client.HTable table
```
  - scan
```
protected transient org.apache.hadoop.hbase.client.Scan scan
```
- Constructor Detail
  - TableInputFormat
```
public TableInputFormat()
```
- Method Detail
  - getScanner
```
protected abstract org.apache.hadoop.hbase.client.Scan getScanner()
```
    Returns an instance of Scan that retrieves the required subset of records from the HBase table.
    
    Returns:
    
    The appropriate instance of Scan for this usecase.
  - getTableName
```
protected abstract String getTableName()
```
    What table is to be read. Per instance of a TableInputFormat derivative only a single tablename is possible.
    
    Returns:
    
    The name of the table
  - mapResultToTuple
```
protected abstract T mapResultToTuple(org.apache.hadoop.hbase.client.Result r)
```
    The output from HBase is always an instance of Result. This method is to copy the data in the Result instance into the required Tuple
    
    Parameters:
    
    r - The Result instance from HBase that needs to be converted
    
    Returns:
    
    The approriate instance of Tuple that contains the needed information.
  - configure
```
public void configure(Configuration parameters)
```
    Creates a Scan object and opens the HTable connection. These are opened here because they are needed in the createInputSplits which is called before the openInputFormat method. So the connection is opened in configure(Configuration) and closed in closeInputFormat().
    
    Parameters:
    
    parameters - The configuration that is to be used
    
    See Also:
    
    Configuration
  - open
```
public void open(TableInputSplit split)
          throws IOException
```
    Description copied from interface: InputFormat
    
    Opens a parallel instance of the input format to work on a split.
    When this method is called, the input format it guaranteed to be configured.
    
    Parameters:
    
    split - The split to be opened.
    
    Throws:
    
    IOException - Thrown, if the spit could not be opened due to an I/O problem.
  - reachedEnd
```
public boolean reachedEnd()
                   throws IOException
```
    Description copied from interface: InputFormat
    
    Method used to check if the end of the input is reached.
    When this method is called, the input format it guaranteed to be opened.
    
    Returns:
    
    True if the end is reached, otherwise false.
    
    Throws:
    
    IOException - Thrown, if an I/O error occurred.
  - nextRecord
```
public T nextRecord(T reuse)
             throws IOException
```
    Description copied from interface: InputFormat
    
    Reads the next record from the input.
    When this method is called, the input format it guaranteed to be opened.
    
    Parameters:
    
    reuse - Object that may be reused.
    
    Returns:
    
    Read record.
    
    Throws:
    
    IOException - Thrown, if an I/O error occurred.
  - close
```
public void close()
           throws IOException
```
    Description copied from interface: InputFormat
    
    Method that marks the end of the life-cycle of an input split. Should be used to close channels and streams and release resources. After this method returns without an error, the input is assumed to be correctly read.
    When this method is called, the input format it guaranteed to be opened.
    
    Throws:
    
    IOException - Thrown, if the input could not be closed properly.
  - closeInputFormat
```
public void closeInputFormat()
```
    Description copied from class: RichInputFormat
    
    Closes this InputFormat instance. This method is called once per parallel instance. Resources allocated during RichInputFormat.openInputFormat() should be closed in this method.
    
    Overrides:
    
    closeInputFormat in class RichInputFormat<T extends Tuple,TableInputSplit>
    
    See Also:
    
    InputFormat
  - createInputSplits
```
public TableInputSplit[] createInputSplits(int minNumSplits)
                                    throws IOException
```
    Description copied from interface: InputFormat
    
    Creates the different splits of the input that can be processed in parallel.
    When this method is called, the input format it guaranteed to be configured.
    
    Parameters:
    
    minNumSplits - The minimum desired number of splits. If fewer are created, some parallel instances may remain idle.
    
    Returns:
    
    The splits of this input that can be processed in parallel.
    
    Throws:
    
    IOException - Thrown, when the creation of the splits was erroneous.
  - includeRegionInSplit
```
protected boolean includeRegionInSplit(byte[] startKey,
                                       byte[] endKey)
```
    Test if the given region is to be included in the InputSplit while splitting the regions of a table.
    This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, (and hence, not contributing to the InputSplit), given the start and end keys of the same.
    Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.
    
    Note: It is possible that endKey.length() == 0 , for the last (recent) region.
    Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).
    
    Parameters:
    
    startKey - Start key of the region
    
    endKey - End key of the region
    
    Returns:
    
    true, if this region needs to be included as part of the input (default).
  - getInputSplitAssigner
```
public InputSplitAssigner getInputSplitAssigner(TableInputSplit[] inputSplits)
```
    Description copied from interface: InputFormat
    
    Gets the type of the input splits that are processed by this input format.
    
    Returns:
    
    The type of the input splits.
  - getStatistics
```
public BaseStatistics getStatistics(BaseStatistics cachedStatistics)
```
    Description copied from interface: InputFormat
    
    Gets the basic statistics from the input described by this format. If the input format does not know how to create those statistics, it may return null. This method optionally gets a cached version of the statistics. The input format may examine them and decide whether it directly returns them without spending effort to re-gather the statistics.
    When this method is called, the input format it guaranteed to be configured.
    
    Parameters:
    
    cachedStatistics - The statistics that were cached. May be null.
    
    Returns:
    
    The base statistics for the input, or null, if not available.

Back to Flink Website

Class TableInputFormat<T extends Tuple>

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.flink.api.common.io.RichInputFormat

Methods inherited from class java.lang.Object

Field Detail

table

scan

Constructor Detail

TableInputFormat

Method Detail

getScanner

getTableName

mapResultToTuple

configure

open

reachedEnd

nextRecord

close

closeInputFormat

createInputSplits

includeRegionInSplit

getInputSplitAssigner

getStatistics

Back to Flink Website