public final class HadoopDataInputStream extends FSDataInputStream implements ByteBufferReadable
FSDataInputStream
for Hadoop's input streams. This
supports all file systems supported by Hadoop, such as HDFS and S3 (S3a/S3n).Modifier and Type | Field and Description |
---|---|
static int |
MIN_SKIP_BYTES
Minimum amount of bytes to skip forward before we issue a seek instead of discarding read.
|
Constructor and Description |
---|
HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream)
Creates a new data input stream from the given Hadoop input stream.
|
Modifier and Type | Method and Description |
---|---|
int |
available() |
void |
close() |
void |
forceSeek(long seekPos)
Positions the stream to the given location.
|
org.apache.hadoop.fs.FSDataInputStream |
getHadoopInputStream()
Gets the wrapped Hadoop input stream.
|
long |
getPos()
Gets the current position in the input stream.
|
int |
read() |
int |
read(byte[] buffer,
int offset,
int length) |
int |
read(ByteBuffer byteBuffer)
Reads up to byteBuffer.remaining() bytes into byteBuffer.
|
int |
read(long position,
ByteBuffer byteBuffer)
Reads up to
byteBuffer.remaining() bytes into byteBuffer from a given position in the
file and returns the number of bytes read. |
void |
seek(long seekPos)
Seek to the given offset from the start of the file.
|
long |
skip(long n) |
void |
skipFully(long bytes)
Skips over a given amount of bytes in the stream.
|
mark, markSupported, read, reset
public static final int MIN_SKIP_BYTES
The current value is just a magic number. In the long run, this value could become configurable, but for now it is a conservative, relatively small value that should bring safe improvements for small skips (e.g. in reading meta data), that would hurt the most with frequent seeks.
The optimal value depends on the DFS implementation and configuration plus the underlying filesystem. For now, this number is chosen "big enough" to provide improvements for smaller seeks, and "small enough" to avoid disadvantages over real seeks. While the minimum should be the page size, a true optimum per system would be the amounts of bytes the can be consumed sequentially within the seektime. Unfortunately, seektime is not constant and devices, OS, and DFS potentially also use read buffers and read-ahead.
public HadoopDataInputStream(org.apache.hadoop.fs.FSDataInputStream fsDataInputStream)
fsDataInputStream
- The Hadoop input streampublic void seek(long seekPos) throws IOException
FSDataInputStream
seek
in class FSDataInputStream
seekPos
- the desired offsetIOException
- Thrown if an error occurred while seeking inside the input stream.public long getPos() throws IOException
FSDataInputStream
getPos
in class FSDataInputStream
IOException
- Thrown if an I/O error occurred in the underlying stream implementation
while accessing the stream's position.public int read() throws IOException
read
in class InputStream
IOException
public void close() throws IOException
close
in interface Closeable
close
in interface AutoCloseable
close
in class InputStream
IOException
public int read(@Nonnull byte[] buffer, int offset, int length) throws IOException
read
in class InputStream
IOException
public int available() throws IOException
available
in class InputStream
IOException
public long skip(long n) throws IOException
skip
in class InputStream
IOException
public org.apache.hadoop.fs.FSDataInputStream getHadoopInputStream()
public void forceSeek(long seekPos) throws IOException
seek(long)
, this method
will always issue a "seek" command to the dfs and may not replace it by skip(long)
for small seeks.
Notice that the underlying DFS implementation can still decide to do skip instead of seek.
seekPos
- the position to seek to.IOException
public void skipFully(long bytes) throws IOException
bytes
- the number of bytes to skip.IOException
public int read(ByteBuffer byteBuffer) throws IOException
ByteBufferReadable
After a successful call, byteBuffer.position() will be advanced by the number of bytes read and byteBuffer.limit() should be unchanged.
In the case of an exception, the values of byteBuffer.position() and byteBuffer.limit() are undefined, and callers should be prepared to recover from this eventuality. Implementations should treat 0-length requests as legitimate, and must not signal an error upon their receipt.
read
in interface ByteBufferReadable
byteBuffer
- the ByteBuffer to receive the results of the read operation.IOException
- if there is some error performing the readpublic int read(long position, ByteBuffer byteBuffer) throws IOException
ByteBufferReadable
byteBuffer.remaining()
bytes into byteBuffer from a given position in the
file and returns the number of bytes read. Callers should use byteBuffer.limit(...)
to control the size of the desired read and byteBuffer.position(...)
to control the
offset into the buffer the data should be written to.
After a successful call, byteBuffer.position()
will be advanced by the number of
bytes read and byteBuffer.limit()
will be unchanged.
In the case of an exception, the state of the buffer (the contents of the buffer, the
buf.position()
, the buf.limit()
, etc.) is undefined, and callers should be
prepared to recover from this eventuality.
Implementations should treat 0-length requests as legitimate, and must not signal an error upon their receipt.
read
in interface ByteBufferReadable
position
- position within filebyteBuffer
- the ByteBuffer to receive the results of the read operation.IOException
- if there is some error performing the readCopyright © 2014–2024 The Apache Software Foundation. All rights reserved.