Class HadoopFileSystem
- java.lang.Object
-
- org.apache.flink.core.fs.FileSystem
-
- org.apache.flink.runtime.fs.hdfs.HadoopFileSystem
-
- All Implemented Interfaces:
IFileSystem
- Direct Known Subclasses:
FlinkOSSFileSystem
,FlinkS3FileSystem
public class HadoopFileSystem extends FileSystem
AFileSystem
that wraps anHadoop File System
.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.core.fs.FileSystem
FileSystem.FSKey, FileSystem.WriteMode
-
-
Constructor Summary
Constructors Constructor Description HadoopFileSystem(org.apache.hadoop.fs.FileSystem hadoopFileSystem)
Wraps the given Hadoop File System object as a Flink File System object.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description HadoopDataOutputStream
create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize)
Opens an FSDataOutputStream at the indicated Path.HadoopDataOutputStream
create(Path f, FileSystem.WriteMode overwrite)
Opens an FSDataOutputStream to a new file at the given path.RecoverableWriter
createRecoverableWriter()
Creates a newRecoverableWriter
.RecoverableWriter
createRecoverableWriter(Map<String,String> conf)
Creates a newRecoverableWriter
.boolean
delete(Path f, boolean recursive)
Delete a file.boolean
exists(Path f)
Check if exists.long
getDefaultBlockSize()
Return the number of bytes that large input files should be optimally be split into to minimize I/O time.BlockLocation[]
getFileBlockLocations(FileStatus file, long start, long len)
Return an array containing hostnames, offset and size of portions of the given file.FileStatus
getFileStatus(Path f)
Return a file status object that represents the path.org.apache.hadoop.fs.FileSystem
getHadoopFileSystem()
Gets the underlying Hadoop FileSystem.Path
getHomeDirectory()
Returns the path of the user's home directory in this file system.URI
getUri()
Returns a URI whose scheme and authority identify this file system.Path
getWorkingDirectory()
Returns the path of the file system's current working directory.boolean
isDistributedFS()
Returns true if this is a distributed file system.FileStatus[]
listStatus(Path f)
List the statuses of the files/directories in the given path if the path is a directory.boolean
mkdirs(Path f)
Make the given file and all non-existent parents into directories.HadoopDataInputStream
open(Path f)
Opens an FSDataInputStream at the indicated Path.HadoopDataInputStream
open(Path f, int bufferSize)
Opens an FSDataInputStream at the indicated Path.boolean
rename(Path src, Path dst)
Renames the file/directory src to dst.static org.apache.hadoop.fs.Path
toHadoopPath(Path path)
-
Methods inherited from class org.apache.flink.core.fs.FileSystem
create, get, getDefaultFsUri, getLocalFileSystem, getUnguardedFileSystem, initialize, initialize, initOutPathDistFS, initOutPathLocalFS
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.core.fs.IFileSystem
canCopyPaths
-
-
-
-
Constructor Detail
-
HadoopFileSystem
public HadoopFileSystem(org.apache.hadoop.fs.FileSystem hadoopFileSystem)
Wraps the given Hadoop File System object as a Flink File System object. The given Hadoop file system object is expected to be initialized already.- Parameters:
hadoopFileSystem
- The Hadoop FileSystem that will be used under the hood.
-
-
Method Detail
-
getHadoopFileSystem
public org.apache.hadoop.fs.FileSystem getHadoopFileSystem()
Gets the underlying Hadoop FileSystem.- Returns:
- The underlying Hadoop FileSystem.
-
getWorkingDirectory
public Path getWorkingDirectory()
Description copied from interface:IFileSystem
Returns the path of the file system's current working directory.- Specified by:
getWorkingDirectory
in interfaceIFileSystem
- Specified by:
getWorkingDirectory
in classFileSystem
- Returns:
- the path of the file system's current working directory
-
getHomeDirectory
public Path getHomeDirectory()
Description copied from interface:IFileSystem
Returns the path of the user's home directory in this file system.- Specified by:
getHomeDirectory
in interfaceIFileSystem
- Specified by:
getHomeDirectory
in classFileSystem
- Returns:
- the path of the user's home directory in this file system.
-
getUri
public URI getUri()
Description copied from interface:IFileSystem
Returns a URI whose scheme and authority identify this file system.- Specified by:
getUri
in interfaceIFileSystem
- Specified by:
getUri
in classFileSystem
- Returns:
- a URI whose scheme and authority identify this file system
-
getFileStatus
public FileStatus getFileStatus(Path f) throws IOException
Description copied from interface:IFileSystem
Return a file status object that represents the path.- Specified by:
getFileStatus
in interfaceIFileSystem
- Specified by:
getFileStatus
in classFileSystem
- Parameters:
f
- The path we want information from- Returns:
- a FileStatus object
- Throws:
FileNotFoundException
- when the path does not exist; IOException see specific implementationIOException
-
getFileBlockLocations
public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len) throws IOException
Description copied from interface:IFileSystem
Return an array containing hostnames, offset and size of portions of the given file. For a nonexistent file or regions, null will be returned. This call is most helpful with DFS, where it returns hostnames of machines that contain the given file. The FileSystem will simply return an elt containing 'localhost'.- Specified by:
getFileBlockLocations
in interfaceIFileSystem
- Specified by:
getFileBlockLocations
in classFileSystem
- Throws:
IOException
-
open
public HadoopDataInputStream open(Path f, int bufferSize) throws IOException
Description copied from interface:IFileSystem
Opens an FSDataInputStream at the indicated Path.- Specified by:
open
in interfaceIFileSystem
- Specified by:
open
in classFileSystem
- Parameters:
f
- the file name to openbufferSize
- the size of the buffer to be used.- Throws:
IOException
-
open
public HadoopDataInputStream open(Path f) throws IOException
Description copied from interface:IFileSystem
Opens an FSDataInputStream at the indicated Path.- Specified by:
open
in interfaceIFileSystem
- Specified by:
open
in classFileSystem
- Parameters:
f
- the file to open- Throws:
IOException
-
create
public HadoopDataOutputStream create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize) throws IOException
Description copied from class:FileSystem
Opens an FSDataOutputStream at the indicated Path.This method is deprecated, because most of its parameters are ignored by most file systems. To control for example the replication factor and block size in the Hadoop Distributed File system, make sure that the respective Hadoop configuration file is either linked from the Flink configuration, or in the classpath of either Flink or the user code.
- Overrides:
create
in classFileSystem
- Parameters:
f
- the file name to openoverwrite
- if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.bufferSize
- the size of the buffer to be used.replication
- required block replication for the file.blockSize
- the size of the file blocks- Throws:
IOException
- Thrown, if the stream could not be opened because of an I/O, or because a file already exists at that path and the write mode indicates to not overwrite the file.
-
create
public HadoopDataOutputStream create(Path f, FileSystem.WriteMode overwrite) throws IOException
Description copied from interface:IFileSystem
Opens an FSDataOutputStream to a new file at the given path.If the file already exists, the behavior depends on the given
WriteMode
. If the mode is set toFileSystem.WriteMode.NO_OVERWRITE
, then this method fails with an exception.- Specified by:
create
in interfaceIFileSystem
- Specified by:
create
in classFileSystem
- Parameters:
f
- The file path to write tooverwrite
- The action to take if a file or directory already exists at the given path.- Returns:
- The stream to the new file at the target path.
- Throws:
IOException
- Thrown, if the stream could not be opened because of an I/O, or because a file already exists at that path and the write mode indicates to not overwrite the file.
-
delete
public boolean delete(Path f, boolean recursive) throws IOException
Description copied from interface:IFileSystem
Delete a file.- Specified by:
delete
in interfaceIFileSystem
- Specified by:
delete
in classFileSystem
- Parameters:
f
- the path to deleterecursive
- if path is a directory and set totrue
, the directory is deleted else throws an exception. In case of a file the recursive can be set to eithertrue
orfalse
- Returns:
true
if delete is successful,false
otherwise- Throws:
IOException
-
exists
public boolean exists(Path f) throws IOException
Description copied from interface:IFileSystem
Check if exists.- Specified by:
exists
in interfaceIFileSystem
- Overrides:
exists
in classFileSystem
- Parameters:
f
- source file- Throws:
IOException
-
listStatus
public FileStatus[] listStatus(Path f) throws IOException
Description copied from interface:IFileSystem
List the statuses of the files/directories in the given path if the path is a directory.- Specified by:
listStatus
in interfaceIFileSystem
- Specified by:
listStatus
in classFileSystem
- Parameters:
f
- given path- Returns:
- the statuses of the files/directories in the given path
- Throws:
IOException
-
mkdirs
public boolean mkdirs(Path f) throws IOException
Description copied from interface:IFileSystem
Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.- Specified by:
mkdirs
in interfaceIFileSystem
- Specified by:
mkdirs
in classFileSystem
- Parameters:
f
- the directory/directories to be created- Returns:
true
if at least one new directory has been created,false
otherwise- Throws:
IOException
- thrown if an I/O error occurs while creating the directory
-
rename
public boolean rename(Path src, Path dst) throws IOException
Description copied from interface:IFileSystem
Renames the file/directory src to dst.- Specified by:
rename
in interfaceIFileSystem
- Specified by:
rename
in classFileSystem
- Parameters:
src
- the file/directory to renamedst
- the new name of the file/directory- Returns:
true
if the renaming was successful,false
otherwise- Throws:
IOException
-
getDefaultBlockSize
public long getDefaultBlockSize()
Description copied from class:FileSystem
Return the number of bytes that large input files should be optimally be split into to minimize I/O time.- Overrides:
getDefaultBlockSize
in classFileSystem
- Returns:
- the number of bytes that large input files should be optimally be split into to minimize I/O time
-
isDistributedFS
public boolean isDistributedFS()
Description copied from interface:IFileSystem
Returns true if this is a distributed file system. A distributed file system here means that the file system is shared among all Flink processes that participate in a cluster or job and that all these processes can see the same files.- Specified by:
isDistributedFS
in interfaceIFileSystem
- Specified by:
isDistributedFS
in classFileSystem
- Returns:
- True, if this is a distributed file system, false otherwise.
-
createRecoverableWriter
public RecoverableWriter createRecoverableWriter() throws IOException
Description copied from interface:IFileSystem
Creates a newRecoverableWriter
. A recoverable writer creates streams that can persist and recover their intermediate state. Persisting and recovering intermediate state is a core building block for writing to files that span multiple checkpoints.The returned object can act as a shared factory to open and recover multiple streams.
This method is optional on file systems and various file system implementations may not support this method, throwing an
UnsupportedOperationException
.- Specified by:
createRecoverableWriter
in interfaceIFileSystem
- Overrides:
createRecoverableWriter
in classFileSystem
- Returns:
- A RecoverableWriter for this file system.
- Throws:
IOException
- Thrown, if the recoverable writer cannot be instantiated.
-
createRecoverableWriter
public RecoverableWriter createRecoverableWriter(Map<String,String> conf) throws IOException
Description copied from interface:IFileSystem
Creates a newRecoverableWriter
. A recoverable writer creates streams that can persist and recover their intermediate state. Persisting and recovering intermediate state is a core building block for writing to files that span multiple checkpoints.The returned object can act as a shared factory to open and recover multiple streams.
This method is optional on file systems and various file system implementations may not support this method, throwing an
UnsupportedOperationException
.- Specified by:
createRecoverableWriter
in interfaceIFileSystem
- Overrides:
createRecoverableWriter
in classFileSystem
- Parameters:
conf
- Map contains a flag to indicate whether the writer should not write to local storage. and can provide more information to instantiate the writer.- Returns:
- A RecoverableWriter for this file system.
- Throws:
IOException
- Thrown, if the recoverable writer cannot be instantiated.
-
toHadoopPath
public static org.apache.hadoop.fs.Path toHadoopPath(Path path)
-
-