Class HadoopFileSystem

    • Constructor Detail

      • HadoopFileSystem

        public HadoopFileSystem​(org.apache.hadoop.fs.FileSystem hadoopFileSystem)
        Wraps the given Hadoop File System object as a Flink File System object. The given Hadoop file system object is expected to be initialized already.
        Parameters:
        hadoopFileSystem - The Hadoop FileSystem that will be used under the hood.
    • Method Detail

      • getHadoopFileSystem

        public org.apache.hadoop.fs.FileSystem getHadoopFileSystem()
        Gets the underlying Hadoop FileSystem.
        Returns:
        The underlying Hadoop FileSystem.
      • getHomeDirectory

        public Path getHomeDirectory()
        Description copied from interface: IFileSystem
        Returns the path of the user's home directory in this file system.
        Specified by:
        getHomeDirectory in interface IFileSystem
        Specified by:
        getHomeDirectory in class FileSystem
        Returns:
        the path of the user's home directory in this file system.
      • getUri

        public URI getUri()
        Description copied from interface: IFileSystem
        Returns a URI whose scheme and authority identify this file system.
        Specified by:
        getUri in interface IFileSystem
        Specified by:
        getUri in class FileSystem
        Returns:
        a URI whose scheme and authority identify this file system
      • getFileBlockLocations

        public BlockLocation[] getFileBlockLocations​(FileStatus file,
                                                     long start,
                                                     long len)
                                              throws IOException
        Description copied from interface: IFileSystem
        Return an array containing hostnames, offset and size of portions of the given file. For a nonexistent file or regions, null will be returned. This call is most helpful with DFS, where it returns hostnames of machines that contain the given file. The FileSystem will simply return an elt containing 'localhost'.
        Specified by:
        getFileBlockLocations in interface IFileSystem
        Specified by:
        getFileBlockLocations in class FileSystem
        Throws:
        IOException
      • create

        public HadoopDataOutputStream create​(Path f,
                                             boolean overwrite,
                                             int bufferSize,
                                             short replication,
                                             long blockSize)
                                      throws IOException
        Description copied from class: FileSystem
        Opens an FSDataOutputStream at the indicated Path.

        This method is deprecated, because most of its parameters are ignored by most file systems. To control for example the replication factor and block size in the Hadoop Distributed File system, make sure that the respective Hadoop configuration file is either linked from the Flink configuration, or in the classpath of either Flink or the user code.

        Overrides:
        create in class FileSystem
        Parameters:
        f - the file name to open
        overwrite - if a file with this name already exists, then if true, the file will be overwritten, and if false an error will be thrown.
        bufferSize - the size of the buffer to be used.
        replication - required block replication for the file.
        blockSize - the size of the file blocks
        Throws:
        IOException - Thrown, if the stream could not be opened because of an I/O, or because a file already exists at that path and the write mode indicates to not overwrite the file.
      • create

        public HadoopDataOutputStream create​(Path f,
                                             FileSystem.WriteMode overwrite)
                                      throws IOException
        Description copied from interface: IFileSystem
        Opens an FSDataOutputStream to a new file at the given path.

        If the file already exists, the behavior depends on the given WriteMode. If the mode is set to FileSystem.WriteMode.NO_OVERWRITE, then this method fails with an exception.

        Specified by:
        create in interface IFileSystem
        Specified by:
        create in class FileSystem
        Parameters:
        f - The file path to write to
        overwrite - The action to take if a file or directory already exists at the given path.
        Returns:
        The stream to the new file at the target path.
        Throws:
        IOException - Thrown, if the stream could not be opened because of an I/O, or because a file already exists at that path and the write mode indicates to not overwrite the file.
      • delete

        public boolean delete​(Path f,
                              boolean recursive)
                       throws IOException
        Description copied from interface: IFileSystem
        Delete a file.
        Specified by:
        delete in interface IFileSystem
        Specified by:
        delete in class FileSystem
        Parameters:
        f - the path to delete
        recursive - if path is a directory and set to true, the directory is deleted else throws an exception. In case of a file the recursive can be set to either true or false
        Returns:
        true if delete is successful, false otherwise
        Throws:
        IOException
      • mkdirs

        public boolean mkdirs​(Path f)
                       throws IOException
        Description copied from interface: IFileSystem
        Make the given file and all non-existent parents into directories. Has the semantics of Unix 'mkdir -p'. Existence of the directory hierarchy is not an error.
        Specified by:
        mkdirs in interface IFileSystem
        Specified by:
        mkdirs in class FileSystem
        Parameters:
        f - the directory/directories to be created
        Returns:
        true if at least one new directory has been created, false otherwise
        Throws:
        IOException - thrown if an I/O error occurs while creating the directory
      • rename

        public boolean rename​(Path src,
                              Path dst)
                       throws IOException
        Description copied from interface: IFileSystem
        Renames the file/directory src to dst.
        Specified by:
        rename in interface IFileSystem
        Specified by:
        rename in class FileSystem
        Parameters:
        src - the file/directory to rename
        dst - the new name of the file/directory
        Returns:
        true if the renaming was successful, false otherwise
        Throws:
        IOException
      • getDefaultBlockSize

        public long getDefaultBlockSize()
        Description copied from class: FileSystem
        Return the number of bytes that large input files should be optimally be split into to minimize I/O time.
        Overrides:
        getDefaultBlockSize in class FileSystem
        Returns:
        the number of bytes that large input files should be optimally be split into to minimize I/O time
      • isDistributedFS

        public boolean isDistributedFS()
        Description copied from interface: IFileSystem
        Returns true if this is a distributed file system. A distributed file system here means that the file system is shared among all Flink processes that participate in a cluster or job and that all these processes can see the same files.
        Specified by:
        isDistributedFS in interface IFileSystem
        Specified by:
        isDistributedFS in class FileSystem
        Returns:
        True, if this is a distributed file system, false otherwise.
      • createRecoverableWriter

        public RecoverableWriter createRecoverableWriter()
                                                  throws IOException
        Description copied from interface: IFileSystem
        Creates a new RecoverableWriter. A recoverable writer creates streams that can persist and recover their intermediate state. Persisting and recovering intermediate state is a core building block for writing to files that span multiple checkpoints.

        The returned object can act as a shared factory to open and recover multiple streams.

        This method is optional on file systems and various file system implementations may not support this method, throwing an UnsupportedOperationException.

        Specified by:
        createRecoverableWriter in interface IFileSystem
        Overrides:
        createRecoverableWriter in class FileSystem
        Returns:
        A RecoverableWriter for this file system.
        Throws:
        IOException - Thrown, if the recoverable writer cannot be instantiated.
      • createRecoverableWriter

        public RecoverableWriter createRecoverableWriter​(Map<String,​String> conf)
                                                  throws IOException
        Description copied from interface: IFileSystem
        Creates a new RecoverableWriter. A recoverable writer creates streams that can persist and recover their intermediate state. Persisting and recovering intermediate state is a core building block for writing to files that span multiple checkpoints.

        The returned object can act as a shared factory to open and recover multiple streams.

        This method is optional on file systems and various file system implementations may not support this method, throwing an UnsupportedOperationException.

        Specified by:
        createRecoverableWriter in interface IFileSystem
        Overrides:
        createRecoverableWriter in class FileSystem
        Parameters:
        conf - Map contains a flag to indicate whether the writer should not write to local storage. and can provide more information to instantiate the writer.
        Returns:
        A RecoverableWriter for this file system.
        Throws:
        IOException - Thrown, if the recoverable writer cannot be instantiated.
      • toHadoopPath

        public static org.apache.hadoop.fs.Path toHadoopPath​(Path path)