Interface ShuffleMaster<T extends ShuffleDescriptor>

    • Method Detail

      • start

        default void start()
                    throws Exception
        Starts this shuffle master as a service. One can do some initialization here, for example getting access and connecting to the external system.
        Throws:
        Exception
      • close

        default void close()
                    throws Exception
        Closes this shuffle master service which should release all resources. A shuffle master will only be closed when the cluster is shut down.
        Specified by:
        close in interface AutoCloseable
        Throws:
        Exception
      • registerJob

        default void registerJob​(JobShuffleContext context)
        Registers the target job together with the corresponding JobShuffleContext to this shuffle master. Through the shuffle context, one can obtain some basic information like job ID, job configuration. It enables ShuffleMaster to notify JobMaster about lost result partitions, so that JobMaster can identify and reproduce unavailable partitions earlier.
        Parameters:
        context - the corresponding shuffle context of the target job.
      • unregisterJob

        default void unregisterJob​(JobID jobID)
        Unregisters the target job from this shuffle master, which means the corresponding job has reached a global termination state and all the allocated resources except for the cluster partitions can be cleared.
        Parameters:
        jobID - ID of the target job to be unregistered.
      • registerPartitionWithProducer

        CompletableFuture<T> registerPartitionWithProducer​(JobID jobID,
                                                           PartitionDescriptor partitionDescriptor,
                                                           ProducerDescriptor producerDescriptor)
        Asynchronously register a partition and its producer with the shuffle service.

        The returned shuffle descriptor is an internal handle which identifies the partition internally within the shuffle service. The descriptor should provide enough information to read from or write data to the partition.

        Parameters:
        jobID - job ID of the corresponding job which registered the partition
        partitionDescriptor - general job graph information about the partition
        producerDescriptor - general producer information (location, execution id, connection info)
        Returns:
        future with the partition shuffle descriptor used for producer/consumer deployment and their data exchange.
      • releasePartitionExternally

        void releasePartitionExternally​(ShuffleDescriptor shuffleDescriptor)
        Release any external resources occupied by the given partition.

        This call triggers release of any resources which are occupied by the given partition in the external systems outside of the producer executor. This is mostly relevant for the batch jobs and blocking result partitions. The producer local resources are managed by ShuffleDescriptor.storesLocalResourcesOn() and ShuffleEnvironment.releasePartitionsLocally(Collection).

        Parameters:
        shuffleDescriptor - shuffle descriptor of the result partition to release externally.
      • getPartitionWithMetrics

        default CompletableFuture<Collection<PartitionWithMetrics>> getPartitionWithMetrics​(JobID jobId,
                                                                                            Duration timeout,
                                                                                            Set<ResultPartitionID> expectedPartitions)
        Retrieves specified partitions and their metrics (identified by expectedPartitions), the metrics include sizes of sub-partitions in a result partition.
        Parameters:
        jobId - ID of the target job
        timeout - The timeout used for retrieve the specified partitions.
        expectedPartitions - The set of identifiers for the result partitions whose metrics are to be fetched.
        Returns:
        A future will contain a collection of the partitions with their metrics that could be retrieved from the expected partitions within the specified timeout period.
      • restoreState

        default void restoreState​(List<ShuffleMasterSnapshot> snapshots)
        Restores the state of the shuffle master from the provided snapshots.
      • notifyPartitionRecoveryStarted

        default void notifyPartitionRecoveryStarted​(JobID jobId)
        Notifies that the recovery process of result partitions has started.
        Parameters:
        jobId - ID of the target job