Package org.apache.flink.runtime.shuffle
Class NettyShuffleMaster
- java.lang.Object
-
- org.apache.flink.runtime.shuffle.NettyShuffleMaster
-
- All Implemented Interfaces:
AutoCloseable
,ShuffleMaster<NettyShuffleDescriptor>
public class NettyShuffleMaster extends Object implements ShuffleMaster<NettyShuffleDescriptor>
DefaultShuffleMaster
for netty and local file based shuffle implementation.
-
-
Constructor Summary
Constructors Constructor Description NettyShuffleMaster(ShuffleMasterContext shuffleMasterContext)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Closes this shuffle master service which should release all resources.MemorySize
computeShuffleMemorySizeForTask(TaskInputsOutputsDescriptor desc)
JM announces network memory requirement from the calculating result of this method.CompletableFuture<Collection<PartitionWithMetrics>>
getPartitionWithMetrics(JobID jobId, Duration timeout, Set<ResultPartitionID> expectedPartitions)
Retrieves specified partitions and their metrics (identified byexpectedPartitions
), the metrics include sizes of sub-partitions in a result partition.void
notifyPartitionRecoveryStarted(JobID jobId)
Notifies that the recovery process of result partitions has started.void
registerJob(JobShuffleContext context)
Registers the target job together with the correspondingJobShuffleContext
to this shuffle master.CompletableFuture<NettyShuffleDescriptor>
registerPartitionWithProducer(JobID jobID, PartitionDescriptor partitionDescriptor, ProducerDescriptor producerDescriptor)
Asynchronously register a partition and its producer with the shuffle service.void
releasePartitionExternally(ShuffleDescriptor shuffleDescriptor)
Release any external resources occupied by the given partition.void
snapshotState(CompletableFuture<ShuffleMasterSnapshot> snapshotFuture, ShuffleMasterSnapshotContext context)
Triggers a snapshot of the shuffle master's state.boolean
supportsBatchSnapshot()
Whether the shuffle master supports taking snapshot in batch scenarios ifBatchExecutionOptions.JOB_RECOVERY_ENABLED
is true.void
unregisterJob(JobID jobId)
Unregisters the target job from this shuffle master, which means the corresponding job has reached a global termination state and all the allocated resources except for the cluster partitions can be cleared.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.runtime.shuffle.ShuffleMaster
restoreState, start
-
-
-
-
Constructor Detail
-
NettyShuffleMaster
public NettyShuffleMaster(ShuffleMasterContext shuffleMasterContext)
-
-
Method Detail
-
registerPartitionWithProducer
public CompletableFuture<NettyShuffleDescriptor> registerPartitionWithProducer(JobID jobID, PartitionDescriptor partitionDescriptor, ProducerDescriptor producerDescriptor)
Description copied from interface:ShuffleMaster
Asynchronously register a partition and its producer with the shuffle service.The returned shuffle descriptor is an internal handle which identifies the partition internally within the shuffle service. The descriptor should provide enough information to read from or write data to the partition.
- Specified by:
registerPartitionWithProducer
in interfaceShuffleMaster<NettyShuffleDescriptor>
- Parameters:
jobID
- job ID of the corresponding job which registered the partitionpartitionDescriptor
- general job graph information about the partitionproducerDescriptor
- general producer information (location, execution id, connection info)- Returns:
- future with the partition shuffle descriptor used for producer/consumer deployment and their data exchange.
-
releasePartitionExternally
public void releasePartitionExternally(ShuffleDescriptor shuffleDescriptor)
Description copied from interface:ShuffleMaster
Release any external resources occupied by the given partition.This call triggers release of any resources which are occupied by the given partition in the external systems outside of the producer executor. This is mostly relevant for the batch jobs and blocking result partitions. The producer local resources are managed by
ShuffleDescriptor.storesLocalResourcesOn()
andShuffleEnvironment.releasePartitionsLocally(Collection)
.- Specified by:
releasePartitionExternally
in interfaceShuffleMaster<NettyShuffleDescriptor>
- Parameters:
shuffleDescriptor
- shuffle descriptor of the result partition to release externally.
-
computeShuffleMemorySizeForTask
public MemorySize computeShuffleMemorySizeForTask(TaskInputsOutputsDescriptor desc)
JM announces network memory requirement from the calculating result of this method. Please note that the calculating algorithm depends on both I/O details of a vertex and network configuration, which means we should always keep the consistency of configurations between JM, RM and TM in fine-grained resource management, thus to guarantee that the processes of memory announcing and allocating respect each other.- Specified by:
computeShuffleMemorySizeForTask
in interfaceShuffleMaster<NettyShuffleDescriptor>
- Parameters:
desc
- describes task inputs and outputs information for shuffle memory calculation.- Returns:
- shuffle memory size for a task with the given
TaskInputsOutputsDescriptor
.
-
getPartitionWithMetrics
public CompletableFuture<Collection<PartitionWithMetrics>> getPartitionWithMetrics(JobID jobId, Duration timeout, Set<ResultPartitionID> expectedPartitions)
Description copied from interface:ShuffleMaster
Retrieves specified partitions and their metrics (identified byexpectedPartitions
), the metrics include sizes of sub-partitions in a result partition.- Specified by:
getPartitionWithMetrics
in interfaceShuffleMaster<NettyShuffleDescriptor>
- Parameters:
jobId
- ID of the target jobtimeout
- The timeout used for retrieve the specified partitions.expectedPartitions
- The set of identifiers for the result partitions whose metrics are to be fetched.- Returns:
- A future will contain a collection of the partitions with their metrics that could be retrieved from the expected partitions within the specified timeout period.
-
registerJob
public void registerJob(JobShuffleContext context)
Description copied from interface:ShuffleMaster
Registers the target job together with the correspondingJobShuffleContext
to this shuffle master. Through the shuffle context, one can obtain some basic information like job ID, job configuration. It enables ShuffleMaster to notify JobMaster about lost result partitions, so that JobMaster can identify and reproduce unavailable partitions earlier.- Specified by:
registerJob
in interfaceShuffleMaster<NettyShuffleDescriptor>
- Parameters:
context
- the corresponding shuffle context of the target job.
-
unregisterJob
public void unregisterJob(JobID jobId)
Description copied from interface:ShuffleMaster
Unregisters the target job from this shuffle master, which means the corresponding job has reached a global termination state and all the allocated resources except for the cluster partitions can be cleared.- Specified by:
unregisterJob
in interfaceShuffleMaster<NettyShuffleDescriptor>
- Parameters:
jobId
- ID of the target job to be unregistered.
-
supportsBatchSnapshot
public boolean supportsBatchSnapshot()
Description copied from interface:ShuffleMaster
Whether the shuffle master supports taking snapshot in batch scenarios ifBatchExecutionOptions.JOB_RECOVERY_ENABLED
is true. If it returns true, Flink will callShuffleMaster.snapshotState(java.util.concurrent.CompletableFuture<org.apache.flink.runtime.shuffle.ShuffleMasterSnapshot>, org.apache.flink.runtime.shuffle.ShuffleMasterSnapshotContext)
to take snapshot, and callShuffleMaster.restoreState(java.util.List<org.apache.flink.runtime.shuffle.ShuffleMasterSnapshot>)
to restore the state of shuffle master.- Specified by:
supportsBatchSnapshot
in interfaceShuffleMaster<NettyShuffleDescriptor>
-
snapshotState
public void snapshotState(CompletableFuture<ShuffleMasterSnapshot> snapshotFuture, ShuffleMasterSnapshotContext context)
Description copied from interface:ShuffleMaster
Triggers a snapshot of the shuffle master's state.- Specified by:
snapshotState
in interfaceShuffleMaster<NettyShuffleDescriptor>
-
notifyPartitionRecoveryStarted
public void notifyPartitionRecoveryStarted(JobID jobId)
Description copied from interface:ShuffleMaster
Notifies that the recovery process of result partitions has started.- Specified by:
notifyPartitionRecoveryStarted
in interfaceShuffleMaster<NettyShuffleDescriptor>
- Parameters:
jobId
- ID of the target job
-
close
public void close() throws Exception
Description copied from interface:ShuffleMaster
Closes this shuffle master service which should release all resources. A shuffle master will only be closed when the cluster is shut down.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceShuffleMaster<NettyShuffleDescriptor>
- Throws:
Exception
-
-