T
- partition shuffle descriptor used for producer/consumer deployment and their data
exchange.public interface ShuffleMaster<T extends ShuffleDescriptor> extends AutoCloseable
JobMaster
.Modifier and Type | Method and Description |
---|---|
default void |
close()
Closes this shuffle master service which should release all resources.
|
default MemorySize |
computeShuffleMemorySizeForTask(TaskInputsOutputsDescriptor taskInputsOutputsDescriptor)
Compute shuffle memory size for a task with the given
TaskInputsOutputsDescriptor . |
default CompletableFuture<Collection<PartitionWithMetrics>> |
getPartitionWithMetrics(JobID jobId,
Duration timeout,
Set<ResultPartitionID> expectedPartitions)
Retrieves specified partitions and their metrics (identified by
expectedPartitions ),
the metrics include sizes of sub-partitions in a result partition. |
default void |
notifyPartitionRecoveryStarted(JobID jobId)
Notifies that the recovery process of result partitions has started.
|
default void |
registerJob(JobShuffleContext context)
Registers the target job together with the corresponding
JobShuffleContext to this
shuffle master. |
CompletableFuture<T> |
registerPartitionWithProducer(JobID jobID,
PartitionDescriptor partitionDescriptor,
ProducerDescriptor producerDescriptor)
Asynchronously register a partition and its producer with the shuffle service.
|
void |
releasePartitionExternally(ShuffleDescriptor shuffleDescriptor)
Release any external resources occupied by the given partition.
|
default void |
restoreState(List<ShuffleMasterSnapshot> snapshots)
Restores the state of the shuffle master from the provided snapshots.
|
default void |
snapshotState(CompletableFuture<ShuffleMasterSnapshot> snapshotFuture,
ShuffleMasterSnapshotContext context)
Triggers a snapshot of the shuffle master's state.
|
default void |
start()
Starts this shuffle master as a service.
|
default boolean |
supportsBatchSnapshot()
Whether the shuffle master supports taking snapshot in batch scenarios if
BatchExecutionOptions.JOB_RECOVERY_ENABLED is true. |
default void |
unregisterJob(JobID jobID)
Unregisters the target job from this shuffle master, which means the corresponding job has
reached a global termination state and all the allocated resources except for the cluster
partitions can be cleared.
|
default void start() throws Exception
Exception
default void close() throws Exception
close
in interface AutoCloseable
Exception
default void registerJob(JobShuffleContext context)
JobShuffleContext
to this
shuffle master. Through the shuffle context, one can obtain some basic information like job
ID, job configuration. It enables ShuffleMaster to notify JobMaster about lost result
partitions, so that JobMaster can identify and reproduce unavailable partitions earlier.context
- the corresponding shuffle context of the target job.default void unregisterJob(JobID jobID)
jobID
- ID of the target job to be unregistered.CompletableFuture<T> registerPartitionWithProducer(JobID jobID, PartitionDescriptor partitionDescriptor, ProducerDescriptor producerDescriptor)
The returned shuffle descriptor is an internal handle which identifies the partition internally within the shuffle service. The descriptor should provide enough information to read from or write data to the partition.
jobID
- job ID of the corresponding job which registered the partitionpartitionDescriptor
- general job graph information about the partitionproducerDescriptor
- general producer information (location, execution id, connection
info)void releasePartitionExternally(ShuffleDescriptor shuffleDescriptor)
This call triggers release of any resources which are occupied by the given partition in
the external systems outside of the producer executor. This is mostly relevant for the batch
jobs and blocking result partitions. The producer local resources are managed by ShuffleDescriptor.storesLocalResourcesOn()
and ShuffleEnvironment.releasePartitionsLocally(Collection)
.
shuffleDescriptor
- shuffle descriptor of the result partition to release externally.default MemorySize computeShuffleMemorySizeForTask(TaskInputsOutputsDescriptor taskInputsOutputsDescriptor)
TaskInputsOutputsDescriptor
.taskInputsOutputsDescriptor
- describes task inputs and outputs information for shuffle
memory calculation.TaskInputsOutputsDescriptor
.default CompletableFuture<Collection<PartitionWithMetrics>> getPartitionWithMetrics(JobID jobId, Duration timeout, Set<ResultPartitionID> expectedPartitions)
expectedPartitions
),
the metrics include sizes of sub-partitions in a result partition.jobId
- ID of the target jobtimeout
- The timeout used for retrieve the specified partitions.expectedPartitions
- The set of identifiers for the result partitions whose metrics are
to be fetched.default boolean supportsBatchSnapshot()
BatchExecutionOptions.JOB_RECOVERY_ENABLED
is true. If it
returns true, Flink will call snapshotState(java.util.concurrent.CompletableFuture<org.apache.flink.runtime.shuffle.ShuffleMasterSnapshot>, org.apache.flink.runtime.shuffle.ShuffleMasterSnapshotContext)
to take snapshot, and call restoreState(java.util.List<org.apache.flink.runtime.shuffle.ShuffleMasterSnapshot>)
to restore the state of shuffle master.default void snapshotState(CompletableFuture<ShuffleMasterSnapshot> snapshotFuture, ShuffleMasterSnapshotContext context)
default void restoreState(List<ShuffleMasterSnapshot> snapshots)
default void notifyPartitionRecoveryStarted(JobID jobId)
jobId
- ID of the target jobCopyright © 2014–2024 The Apache Software Foundation. All rights reserved.