public class SlotPool extends RpcEndpoint implements SlotPoolGateway, AllocatedSlotActions
ExecutionGraph
. It will will attempt to acquire new slots
from the ResourceManager when it cannot serve a slot request. If no ResourceManager is currently available,
or it gets a decline from the ResourceManager, or a request times out, it fails the slot request. The slot pool also
holds all the slots that were offered to it and accepted, and can thus provides registered free slots even if the
ResourceManager is down. The slots will only be released when they are useless, e.g. when the job is fully running
but we still have some free slots.
All the allocation or the slot offering will be identified by self generated AllocationID, we will use it to eliminate ambiguities.
TODO : Make pending requests location preference aware TODO : Make pass location preferences to ResourceManager when sending a slot request
Modifier and Type | Class and Description |
---|---|
protected static class |
SlotPool.AvailableSlots
Organize all available slots from different points of view.
|
RpcEndpoint.MainThreadExecutor
Modifier and Type | Field and Description |
---|---|
protected Map<SlotSharingGroupId,SlotSharingManager> |
slotSharingManagers
Managers for the different slot sharing groups.
|
log, rpcServer
Modifier | Constructor and Description |
---|---|
protected |
SlotPool(RpcService rpcService,
JobID jobId,
SchedulingStrategy schedulingStrategy) |
|
SlotPool(RpcService rpcService,
JobID jobId,
SchedulingStrategy schedulingStrategy,
Clock clock,
Time rpcTimeout,
Time idleSlotTimeout) |
Modifier and Type | Method and Description |
---|---|
CompletableFuture<LogicalSlot> |
allocateSlot(SlotRequestId slotRequestId,
ScheduledUnit task,
SlotProfile slotProfile,
boolean allowQueuedScheduling,
Time allocationTimeout)
Requests to allocate a slot for the given
ScheduledUnit . |
void |
connectToResourceManager(ResourceManagerGateway resourceManagerGateway)
Connects the SlotPool to the given ResourceManager.
|
void |
disconnectResourceManager()
Disconnects the slot pool from its current Resource Manager.
|
void |
failAllocation(AllocationID allocationID,
Exception cause)
Fail the specified allocation and release the corresponding slot if we have one.
|
protected org.apache.flink.runtime.jobmaster.slotpool.SlotPool.AllocatedSlots |
getAllocatedSlots() |
protected SlotPool.AvailableSlots |
getAvailableSlots() |
SlotOwner |
getSlotOwner()
Gets the slot owner implementation for this pool.
|
SlotProvider |
getSlotProvider()
Gets the slot provider implementation for this pool.
|
CompletableFuture<Boolean> |
offerSlot(TaskManagerLocation taskManagerLocation,
TaskManagerGateway taskManagerGateway,
SlotOffer slotOffer)
Slot offering by TaskExecutor with AllocationID.
|
CompletableFuture<Collection<SlotOffer>> |
offerSlots(TaskManagerLocation taskManagerLocation,
TaskManagerGateway taskManagerGateway,
Collection<SlotOffer> offers)
Offers multiple slots to the
SlotPool . |
CompletableFuture<Void> |
postStop()
User overridable callback.
|
CompletableFuture<Acknowledge> |
registerTaskManager(ResourceID resourceID)
Register TaskManager to this pool, only those slots come from registered TaskManager will be considered valid.
|
CompletableFuture<Acknowledge> |
releaseSlot(SlotRequestId slotRequestId,
SlotSharingGroupId slotSharingGroupId,
Throwable cause)
Releases the slot with the given
SlotRequestId . |
CompletableFuture<Acknowledge> |
releaseTaskManager(ResourceID resourceId,
Exception cause)
Unregister TaskManager from this pool, all the related slots will be released and tasks be canceled.
|
void |
start()
Starts the rpc endpoint.
|
void |
start(JobMasterId jobMasterId,
String newJobManagerAddress)
Start the slot pool to accept RPC calls.
|
void |
suspend()
Suspends this pool, meaning it has lost its authority to accept and distribute slots.
|
protected void |
timeoutPendingSlotRequest(SlotRequestId slotRequestId) |
callAsync, getAddress, getEndpointId, getHostname, getMainThreadExecutor, getRpcService, getSelfGateway, getTerminationFuture, runAsync, scheduleRunAsync, scheduleRunAsync, shutDown, stop, validateRunsInMainThread
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getAddress, getHostname
protected final Map<SlotSharingGroupId,SlotSharingManager> slotSharingManagers
@VisibleForTesting protected SlotPool(RpcService rpcService, JobID jobId, SchedulingStrategy schedulingStrategy)
public SlotPool(RpcService rpcService, JobID jobId, SchedulingStrategy schedulingStrategy, Clock clock, Time rpcTimeout, Time idleSlotTimeout)
public void start()
RpcEndpoint
IMPORTANT: Whenever you override this method, call the parent implementation to enable rpc processing. It is advised to make the parent call last.
start
in class RpcEndpoint
public void start(JobMasterId jobMasterId, String newJobManagerAddress) throws Exception
jobMasterId
- The necessary leader id for running the job.newJobManagerAddress
- for the slot requests which are sent to the resource managerException
public CompletableFuture<Void> postStop()
RpcEndpoint
This method is called when the RpcEndpoint is being shut down. The method is guaranteed to be executed in the main thread context and can be used to clean up internal state.
IMPORTANT: This method should never be called directly by the user.
postStop
in class RpcEndpoint
public void suspend()
suspend
in interface SlotPoolGateway
public SlotOwner getSlotOwner()
This method does not mutate state and can be called directly (no RPC indirection)
public SlotProvider getSlotProvider()
This method does not mutate state and can be called directly (no RPC indirection)
public void connectToResourceManager(ResourceManagerGateway resourceManagerGateway)
SlotPoolGateway
connectToResourceManager
in interface SlotPoolGateway
resourceManagerGateway
- The RPC gateway for the resource manager.public void disconnectResourceManager()
SlotPoolGateway
The slot pool will still be able to serve slots from its internal pool.
disconnectResourceManager
in interface SlotPoolGateway
public CompletableFuture<LogicalSlot> allocateSlot(SlotRequestId slotRequestId, ScheduledUnit task, SlotProfile slotProfile, boolean allowQueuedScheduling, Time allocationTimeout)
SlotPoolGateway
ScheduledUnit
. The request
is uniquely identified by the provided SlotRequestId
which can also
be used to release the slot via AllocatedSlotActions.releaseSlot(SlotRequestId, SlotSharingGroupId, Throwable)
.
The allocated slot will fulfill the requested ResourceProfile
and it
is tried to place it on one of the location preferences.
If the returned future must not be completed right away (a.k.a. the slot request can be queued), allowQueuedScheduling must be set to true.
allocateSlot
in interface SlotPoolGateway
slotRequestId
- identifying the requested slottask
- for which to allocate slotslotProfile
- profile that specifies the requirements for the requested slotallowQueuedScheduling
- true if the slot request can be queued (e.g. the returned future must not be completed)allocationTimeout
- for the operationLogicalSlot
public CompletableFuture<Acknowledge> releaseSlot(SlotRequestId slotRequestId, @Nullable SlotSharingGroupId slotSharingGroupId, Throwable cause)
AllocatedSlotActions
SlotRequestId
. If the slot belonged to a
slot sharing group, then the corresponding SlotSharingGroupId
has to be
provided. Additionally, one can provide a cause for the slot release.releaseSlot
in interface AllocatedSlotActions
slotRequestId
- identifying the slot to releaseslotSharingGroupId
- identifying the slot sharing group to which the slot belongs, null if nonecause
- of the slot release, null if nonepublic CompletableFuture<Collection<SlotOffer>> offerSlots(TaskManagerLocation taskManagerLocation, TaskManagerGateway taskManagerGateway, Collection<SlotOffer> offers)
SlotPoolGateway
SlotPool
. The slot offerings can be
individually accepted or rejected by returning the collection of accepted
slot offers.offerSlots
in interface SlotPoolGateway
taskManagerLocation
- from which the slot offers originatetaskManagerGateway
- to talk to the slot offereroffers
- slot offers which are offered to the SlotPool
public CompletableFuture<Boolean> offerSlot(TaskManagerLocation taskManagerLocation, TaskManagerGateway taskManagerGateway, SlotOffer slotOffer)
offerSlot
in interface SlotPoolGateway
taskManagerLocation
- location from where the offer comes fromtaskManagerGateway
- TaskManager gatewayslotOffer
- the offered slotpublic void failAllocation(AllocationID allocationID, Exception cause)
failAllocation
in interface SlotPoolGateway
allocationID
- Represents the allocation which should be failedcause
- The cause of the failurepublic CompletableFuture<Acknowledge> registerTaskManager(ResourceID resourceID)
registerTaskManager
in interface SlotPoolGateway
resourceID
- The id of the TaskManagerpublic CompletableFuture<Acknowledge> releaseTaskManager(ResourceID resourceId, Exception cause)
releaseTaskManager
in interface SlotPoolGateway
resourceId
- The id of the TaskManagercause
- for the releasing of the TaskManager@VisibleForTesting protected void timeoutPendingSlotRequest(SlotRequestId slotRequestId)
@VisibleForTesting protected org.apache.flink.runtime.jobmaster.slotpool.SlotPool.AllocatedSlots getAllocatedSlots()
@VisibleForTesting protected SlotPool.AvailableSlots getAvailableSlots()
Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.