Class ClusterResourceManager

  • All Implemented Interfaces:
    ResourceCheck

    public class ClusterResourceManager
    extends java.lang.Object
    implements ResourceCheck
    A cluster resource manager which provides a view over the allocatable resources within a Kubernetes cluster and allows to simulate scheduling pods with a defined number of required resources.

    The goal is to provide a good indicator for whether resources needed for autoscaling are going to be available. This is achieved by pulling the node resource usage from the Kubernetes cluster at a regular configurable interval, after which we use this data to simulate adding / removing resources (pods). Note that this is merely a (pretty good) heuristic because the Kubernetes scheduler has the final saying. However, we prevent 99% of the scenarios after pipeline outages which can lead to massive scale up where all pipelines may be scaled up at the same time and exhaust the number of available resources.

    The simulation can run on a fixed set of Kubernetes nodes. Additionally, if we detect that the cluster is using the Kubernetes Cluster Autoscaler, we will use this data to extrapolate the number of nodes to the maximum defined nodes in the autoscaler configuration.

    We currently track CPU and memory. Ephemeral storage is missing because there is no easy way to get node statics on free storage.

    • Constructor Summary

      Constructors 
      Constructor Description
      ClusterResourceManager​(java.time.Duration refreshInterval, io.fabric8.kubernetes.client.KubernetesClient kubernetesClient)  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static ClusterResourceManager of​(org.apache.flink.configuration.Configuration config, io.fabric8.kubernetes.client.KubernetesClient client)  
      boolean trySchedule​(int currentInstances, int newInstances, double cpuPerInstance, org.apache.flink.configuration.MemorySize memoryPerInstance)
      Simulates scheduling the provided number of TaskManager instances.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • ClusterResourceManager

        public ClusterResourceManager​(java.time.Duration refreshInterval,
                                      io.fabric8.kubernetes.client.KubernetesClient kubernetesClient)
    • Method Detail

      • of

        public static ClusterResourceManager of​(org.apache.flink.configuration.Configuration config,
                                                io.fabric8.kubernetes.client.KubernetesClient client)
      • trySchedule

        public boolean trySchedule​(int currentInstances,
                                   int newInstances,
                                   double cpuPerInstance,
                                   org.apache.flink.configuration.MemorySize memoryPerInstance)
        Description copied from interface: ResourceCheck
        Simulates scheduling the provided number of TaskManager instances.
        Specified by:
        trySchedule in interface ResourceCheck
        Parameters:
        currentInstances - The current number of instances.
        newInstances - The new number of instances.
        cpuPerInstance - The number of CPU per instances.
        memoryPerInstance - The total memory size per instances.
        Returns:
        true if a scheduling configuration was found, false otherwise.