This documentation is for an unreleased version of the Apache Flink Kubernetes Operator. We recommend you use the latest stable version.
The core user facing API of the Flink Kubernetes Operator is the FlinkDeployment and FlinkSessionJob Custom Resources (CR).
Custom Resources are extensions of the Kubernetes API and define new object types. In our case the FlinkDeployment CR defines Flink Application and Session cluster deployments. The FlinkSessionJob CR defines the session job on the Session cluster and each Session cluster can run multiple FlinkSessionJob.
Once the Flink Kubernetes Operator is installed and running in your Kubernetes environment, it will continuously watch FlinkDeployment and FlinkSessionJob objects submitted by the user to detect new CR and changes to existing ones. In case you haven’t deployed the operator yet, please check out the quickstart for detailed instructions on how to get started.
With these two Custom Resources, we can support two different operational models:
- Flink application managed by the
- Empty Flink session managed by the
FlinkDeployment+ multiple jobs managed by the
FlinkSessionJobs. The operations on the session jobs are independent of each other.
FlinkDeployment objects are defined in YAML format by the user and must contain the following required fields:
// Deployment specs of your Flink Session/Application
kind fields have fixed values while
spec control the actual Flink deployment.
The Flink operator will subsequently add status information to your FlinkDeployment object based on the observed deployment state:
kubectl get flinkdeployment my-deployment -o yaml
While the status contains a lot of information that the operator tracks about the deployment, it is considered to be internal to the operator logic and users should only rely on it with care.
FlinkDeployment spec overview #
spec is the most important part of the
FlinkDeployment as it describes the desired Flink Application or Session cluster.
The spec contains all the information the operator need to deploy and manage your Flink deployments, including docker images, configurations, desired state etc.
Most deployments will define at least the following fields:
image: Docker used to run Flink job and task manager processes
flinkVersion: Flink version used in the image (
serviceAccount: Kubernetes service account used by the Flink pods
taskManager, jobManager: Job and Task manager pod resource specs (cpu, memory, ephemeralStorage)
flinkConfiguration: Map of Flink configuration overrides such as HA and checkpointing configs
job: Job Spec for Application deployments
The Flink Kubernetes Operator supports two main types of deployments: Application and Session
Application deployments manage a single job deployment in Application mode while Session deployments manage Flink Session clusters without providing any job management for it. The type of cluster created depends on the
spec provided by the user as we will see in the next sections.
Application Deployments #
To create an Application deployment users must define the
job (JobSpec) field in their deployment spec.
jarURI: URI of the job jar
parallelism: Parallelism of the job
upgradeMode: Upgrade mode of the job (stateless/savepoint/last-state)
state: Desired state of the job (running/suspended)
Once created FlinkDeployment yamls can be submitted through kubectl:
kubectl apply -f your-deployment.yaml
Session Cluster Deployments #
Session clusters use a similar spec to application clusters with the only difference that
job is not defined.
For Session clusters the operator only provides very basic management and monitoring that cover:
- Start Session cluster
- Monitor overall cluster health
- Stop / Delete Session cluster
Cluster Deployment Modes #
On-top of the deployment types the Flink Kubernetes Operator also supports two modes of deployments: Native and Standalone
Native cluster deployment is the default deployment mode and uses Flink’s built in integration with Kubernetes when deploying the cluster. This integration means the Flink cluster communicates directly with Kubernetes and allows it to manage Kubernetes resources, e.g. dynamically allocate and de-allocate TaskManager pods.
For standard Operator use running your own Flink Jobs Native mode is recommended.
Standalone cluster deployment simply uses Kubernetes as an orchestration platform that the Flink cluster is running on. Flink is unaware that it is running on Kubernetes and therefore all Kubernetes resources need to be managed externally, by the Kubernetes Operator.
In Standalone mode the Flink cluster doesn’t have access to the Kubernetes cluster so this can increase security. If unknown or external code is being ran on the Flink cluster then Standalone mode adds another layer of security.
The deployment mode can be set using the
mode field in the deployment spec.
The FlinkSessionJob have a similar structure to FlinkDeployment with the following required fields:
FlinkSessionJob spec overview #
The spec contains the information to submit a session job to the session cluster. Mostly, it will define at least the following fields:
- deploymentName: The name of the target session cluster’s CR
- job: The specification for the Session Job
The job specification has the same structure in FlinkSessionJobs and FlinkDeployments, but in FlinkSessionJobs the jarUri can contain remote sources too. It leverages the Flink filesystem mechanism to download the jar and submit to the session cluster. So the FlinkSessionJob must be run with an existing session cluster managed by the FlinkDeployment.
To support jar from different filesystems, you should extend the base docker image as below, and put the related filesystem jar to the plugin dir and deploy the operator. For example, to support the hadoop fs resource:
COPY flink-hadoop-fs-1.15-SNAPSHOT.jar $FLINK_PLUGINS_DIR/hadoop-fs/
Alternatively, if you use helm to install flink-kubernetes-operator, it allows you to specify a postStart hook to download the required plugins.
- Last-state upgradeMode is currently not supported for FlinkSessionJobs