REST API #
Flink 具有监控 API ,可用于查询正在运行的作业以及最近完成的作业的状态和统计信息。该监控 API 被用于 Flink 自己的仪表盘,同时也可用于自定义监控工具。
该监控 API 是 REST-ful 风格的,可以接受 HTTP 请求并返回 JSON 格式的数据。
概览 #
该监控 API 由作为 JobManager
一部分运行的 web 服务器提供支持。默认情况下,该服务器监听 8081 端口,端口号可以通过修改 Flink 配置文件的 rest.port
进行配置。请注意,该监控 API 的 web 服务器和仪表盘的 web 服务器目前是相同的,因此在同一端口一起运行。不过,它们响应不同的 HTTP URL 。
在多个 JobManager
的情况下(为了高可用),每个 JobManager 将运行自己的监控 API 实例,当 JobManager 被选举成为集群 leader 时,该实例将提供已完成和正在运行作业的相关信息。
拓展 #
该 REST API
后端位于 flink-runtime
项目中。核心类是 org.apache.flink.runtime.webmonitor.WebMonitorEndpoint
,用来配置服务器和请求路由。
我们使用 Netty
和 Netty Router
库来处理 REST
请求和转换 URL 。选择该选项是因为这种组合具有轻量级依赖关系,并且 Netty HTTP
的性能非常好。
添加新的请求,需要
- 添加一个新的 MessageHeaders 类,作为新请求的接口,
- 添加一个新的 AbstractRestHandler 类,该类接收并处理 MessageHeaders 类的请求,
- 将处理程序添加到 org.apache.flink.runtime.webmonitor.WebMonitorEndpoint#initializeHandlers() 中。
一个很好的例子是使用 org.apache.flink.runtime.rest.messages.JobExceptionsHeaders
的 org.apache.flink.runtime.rest.handler.job.JobExceptionsHandler
。
API #
该 REST API
已版本化,可以通过在 URL 前面加上版本前缀来查询指定版本。前缀格式始终为 v[version_number] 。 例如,要访问版本 1 的 /foo/bar
接口,将查询 /v1/foo/bar
。
如果未指定版本, Flink 将默认使用支持该请求的最旧版本。
查询 不支持/不存在 的版本将返回 404
错误。
这些 API 中存在几种异步操作,例如:trigger savepoint
、 rescale a job
。它们将返回 triggerid
来标识你刚刚执行的 POST
请求,然后你需要使用该 triggerid
查询该操作的状态。
JobManager #
The OpenAPI specification is still experimental.
API reference #
/cluster |
|
Verb: DELETE |
Response code: 200 OK |
Shuts down the cluster | |
/config |
|
Verb: GET |
Response code: 200 OK |
Returns the configuration of the WebUI. | |
/datasets |
|
Verb: GET |
Response code: 200 OK |
Returns all cluster data sets. | |
/datasets/delete/:triggerid |
|
Verb: GET |
Response code: 200 OK |
Returns the status for the delete operation of a cluster data set. | |
Path parameters | |
|
|
/datasets/:datasetid |
|
Verb: DELETE |
Response code: 202 Accepted |
Triggers the deletion of a cluster data set. This async operation would return a 'triggerid' for further query identifier. | |
Path parameters | |
|
|
/jars |
|
Verb: GET |
Response code: 200 OK |
Returns a list of all jars previously uploaded via '/jars/upload'. | |
/jars/upload |
|
Verb: POST |
Response code: 200 OK |
Uploads a jar to the cluster. The jar must be sent as multi-part data. Make sure that the "Content-Type" header is set to "application/x-java-archive", as some http libraries do not add the header by default. Using 'curl' you can upload a jar via 'curl -X POST -H "Expect:" -F "jarfile=@path/to/flink-job.jar" http://hostname:port/jars/upload'. | |
/jars/:jarid |
|
Verb: DELETE |
Response code: 200 OK |
Deletes a jar previously uploaded via '/jars/upload'. | |
Path parameters | |
|
|
/jars/:jarid/plan |
|
Verb: POST |
Response code: 200 OK |
Returns the dataflow plan of a job contained in a jar previously uploaded via '/jars/upload'. Program arguments can be passed both via the JSON request (recommended) or query parameters. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jars/:jarid/run |
|
Verb: POST |
Response code: 200 OK |
Submits a job by running a jar previously uploaded via '/jars/upload'. Program arguments can be passed both via the JSON request (recommended) or query parameters. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobmanager/config |
|
Verb: GET |
Response code: 200 OK |
Returns the cluster configuration. | |
/jobmanager/environment |
|
Verb: GET |
Response code: 200 OK |
Returns the jobmanager environment. | |
/jobmanager/logs |
|
Verb: GET |
Response code: 200 OK |
Returns the list of log files on the JobManager. | |
/jobmanager/metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to job manager metrics. | |
Query parameters | |
|
|
/jobmanager/thread-dump |
|
Verb: GET |
Response code: 200 OK |
Returns the thread dump of the JobManager. | |
/jobs |
|
Verb: GET |
Response code: 200 OK |
Returns an overview over all jobs and their current state. | |
/jobs |
|
Verb: POST |
Response code: 202 Accepted |
Submits a job. This call is primarily intended to be used by the Flink client. This call expects a multipart/form-data request that consists of file uploads for the serialized JobGraph, jars and distributed cache artifacts and an attribute named "request" for the JSON payload. | |
/jobs/metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to aggregated job metrics. | |
Query parameters | |
|
|
/jobs/overview |
|
Verb: GET |
Response code: 200 OK |
Returns an overview over all jobs. | |
/jobs/:jobid |
|
Verb: GET |
Response code: 200 OK |
Returns details of a job. | |
Path parameters | |
|
|
/jobs/:jobid |
|
Verb: PATCH |
Response code: 202 Accepted |
Terminates a job. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/accumulators |
|
Verb: GET |
Response code: 200 OK |
Returns the accumulators for all tasks of a job, aggregated across the respective subtasks. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/checkpoints |
|
Verb: GET |
Response code: 200 OK |
Returns checkpointing statistics for a job. | |
Path parameters | |
|
|
/jobs/:jobid/checkpoints |
|
Verb: POST |
Response code: 202 Accepted |
Triggers a checkpoint. The 'checkpointType' parameter does not support 'INCREMENTAL' option for now. See FLINK-33723. This async operation would return a 'triggerid' for further query identifier. | |
Path parameters | |
|
|
/jobs/:jobid/checkpoints/config |
|
Verb: GET |
Response code: 200 OK |
Returns the checkpointing configuration. | |
Path parameters | |
|
|
/jobs/:jobid/checkpoints/details/:checkpointid |
|
Verb: GET |
Response code: 200 OK |
Returns details for a checkpoint. | |
Path parameters | |
|
|
/jobs/:jobid/checkpoints/details/:checkpointid/subtasks/:vertexid |
|
Verb: GET |
Response code: 200 OK |
Returns checkpoint statistics for a task and its subtasks. | |
Path parameters | |
|
|
/jobs/:jobid/checkpoints/:triggerid |
|
Verb: GET |
Response code: 200 OK |
Returns the status of a checkpoint trigger operation. | |
Path parameters | |
|
|
/jobs/:jobid/clientHeartbeat |
|
Verb: PATCH |
Response code: 202 Accepted |
Report the jobClient's aliveness. | |
Path parameters | |
|
|
/jobs/:jobid/config |
|
Verb: GET |
Response code: 200 OK |
Returns the configuration of a job. | |
Path parameters | |
|
|
/jobs/:jobid/exceptions |
|
Verb: GET |
Response code: 200 OK |
Returns the most recent exceptions that have been handled by Flink for this job. The 'exceptionHistory.truncated' flag defines whether exceptions were filtered out through the GET parameter. The backend collects only a specific amount of most recent exceptions per job. This can be configured through web.exception-history-size in the Flink configuration. The following first-level members are deprecated: 'root-exception', 'timestamp', 'all-exceptions', and 'truncated'. Use the data provided through 'exceptionHistory', instead. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/execution-result |
|
Verb: GET |
Response code: 200 OK |
Returns the result of a job execution. Gives access to the execution time of the job and to all accumulators created by this job. | |
Path parameters | |
|
|
/jobs/:jobid/jobmanager/config |
|
Verb: GET |
Response code: 200 OK |
Returns the jobmanager's configuration of a specific job. | |
Path parameters | |
|
|
/jobs/:jobid/jobmanager/environment |
|
Verb: GET |
Response code: 200 OK |
Returns the jobmanager's environment of a specific job. | |
Path parameters | |
|
|
/jobs/:jobid/jobmanager/log-url |
|
Verb: GET |
Response code: 200 OK |
Returns the log url of jobmanager of a specific job. | |
Path parameters | |
|
|
/jobs/:jobid/metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to job metrics. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/plan |
|
Verb: GET |
Response code: 200 OK |
Returns the dataflow plan of a job. | |
Path parameters | |
|
|
/jobs/:jobid/rescaling |
|
Verb: PATCH |
Response code: 200 OK |
Triggers the rescaling of a job. This async operation would return a 'triggerid' for further query identifier. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/rescaling/:triggerid |
|
Verb: GET |
Response code: 200 OK |
Returns the status of a rescaling operation. | |
Path parameters | |
|
|
/jobs/:jobid/resource-requirements |
|
Verb: GET |
Response code: 200 OK |
Request details on the job's resource requirements. | |
Path parameters | |
|
|
/jobs/:jobid/resource-requirements |
|
Verb: PUT |
Response code: 200 OK |
Request to update job's resource requirements. | |
Path parameters | |
|
|
/jobs/:jobid/savepoints |
|
Verb: POST |
Response code: 202 Accepted |
Triggers a savepoint, and optionally cancels the job afterwards. This async operation would return a 'triggerid' for further query identifier. | |
Path parameters | |
|
|
/jobs/:jobid/savepoints/:triggerid |
|
Verb: GET |
Response code: 200 OK |
Returns the status of a savepoint operation. | |
Path parameters | |
|
|
/jobs/:jobid/status |
|
Verb: GET |
Response code: 200 OK |
Returns the current status of a job execution. | |
Path parameters | |
|
|
/jobs/:jobid/stop |
|
Verb: POST |
Response code: 202 Accepted |
Stops a job with a savepoint. Optionally, it can also emit a MAX_WATERMARK before taking the savepoint to flush out any state waiting for timers to fire. This async operation would return a 'triggerid' for further query identifier. | |
Path parameters | |
|
|
/jobs/:jobid/taskmanagers/:taskmanagerid/log-url |
|
Verb: GET |
Response code: 200 OK |
Returns the log url of jobmanager of a specific job. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid |
|
Verb: GET |
Response code: 200 OK |
Returns details for a task, with a summary for each of its subtasks. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/accumulators |
|
Verb: GET |
Response code: 200 OK |
Returns user-defined accumulators of a task, aggregated across all subtasks. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/backpressure |
|
Verb: GET |
Response code: 200 OK |
Returns back-pressure information for a job, and may initiate back-pressure sampling if necessary. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/flamegraph |
|
Verb: GET |
Response code: 200 OK |
Returns flame graph information for a vertex, and may initiate flame graph sampling if necessary. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/jm-operator-metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to jobmanager operator metrics. This is an operator that executes on the jobmanager and the coordinator for FLIP 27 sources is one example of such an operator. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to task metrics. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/subtasks/accumulators |
|
Verb: GET |
Response code: 200 OK |
Returns all user-defined accumulators for all subtasks of a task. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/subtasks/metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to aggregated subtask metrics. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/subtasks/:subtaskindex |
|
Verb: GET |
Response code: 200 OK |
Returns details of the current or latest execution attempt of a subtask. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/subtasks/:subtaskindex/attempts/:attempt |
|
Verb: GET |
Response code: 200 OK |
Returns details of an execution attempt of a subtask. Multiple execution attempts happen in case of failure/recovery. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/subtasks/:subtaskindex/attempts/:attempt/accumulators |
|
Verb: GET |
Response code: 200 OK |
Returns the accumulators of an execution attempt of a subtask. Multiple execution attempts happen in case of failure/recovery. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/subtasks/:subtaskindex/metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to subtask metrics. | |
Path parameters | |
|
|
Query parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/subtasktimes |
|
Verb: GET |
Response code: 200 OK |
Returns time-related information for all subtasks of a task. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/taskmanagers |
|
Verb: GET |
Response code: 200 OK |
Returns task information aggregated by task manager. | |
Path parameters | |
|
|
/jobs/:jobid/vertices/:vertexid/watermarks |
|
Verb: GET |
Response code: 200 OK |
Returns the watermarks for all subtasks of a task. | |
Path parameters | |
|
|
/overview |
|
Verb: GET |
Response code: 200 OK |
Returns an overview over the Flink cluster. | |
/savepoint-disposal |
|
Verb: POST |
Response code: 200 OK |
Triggers the desposal of a savepoint. This async operation would return a 'triggerid' for further query identifier. | |
/savepoint-disposal/:triggerid |
|
Verb: GET |
Response code: 200 OK |
Returns the status of a savepoint disposal operation. | |
Path parameters | |
|
|
/taskmanagers |
|
Verb: GET |
Response code: 200 OK |
Returns an overview over all task managers. | |
/taskmanagers/metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to aggregated task manager metrics. | |
Query parameters | |
|
|
/taskmanagers/:taskmanagerid |
|
Verb: GET |
Response code: 200 OK |
Returns details for a task manager. "metrics.memorySegmentsAvailable" and "metrics.memorySegmentsTotal" are deprecated. Please use "metrics.nettyShuffleMemorySegmentsAvailable" and "metrics.nettyShuffleMemorySegmentsTotal" instead. | |
Path parameters | |
|
|
/taskmanagers/:taskmanagerid/logs |
|
Verb: GET |
Response code: 200 OK |
Returns the list of log files on a TaskManager. | |
Path parameters | |
|
|
/taskmanagers/:taskmanagerid/metrics |
|
Verb: GET |
Response code: 200 OK |
Provides access to task manager metrics. | |
Path parameters | |
|
|
Query parameters | |
|
|
/taskmanagers/:taskmanagerid/thread-dump |
|
Verb: GET |
Response code: 200 OK |
Returns the thread dump of the requested TaskManager. | |
Path parameters | |
|
|