Source code for pyflink.datastream.time_characteristic
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
from enum import Enum
from pyflink.java_gateway import get_gateway
__all__ = ['TimeCharacteristic']
[docs]class TimeCharacteristic(Enum):
"""
The time characteristic defines how the system determines time for time-dependent
order and operations that depend on time (such as time windows).
:data:`ProcessingTime`:
Processing time for operators means that the operator uses the system clock of the machine
to determine the current time of the data stream. Processing-time windows trigger based
on wall-clock time and include whatever elements happen to have arrived at the operator at
that point in time.
Using processing time for window operations results in general in quite non-deterministic
results, because the contents of the windows depends on the speed in which elements arrive.
It is, however, the cheapest method of forming windows and the method that introduces the
least latency.
:data:`IngestionTime`:
Ingestion time means that the time of each individual element in the stream is determined
when the element enters the Flink streaming data flow. Operations like windows group the
elements based on that time, meaning that processing speed within the streaming dataflow
does not affect windowing, but only the speed at which sources receive elements.
Ingestion time is often a good compromise between processing time and event time.
It does not need any special manual form of watermark generation, and events are typically
not too much out-or-order when they arrive at operators; in fact, out-of-orderness can
only be introduced by streaming shuffles or split/join/union operations. The fact that
elements are not very much out-of-order means that the latency increase is moderate,
compared to event time.
:data:`EventTime`:
Event time means that the time of each individual element in the stream (also called event)
is determined by the event's individual custom timestamp. These timestamps either exist in
the elements from before they entered the Flink streaming dataflow, or are user-assigned at
the sources. The big implication of this is that it allows for elements to arrive in the
sources and in all operators out of order, meaning that elements with earlier timestamps may
arrive after elements with later timestamps.
Operators that window or order data with respect to event time must buffer data until they
can be sure that all timestamps for a certain time interval have been received. This is
handled by the so called "time watermarks".
Operations based on event time are very predictable - the result of windowing operations
is typically identical no matter when the window is executed and how fast the streams
operate. At the same time, the buffering and tracking of event time is also costlier than
operating with processing time, and typically also introduces more latency. The amount of
extra cost depends mostly on how much out of order the elements arrive, i.e., how long the
time span between the arrival of early and late elements is. With respect to the
"time watermarks", this means that the cost typically depends on how early or late the
watermarks can be generated for their timestamp.
In relation to :data:`IngestionTime`, the event time is similar, but refers the event's
original time, rather than the time assigned at the data source. Practically, that means
that event time has generally more meaning, but also that it takes longer to determine
that all elements for a certain time have arrived.
"""
ProcessingTime = 0
IngestionTime = 1
EventTime = 2
@staticmethod
def _from_j_time_characteristic(j_time_characteristic) -> 'TimeCharacteristic':
return TimeCharacteristic[j_time_characteristic.name()]
def _to_j_time_characteristic(self):
gateway = get_gateway()
JTimeCharacteristic = gateway.jvm.org.apache.flink.streaming.api.TimeCharacteristic
return getattr(JTimeCharacteristic, self.name)