T
- The type of the elements that result from this Transformation
@Internal public abstract class Transformation<T> extends Object
Transformation
represents the operation that creates a DataStream. Every DataStream has
an underlying Transformation
that is the origin of said DataStream.
API operations such as DataStream#map create a tree of Transformation
s underneath.
When the stream program is to be executed this graph is translated to a StreamGraph using
StreamGraphGenerator.
A Transformation
does not necessarily correspond to a physical operation at runtime.
Some operations are only logical concepts. Examples of this are union, split/select data stream,
partitioning.
The following graph of Transformations
:
Source Source
+ +
| |
v v
Rebalance HashPartition
+ +
| |
| |
+------>Union<------+
+
|
v
Split
+
|
v
Select
+
v
Map
+
|
v
Sink
Would result in this graph of operations at runtime:
Source Source
+ +
| |
| |
+------->Map<-------+
+
|
v
Sink
The information about partitioning, union, split/select end up being encoded in the edges that connect the sources to the map operation.
Modifier and Type | Field and Description |
---|---|
protected long |
bufferTimeout |
protected String |
description |
protected int |
id |
protected String |
name |
protected TypeInformation<T> |
outputType |
protected boolean |
typeUsed |
static int |
UPPER_BOUND_MAX_PARALLELISM |
Constructor and Description |
---|
Transformation(String name,
TypeInformation<T> outputType,
int parallelism)
Creates a new
Transformation with the given name, output type and parallelism. |
Transformation(String name,
TypeInformation<T> outputType,
int parallelism,
boolean parallelismConfigured)
Creates a new
Transformation with the given name, output type and parallelism. |
Modifier and Type | Method and Description |
---|---|
Optional<Integer> |
declareManagedMemoryUseCaseAtOperatorScope(ManagedMemoryUseCase managedMemoryUseCase,
int weight)
Declares that this transformation contains certain operator scope managed memory use case.
|
void |
declareManagedMemoryUseCaseAtSlotScope(ManagedMemoryUseCase managedMemoryUseCase)
Declares that this transformation contains certain slot scope managed memory use case.
|
boolean |
equals(Object o) |
long |
getBufferTimeout()
Returns the buffer timeout of this
Transformation . |
String |
getCoLocationGroupKey()
NOTE: This is an internal undocumented feature for now.
|
String |
getDescription()
Returns the description of this
Transformation . |
int |
getId()
Returns the unique ID of this
Transformation . |
abstract List<Transformation<?>> |
getInputs()
Returns the
transformations that are the immediate predecessors of the
current transformation in the transformation graph. |
Map<ManagedMemoryUseCase,Integer> |
getManagedMemoryOperatorScopeUseCaseWeights()
Get operator scope use cases that this transformation needs managed memory for, and the
use-case-specific weights for this transformation.
|
Set<ManagedMemoryUseCase> |
getManagedMemorySlotScopeUseCases()
Get slot scope use cases that this transformation needs managed memory for.
|
int |
getMaxParallelism()
Gets the maximum parallelism for this stream transformation.
|
ResourceSpec |
getMinResources()
Gets the minimum resource of this stream transformation.
|
String |
getName()
Returns the name of this
Transformation . |
static int |
getNewNodeId() |
TypeInformation<T> |
getOutputType()
Returns the output type of this
Transformation as a TypeInformation . |
int |
getParallelism()
Returns the parallelism of this
Transformation . |
ResourceSpec |
getPreferredResources()
Gets the preferred resource of this stream transformation.
|
Optional<SlotSharingGroup> |
getSlotSharingGroup()
Returns the slot sharing group of this transformation if present.
|
List<Transformation<?>> |
getTransitivePredecessors()
Returns all transitive predecessor
Transformation s of this Transformation . |
protected abstract List<Transformation<?>> |
getTransitivePredecessorsInternal()
Returns all transitive predecessor
Transformation s of this Transformation . |
String |
getUid()
Returns the user-specified ID of this transformation.
|
String |
getUserProvidedNodeHash()
Gets the user provided hash.
|
int |
hashCode() |
boolean |
isParallelismConfigured() |
void |
setBufferTimeout(long bufferTimeout)
Set the buffer timeout of this
Transformation . |
void |
setCoLocationGroupKey(String coLocationGroupKey)
NOTE: This is an internal undocumented feature for now.
|
void |
setDescription(String description)
Changes the description of this
Transformation . |
void |
setMaxParallelism(int maxParallelism)
Sets the maximum parallelism for this stream transformation.
|
void |
setName(String name)
Changes the name of this
Transformation . |
void |
setOutputType(TypeInformation<T> outputType)
Tries to fill in the type information.
|
void |
setParallelism(int parallelism)
Sets the parallelism of this
Transformation . |
void |
setParallelism(int parallelism,
boolean parallelismConfigured) |
void |
setResources(ResourceSpec minResources,
ResourceSpec preferredResources)
Sets the minimum and preferred resources for this stream transformation.
|
void |
setSlotSharingGroup(SlotSharingGroup slotSharingGroup)
Sets the slot sharing group of this transformation.
|
void |
setSlotSharingGroup(String slotSharingGroupName)
Sets the slot sharing group of this transformation.
|
void |
setUid(String uid)
Sets an ID for this
Transformation . |
void |
setUidHash(String uidHash)
Sets an user provided hash for this operator.
|
String |
toString() |
protected void |
updateManagedMemoryStateBackendUseCase(boolean hasStateBackend) |
public static final int UPPER_BOUND_MAX_PARALLELISM
protected final int id
protected String name
protected String description
protected TypeInformation<T> outputType
protected boolean typeUsed
protected long bufferTimeout
public Transformation(String name, TypeInformation<T> outputType, int parallelism)
Transformation
with the given name, output type and parallelism.name
- The name of the Transformation
, this will be shown in Visualizations and
the LogoutputType
- The output type of this Transformation
parallelism
- The parallelism of this Transformation
public Transformation(String name, TypeInformation<T> outputType, int parallelism, boolean parallelismConfigured)
Transformation
with the given name, output type and parallelism.name
- The name of the Transformation
, this will be shown in Visualizations and
the LogoutputType
- The output type of this Transformation
parallelism
- The parallelism of this Transformation
parallelismConfigured
- If true, the parallelism of the transformation is explicitly set
and should be respected. Otherwise the parallelism can be changed at runtime.public static int getNewNodeId()
public int getId()
Transformation
.public void setName(String name)
Transformation
.public String getName()
Transformation
.public void setDescription(String description)
Transformation
.public String getDescription()
Transformation
.public int getParallelism()
Transformation
.public void setParallelism(int parallelism)
Transformation
.parallelism
- The new parallelism to set on this Transformation
.public void setParallelism(int parallelism, boolean parallelismConfigured)
public boolean isParallelismConfigured()
public int getMaxParallelism()
public void setMaxParallelism(int maxParallelism)
maxParallelism
- Maximum parallelism for this stream transformation.public void setResources(ResourceSpec minResources, ResourceSpec preferredResources)
minResources
- The minimum resource of this transformation.preferredResources
- The preferred resource of this transformation.public ResourceSpec getMinResources()
public ResourceSpec getPreferredResources()
public Optional<Integer> declareManagedMemoryUseCaseAtOperatorScope(ManagedMemoryUseCase managedMemoryUseCase, int weight)
managedMemoryUseCase
- The use case that this transformation declares needing managed
memory for.weight
- Use-case-specific weights for this transformation. Used for sharing managed
memory across transformations for OPERATOR scope use cases. Check the individual ManagedMemoryUseCase
for the specific weight definition.public void declareManagedMemoryUseCaseAtSlotScope(ManagedMemoryUseCase managedMemoryUseCase)
managedMemoryUseCase
- The use case that this transformation declares needing managed
memory for.protected void updateManagedMemoryStateBackendUseCase(boolean hasStateBackend)
public Map<ManagedMemoryUseCase,Integer> getManagedMemoryOperatorScopeUseCaseWeights()
ManagedMemoryUseCase
for the specific weight definition.public Set<ManagedMemoryUseCase> getManagedMemorySlotScopeUseCases()
public void setUidHash(String uidHash)
The user provided hash is an alternative to the generated hashes, that is considered when identifying an operator through the default hash mechanics fails (e.g. because of changes between Flink versions).
Important: this should be used as a workaround or for trouble shooting. The provided hash needs to be unique per transformation and job. Otherwise, job submission will fail. Furthermore, you cannot assign user-specified hash to intermediate nodes in an operator chain and trying so will let your job fail.
A use case for this is in migration between Flink versions or changing the jobs in a way that changes the automatically generated hashes. In this case, providing the previous hashes directly through this method (e.g. obtained from old logs) can help to reestablish a lost mapping from states to their target operator.
uidHash
- The user provided hash for this operator. This will become the JobVertexID,
which is shown in the logs and web ui.public String getUserProvidedNodeHash()
public void setUid(String uid)
Transformation
. This is will later be hashed to a uidHash which
is then used to create the JobVertexID (that is shown in logs and the web ui).
The specified ID is used to assign the same operator ID across job submissions (for example when starting a job from a savepoint).
Important: this ID needs to be unique per transformation and job. Otherwise, job submission will fail.
uid
- The unique user-specified ID of this transformation.public String getUid()
public Optional<SlotSharingGroup> getSlotSharingGroup()
setSlotSharingGroup(SlotSharingGroup)
public void setSlotSharingGroup(String slotSharingGroupName)
Initially, an operation is in the default slot sharing group. This can be explicitly set
using setSlotSharingGroup("default")
.
slotSharingGroupName
- The slot sharing group's name.public void setSlotSharingGroup(SlotSharingGroup slotSharingGroup)
Initially, an operation is in the default slot sharing group. This can be explicitly set
with constructing a SlotSharingGroup
with name "default"
.
slotSharingGroup
- which contains name and its resource spec.public void setCoLocationGroupKey(@Nullable String coLocationGroupKey)
Sets the key that identifies the co-location group. Operators with the same co-location key will have their corresponding subtasks placed into the same slot by the scheduler.
Setting this to null means there is no co-location constraint.
@Nullable public String getCoLocationGroupKey()
Gets the key that identifies the co-location group. Operators with the same co-location key will have their corresponding subtasks placed into the same slot by the scheduler.
If this is null (which is the default), it means there is no co-location constraint.
public void setOutputType(TypeInformation<T> outputType)
outputType
- The type information to fill in.IllegalStateException
- Thrown, if the type information has been accessed before.public TypeInformation<T> getOutputType()
Transformation
as a TypeInformation
. Once
this is used once the output type cannot be changed anymore using setOutputType(org.apache.flink.api.common.typeinfo.TypeInformation<T>)
.Transformation
public void setBufferTimeout(long bufferTimeout)
Transformation
. The timeout defines how long data may
linger in a partially full buffer before being sent over the network.
Lower timeouts lead to lower tail latencies, but may affect throughput. For Flink 1.5+, timeouts of 1ms are feasible for jobs with high parallelism.
A value of -1 means that the default buffer timeout should be used. A value of zero indicates that no buffering should happen, and all records/events should be immediately sent through the network, without additional buffering.
public long getBufferTimeout()
Transformation
.setBufferTimeout(long)
protected abstract List<Transformation<?>> getTransitivePredecessorsInternal()
Transformation
s of this Transformation
.
This is, for example, used when determining whether a feedback edge of an iteration actually
has the iteration head as a predecessor.public final List<Transformation<?>> getTransitivePredecessors()
Transformation
s of this Transformation
.
This is, for example, used when determining whether a feedback edge of an iteration actually
has the iteration head as a predecessor. This method is just a wrapper on top of getTransitivePredecessorsInternal
method with public access. It uses caching internally.public abstract List<Transformation<?>> getInputs()
transformations
that are the immediate predecessors of the
current transformation in the transformation graph.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.