public abstract class OptimizerNode extends Object implements Visitable<OptimizerNode>, EstimateProvider, DumpableNode<OptimizerNode>
Nodes in the DAG correspond (almost) one-to-one to the operators in a program. The optimizer DAG is constructed to hold the additional information that the optimizer needs:
Modifier and Type | Class and Description |
---|---|
static class |
OptimizerNode.UnclosedBranchDescriptor
Description of an unclosed branch.
|
Modifier and Type | Field and Description |
---|---|
protected List<PlanNode> |
cachedPlans |
protected Set<OptimizerNode> |
closedBranchingNodes |
protected int |
costWeight |
protected long |
estimatedNumRecords |
protected long |
estimatedOutputSize |
protected List<OptimizerNode> |
hereJoinedBranches |
protected int |
id |
static int |
MAX_DYNAMIC_PATH_COST_WEIGHT |
protected boolean |
onDynamicPath |
protected List<OptimizerNode.UnclosedBranchDescriptor> |
openBranches |
protected Set<FieldSet> |
uniqueFields |
Modifier | Constructor and Description |
---|---|
|
OptimizerNode(Operator<?> op)
Creates a new optimizer node that represents the given program operator.
|
protected |
OptimizerNode(OptimizerNode toCopy) |
Modifier and Type | Method and Description |
---|---|
abstract void |
accept(Visitor<OptimizerNode> visitor)
This method implements the visit of a depth-first graph traversing visitor.
|
void |
addBroadcastConnection(String name,
DagConnection broadcastConnection)
Adds the broadcast connection identified by the given
name to this node. |
protected void |
addClosedBranch(OptimizerNode alreadyClosed) |
protected void |
addClosedBranches(Set<OptimizerNode> alreadyClosed) |
void |
addOutgoingConnection(DagConnection connection)
Adds a new outgoing connection to this node.
|
protected boolean |
areBranchCompatible(PlanNode plan1,
PlanNode plan2)
Checks whether to candidate plans for the sub-plan of this node are comparable.
|
void |
clearInterestingProperties() |
abstract void |
computeInterestingPropertiesForInputs(CostEstimator estimator)
Tells the node to compute the interesting properties for its inputs.
|
protected abstract void |
computeOperatorSpecificDefaultEstimates(DataStatistics statistics) |
void |
computeOutputEstimates(DataStatistics statistics)
Causes this node to compute its output estimates (such as number of rows, size in bytes)
based on the inputs and the compiler hints.
|
abstract void |
computeUnclosedBranchStack()
This method causes the node to compute the description of open branches in its sub-plan.
|
protected List<OptimizerNode.UnclosedBranchDescriptor> |
computeUnclosedBranchStackForBroadcastInputs(List<OptimizerNode.UnclosedBranchDescriptor> branchesSoFar) |
void |
computeUnionOfInterestingPropertiesFromSuccessors()
Computes all the interesting properties that are relevant to this node.
|
abstract List<PlanNode> |
getAlternativePlans(CostEstimator estimator)
Computes the plan alternatives for this node, an implicitly for all nodes that are children
of this node.
|
protected List<OptimizerNode.UnclosedBranchDescriptor> |
getBranchesForParent(DagConnection toParent) |
List<String> |
getBroadcastConnectionNames()
Return the list of names associated with broadcast inputs for this node.
|
List<DagConnection> |
getBroadcastConnections()
Return the list of inputs associated with broadcast variables for this node.
|
Set<OptimizerNode> |
getClosedBranchingNodes() |
int |
getCostWeight() |
Iterable<DumpableConnection<OptimizerNode>> |
getDumpableInputs() |
float |
getEstimatedAvgWidthPerOutputRecord()
Gets the estimated number of bytes per record.
|
long |
getEstimatedNumRecords()
Gets the estimated number of records in the output of this node.
|
long |
getEstimatedOutputSize()
Gets the estimated output size from this node.
|
int |
getId()
Gets the ID of this node.
|
abstract List<DagConnection> |
getIncomingConnections()
Gets all incoming connections of this node.
|
InterestingProperties |
getInterestingProperties()
Gets the properties that are interesting for this node to produce.
|
int |
getMaxDepth() |
long |
getMinimalMemoryAcrossAllSubTasks()
Gets the amount of memory that all subtasks of this task have jointly available.
|
List<OptimizerNode.UnclosedBranchDescriptor> |
getOpenBranches() |
Operator<?> |
getOperator()
Gets the operator represented by this optimizer node.
|
abstract String |
getOperatorName()
Gets the name of this node, which is the name of the function/operator, or data source / data
sink.
|
OptimizerNode |
getOptimizerNode() |
List<DagConnection> |
getOutgoingConnections()
The list of outgoing connections from this node to succeeding tasks.
|
int |
getParallelism()
Gets the parallelism for the operator represented by this optimizer node.
|
PlanNode |
getPlanNode() |
Iterable<OptimizerNode> |
getPredecessors()
Gets an iterator over the predecessors.
|
abstract SemanticProperties |
getSemanticProperties() |
Set<FieldSet> |
getUniqueFields()
Gets the FieldSets which are unique in the output of the node.
|
boolean |
hasUnclosedBranches() |
boolean |
haveAllOutputConnectionInterestingProperties()
Checks, if all outgoing connections have their interesting properties set from their target
nodes.
|
void |
identifyDynamicPath(int costWeight) |
void |
initId(int id)
Sets the ID of this node.
|
boolean |
isBranching()
Checks whether this node has branching output.
|
boolean |
isOnDynamicPath() |
void |
markAllOutgoingConnectionsAsPipelineBreaking() |
protected boolean |
mergeLists(List<OptimizerNode.UnclosedBranchDescriptor> child1open,
List<OptimizerNode.UnclosedBranchDescriptor> child2open,
List<OptimizerNode.UnclosedBranchDescriptor> result,
boolean markJoinedBranchesAsPipelineBreaking)
The node IDs are assigned in graph-traversal order (pre-order), hence, each list is sorted by
ID in ascending order and all consecutive lists start with IDs in ascending order.
|
protected void |
prunePlanAlternatives(List<PlanNode> plans) |
protected void |
prunePlanAlternativesWithCommonBranching(List<PlanNode> plans) |
protected void |
readStubAnnotations()
Reads all stub annotations, i.e. which fields remain constant, what cardinality bounds the
functions have, which fields remain unique.
|
protected void |
readUniqueFieldsAnnotation() |
protected void |
removeClosedBranches(List<OptimizerNode.UnclosedBranchDescriptor> openList) |
void |
setBroadcastInputs(Map<Operator<?>,OptimizerNode> operatorToNode,
ExecutionMode defaultExchangeMode)
This function connects the operators that produce the broadcast inputs to this operator.
|
void |
setEstimatedNumRecords(long estimatedNumRecords) |
void |
setEstimatedOutputSize(long estimatedOutputSize) |
abstract void |
setInput(Map<Operator<?>,OptimizerNode> operatorToNode,
ExecutionMode defaultExchangeMode)
This function connects the predecessors to this operator.
|
void |
setParallelism(int parallelism)
Sets the parallelism for this optimizer node.
|
String |
toString() |
public static final int MAX_DYNAMIC_PATH_COST_WEIGHT
protected List<OptimizerNode.UnclosedBranchDescriptor> openBranches
protected Set<OptimizerNode> closedBranchingNodes
protected List<OptimizerNode> hereJoinedBranches
protected long estimatedOutputSize
protected long estimatedNumRecords
protected int id
protected int costWeight
protected boolean onDynamicPath
public OptimizerNode(Operator<?> op)
op
- The operator that the node represents.protected OptimizerNode(OptimizerNode toCopy)
public abstract String getOperatorName()
public abstract void setInput(Map<Operator<?>,OptimizerNode> operatorToNode, ExecutionMode defaultExchangeMode)
operatorToNode
- The map from program operators to optimizer nodes.defaultExchangeMode
- The data exchange mode to use, if the operator does not specify
one.public void setBroadcastInputs(Map<Operator<?>,OptimizerNode> operatorToNode, ExecutionMode defaultExchangeMode)
operatorToNode
- The map from program operators to optimizer nodes.defaultExchangeMode
- The data exchange mode to use, if the operator does not specify
one.CompilerException
public abstract List<DagConnection> getIncomingConnections()
public abstract void computeInterestingPropertiesForInputs(CostEstimator estimator)
estimator
- The CostEstimator
instance to use for plan cost estimation.public abstract void computeUnclosedBranchStack()
openBranches
field to a stack of unclosed branches, the latest one top. A branch is considered closed, if
some later node sees all of the branching node's outputs, no matter if there have been more
branches to different paths in the meantime.protected List<OptimizerNode.UnclosedBranchDescriptor> computeUnclosedBranchStackForBroadcastInputs(List<OptimizerNode.UnclosedBranchDescriptor> branchesSoFar)
public abstract List<PlanNode> getAlternativePlans(CostEstimator estimator)
getAlternatives()
on its
children to get their plan alternatives, and build its own alternatives on top of those.estimator
- The cost estimator used to estimate the costs of each plan alternative.public abstract void accept(Visitor<OptimizerNode> visitor)
preVisit()
method, then hand the visitor to their children, and
finally call the postVisit()
method.accept
in interface Visitable<OptimizerNode>
visitor
- The graph traversing visitor.Visitable.accept(org.apache.flink.util.Visitor)
public abstract SemanticProperties getSemanticProperties()
public Iterable<OptimizerNode> getPredecessors()
DumpableNode
getPredecessors
in interface DumpableNode<OptimizerNode>
public int getId()
public void initId(int id)
id
- The id for this node.public void addBroadcastConnection(String name, DagConnection broadcastConnection)
name
to this node.broadcastConnection
- The connection to add.public List<String> getBroadcastConnectionNames()
public List<DagConnection> getBroadcastConnections()
public void addOutgoingConnection(DagConnection connection)
connection
- The connection to add.public List<DagConnection> getOutgoingConnections()
public Operator<?> getOperator()
public int getParallelism()
ExecutionConfig.PARALLELISM_DEFAULT
then the system will take the
default number of parallel instances.public void setParallelism(int parallelism)
parallelism
- The parallelism to set. If this value is ExecutionConfig.PARALLELISM_DEFAULT
then the system will take the default number of
parallel instances.IllegalArgumentException
- If the parallelism is smaller than one.public long getMinimalMemoryAcrossAllSubTasks()
public boolean isOnDynamicPath()
public void identifyDynamicPath(int costWeight)
public int getCostWeight()
public int getMaxDepth()
public InterestingProperties getInterestingProperties()
public long getEstimatedOutputSize()
EstimateProvider
getEstimatedOutputSize
in interface EstimateProvider
public long getEstimatedNumRecords()
EstimateProvider
getEstimatedNumRecords
in interface EstimateProvider
public void setEstimatedOutputSize(long estimatedOutputSize)
public void setEstimatedNumRecords(long estimatedNumRecords)
public float getEstimatedAvgWidthPerOutputRecord()
EstimateProvider
getEstimatedAvgWidthPerOutputRecord
in interface EstimateProvider
public boolean isBranching()
public void markAllOutgoingConnectionsAsPipelineBreaking()
public boolean haveAllOutputConnectionInterestingProperties()
public void computeUnionOfInterestingPropertiesFromSuccessors()
public void clearInterestingProperties()
public void computeOutputEstimates(DataStatistics statistics)
statistics
- The statistics object which may be accessed to get statistical information.
The parameter may be null, if no statistics are available.protected abstract void computeOperatorSpecificDefaultEstimates(DataStatistics statistics)
protected void readStubAnnotations()
protected void readUniqueFieldsAnnotation()
public Set<FieldSet> getUniqueFields()
protected void prunePlanAlternativesWithCommonBranching(List<PlanNode> plans)
public boolean hasUnclosedBranches()
public Set<OptimizerNode> getClosedBranchingNodes()
public List<OptimizerNode.UnclosedBranchDescriptor> getOpenBranches()
protected List<OptimizerNode.UnclosedBranchDescriptor> getBranchesForParent(DagConnection toParent)
protected void removeClosedBranches(List<OptimizerNode.UnclosedBranchDescriptor> openList)
protected void addClosedBranches(Set<OptimizerNode> alreadyClosed)
protected void addClosedBranch(OptimizerNode alreadyClosed)
protected boolean areBranchCompatible(PlanNode plan1, PlanNode plan2)
a) There is no branch in the sub-plan of this node b) Both candidates have the same candidate as the child at the last open branch.
plan1
- The root node of the first candidate plan.plan2
- The root node of the second candidate plan.protected final boolean mergeLists(List<OptimizerNode.UnclosedBranchDescriptor> child1open, List<OptimizerNode.UnclosedBranchDescriptor> child2open, List<OptimizerNode.UnclosedBranchDescriptor> result, boolean markJoinedBranchesAsPipelineBreaking)
markJoinedBranchesAsPipelineBreaking
- True, if thepublic OptimizerNode getOptimizerNode()
getOptimizerNode
in interface DumpableNode<OptimizerNode>
public PlanNode getPlanNode()
getPlanNode
in interface DumpableNode<OptimizerNode>
public Iterable<DumpableConnection<OptimizerNode>> getDumpableInputs()
getDumpableInputs
in interface DumpableNode<OptimizerNode>
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.