Class Plan
- java.lang.Object
-
- org.apache.flink.api.common.Plan
-
-
Field Summary
Fields Modifier and Type Field Description protected HashMap<String,DistributedCache.DistributedCacheEntry>
cacheFile
Hash map for files in the distributed cache: registered name to cache entry.protected int
defaultParallelism
The default parallelism to use for nodes that have no explicitly specified parallelism.protected ExecutionConfig
executionConfig
Config object for runtime execution parameters.protected String
jobName
The name of the job.protected List<GenericDataSinkBase<?>>
sinks
A collection of all sinks in the plan.
-
Constructor Summary
Constructors Constructor Description Plan(Collection<? extends GenericDataSinkBase<?>> sinks)
Creates a new program plan, describing the data flow that ends at the given data sinks.Plan(Collection<? extends GenericDataSinkBase<?>> sinks, int defaultParallelism)
Creates a new program plan with the given default parallelism, describing the data flow that ends at the given data sinks.Plan(Collection<? extends GenericDataSinkBase<?>> sinks, String jobName)
Creates a new dataflow plan with the given name, describing the data flow that ends at the given data sinks.Plan(Collection<? extends GenericDataSinkBase<?>> sinks, String jobName, int defaultParallelism)
Creates a new program plan with the given name and default parallelism, describing the data flow that ends at the given data sinks.Plan(GenericDataSinkBase<?> sink)
Creates a new program plan with single data sink.Plan(GenericDataSinkBase<?> sink, int defaultParallelism)
Creates a new program plan with single data sink and the given default parallelism.Plan(GenericDataSinkBase<?> sink, String jobName)
Creates a new program plan with the given name, containing initially a single data sink.Plan(GenericDataSinkBase<?> sink, String jobName, int defaultParallelism)
Creates a new program plan with the given name and default parallelism, containing initially a single data sink.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
accept(Visitor<Operator<?>> visitor)
Traverses the job depth first from all data sinks on towards the sources.void
addDataSink(GenericDataSinkBase<?> sink)
Adds a data sink to the set of sinks in this program.Set<Map.Entry<String,DistributedCache.DistributedCacheEntry>>
getCachedFiles()
Return the registered cached files.Collection<? extends GenericDataSinkBase<?>>
getDataSinks()
Gets all the data sinks of this job.int
getDefaultParallelism()
Gets the default parallelism for this job.ExecutionConfig
getExecutionConfig()
Gets the execution config object.Configuration
getJobConfiguration()
JobID
getJobId()
Gets the ID of the job that the dataflow plan belongs to.String
getJobName()
Gets the name of this job.int
getMaximumParallelism()
String
getPostPassClassName()
Gets the optimizer post-pass class for this job.void
registerCachedFile(String name, DistributedCache.DistributedCacheEntry entry)
Register cache files at program level.void
setDefaultParallelism(int defaultParallelism)
Sets the default parallelism for this plan.void
setExecutionConfig(ExecutionConfig executionConfig)
Sets the runtime config object defining execution parameters.void
setJobConfiguration(Configuration jobConfiguration)
void
setJobId(JobID jobId)
Sets the ID of the job that the dataflow plan belongs to.void
setJobName(String jobName)
Sets the jobName for this Plan.
-
-
-
Field Detail
-
sinks
protected final List<GenericDataSinkBase<?>> sinks
A collection of all sinks in the plan. Since the plan is traversed from the sinks to the sources, this collection must contain all the sinks.
-
jobName
protected String jobName
The name of the job.
-
defaultParallelism
protected int defaultParallelism
The default parallelism to use for nodes that have no explicitly specified parallelism.
-
cacheFile
protected HashMap<String,DistributedCache.DistributedCacheEntry> cacheFile
Hash map for files in the distributed cache: registered name to cache entry.
-
executionConfig
protected ExecutionConfig executionConfig
Config object for runtime execution parameters.
-
-
Constructor Detail
-
Plan
public Plan(Collection<? extends GenericDataSinkBase<?>> sinks, String jobName)
Creates a new dataflow plan with the given name, describing the data flow that ends at the given data sinks.If not all of the sinks of a data flow are given to the plan, the flow might not be translated entirely.
- Parameters:
sinks
- The collection will the sinks of the job's data flow.jobName
- The name to display for the job.
-
Plan
public Plan(Collection<? extends GenericDataSinkBase<?>> sinks, String jobName, int defaultParallelism)
Creates a new program plan with the given name and default parallelism, describing the data flow that ends at the given data sinks.If not all of the sinks of a data flow are given to the plan, the flow might not be translated entirely.
- Parameters:
sinks
- The collection will the sinks of the job's data flow.jobName
- The name to display for the job.defaultParallelism
- The default parallelism for the job.
-
Plan
public Plan(GenericDataSinkBase<?> sink, String jobName)
Creates a new program plan with the given name, containing initially a single data sink.If not all of the sinks of a data flow are given, the flow might not be translated entirely, but only the parts of the flow reachable by traversing backwards from the given data sinks.
- Parameters:
sink
- The data sink of the data flow.jobName
- The name to display for the job.
-
Plan
public Plan(GenericDataSinkBase<?> sink, String jobName, int defaultParallelism)
Creates a new program plan with the given name and default parallelism, containing initially a single data sink.If not all of the sinks of a data flow are given, the flow might not be translated entirely, but only the parts of the flow reachable by traversing backwards from the given data sinks.
- Parameters:
sink
- The data sink of the data flow.jobName
- The name to display for the job.defaultParallelism
- The default parallelism for the job.
-
Plan
public Plan(Collection<? extends GenericDataSinkBase<?>> sinks)
Creates a new program plan, describing the data flow that ends at the given data sinks. The display name for the job is generated using a timestamp.If not all of the sinks of a data flow are given, the flow might not be translated entirely, but only the parts of the flow reachable by traversing backwards from the given data sinks.
- Parameters:
sinks
- The collection will the sinks of the data flow.
-
Plan
public Plan(Collection<? extends GenericDataSinkBase<?>> sinks, int defaultParallelism)
Creates a new program plan with the given default parallelism, describing the data flow that ends at the given data sinks. The display name for the job is generated using a timestamp.If not all of the sinks of a data flow are given, the flow might not be translated entirely, but only the parts of the flow reachable by traversing backwards from the given data sinks.
- Parameters:
sinks
- The collection will the sinks of the data flow.defaultParallelism
- The default parallelism for the job.
-
Plan
public Plan(GenericDataSinkBase<?> sink)
Creates a new program plan with single data sink. The display name for the job is generated using a timestamp.If not all of the sinks of a data flow are given to the plan, the flow might not be translated entirely.
- Parameters:
sink
- The data sink of the data flow.
-
Plan
public Plan(GenericDataSinkBase<?> sink, int defaultParallelism)
Creates a new program plan with single data sink and the given default parallelism. The display name for the job is generated using a timestamp.If not all of the sinks of a data flow are given to the plan, the flow might not be translated entirely.
- Parameters:
sink
- The data sink of the data flow.defaultParallelism
- The default parallelism for the job.
-
-
Method Detail
-
addDataSink
public void addDataSink(GenericDataSinkBase<?> sink)
Adds a data sink to the set of sinks in this program.- Parameters:
sink
- The data sink to add.
-
getDataSinks
public Collection<? extends GenericDataSinkBase<?>> getDataSinks()
Gets all the data sinks of this job.- Returns:
- All sinks of the program.
-
getJobName
public String getJobName()
Gets the name of this job.- Returns:
- The name of the job.
-
setJobName
public void setJobName(String jobName)
Sets the jobName for this Plan.- Parameters:
jobName
- The jobName to set.
-
getJobId
public JobID getJobId()
Gets the ID of the job that the dataflow plan belongs to. If this ID is not set, then the dataflow represents its own independent job.- Returns:
- The ID of the job that the dataflow plan belongs to.
-
setJobId
public void setJobId(JobID jobId)
Sets the ID of the job that the dataflow plan belongs to. If this ID is set tonull
, then the dataflow represents its own independent job.- Parameters:
jobId
- The ID of the job that the dataflow plan belongs to.
-
getDefaultParallelism
public int getDefaultParallelism()
Gets the default parallelism for this job. That degree is always used when an operator is not explicitly given a parallelism.- Returns:
- The default parallelism for the plan.
-
setDefaultParallelism
public void setDefaultParallelism(int defaultParallelism)
Sets the default parallelism for this plan. That degree is always used when an operator is not explicitly given a parallelism.- Parameters:
defaultParallelism
- The default parallelism for the plan.
-
getPostPassClassName
public String getPostPassClassName()
Gets the optimizer post-pass class for this job. The post-pass typically creates utility classes for data types and is specific to a particular data model (record, tuple, Scala, ...)- Returns:
- The name of the class implementing the optimizer post-pass.
-
getExecutionConfig
public ExecutionConfig getExecutionConfig()
Gets the execution config object.- Returns:
- The execution config object.
-
setExecutionConfig
public void setExecutionConfig(ExecutionConfig executionConfig)
Sets the runtime config object defining execution parameters.- Parameters:
executionConfig
- The execution config to use.
-
accept
public void accept(Visitor<Operator<?>> visitor)
Traverses the job depth first from all data sinks on towards the sources.- Specified by:
accept
in interfaceVisitable<Operator<?>>
- Parameters:
visitor
- The visitor to be called with this object as the parameter.- See Also:
Visitable.accept(Visitor)
-
registerCachedFile
public void registerCachedFile(String name, DistributedCache.DistributedCacheEntry entry) throws IOException
Register cache files at program level.- Parameters:
entry
- contains all relevant informationname
- user defined name of that file- Throws:
IOException
-
getCachedFiles
public Set<Map.Entry<String,DistributedCache.DistributedCacheEntry>> getCachedFiles()
Return the registered cached files.- Returns:
- Set of (name, filePath) pairs
-
getMaximumParallelism
public int getMaximumParallelism()
-
setJobConfiguration
public void setJobConfiguration(Configuration jobConfiguration)
-
getJobConfiguration
public Configuration getJobConfiguration()
-
-