DataGeneratorSource (Flink : 2.0-SNAPSHOT API)

java.lang.Object
- org.apache.flink.connector.datagen.source.DataGeneratorSource<OUT>

All Implemented Interfaces:

Serializable, Source<OUT,NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>, SourceReaderFactory<OUT,NumberSequenceSource.NumberSequenceSplit>, ResultTypeQueryable<OUT>, OutputTypeConfigurable<OUT>
```
@Experimental
public class DataGeneratorSource<OUT>
extends Object
implements Source<OUT,NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>, ResultTypeQueryable<OUT>, OutputTypeConfigurable<OUT>
```
A data source that produces N data points in parallel. This source is useful for testing and for cases that just need a stream of N events of any kind.
The source splits the sequence into as many parallel sub-sequences as there are parallel source readers.
Users can supply a GeneratorFunction for mapping the (sub-)sequences of Long values into the generated events. For instance, the following code will produce the sequence of ["Number: 0", "Number: 1", ... , "Number: 999"] elements.
```
 GeneratorFunction<Long, String> generatorFunction = index -> "Number: " + index;

 DataGeneratorSource<String> source =
         new DataGeneratorSource<>(generatorFunction, 1000, Types.STRING);

 DataStreamSource<String> stream =
         env.fromSource(source,
         WatermarkStrategy.noWatermarks(),
         "Generator Source");
 
```
The order of elements depends on the parallelism. Each sub-sequence will be produced in order. Consequently, if the parallelism is limited to one, this will produce one sequence in order from "Number: 0" to "Number: 999".
Note that this approach also makes it possible to produce deterministic watermarks at the source based on the generated events and a custom WatermarkStrategy.
This source has built-in support for rate limiting. The following code will produce an effectively unbounded (Long.MAX_VALUE from practical perspective will never be reached) stream of Long values at the overall source rate (across all source subtasks) of 100 events per second.
```
 GeneratorFunction<Long, Long> generatorFunction = index -> index;

 DataGeneratorSource<String> source =
         new DataGeneratorSource<>(
              generatorFunctionStateless,
              Long.MAX_VALUE,
              RateLimiterStrategy.perSecond(100),
              Types.STRING);
 
```
This source is always bounded. For very long sequences (for example when the count is set to Long.MAX_VALUE), users may want to consider executing the application in a streaming manner, because, despite the fact that the produced stream is bounded, the end bound is pretty far away.
See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description
`DataGeneratorSource(GeneratorFunction<Long,OUT> generatorFunction, long count, RateLimiterStrategy rateLimiterStrategy, TypeInformation<OUT> typeInfo)` Instantiates a new `DataGeneratorSource`.
`DataGeneratorSource(GeneratorFunction<Long,OUT> generatorFunction, long count, TypeInformation<OUT> typeInfo)` Instantiates a new `DataGeneratorSource`.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`SplitEnumerator<NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>`	`createEnumerator(SplitEnumeratorContext<NumberSequenceSource.NumberSequenceSplit> enumContext)` Creates a new SplitEnumerator for this source, starting a new input.
`SourceReader<OUT,NumberSequenceSource.NumberSequenceSplit>`	`createReader(SourceReaderContext readerContext)` Creates a new reader to read data from the splits it gets assigned.
`Boundedness`	`getBoundedness()` Get the boundedness of this source.
`SimpleVersionedSerializer<Collection<NumberSequenceSource.NumberSequenceSplit>>`	`getEnumeratorCheckpointSerializer()` Creates the serializer for the `SplitEnumerator` checkpoint.
`GeneratorFunction<Long,OUT>`	`getGeneratorFunction()`
`TypeInformation<OUT>`	`getProducedType()` Gets the data type (as a `TypeInformation`) produced by this function or input format.
`SimpleVersionedSerializer<NumberSequenceSource.NumberSequenceSplit>`	`getSplitSerializer()` Creates a serializer for the source splits.
`SplitEnumerator<NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>`	`restoreEnumerator(SplitEnumeratorContext<NumberSequenceSource.NumberSequenceSplit> enumContext, Collection<NumberSequenceSource.NumberSequenceSplit> checkpoint)` Restores an enumerator from a checkpoint.
`void`	`setOutputType(TypeInformation<OUT> outTypeInfo, ExecutionConfig executionConfig)` Is called by the `org.apache.flink.streaming.api.graph.StreamGraph#addOperator(Integer, String, StreamOperator, TypeInformation, TypeInformation, String)` method when the `org.apache.flink.streaming.api.graph.StreamGraph` is generated.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - DataGeneratorSource
```
public DataGeneratorSource(GeneratorFunction<Long,OUT> generatorFunction,
                           long count,
                           TypeInformation<OUT> typeInfo)
```
    Instantiates a new DataGeneratorSource.
    
    Parameters:
    
    generatorFunction - The GeneratorFunction function.
    
    count - The number of generated data points.
    
    typeInfo - The type of the produced data points.
  - DataGeneratorSource
```
public DataGeneratorSource(GeneratorFunction<Long,OUT> generatorFunction,
                           long count,
                           RateLimiterStrategy rateLimiterStrategy,
                           TypeInformation<OUT> typeInfo)
```
    Instantiates a new DataGeneratorSource.
    
    Parameters:
    
    generatorFunction - The GeneratorFunction function.
    
    count - The number of generated data points.
    
    rateLimiterStrategy - The strategy for rate limiting.
    
    typeInfo - The type of the produced data points.
- Method Detail
  - setOutputType
```
public void setOutputType(TypeInformation<OUT> outTypeInfo,
                          ExecutionConfig executionConfig)
```
    Description copied from interface: OutputTypeConfigurable
    
    Is called by the org.apache.flink.streaming.api.graph.StreamGraph#addOperator(Integer, String, StreamOperator, TypeInformation, TypeInformation, String) method when the org.apache.flink.streaming.api.graph.StreamGraph is generated. The method is called with the output TypeInformation which is also used for the org.apache.flink.streaming.runtime.tasks.StreamTask output serializer.
    
    Specified by:
    
    setOutputType in interface OutputTypeConfigurable<OUT>
    
    Parameters:
    
    outTypeInfo - Output type information of the org.apache.flink.streaming.runtime.tasks.StreamTask
    
    executionConfig - Execution configuration
  - getGeneratorFunction
```
@VisibleForTesting
public GeneratorFunction<Long,OUT> getGeneratorFunction()
```
  - getProducedType
```
public TypeInformation<OUT> getProducedType()
```
    Description copied from interface: ResultTypeQueryable
    
    Gets the data type (as a TypeInformation) produced by this function or input format.
    
    Specified by:
    
    getProducedType in interface ResultTypeQueryable<OUT>
    
    Returns:
    
    The data type produced by this function or input format.
  - getBoundedness
```
public Boundedness getBoundedness()
```
    Description copied from interface: Source
    
    Get the boundedness of this source.
    
    Specified by:
    
    getBoundedness in interface Source<OUT,NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>
    
    Returns:
    
    the boundedness of this source.
  - createReader
```
public SourceReader<OUT,NumberSequenceSource.NumberSequenceSplit> createReader(SourceReaderContext readerContext)
                                                                        throws Exception
```
    Description copied from interface: SourceReaderFactory
    
    Creates a new reader to read data from the splits it gets assigned. The reader starts fresh and does not have any state to resume.
    
    Specified by:
    
    createReader in interface SourceReaderFactory<OUT,NumberSequenceSource.NumberSequenceSplit>
    
    Parameters:
    
    readerContext - The context for the source reader.
    
    Returns:
    
    A new SourceReader.
    
    Throws:
    
    Exception - The implementor is free to forward all exceptions directly. Exceptions thrown from this method cause task failure/recovery.
  - restoreEnumerator
```
public SplitEnumerator<NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>> restoreEnumerator(SplitEnumeratorContext<NumberSequenceSource.NumberSequenceSplit> enumContext,
                                                                                                                                        Collection<NumberSequenceSource.NumberSequenceSplit> checkpoint)
```
    Description copied from interface: Source
    
    Restores an enumerator from a checkpoint.
    
    Specified by:
    
    restoreEnumerator in interface Source<OUT,NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>
    
    Parameters:
    
    enumContext - The context for the restored split enumerator.
    
    checkpoint - The checkpoint to restore the SplitEnumerator from.
    
    Returns:
    
    A SplitEnumerator restored from the given checkpoint.
  - createEnumerator
```
public SplitEnumerator<NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>> createEnumerator(SplitEnumeratorContext<NumberSequenceSource.NumberSequenceSplit> enumContext)
```
    Description copied from interface: Source
    
    Creates a new SplitEnumerator for this source, starting a new input.
    
    Specified by:
    
    createEnumerator in interface Source<OUT,NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>
    
    Parameters:
    
    enumContext - The context for the split enumerator.
    
    Returns:
    
    A new SplitEnumerator.
  - getSplitSerializer
```
public SimpleVersionedSerializer<NumberSequenceSource.NumberSequenceSplit> getSplitSerializer()
```
    Description copied from interface: Source
    
    Creates a serializer for the source splits. Splits are serialized when sending them from enumerator to reader, and when checkpointing the reader's current state.
    
    Specified by:
    
    getSplitSerializer in interface Source<OUT,NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>
    
    Returns:
    
    The serializer for the split type.
  - getEnumeratorCheckpointSerializer
```
public SimpleVersionedSerializer<Collection<NumberSequenceSource.NumberSequenceSplit>> getEnumeratorCheckpointSerializer()
```
    Description copied from interface: Source
    
    Creates the serializer for the SplitEnumerator checkpoint. The serializer is used for the result of the SplitEnumerator.snapshotState(long) method.
    
    Specified by:
    
    getEnumeratorCheckpointSerializer in interface Source<OUT,NumberSequenceSource.NumberSequenceSplit,Collection<NumberSequenceSource.NumberSequenceSplit>>
    
    Returns:
    
    The serializer for the SplitEnumerator checkpoint.

Back to Flink Website

Class DataGeneratorSource<OUT>

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

DataGeneratorSource

DataGeneratorSource

Method Detail

setOutputType

getGeneratorFunction

getProducedType

getBoundedness

createReader

restoreEnumerator

createEnumerator

getSplitSerializer

getEnumeratorCheckpointSerializer

Back to Flink Website