implicit class DataSetUtils[T] extends AnyRef
This class provides simple utility methods for zipping elements in a data set with an index or with a unique identifier, sampling elements from a data set.
- Annotations
- @PublicEvolving()
- Deprecated
All Flink Scala APIs are deprecated and will be removed in a future Flink major version. You can still build your application in Scala, but you should move to the Java version of either the DataStream and/or Table API.
- See also
- Alphabetic
- By Inheritance
- DataSetUtils
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
DataSetUtils(self: DataSet[T])(implicit arg0: TypeInformation[T], arg1: ClassTag[T])
- self
Data Set
- Deprecated
All Flink Scala APIs are deprecated and will be removed in a future Flink major version. You can still build your application in Scala, but you should move to the Java version of either the DataStream and/or Table API.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
checksumHashCode(): ChecksumHashCode
Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).
Convenience method to get the count (number of elements) of a DataSet as well as the checksum (sum over element hashes).
- returns
A ChecksumHashCode with the count and checksum of elements in the data set.
- See also
org.apache.flink.api.java.Utils.ChecksumHashCodeHelper
-
def
clone(): AnyRef
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
countElementsPerPartition: DataSet[(Int, Long)]
Method that goes over all the elements in each partition in order to retrieve the total number of elements.
Method that goes over all the elements in each partition in order to retrieve the total number of elements.
- returns
a data set of tuple2 consisting of (subtask index, number of elements mappings)
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[java.lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
partitionByRange[K](distribution: DataDistribution, fun: (T) ⇒ K)(implicit arg0: TypeInformation[K]): DataSet[T]
Range-partitions a DataSet using the specified key selector function.
-
def
partitionByRange(distribution: DataDistribution, firstField: String, otherFields: String*): DataSet[T]
Range-partitions a DataSet on the specified fields.
-
def
partitionByRange(distribution: DataDistribution, fields: Int*): DataSet[T]
Range-partitions a DataSet on the specified tuple field positions.
-
def
sample(withReplacement: Boolean, fraction: Double, seed: Long = Utils.RNG.nextLong()): DataSet[T]
Generate a sample of DataSet by the probability fraction of each element.
Generate a sample of DataSet by the probability fraction of each element.
- withReplacement
Whether element can be selected more than once.
- fraction
Probability that each element is chosen, should be [0,1] without replacement, and [0, ∞) with replacement. While fraction is larger than 1, the elements are expected to be selected multi times into sample on average.
- seed
Random number generator seed.
- returns
The sampled DataSet
-
def
sampleWithSize(withReplacement: Boolean, numSamples: Int, seed: Long = Utils.RNG.nextLong()): DataSet[T]
Generate a sample of DataSet with fixed sample size.
Generate a sample of DataSet with fixed sample size.
NOTE: Sample with fixed size is not as efficient as sample with fraction, use sample with fraction unless you need exact precision.
- withReplacement
Whether element can be selected more than once.
- numSamples
The expected sample size.
- seed
Random number generator seed.
- returns
The sampled DataSet
- val self: DataSet[T]
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @throws( ... )
-
def
zipWithIndex: DataSet[(Long, T)]
Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.
Method that takes a set of subtask index, total number of elements mappings and assigns ids to all the elements from the input data set.
- returns
a data set of tuple 2 consisting of consecutive ids and initial values.
-
def
zipWithUniqueId: DataSet[(Long, T)]
Method that assigns a unique id to all the elements of the input data set.
Method that assigns a unique id to all the elements of the input data set.
- returns
a data set of tuple 2 consisting of ids and initial values.