T
- The type of the sampler.@Internal public class ReservoirSamplerWithoutReplacement<T> extends DistributedRandomSampler<T>
DistributedRandomSampler
interface. In
the first phase, we generate random numbers as the weights for each element and select top K
elements as the output of each partitions. In the second phase, we select top K elements from all
the outputs of the first phase.
This implementation refers to the algorithm described in "Optimal Random Sampling from Distributed Streams Revisited".
emptyIntermediateIterable, numSamples
emptyIterable, EPSILON
Constructor and Description |
---|
ReservoirSamplerWithoutReplacement(int numSamples)
Create a new sampler with reservoir size and a default random number generator.
|
ReservoirSamplerWithoutReplacement(int numSamples,
long seed)
Create a new sampler with reservoir size and the seed for random number generator.
|
ReservoirSamplerWithoutReplacement(int numSamples,
Random random)
Create a new sampler with reservoir size and a supplied random number generator.
|
Modifier and Type | Method and Description |
---|---|
Iterator<IntermediateSampleData<T>> |
sampleInPartition(Iterator<T> input)
Sample algorithm for the first phase.
|
sample, sampleInCoordinator
public ReservoirSamplerWithoutReplacement(int numSamples, Random random)
numSamples
- Maximum number of samples to retain in reservoir, must be non-negative.random
- Instance of random number generator for sampling.public ReservoirSamplerWithoutReplacement(int numSamples)
numSamples
- Maximum number of samples to retain in reservoir, must be non-negative.public ReservoirSamplerWithoutReplacement(int numSamples, long seed)
numSamples
- Maximum number of samples to retain in reservoir, must be non-negative.seed
- Random number generator seed.public Iterator<IntermediateSampleData<T>> sampleInPartition(Iterator<T> input)
DistributedRandomSampler
sampleInPartition
in class DistributedRandomSampler<T>
input
- The DataSet input of each partition.Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.