T
- The type of sample.@Internal public class ReservoirSamplerWithReplacement<T> extends DistributedRandomSampler<T>
ReservoirSamplerWithoutReplacement
. The main
difference is that, in the first phase, we generate weights for each element K times, so that
each element can get selected multiple times.
This implementation refers to the algorithm described in "Optimal Random Sampling from Distributed Streams Revisited".
emptyIntermediateIterable, numSamples
emptyIterable, EPSILON
Constructor and Description |
---|
ReservoirSamplerWithReplacement(int numSamples)
Create a sampler with fixed sample size and default random number generator.
|
ReservoirSamplerWithReplacement(int numSamples,
long seed)
Create a sampler with fixed sample size and random number generator seed.
|
ReservoirSamplerWithReplacement(int numSamples,
Random random)
Create a sampler with fixed sample size and random number generator.
|
Modifier and Type | Method and Description |
---|---|
Iterator<IntermediateSampleData<T>> |
sampleInPartition(Iterator<T> input)
Sample algorithm for the first phase.
|
sample, sampleInCoordinator
public ReservoirSamplerWithReplacement(int numSamples)
numSamples
- Number of selected elements, must be non-negative.public ReservoirSamplerWithReplacement(int numSamples, long seed)
numSamples
- Number of selected elements, must be non-negative.seed
- Random number generator seedpublic ReservoirSamplerWithReplacement(int numSamples, Random random)
numSamples
- Number of selected elements, must be non-negative.random
- Random number generatorpublic Iterator<IntermediateSampleData<T>> sampleInPartition(Iterator<T> input)
DistributedRandomSampler
sampleInPartition
in class DistributedRandomSampler<T>
input
- The DataSet input of each partition.Copyright © 2014–2021 The Apache Software Foundation. All rights reserved.