public interface Bucketer<T> extends Serializable
BucketingSink
to put emitted elements into rolling files.
The BucketingSink
can be writing to many buckets at a time, and it is responsible for managing
a set of active buckets. Whenever a new element arrives it will ask the Bucketer
for the bucket
path the element should fall in. The Bucketer
can, for example, determine buckets based on
system time.
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.fs.Path |
getBucketPath(Clock clock,
org.apache.hadoop.fs.Path basePath,
T element)
Returns the
Path of a bucket file. |
org.apache.hadoop.fs.Path getBucketPath(Clock clock, org.apache.hadoop.fs.Path basePath, T element)
Path
of a bucket file.basePath
- The base path containing all the buckets.element
- The current element being processed.Path
of the bucket which the provided element should fall in. This
should include the basePath
and also the subtaskIndex
to avoid clashes with
parallel sinks.Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.