public class HiveSourceFileEnumerator extends Object implements FileEnumerator
FileEnumerator
implementation for hive source, which generates splits based on HiveTablePartition
s.Modifier and Type | Class and Description |
---|---|
static class |
HiveSourceFileEnumerator.PartitionFilesSizeCalculator
The calculator to calculate the total bytes with weight for a partition.
|
static class |
HiveSourceFileEnumerator.Provider
A factory to create
HiveSourceFileEnumerator . |
Constructor and Description |
---|
HiveSourceFileEnumerator(List<HiveTablePartition> partitions,
org.apache.hadoop.mapred.JobConf jobConf) |
Modifier and Type | Method and Description |
---|---|
static List<HiveSourceSplit> |
createInputSplits(int minNumSplits,
List<HiveTablePartition> partitions,
org.apache.hadoop.mapred.JobConf jobConf,
boolean isForParallelismInfer) |
Collection<FileSourceSplit> |
enumerateSplits(Path[] paths,
int minDesiredSplits)
Generates all file splits for the relevant files under the given paths.
|
static int |
getNumFiles(List<HiveTablePartition> partitions,
org.apache.hadoop.mapred.JobConf jobConf) |
public HiveSourceFileEnumerator(List<HiveTablePartition> partitions, org.apache.hadoop.mapred.JobConf jobConf)
public Collection<FileSourceSplit> enumerateSplits(Path[] paths, int minDesiredSplits) throws IOException
FileEnumerator
minDesiredSplits
is an optional hint indicating how many splits would be necessary to
exploit parallelism properly.enumerateSplits
in interface FileEnumerator
IOException
public static List<HiveSourceSplit> createInputSplits(int minNumSplits, List<HiveTablePartition> partitions, org.apache.hadoop.mapred.JobConf jobConf, boolean isForParallelismInfer) throws IOException
IOException
public static int getNumFiles(List<HiveTablePartition> partitions, org.apache.hadoop.mapred.JobConf jobConf) throws IOException
IOException
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.