@Internal public class FileSystemCommitter extends Object
It's used to commit data to FileSystem table in batch mode.
Data consistency: 1.For task failure: will launch a new task and create a PartitionTempFileManager
, this will clean previous temporary files (This simple design can make
it easy to delete the invalid temporary directory of the task, but it also causes that our
directory does not support the same task to start multiple backups to run). 2.For job master
commit failure when overwrite: this may result in unfinished intermediate results, but if we try
to run job again, the final result must be correct (because the intermediate result will be
overwritten). 3.For job master commit failure when append: This can lead to inconsistent data.
But, considering that the commit action is a single point of execution, and only moves files and
updates metadata, it will be faster, so the probability of inconsistency is relatively small.
Constructor and Description |
---|
FileSystemCommitter(FileSystemFactory factory,
TableMetaStoreFactory metaStoreFactory,
boolean overwrite,
Path tmpPath,
int partitionColumnSize,
boolean isToLocal,
ObjectIdentifier identifier,
LinkedHashMap<String,String> staticPartitions,
List<PartitionCommitPolicy> policies) |
Modifier and Type | Method and Description |
---|---|
void |
commitPartitions()
For committing job's output after successful batch job completion.
|
void |
commitPartitions(BiPredicate<Integer,Integer> taskAttemptFilter)
Commits the partitions with a filter to filter out invalid task attempt files.
|
void |
commitPartitionsWithFiles(Map<String,List<Path>> partitionsFiles)
For committing job's output after successful batch job completion, it will commit with the
given partitions and corresponding files written which means it'll move the temporary files
to partition's location.
|
public FileSystemCommitter(FileSystemFactory factory, TableMetaStoreFactory metaStoreFactory, boolean overwrite, Path tmpPath, int partitionColumnSize, boolean isToLocal, ObjectIdentifier identifier, LinkedHashMap<String,String> staticPartitions, List<PartitionCommitPolicy> policies)
public void commitPartitions() throws Exception
Exception
public void commitPartitions(BiPredicate<Integer,Integer> taskAttemptFilter) throws Exception
taskAttemptFilter
- the filter that accepts subtaskIndex and attemptNumberException
- if partition commitment failspublic void commitPartitionsWithFiles(Map<String,List<Path>> partitionsFiles) throws Exception
Exception
Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.