Class GroupReduceCombineDriver<IN,​OUT>

  • Type Parameters:
    IN - The data type consumed by the combiner.
    OUT - The data type produced by the combiner.
    All Implemented Interfaces:
    Driver<GroupCombineFunction<IN,​OUT>,​OUT>

    public class GroupReduceCombineDriver<IN,​OUT>
    extends Object
    implements Driver<GroupCombineFunction<IN,​OUT>,​OUT>
    Non-chained combine driver which is used for a CombineGroup transformation or a GroupReduce transformation where the user supplied a RichGroupReduceFunction with a combine method. The combining is performed in memory with a lazy approach which only combines elements which currently fit in the sorter. This may lead to a partial solution. In the case of the RichGroupReduceFunction this partial result is transformed into a proper deterministic result. The CombineGroup uses the GroupCombineFunction interface which allows to combine values of type IN to any type of type OUT. In contrast, the RichGroupReduceFunction requires the combine method to have the same input and output type to be able to reduce the elements after the combine from IN to OUT.

    The GroupReduceCombineDriver uses a combining iterator over its input. The output of the iterator is emitted.

    • Constructor Detail

      • GroupReduceCombineDriver

        public GroupReduceCombineDriver()
    • Method Detail

      • getNumberOfInputs

        public int getNumberOfInputs()
        Description copied from interface: Driver
        Gets the number of inputs that the task has.
        Specified by:
        getNumberOfInputs in interface Driver<IN,​OUT>
        Returns:
        The number of inputs.
      • getStubType

        public Class<GroupCombineFunction<IN,​OUT>> getStubType()
        Description copied from interface: Driver
        Gets the class of the stub type that is run by this task. For example, a MapTask should return MapFunction.class.
        Specified by:
        getStubType in interface Driver<IN,​OUT>
        Returns:
        The class of the stub type run by the task.
      • getNumberOfDriverComparators

        public int getNumberOfDriverComparators()
        Description copied from interface: Driver
        Gets the number of comparators required for this driver.
        Specified by:
        getNumberOfDriverComparators in interface Driver<IN,​OUT>
        Returns:
        The number of comparators required for this driver.
      • prepare

        public void prepare()
                     throws Exception
        Description copied from interface: Driver
        This method is called before the user code is opened. An exception thrown by this method signals failure of the task.
        Specified by:
        prepare in interface Driver<IN,​OUT>
        Throws:
        Exception - Exceptions may be forwarded and signal task failure.
      • run

        public void run()
                 throws Exception
        Description copied from interface: Driver
        The main operation method of the task. It should call the user code with the data subsets until the input is depleted.
        Specified by:
        run in interface Driver<IN,​OUT>
        Throws:
        Exception - Any exception thrown by this method signals task failure. Because exceptions in the user code typically signal situations where this instance in unable to proceed, exceptions from the user code should be forwarded.
      • cleanup

        public void cleanup()
                     throws Exception
        Description copied from interface: Driver
        This method is invoked in any case (clean termination and exception) at the end of the tasks operation.
        Specified by:
        cleanup in interface Driver<IN,​OUT>
        Throws:
        Exception - Exceptions may be forwarded.
      • cancel

        public void cancel()
        Description copied from interface: Driver
        This method is invoked when the driver must aborted in mid processing. It is invoked asynchronously by a different thread.
        Specified by:
        cancel in interface Driver<IN,​OUT>
      • getOversizedRecordCount

        public long getOversizedRecordCount()
        Gets the number of oversized records handled by this combiner.
        Returns:
        The number of oversized records handled by this combiner.