@Internal public class RowComparator extends CompositeTypeComparator<Row>
Row
.
Note: Since comparators are used only in DataSet API for batch use cases, this comparator
assumes the latest serialization format and ignores Row.getKind()
for simplicity of the
implementation and efficiency.
Constructor and Description |
---|
RowComparator(int arity,
int[] keyPositions,
TypeComparator<Object>[] comparators,
TypeSerializer<Object>[] serializers,
boolean[] orders)
General constructor for RowComparator.
|
Modifier and Type | Method and Description |
---|---|
int |
compare(Row first,
Row second)
Compares two records in object form.
|
int |
compareSerialized(DataInputView firstSource,
DataInputView secondSource)
Compares two records in serialized form.
|
int |
compareToReference(TypeComparator<Row> referencedComparator)
This method compares the element that has been set as reference in this type accessor, to the
element set as reference in the given type accessor.
|
TypeComparator<Row> |
duplicate()
Creates a copy of this class.
|
boolean |
equalToReference(Row candidate)
Checks, whether the given element is equal to the element that has been set as the comparison
reference in this comparator instance.
|
int |
extractKeys(Object record,
Object[] target,
int index)
Extracts the key fields from a record.
|
void |
getFlatComparator(List<TypeComparator> flatComparators) |
int |
getNormalizeKeyLen()
Gets the number of bytes that the normalized key would maximally take.
|
int |
hash(Row record)
Computes a hash value for the given record.
|
boolean |
invertNormalizedKey()
Flag whether normalized key comparisons should be inverted key should be interpreted
inverted, i.e. descending.
|
boolean |
isNormalizedKeyPrefixOnly(int keyBytes)
Checks, whether the given number of bytes for a normalized is only a prefix to determine the
order of elements of the data type for which this comparator provides the comparison methods.
|
void |
putNormalizedKey(Row record,
MemorySegment target,
int offset,
int numBytes)
Writes a normalized key for the given record into the target byte array, starting at the
specified position and writing exactly the given number of bytes.
|
Row |
readWithKeyDenormalization(Row reuse,
DataInputView source)
Reads the record back while de-normalizing the key fields.
|
void |
setReference(Row toCompare)
Sets the given element as the comparison reference for future calls to
TypeComparator.equalToReference(Object) and TypeComparator.compareToReference(TypeComparator) . |
boolean |
supportsNormalizedKey()
Checks whether the data type supports the creation of a normalized key for comparison.
|
boolean |
supportsSerializationWithKeyNormalization()
Check whether this comparator supports to serialize the record in a format that replaces its
keys by a normalized key.
|
void |
writeWithKeyNormalization(Row record,
DataOutputView target)
Writes the record in such a fashion that all keys are normalizing and at the beginning of the
serialized data.
|
getFlatComparators
compareAgainstReference, supportsCompareAgainstReference
public RowComparator(int arity, int[] keyPositions, TypeComparator<Object>[] comparators, TypeSerializer<Object>[] serializers, boolean[] orders)
arity
- the number of fields of the RowkeyPositions
- key positions describe which fields are keys in what ordercomparators
- non-null-aware comparators for the key fields, in the same order as the
key fieldsserializers
- serializers to deserialize the first n fields for comparisonorders
- sorting orders for the fieldspublic void getFlatComparator(List<TypeComparator> flatComparators)
getFlatComparator
in class CompositeTypeComparator<Row>
public int hash(Row record)
TypeComparator
The hash code is typically not used as it is in hash tables and for partitioning, but it is further scrambled to make sure that a projection of the hash values to a lower cardinality space is as results in a rather uniform value distribution. However, any collisions produced by this method cannot be undone. While it is NOT important to create hash codes that cover the full spectrum of bits in the integer, it IS important to avoid collisions when combining two value as much as possible.
hash
in class TypeComparator<Row>
record
- The record to be hashed.Object.hashCode()
public void setReference(Row toCompare)
TypeComparator
TypeComparator.equalToReference(Object)
and TypeComparator.compareToReference(TypeComparator)
. This method must
set the given element into this comparator instance's state. If the comparison happens on a
subset of the fields from the record, this method may extract those fields.
A typical example for checking the equality of two elements is the following:
E e1 = ...;
E e2 = ...;
TypeComparator<E> acc = ...;
acc.setReference(e1);
boolean equal = acc.equalToReference(e2);
The rational behind this method is that elements are typically compared using certain
features that are extracted from them, (such de-serializing as a subset of fields). When
setting the reference, this extraction happens. The extraction needs happen only once per
element, even though an element is often compared to multiple other elements, such as when
finding equal elements in the process of grouping the elements.setReference
in class TypeComparator<Row>
toCompare
- The element to set as the comparison reference.public boolean equalToReference(Row candidate)
TypeComparator
equalToReference
in class TypeComparator<Row>
candidate
- The candidate to check.TypeComparator.setReference(Object)
public int compareToReference(TypeComparator<Row> referencedComparator)
TypeComparator
e1
and e2
via a comparator, this method can be used the following way.
E e1 = ...;
E e2 = ...;
TypeComparator<E> acc1 = ...;
TypeComparator<E> acc2 = ...;
acc1.setReference(e1);
acc2.setReference(e2);
int comp = acc1.compareToReference(acc2);
The rational behind this method is that elements are typically compared using certain
features that are extracted from them, (such de-serializing as a subset of fields). When
setting the reference, this extraction happens. The extraction needs happen only once per
element, even though an element is typically compared to many other elements when
establishing a sorted order. The actual comparison performed by this method may be very
cheap, as it happens on the extracted features.compareToReference
in class TypeComparator<Row>
referencedComparator
- The type accessors where the element for comparison has been set
as reference.referencedAccessors
is
smaller than the reference value of this type accessor; a value greater than zero, if it
is larger; zero, if both are equal.TypeComparator.setReference(Object)
public int compare(Row first, Row second)
TypeComparator
Comparator.compare(Object, Object)
.compare
in class TypeComparator<Row>
first
- The first record.second
- The second record.Comparator.compare(Object, Object)
.Comparator.compare(Object, Object)
public int compareSerialized(DataInputView firstSource, DataInputView secondSource) throws IOException
TypeComparator
Comparator.compare(Object, Object)
.
This method may de-serialize the records or compare them directly based on their binary representation.
compareSerialized
in class TypeComparator<Row>
firstSource
- The input view containing the first record.secondSource
- The input view containing the second record.Comparator.compare(Object, Object)
.IOException
- Thrown, if any of the input views raised an exception when reading the
records.Comparator.compare(Object, Object)
public boolean supportsNormalizedKey()
TypeComparator
supportsNormalizedKey
in class TypeComparator<Row>
public boolean supportsSerializationWithKeyNormalization()
TypeComparator
supportsSerializationWithKeyNormalization
in class TypeComparator<Row>
public int getNormalizeKeyLen()
TypeComparator
Integer
.MAX_VALUE is interpreted as infinite.getNormalizeKeyLen
in class TypeComparator<Row>
public boolean isNormalizedKeyPrefixOnly(int keyBytes)
TypeComparator
isNormalizedKeyPrefixOnly
in class TypeComparator<Row>
public void putNormalizedKey(Row record, MemorySegment target, int offset, int numBytes)
TypeComparator
int byteI = bytes[i] & 0xFF;
If the meaningful part of the normalized key takes less than the given number of bytes,
then it must be padded. Padding is typically required for variable length data types, such as
strings. The padding uses a special character, either 0
or 0xff
, depending on
whether shorter values are sorted to the beginning or the end.
This method is similar to NormalizableKey.copyNormalizedKey(MemorySegment, int, int)
. In the
case that multiple fields of a record contribute to the normalized key, it is crucial that
the fields align on the byte field, i.e. that every field always takes up the exact same
number of bytes.
putNormalizedKey
in class TypeComparator<Row>
record
- The record for which to create the normalized key.target
- The byte array into which to write the normalized key bytes.offset
- The offset in the byte array, where to start writing the normalized key bytes.numBytes
- The number of bytes to be written exactly.NormalizableKey.copyNormalizedKey(MemorySegment, int, int)
public void writeWithKeyNormalization(Row record, DataOutputView target) throws IOException
TypeComparator
#supportsSerializationWithKeyNormalization()
allows to check
that.writeWithKeyNormalization
in class TypeComparator<Row>
record
- The record object into which to read the record data.target
- The stream to which to write the data,IOException
TypeComparator.supportsSerializationWithKeyNormalization()
,
TypeComparator.readWithKeyDenormalization(Object, DataInputView)
,
NormalizableKey.copyNormalizedKey(MemorySegment, int, int)
public Row readWithKeyDenormalization(Row reuse, DataInputView source) throws IOException
TypeComparator
#supportsSerializationWithKeyNormalization()
method.readWithKeyDenormalization
in class TypeComparator<Row>
reuse
- The reuse object into which to read the record data.source
- The stream from which to read the data,IOException
TypeComparator.supportsSerializationWithKeyNormalization()
,
TypeComparator.writeWithKeyNormalization(Object, DataOutputView)
,
NormalizableKey.copyNormalizedKey(MemorySegment, int, int)
public boolean invertNormalizedKey()
TypeComparator
invertNormalizedKey
in class TypeComparator<Row>
public TypeComparator<Row> duplicate()
TypeComparator
duplicate
in class TypeComparator<Row>
public int extractKeys(Object record, Object[] target, int index)
TypeComparator
extractKeys
in class TypeComparator<Row>
record
- The record that contains the key(s)target
- The array to write the key(s) into.index
- The offset of the target array to start writing into.Copyright © 2014–2024 The Apache Software Foundation. All rights reserved.