Class TypeComparator<T>
- java.lang.Object
-
- org.apache.flink.api.common.typeutils.TypeComparator<T>
-
- Type Parameters:
T
- The data type that the comparator works on.
- All Implemented Interfaces:
Serializable
- Direct Known Subclasses:
BasicTypeComparator
,BooleanValueComparator
,ByteValueComparator
,CharValueComparator
,CompositeTypeComparator
,CopyableValueComparator
,DoubleValueComparator
,FixedLengthByteKeyAndValueComparator
,FloatValueComparator
,GenericTypeComparator
,IntValueComparator
,LocalDateComparator
,LocalDateTimeComparator
,LongValueComparator
,NullAwareComparator
,NullValueComparator
,PrimitiveArrayComparator
,ShortValueComparator
,StringValueComparator
,ValueComparator
,VariableLengthByteKeyAndValueComparator
,WritableComparator
@PublicEvolving public abstract class TypeComparator<T> extends Object implements Serializable
This interface describes the methods that are required for a data type to be handled by the pact runtime. Specifically, this interface contains the methods used for hashing, comparing, and creating auxiliary structures.The methods in this interface depend not only on the record, but also on what fields of a record are used for the comparison or hashing. That set of fields is typically a subset of a record's fields. In general, this class assumes a contract on hash codes and equality the same way as defined for
Object.equals(Object)
Object.equals(Object)
Implementing classes are stateful, because several methods require to set one record as the reference for comparisons and later comparing a candidate against it. Therefore, the classes implementing this interface are not thread safe. The runtime will ensure that no instance is used twice in different threads, but will create a copy for that purpose. It is hence imperative that the copies created by the
duplicate()
method share no state with the instance from which they were copied: they have to be deep copies.
-
-
Constructor Summary
Constructors Constructor Description TypeComparator()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract int
compare(T first, T second)
Compares two records in object form.int
compareAgainstReference(Comparable[] keys)
abstract int
compareSerialized(DataInputView firstSource, DataInputView secondSource)
Compares two records in serialized form.abstract int
compareToReference(TypeComparator<T> referencedComparator)
This method compares the element that has been set as reference in this type accessor, to the element set as reference in the given type accessor.abstract TypeComparator<T>
duplicate()
Creates a copy of this class.abstract boolean
equalToReference(T candidate)
Checks, whether the given element is equal to the element that has been set as the comparison reference in this comparator instance.abstract int
extractKeys(Object record, Object[] target, int index)
Extracts the key fields from a record.abstract TypeComparator[]
getFlatComparators()
Get the field comparators.abstract int
getNormalizeKeyLen()
Gets the number of bytes that the normalized key would maximally take.abstract int
hash(T record)
Computes a hash value for the given record.abstract boolean
invertNormalizedKey()
Flag whether normalized key comparisons should be inverted key should be interpreted inverted, i.e. descending.abstract boolean
isNormalizedKeyPrefixOnly(int keyBytes)
Checks, whether the given number of bytes for a normalized is only a prefix to determine the order of elements of the data type for which this comparator provides the comparison methods.abstract void
putNormalizedKey(T record, MemorySegment target, int offset, int numBytes)
Writes a normalized key for the given record into the target byte array, starting at the specified position and writing exactly the given number of bytes.abstract T
readWithKeyDenormalization(T reuse, DataInputView source)
Reads the record back while de-normalizing the key fields.abstract void
setReference(T toCompare)
Sets the given element as the comparison reference for future calls toequalToReference(Object)
andcompareToReference(TypeComparator)
.boolean
supportsCompareAgainstReference()
abstract boolean
supportsNormalizedKey()
Checks whether the data type supports the creation of a normalized key for comparison.abstract boolean
supportsSerializationWithKeyNormalization()
Check whether this comparator supports to serialize the record in a format that replaces its keys by a normalized key.abstract void
writeWithKeyNormalization(T record, DataOutputView target)
Writes the record in such a fashion that all keys are normalizing and at the beginning of the serialized data.
-
-
-
Method Detail
-
hash
public abstract int hash(T record)
Computes a hash value for the given record. The hash value should include all fields in the record relevant to the comparison.The hash code is typically not used as it is in hash tables and for partitioning, but it is further scrambled to make sure that a projection of the hash values to a lower cardinality space is as results in a rather uniform value distribution. However, any collisions produced by this method cannot be undone. While it is NOT important to create hash codes that cover the full spectrum of bits in the integer, it IS important to avoid collisions when combining two value as much as possible.
- Parameters:
record
- The record to be hashed.- Returns:
- A hash value for the record.
- See Also:
Object.hashCode()
-
setReference
public abstract void setReference(T toCompare)
Sets the given element as the comparison reference for future calls toequalToReference(Object)
andcompareToReference(TypeComparator)
. This method must set the given element into this comparator instance's state. If the comparison happens on a subset of the fields from the record, this method may extract those fields.A typical example for checking the equality of two elements is the following:
The rational behind this method is that elements are typically compared using certain features that are extracted from them, (such de-serializing as a subset of fields). When setting the reference, this extraction happens. The extraction needs happen only once per element, even though an element is often compared to multiple other elements, such as when finding equal elements in the process of grouping the elements.E e1 = ...; E e2 = ...; TypeComparator<E> acc = ...; acc.setReference(e1); boolean equal = acc.equalToReference(e2);
- Parameters:
toCompare
- The element to set as the comparison reference.
-
equalToReference
public abstract boolean equalToReference(T candidate)
Checks, whether the given element is equal to the element that has been set as the comparison reference in this comparator instance.- Parameters:
candidate
- The candidate to check.- Returns:
- True, if the element is equal to the comparison reference, false otherwise.
- See Also:
setReference(Object)
-
compareToReference
public abstract int compareToReference(TypeComparator<T> referencedComparator)
This method compares the element that has been set as reference in this type accessor, to the element set as reference in the given type accessor. Similar to comparing two elementse1
ande2
via a comparator, this method can be used the following way.
The rational behind this method is that elements are typically compared using certain features that are extracted from them, (such de-serializing as a subset of fields). When setting the reference, this extraction happens. The extraction needs happen only once per element, even though an element is typically compared to many other elements when establishing a sorted order. The actual comparison performed by this method may be very cheap, as it happens on the extracted features.E e1 = ...; E e2 = ...; TypeComparator<E> acc1 = ...; TypeComparator<E> acc2 = ...; acc1.setReference(e1); acc2.setReference(e2); int comp = acc1.compareToReference(acc2);
- Parameters:
referencedComparator
- The type accessors where the element for comparison has been set as reference.- Returns:
- A value smaller than zero, if the reference value of
referencedAccessors
is smaller than the reference value of this type accessor; a value greater than zero, if it is larger; zero, if both are equal. - See Also:
setReference(Object)
-
supportsCompareAgainstReference
public boolean supportsCompareAgainstReference()
-
compare
public abstract int compare(T first, T second)
Compares two records in object form. The return value indicates the order of the two in the same way as defined byComparator.compare(Object, Object)
.- Parameters:
first
- The first record.second
- The second record.- Returns:
- An integer defining the oder among the objects in the same way as
Comparator.compare(Object, Object)
. - See Also:
Comparator.compare(Object, Object)
-
compareSerialized
public abstract int compareSerialized(DataInputView firstSource, DataInputView secondSource) throws IOException
Compares two records in serialized form. The return value indicates the order of the two in the same way as defined byComparator.compare(Object, Object)
.This method may de-serialize the records or compare them directly based on their binary representation.
- Parameters:
firstSource
- The input view containing the first record.secondSource
- The input view containing the second record.- Returns:
- An integer defining the oder among the objects in the same way as
Comparator.compare(Object, Object)
. - Throws:
IOException
- Thrown, if any of the input views raised an exception when reading the records.- See Also:
Comparator.compare(Object, Object)
-
supportsNormalizedKey
public abstract boolean supportsNormalizedKey()
Checks whether the data type supports the creation of a normalized key for comparison.- Returns:
- True, if the data type supports the creation of a normalized key for comparison, false otherwise.
-
supportsSerializationWithKeyNormalization
public abstract boolean supportsSerializationWithKeyNormalization()
Check whether this comparator supports to serialize the record in a format that replaces its keys by a normalized key.- Returns:
- True, if the comparator supports that specific form of serialization, false if not.
-
getNormalizeKeyLen
public abstract int getNormalizeKeyLen()
Gets the number of bytes that the normalized key would maximally take. A value ofInteger
.MAX_VALUE is interpreted as infinite.- Returns:
- The number of bytes that the normalized key would maximally take.
-
isNormalizedKeyPrefixOnly
public abstract boolean isNormalizedKeyPrefixOnly(int keyBytes)
Checks, whether the given number of bytes for a normalized is only a prefix to determine the order of elements of the data type for which this comparator provides the comparison methods. For example, if the data type is ordered with respect to an integer value it contains, then this method would return true, if the number of key bytes is smaller than four.- Returns:
- True, if the given number of bytes is only a prefix, false otherwise.
-
putNormalizedKey
public abstract void putNormalizedKey(T record, MemorySegment target, int offset, int numBytes)
Writes a normalized key for the given record into the target byte array, starting at the specified position and writing exactly the given number of bytes. Note that the comparison of the bytes is treating the bytes as unsigned bytes:int byteI = bytes[i] & 0xFF;
If the meaningful part of the normalized key takes less than the given number of bytes, then it must be padded. Padding is typically required for variable length data types, such as strings. The padding uses a special character, either
0
or0xff
, depending on whether shorter values are sorted to the beginning or the end.This method is similar to
NormalizableKey.copyNormalizedKey(MemorySegment, int, int)
. In the case that multiple fields of a record contribute to the normalized key, it is crucial that the fields align on the byte field, i.e. that every field always takes up the exact same number of bytes.- Parameters:
record
- The record for which to create the normalized key.target
- The byte array into which to write the normalized key bytes.offset
- The offset in the byte array, where to start writing the normalized key bytes.numBytes
- The number of bytes to be written exactly.- See Also:
NormalizableKey.copyNormalizedKey(MemorySegment, int, int)
-
writeWithKeyNormalization
public abstract void writeWithKeyNormalization(T record, DataOutputView target) throws IOException
Writes the record in such a fashion that all keys are normalizing and at the beginning of the serialized data. This must only be used when for all the key fields the full normalized key is used. The method#supportsSerializationWithKeyNormalization()
allows to check that.- Parameters:
record
- The record object into which to read the record data.target
- The stream to which to write the data,- Throws:
IOException
- See Also:
supportsSerializationWithKeyNormalization()
,readWithKeyDenormalization(Object, DataInputView)
,NormalizableKey.copyNormalizedKey(MemorySegment, int, int)
-
readWithKeyDenormalization
public abstract T readWithKeyDenormalization(T reuse, DataInputView source) throws IOException
Reads the record back while de-normalizing the key fields. This must only be used when for all the key fields the full normalized key is used, which is hinted by the#supportsSerializationWithKeyNormalization()
method.- Parameters:
reuse
- The reuse object into which to read the record data.source
- The stream from which to read the data,- Throws:
IOException
- See Also:
supportsSerializationWithKeyNormalization()
,writeWithKeyNormalization(Object, DataOutputView)
,NormalizableKey.copyNormalizedKey(MemorySegment, int, int)
-
invertNormalizedKey
public abstract boolean invertNormalizedKey()
Flag whether normalized key comparisons should be inverted key should be interpreted inverted, i.e. descending.- Returns:
- True, if all normalized key comparisons should invert the sign of the comparison result, false if the normalized key should be used as is.
-
duplicate
public abstract TypeComparator<T> duplicate()
Creates a copy of this class. The copy must be deep such that no state set in the copy affects this instance of the comparator class.- Returns:
- A deep copy of this comparator instance.
-
extractKeys
public abstract int extractKeys(Object record, Object[] target, int index)
Extracts the key fields from a record. This is for use by the PairComparator to provide interoperability between different record types. Note, that at least one key should be extracted.- Parameters:
record
- The record that contains the key(s)target
- The array to write the key(s) into.index
- The offset of the target array to start writing into.- Returns:
- the number of keys added to target.
-
getFlatComparators
public abstract TypeComparator[] getFlatComparators()
Get the field comparators. This is used together withextractKeys(Object, Object[], int)
to provide interoperability between different record types. Note, that this should return at least one Comparator and that the number of Comparators must match the number of extracted keys.- Returns:
- An Array of Comparators for the extracted keys.
-
compareAgainstReference
public int compareAgainstReference(Comparable[] keys)
-
-