Different metrics of distance are convenient for different types of analysis. Flink ML provides
built-in implementations for many standard distance metrics. You can create custom
distance metrics by implementing the DistanceMetric
trait.
Currently, FlinkML supports the following metrics:
Metric | Description |
---|---|
Euclidean Distance | $$d(\x, \y) = \sqrt{\sum_{i=1}^n \left(x_i - y_i \right)^2}$$ |
Squared Euclidean Distance | $$d(\x, \y) = \sum_{i=1}^n \left(x_i - y_i \right)^2$$ |
Cosine Similarity | $$d(\x, \y) = 1 - \frac{\x^T \y}{\Vert \x \Vert \Vert \y \Vert}$$ |
Chebyshev Distance | $$d(\x, \y) = \max_{i}\left(\left \vert x_i - y_i \right\vert \right)$$ |
Manhattan Distance | $$d(\x, \y) = \sum_{i=1}^n \left\vert x_i - y_i \right\vert$$ |
Minkowski Distance | $$d(\x, \y) = \left( \sum_{i=1}^{n} \left( x_i - y_i \right)^p \right)^{\rfrac{1}{p}}$$ |
Tanimoto Distance | $$d(\x, \y) = 1 - \frac{\x^T\y}{\Vert \x \Vert^2 + \Vert \y \Vert^2 - \x^T\y}$$ with $\x$ and $\y$ being bit-vectors |
You can create your own distance metric by implementing the DistanceMetric
trait.