public class MinHashLSHModel extends Object
MinHashLSH
.Modifier and Type | Field and Description |
---|---|
protected org.apache.flink.table.api.Table |
modelDataTable |
INPUT_COL
OUTPUT_COL
Constructor and Description |
---|
MinHashLSHModel() |
Modifier and Type | Method and Description |
---|---|
org.apache.flink.table.api.Table |
approxNearestNeighbors(org.apache.flink.table.api.Table dataset,
Vector key,
int k)
An overloaded version of `approxNearestNeighbors` with "distCol" as default value of
`distCol`.
|
org.apache.flink.table.api.Table |
approxNearestNeighbors(org.apache.flink.table.api.Table dataset,
Vector key,
int k,
String distCol)
Approximately finds at most k items from a dataset which have the closest distance to a given
item.
|
org.apache.flink.table.api.Table |
approxSimilarityJoin(org.apache.flink.table.api.Table datasetA,
org.apache.flink.table.api.Table datasetB,
double threshold,
String idCol)
An overloaded version of `approxNearestNeighbors` with "distCol" as default value of
`distCol`.
|
org.apache.flink.table.api.Table |
approxSimilarityJoin(org.apache.flink.table.api.Table datasetA,
org.apache.flink.table.api.Table datasetB,
double threshold,
String idCol,
String distCol)
Joins two datasets to approximately find all pairs of rows whose distance are smaller than or
equal to the threshold.
|
org.apache.flink.table.api.Table[] |
getModelData()
Gets a list of tables representing the model data.
|
Map<Param<?>,Object> |
getParamMap()
Returns a map which should contain value for every parameter that meets one of the following
conditions.
|
static MinHashLSHModel |
load(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv,
String path)
Loads model data from path.
|
void |
save(String path)
Saves the metadata and bounded data of this stage to the given path.
|
T |
setModelData(org.apache.flink.table.api.Table... inputs)
Sets model data using the given list of tables.
|
org.apache.flink.table.api.Table[] |
transform(org.apache.flink.table.api.Table... inputs)
Applies the AlgoOperator on the given input tables and returns the result tables.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
get, getParam, set
getInputCol, setInputCol
getOutputCol, setOutputCol
public void save(String path) throws IOException
Stage
IOException
public static MinHashLSHModel load(org.apache.flink.table.api.bridge.java.StreamTableEnvironment tEnv, String path) throws IOException
tEnv
- A StreamTableEnvironment instance.path
- Model path.IOException
public T setModelData(org.apache.flink.table.api.Table... inputs)
Model
setModelData
in interface Model<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
inputs
- a list of tablespublic org.apache.flink.table.api.Table[] getModelData()
Model
getModelData
in interface Model<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
public Map<Param<?>,Object> getParamMap()
WithParams
1) set(...) has been called to set value for this parameter.
2) The parameter is a public final field of this WithParams instance. This includes fields inherited from its interfaces and super-classes.
The subclass which implements this interface could meet this requirement by returning a
member field of the given map type, after having initialized this member field using the
ParamUtils.initializeMapWithDefaultValues(Map, WithParams)
method.
getParamMap
in interface WithParams<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
public org.apache.flink.table.api.Table[] transform(org.apache.flink.table.api.Table... inputs)
AlgoOperator
transform
in interface AlgoOperator<T extends org.apache.flink.ml.feature.lsh.LSHModel<T>>
inputs
- a list of tablespublic org.apache.flink.table.api.Table approxNearestNeighbors(org.apache.flink.table.api.Table dataset, Vector key, int k, String distCol)
dataset
- The dataset in which to to search for nearest neighbors.key
- The item to search for.k
- The maximum number of nearest neighbors.distCol
- The output column storing the distance between each neighbor and the key.public org.apache.flink.table.api.Table approxNearestNeighbors(org.apache.flink.table.api.Table dataset, Vector key, int k)
public org.apache.flink.table.api.Table approxSimilarityJoin(org.apache.flink.table.api.Table datasetA, org.apache.flink.table.api.Table datasetB, double threshold, String idCol, String distCol)
datasetA
- One dataset.datasetB
- The other dataset.threshold
- The distance threshold.idCol
- A column in the two datasets to identify each row.distCol
- The output column storing the distance between each pair of rows.public org.apache.flink.table.api.Table approxSimilarityJoin(org.apache.flink.table.api.Table datasetA, org.apache.flink.table.api.Table datasetB, double threshold, String idCol)
Copyright © 2019–2023 The Apache Software Foundation. All rights reserved.