Utility functions for LSH models
Usage
ml_approx_nearest_neighbors(
model,
dataset,
key,
num_nearest_neighbors,
dist_col = "distCol"
)
ml_approx_similarity_join(
model,
dataset_a,
dataset_b,
threshold,
dist_col = "distCol"
)Arguments
- model
A fitted LSH model, returned by either
ft_minhash_lsh()orft_bucketed_random_projection_lsh().- dataset
The dataset to search for nearest neighbors of the key.
- key
Feature vector representing the item to search for.
- num_nearest_neighbors
The maximum number of nearest neighbors.
- dist_col
Output column for storing the distance between each result row and the key.
- dataset_a
One of the datasets to join.
- dataset_b
Another dataset to join.
- threshold
The threshold for the distance of row pairs.