Nearest neighbor search

A Survey on Learning to Hash Open

Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, Heng Tao Shen · 2017

Computer science

Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely stu…

k-Nearest Neighbour Classifiers - A Tutorial Open

Pádraig Cunningham, Sarah Jane Delany · 2021

Computer science

Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier—classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to…

Deep Hashing Network for Efficient Similarity Retrieval Open

Han Zhu, Mingsheng Long, Jianmin Wang, Yue Cao · 2016

Computer science Mathematics

Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing, which improves the quality of hash coding by exploiting the sema…

Billion-Scale Similarity Search with GPUs Open

Jeff Johnson, Matthijs Douze, Hervé Jeǵou · 2019

Computer science

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles t…

Collaborative Metric Learning Open

Cheng-Kang Hsieh, Longqi Yang, Yin Cui, Tsung-Yi Lin, Serge Belongie , et al. · 2017

Computer science Mathematics Economics

Metric learning algorithms produce distance metrics that capture the important relationships among data. In this work, we study the connection between metric learning and collaborative filtering. We propose Collaborative Metric Learning (C…

Asymmetric Deep Supervised Hashing Open

Qing-Yuan Jiang, Wu-Jun Li · 2018

Computer science

Hashing has been widely used for large-scale approximate nearest neighbor search because of its storage and search efficiency. Recent work has found that deep supervised hashing can significantly outperform non-deep supervised hashing in m…

Deep Quantization Network for Efficient Image Retrieval Open

Yue Cao, Mingsheng Long, Jianmin Wang, Han Zhu, Qingfu Wen · 2016

Computer science

Hashing has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing improves the quality of hash coding by exploiting the semantic similarity on data pairs and has received increa…

Graph PCA Hashing for Similarity Search Open

Xiaofeng Zhu, Xuelong Li, Shichao Zhang, Zongben Xu, Litao Yu , et al. · 2017

Computer science Mathematics Political science

This paper proposes a new hashing framework to conduct similarity search via the following steps: first, employing linear clustering methods to obtain a set of representative data points and a set of landmarks of the big dataset; second, u…

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs Open

Yu. A. Malkov, D. A. Yashunin · 2016

Computer science Mathematics Economics

We present a new approach for the approximate K-nearest neighbor search based on navigable small world graphs with controllable hierarchy (Hierarchical NSW, HNSW). The proposed solution is fully graph-based, without any need for additional…

Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing Open

Jun Yu, Hao Zhou, Yibing Zhan, Dacheng Tao · 2021

Computer science Mathematics

Unsupervised cross-modal hashing (UCMH) has become a hot topic recently. Current UCMH focuses on exploring data similarities. However, current UCMH methods calculate the similarity between two data, mainly relying on the two data's cross-m…

Exploring Nearest Neighbor Approaches for Image Captioning Open

Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C. Lawrence Zitnick · 2015

Computer science

We explore a variety of nearest neighbor baseline approaches for image captioning. These approaches find a set of nearest neighbor images in the training set from which a caption may be borrowed for the query image. We select a caption for…

Levenshtein Distance, Sequence Comparison and Biological Database Search Open

Bonnie Berger, Michael S. Waterman, Yun William Yu · 2020

Computer science Biology Economics

Levenshtein edit distance has played a central role—both past and present—in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for co…

Optimization of distance formula in K-Nearest Neighbor method Open

Arif Ridho Lubis, Muharman Lubis, Al-Khowarizmi Al-Khowarizmi · 2020

Mathematics Computer science

K-Nearest Neighbor (KNN) is a method applied in classifying objects based on learning data that is closest to the object based on comparison between previous and current data. In the learning process, KNN calculates the distance of the nea…

Efficient Hyperparameter Tuning with Grid Search for Text Categorization using kNN Approach with BM25 Similarity Open

Raji Ghawi, Jürgen Pfeffer · 2019

Computer science Mathematics

In machine learning, hyperparameter tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. Several approaches have been widely adopted for hyperparameter tuning, which is typically a time consuming pro…

Meta-Path Guided Embedding for Similarity Search in Large-Scale Heterogeneous Information Networks Open

Jingbo Shang, Meng Qu, Jialu Liu, Lance Kaplan, Jiawei Han , et al. · 2016

Computer science

Most real-world data can be modeled as heterogeneous information networks (HINs) consisting of vertices of multiple types and their relationships. Search for similar vertices of the same type in large HINs, such as bibliographic networks a…

A comparative analysis of trajectory similarity measures Open

Yaguang Tao, Alan Both, Rodrigo I. Silveira, Kevin Buchin, Stef Sijben , et al. · 2021

Computer science Mathematics Engineering

Computing trajectory similarity is a fundamental operation in movement analytics, required in search, clustering, and classification of trajectories, for example. Yet the range of different but interrelated trajectory similarity measures c…

JOSIE Open

Erkang Zhu, Dong Deng, Fatemeh Nargesian, Renée J. Miller · 2019

Computer science Mathematics Engineering

We present a new solution for finding joinable tables in massive data lakes: given a table and one join column, find tables that can be joined with the given table on the largest number of distinct values. The problem can be formulated as …

mTM-align: a server for fast protein structure database search and multiple protein structure alignment Open

Runze Dong, Shuo Pan, Zhenling Peng, Yang Zhang, Jianyi Yang · 2018

Computer science Biology Geography

With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of …

DrugShot: querying biomedical search terms to retrieve prioritized lists of small molecules Open

Eryk Kropiwnicki, Alexander Lachmann, Daniel Clarke, Zhuorui Xie, Kathleen M. Jagodnik , et al. · 2022

Computer science Medicine Biology

Background PubMed contains millions of abstracts that co-mention terms that describe drugs with other biomedical terms such as genes or diseases. Unique opportunities exist for leveraging these co-mentions by integrating them with other dr…

Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing Open

Wout Bittremieux, Pieter Meysman, William Stafford Noble, Kris Laukens · 2018

Computer science

Open modification searching (OMS) is a powerful search strategy that identifies peptides carrying any type of modification by allowing a modified spectrum to match against its unmodified variant by using a very wide precursor mass window. …

An Improved DBSCAN Algorithm Based on the Neighbor Similarity and Fast Nearest Neighbor Query Open

Shanshan Li · 2020

Computer science

DBSCAN is the most famous density based clustering algorithm which is one of the main clustering paradigms. However, there are many redundant distance computations among the process of DBSCAN clustering, due to brute force Range-Query used…

Binary Hashing for Approximate Nearest Neighbor Search on Big Data: A Survey Open

Yuan Cao, Heng Qi, Wenrui Zhou, Jien Kato, Keqiu Li , et al. · 2017

Computer science Mathematics

Nearest neighbor search is a fundamental problem in various domains, such as computer vision, data mining, and machine learning. With the explosive growth of data on the Internet, many new data structures using spatial partitions and recur…

Sequential Discrete Hashing for Scalable Cross-Modality Similarity Retrieval Open

Li Liu, Zijia Lin, Ling Shao, Fumin Shen, Guiguang Ding , et al. · 2016

Computer science Mathematics

With the dramatic development of the Internet, how to exploit large-scale retrieval techniques for multimodal web data has become one of the most popular but challenging problems in computer vision and multimedia. Recently, hashing methods…

A method for satellite time series anomaly detection based on fast-DTW and improved-KNN Open

Langfu Cui, Qingzhen Zhang, Yan Shi, Liman Yang, Yixuan Wang , et al. · 2022

Computer science Mathematics Engineering

In satellite anomaly detection, there are some problems such as unbalanced sample distribution, fewer fault samples, and unobvious anomaly characteristics. These problems cause the extisted anomaly detection methods are difficult to train …

Spectral Multimodal Hashing and Its Application to Multimedia Retrieval Open

Yi Zhen, Yue Gao, Dit‐Yan Yeung, Hongyuan Zha, Xuelong Li · 2015

Computer science Sociology

In recent years, multimedia retrieval has sparked much research interest in the multimedia, pattern recognition, and data mining communities. Although some attempts have been made along this direction, performing fast multimodal search at …

MLS3RDUH: Deep Unsupervised Hashing via Manifold based Local Semantic Similarity Structure Reconstructing Open

Rong-Cheng Tu, Xian-Ling Mao, Wei Wei · 2020

Computer science

Most of the unsupervised hashing methods usually map images into semantic similarity-preserving hash codes by constructing local semantic similarity structure as guiding information, i.e., treating each point similar to its k nearest neigh…

Activity-relevant similarity values for fingerprints and implications for similarity searching Open

Swarit Jasial, Ye Hu, Martin Vogt, Jürgen Bajorath · 2016

Computer science Mathematics Biology

A largely unsolved problem in chemoinformatics is the issue of how calculated compound similarity relates to activity similarity, which is central to many applications. In general, activity relationships are predicted from calculated simil…

Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination Open

Conglong Li, Minjia Zhang, David G. Andersen, Yuxiong He · 2020

Computer science

In applications ranging from image search to recommendation systems, the problem of identifying a set of "similar" real-valued vectors to a query vector plays a critical role. However, retrieving these vectors and computing the correspondi…

A Distributed Storage and Computation k-Nearest Neighbor Algorithm Based Cloud-Edge Computing for Cyber-Physical-Social Systems Open

Wei Zhang, Xiaohong Chen, Yueqi Liu, Qian Xi · 2020

Computer science Engineering

The k-nearest neighbor (kNN) algorithm is a classic supervised machine learning algorithm. It is widely used in cyber-physical-social systems (CPSS) to analyze and mine data. However, in practical CPSS applications, the standard linear kNN…

A generalized fuzzy k-nearest neighbor regression model based on Minkowski distance Open

Mahinda Mailagaha Kumbure, Pasi Luukka · 2021

Mathematics Computer science

The fuzzy k-nearest neighbor (FKNN) algorithm, one of the most well-known and effective supervised learning techniques, has often been used in data classification problems but rarely in regression settings. This paper introduces a new, mor…

Nearest neighbor search ≈ Nearest neighbor search