A Survey on Similarity Measures in Text Mining Article Swipe

PDF

Related Concepts

cluster analysis similarity (geometry) cosine similarity computer science information retrieval euclidean distance measure (data warehouse) similarity measure document clustering metric (unit) process (computing) string (physics) data mining the internet artificial intelligence natural language processing mathematics world wide web image (mathematics) mathematical physics economics operations management operating system

Vijaymeena M.K , K. Kavitha ·

YOU? · · 2016 · Open Access · · DOI: https://doi.org/10.5121/mlaij.2016.3103 · OA: W2327620174

The Volume of text resources have been increasing in digital libraries and internet.Organizing these text documents has become a practical need.For organizing great number of objects into small or minimum number of coherent groups automatically, Clustering technique is used.These documents are widely used for information retrieval and Natural Language processing tasks.Different Clustering algorithms require a metric for quantifying how dissimilar two given documents are.This difference is often measured by similarity measure such as Euclidean distance, Cosine similarity etc.The similarity measure process in text mining can be used to identify the suitable clustering algorithm for a specific problem.This survey discusses the existing works on text similarity by partitioning them into three significant approaches; String-based, Knowledge based and Corpus-based similarities.