A Survey on Similarity Measures in Text Mining Article Swipe
YOU?
·
· 2016
· Open Access
·
· DOI: https://doi.org/10.5121/mlaij.2016.3103
· OA: W2327620174
The Volume of text resources have been increasing in digital libraries and internet.Organizing these text documents has become a practical need.For organizing great number of objects into small or minimum number of coherent groups automatically, Clustering technique is used.These documents are widely used for information retrieval and Natural Language processing tasks.Different Clustering algorithms require a metric for quantifying how dissimilar two given documents are.This difference is often measured by similarity measure such as Euclidean distance, Cosine similarity etc.The similarity measure process in text mining can be used to identify the suitable clustering algorithm for a specific problem.This survey discusses the existing works on text similarity by partitioning them into three significant approaches; String-based, Knowledge based and Corpus-based similarities.