Clustering high-dimensional data

-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data Open

Kai Wang, Qing Zhao, Jianwei Lu, Tianwei Yu · 2015

With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biologica…

Improved Deep Embedded Clustering with Local Structure Preservation Open

Xifeng Guo, Long Gao, Xinwang Liu, Jianping Yin · 2017

Computer science Philosophy

Deep clustering learns deep feature representations that favor clustering task using neural networks. Some pioneering work proposes to simultaneously learn embedded features and perform clustering by explicitly defining a clustering orient…

Multi-view Subspace Clustering Open

Hongchang Gao, Feiping Nie, Xuelong Li, Heng Huang · 2015

Computer science Mathematics Geography

For many computer vision applications, the data sets distribute on certain low;dimensional subspaces. Subspace clustering is to find such underlying subspaces and cluster the data points correctly. In this paper, we propose a novel multi;v…

Multi-view clustering: A survey Open

Yan Yang, Hao Wang · 2018

Computer science

In the big data era, the data are generated from different sources or observed from different views. These data are referred to as multi-view data. Unleashing the power of knowledge in multi-view data is very important in big data mining a…

Large-Scale Multi-View Subspace Clustering in Linear Time Open

Zhao Kang, Wang-Tao Zhou, Zhitong Zhao, Junming Shao, Meng Han , et al. · 2020

Computer science Mathematics Geography

A plethora of multi-view subspace clustering (MVSC) methods have been proposed over the past few years. Researchers manage to boost clustering accuracy from different points of view. However, many state-of-the-art MVSC algorithms, typicall…

Multiview Spectral Clustering via Structured Low-Rank Matrix Factorization Open

Yang Wang, Lin Wu, Xuemin Lin, Junbin Gao · 2018

Computer science Mathematics Political science

Multiview data clustering attracts more attention than their single-view counterparts due to the fact that leveraging multiple independent and complementary information from multiview feature spaces outperforms the single one. Multiview sp…

Deep learning-based clustering approaches for bioinformatics Open

Md. Rezaul Karim, Oya Beyan, Achille Zappa, Ivan G. Costa, Dietrich Rebholz‐Schuhmann , et al. · 2019

Computer science Philosophy Political science

Clustering is central to many data-driven bioinformatics research and serves a powerful computational method. In particular, clustering helps at analyzing unstructured and high-dimensional data in the form of sequences, expressions, texts …

Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework Open

Chun-Guang Li, Chong You, René Vidal · 2017

Computer science Mathematics

Subspace clustering refers to the problem of segmenting data drawn from a union of subspaces. State-of-the-art approaches for solving this problem follow a two-stage approach. In the first step, an affinity matrix is learned from the data …

The Application of Unsupervised Clustering Methods to Alzheimer’s Disease Open

Hany Alashwal, Mohamed El Halaby, Jacob J. Crouse, Areeg Abdalla, Ahmed A. Moustafa · 2019

Computer science Mathematics

Clustering is a powerful machine learning tool for detecting structures in datasets. In the medical field, clustering has been proven to be a powerful tool for discovering patterns and structure in labeled and unlabeled datasets. Unlike su…

Deep Neural Networks for High Dimension, Low Sample Size Data Open

Bo Liu, Ying Wei, Yu Zhang, Qiang Yang · 2017

Computer science Mathematics Biology

Deep neural networks (DNN) have achieved breakthroughs in applications with large sample size. However, when facing high dimension, low sample size (HDLSS) data, such as the phenotype prediction problem using genetic data in bioinformatics…

Robust continuous clustering Open

Sohil Shah, Vladlen Koltun · 2017

Computer science Mathematics

Significance Clustering is a fundamental experimental procedure in data analysis. It is used in virtually all natural and social sciences and has played a central role in biology, astronomy, psychology, medicine, and chemistry. Despite the…

dropClust: efficient clustering of ultra-large scRNA-seq data Open

Debajyoti Sinha, Akhilesh Kumar, Himanshu Kumar, Sanghamitra Bandyopadhyay, Debarka Sengupta · 2018

Biology Computer science Philosophy

Droplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Local…

A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data Open

Xi Peng, Huajin Tang, Lei Zhang, Yi Zhang, Shijie Xiao · 2015

Computer science Mathematics

Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and l2 -norm…

One-Pass Incomplete Multi-View Clustering Open

Menglei Hu, Songcan Chen · 2019

Computer science Physics

Real data are often with multiple modalities or from multiple heterogeneous sources, thus forming so-called multi-view data, which receives more and more attentions in machine learning. Multi-view clustering (MVC) becomes its important par…

A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data Open

Hongchao Song, Zhuqing Jiang, Aidong Men, Bo Yang · 2017

Computer science Mathematics Physics

Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each…

SAFE-clustering: Single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data Open

Yuchen Yang, Ruth Huh, Houston Culpepper, Yuan Lin, Michael I. Love , et al. · 2018

Computer science Mathematics

Motivation Accurately clustering cell types from a mass of heterogeneous cells is a crucial first step for the analysis of single-cell RNA-seq (scRNA-Seq) data. Although several methods have been recently developed, they utilize different …

Clustering Algorithm for a Healthcare Dataset Using Silhouette Score Value Open

Godwin Ogbuabor, Ugwoke F. N · 2018

Computer science

The huge amount of healthcare data, coupled with the need for data analysis tools has made data mining interesting research areas.Data mining tools and techniques help to discover and understand hidden patterns in a dataset which may not b…

Variable selection methods for model-based clustering Open

Michael Fop, Thomas Brendan Murphy · 2018

Computer science Mathematics

Model-based clustering is a popular approach for clustering multivariate data which has seen applications in numerous fields. Nowadays, high-dimensional data are more and more common and the model-based clustering approach has adapted to d…

Unsupervised Deep Embedding for Clustering Analysis Open

Junyuan Xie, Ross Girshick, Ali Farhadi · 2015

Computer science Philosophy

Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this p…

Transfer Prototype-Based Fuzzy Clustering Open

Zhaohong Deng, Yizhang Jiang, Fu-Lai Chung, Hisao Ishibuchi, Kup‐Sze Choi , et al. · 2015

Computer science

The traditional prototype based clustering methods, such as the well-known\nfuzzy c-mean (FCM) algorithm, usually need sufficient data to find a good\nclustering partition. If the available data is limited or scarce, most of the\nexisting …

Sliding Window-Based Fault Detection From High-Dimensional Data Streams Open

Liangwei Zhang, Jing Lin, Ramin Karim · 2016

Computer science Mathematics

High-dimensional data streams are becoming increasingly ubiquitous in industrial systems. Efficient detection of system faults from these data can ensure the reliability and safety of the system. The difficulties brought about by high dime…

Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features Open

Naveen Sai Madiraju, Seid M. Sadat, Dimitry Fisher, H. Karimabadi · 2018

Computer science

Unsupervised learning of time series data, also known as temporal clustering, is a challenging problem in machine learning. Here we propose a novel algorithm, Deep Temporal Clustering (DTC), to naturally integrate dimensionality reduction …

Data Clustering: Algorithms and Its Applications Open

Jelili Oyelade, Itunuoluwa Isewon, Olufunke Oladipupo, Onyeka Emebo, Zacchaeus O. Omogbadegun , et al. · 2019

Computer science

Data is useless if information or knowledge that can \nbe used for further reasoning cannot be inferred from it. \nCluster analysis, based on some criteria, shares data into important, practical or both categories (clusters) based on share…

Clustering of single-cell multi-omics data with a multimodal deep learning method Open

Xiang Lin, Tian Tian, Zhi Wei, Hákon Hákonarson · 2022

Computer science

Single-cell multimodal sequencing technologies are developed to simultaneously profile different modalities of data in the same cell. It provides a unique opportunity to jointly analyze multimodal data at the single-cell level for the iden…

Distance‐based clustering of mixed data Open

Michel van de Velden, Alfonso Iodice D’Enza, Angelos Markos · 2018

Computer science Economics

Cluster analysis comprises of several unsupervised techniques aiming to identify a subgroup (cluster) structure underlying the observations of a data set. The desired cluster allocation is such that it assigns similar observations to the s…

Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data Open

Wei Liu, Xu Liao, Yi Yang, Huazhen Lin, Joe Yeong , et al. · 2022

Computer science Mathematics

Dimension reduction and (spatial) clustering is usually performed sequentially; however, the low-dimensional embeddings estimated in the dimension-reduction step may not be relevant to the class labels inferred in the clustering step. We t…

Entropy-based consensus clustering for patient stratification Open

Hongfu Liu, Rui Zhao, Hong-Sheng Fang, Feixiong Cheng, Yun Fu , et al. · 2017

Computer science Physics

Motivation Patient stratification or disease subtyping is crucial for precision medicine and personalized treatment of complex diseases. The increasing availability of high-throughput molecular data provides a great opportunity for patient…

SHARP: hyperfast and accurate processing of single-cell RNA-seq data via ensemble random projection Open

Shibiao Wan, Junil Kim, Kyoung‐Jae Won · 2020

Computer science Biology Business

To process large-scale single-cell RNA-sequencing (scRNA-seq) data effectively without excessive distortion during dimension reduction, we present SHARP, an ensemble random projection-based algorithm that is scalable to clustering 10 milli…

Integrative clustering methods of multi‐omics data for molecule‐based cancer classifications Open

Dongfang Wang, Jin Gu · 2016

Computer science Biology

One goal of precise oncology is to re‐classify cancer based on molecular features rather than its tissue origin. Integrative clustering of large‐scale multi‐omics data is an important way for molecule‐based cancer classification. The data …

HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data Open

Laurent Bergé, Charles Bouveyron, Stéphane Girard · 2020

Computer science Mathematics

This paper presents the R package HDclassif which is devoted to the clustering and the discriminant analysis of high-dimensional data. The classification methods proposed in the package result from a new parametrization of the Gaussian mix…