Statistical Significance of Clustering Using Soft Thresholding Article Swipe
YOU?
·
· 2020
· Open Access
·
· DOI: https://doi.org/10.17615/dvba-7p60
Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts. This challenge is especially serious, and very few methods are available, when the data are very high in dimension. Statistical Significance of Clustering (SigClust) is a recently developed cluster evaluation tool for high dimensional low sample size data. An important component of the SigClust approach is the very definition of a single cluster as a subset of data sampled from a multivariate Gaussian distribution. The implementation of SigClust requires the estimation of the eigenvalues of the covariance matrix for the null multivariate Gaussian distribution. We show that the original eigenvalue estimation can lead to a test that suffers from severe inflation of type-I error, in the important case where there are a few very large eigenvalues. This paper addresses this critical challenge using a novel likelihood based soft thresholding approach to estimate these eigenvalues, which leads to a much improved SigClust. Major improvements in SigClust performance are shown by both mathematical analysis, based on the new notion of Theoretical Cluster Index, and extensive simulation studies. Applications to some cancer genomic data further demonstrate the usefulness of these improvements.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.17615/dvba-7p60
- OA Status
- green
- Cited By
- 1
- References
- 30
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W2949610149
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W2949610149Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.17615/dvba-7p60Digital Object Identifier
- Title
-
Statistical Significance of Clustering Using Soft ThresholdingWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2020Year of publication
- Publication date
-
2020-11-04Full publication date if available
- Authors
-
Hanwen Huang, Yufeng Liu, Ming Yuan, J. S. MarronList of authors in order
- Landing page
-
https://doi.org/10.17615/dvba-7p60Publisher landing page
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://doi.org/10.17615/dvba-7p60Direct OA link when available
- Concepts
-
Thresholding, Cluster analysis, Artificial intelligence, Pattern recognition (psychology), Computer science, Image (mathematics)Top concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2013: 1Per-year citation counts (last 5 years)
- References (count)
-
30Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W2949610149 |
|---|---|
| doi | https://doi.org/10.17615/dvba-7p60 |
| ids.doi | https://doi.org/10.17615/dvba-7p60 |
| ids.mag | 2949610149 |
| ids.openalex | https://openalex.org/W2949610149 |
| fwci | 0.0 |
| type | article |
| title | Statistical Significance of Clustering Using Soft Thresholding |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11901 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.998199999332428 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Bayesian Methods and Mixture Models |
| topics[1].id | https://openalex.org/T10885 |
| topics[1].field.id | https://openalex.org/fields/13 |
| topics[1].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[1].score | 0.995199978351593 |
| topics[1].domain.id | https://openalex.org/domains/1 |
| topics[1].domain.display_name | Life Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1312 |
| topics[1].subfield.display_name | Molecular Biology |
| topics[1].display_name | Gene expression and cancer classification |
| topics[2].id | https://openalex.org/T10887 |
| topics[2].field.id | https://openalex.org/fields/13 |
| topics[2].field.display_name | Biochemistry, Genetics and Molecular Biology |
| topics[2].score | 0.9896000027656555 |
| topics[2].domain.id | https://openalex.org/domains/1 |
| topics[2].domain.display_name | Life Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1312 |
| topics[2].subfield.display_name | Molecular Biology |
| topics[2].display_name | Bioinformatics and Genomic Networks |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C191178318 |
| concepts[0].level | 3 |
| concepts[0].score | 0.7440460324287415 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q2256906 |
| concepts[0].display_name | Thresholding |
| concepts[1].id | https://openalex.org/C73555534 |
| concepts[1].level | 2 |
| concepts[1].score | 0.698887288570404 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q622825 |
| concepts[1].display_name | Cluster analysis |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.5147466659545898 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C153180895 |
| concepts[3].level | 2 |
| concepts[3].score | 0.46673545241355896 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7148389 |
| concepts[3].display_name | Pattern recognition (psychology) |
| concepts[4].id | https://openalex.org/C41008148 |
| concepts[4].level | 0 |
| concepts[4].score | 0.4404182732105255 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[4].display_name | Computer science |
| concepts[5].id | https://openalex.org/C115961682 |
| concepts[5].level | 2 |
| concepts[5].score | 0.04672834277153015 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q860623 |
| concepts[5].display_name | Image (mathematics) |
| keywords[0].id | https://openalex.org/keywords/thresholding |
| keywords[0].score | 0.7440460324287415 |
| keywords[0].display_name | Thresholding |
| keywords[1].id | https://openalex.org/keywords/cluster-analysis |
| keywords[1].score | 0.698887288570404 |
| keywords[1].display_name | Cluster analysis |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.5147466659545898 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/pattern-recognition |
| keywords[3].score | 0.46673545241355896 |
| keywords[3].display_name | Pattern recognition (psychology) |
| keywords[4].id | https://openalex.org/keywords/computer-science |
| keywords[4].score | 0.4404182732105255 |
| keywords[4].display_name | Computer science |
| keywords[5].id | https://openalex.org/keywords/image |
| keywords[5].score | 0.04672834277153015 |
| keywords[5].display_name | Image (mathematics) |
| language | en |
| locations[0].id | doi:10.17615/dvba-7p60 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S7407051488 |
| locations[0].source.type | repository |
| locations[0].source.is_oa | False |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | UNC Libraries |
| locations[0].source.host_organization | |
| locations[0].source.host_organization_name | |
| locations[0].license | |
| locations[0].pdf_url | |
| locations[0].version | |
| locations[0].raw_type | article-journal |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | https://doi.org/10.17615/dvba-7p60 |
| indexed_in | datacite |
| authorships[0].author.id | https://openalex.org/A5018496840 |
| authorships[0].author.orcid | https://orcid.org/0000-0003-2021-755X |
| authorships[0].author.display_name | Hanwen Huang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Huang, Hanwen |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5100376614 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-1686-0545 |
| authorships[1].author.display_name | Yufeng Liu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Liu, Yufeng |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5070196780 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-4415-8606 |
| authorships[2].author.display_name | Ming Yuan |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Yuan, Ming |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5060651495 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | J. S. Marron |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Marron, J. S. |
| authorships[3].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://doi.org/10.17615/dvba-7p60 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Statistical Significance of Clustering Using Soft Thresholding |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11901 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.998199999332428 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Bayesian Methods and Mixture Models |
| related_works | https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2953058328, https://openalex.org/W1542224353, https://openalex.org/W1661087619, https://openalex.org/W2750730210, https://openalex.org/W2236974868, https://openalex.org/W2106145857, https://openalex.org/W2033914206, https://openalex.org/W2042327336 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2013 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 1 |
| best_oa_location.id | doi:10.17615/dvba-7p60 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S7407051488 |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | False |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | UNC Libraries |
| best_oa_location.source.host_organization | |
| best_oa_location.source.host_organization_name | |
| best_oa_location.license | |
| best_oa_location.pdf_url | |
| best_oa_location.version | |
| best_oa_location.raw_type | article-journal |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | https://doi.org/10.17615/dvba-7p60 |
| primary_location.id | doi:10.17615/dvba-7p60 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S7407051488 |
| primary_location.source.type | repository |
| primary_location.source.is_oa | False |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | UNC Libraries |
| primary_location.source.host_organization | |
| primary_location.source.host_organization_name | |
| primary_location.license | |
| primary_location.pdf_url | |
| primary_location.version | |
| primary_location.raw_type | article-journal |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | https://doi.org/10.17615/dvba-7p60 |
| publication_date | 2020-11-04 |
| publication_year | 2020 |
| referenced_works | https://openalex.org/W2165009258, https://openalex.org/W2161289668, https://openalex.org/W3098834468, https://openalex.org/W2135311088, https://openalex.org/W2132619562, https://openalex.org/W2024866056, https://openalex.org/W2127218421, https://openalex.org/W1480376833, https://openalex.org/W1966096622, https://openalex.org/W2098290597, https://openalex.org/W2133822638, https://openalex.org/W2009569524, https://openalex.org/W1989727964, https://openalex.org/W2048178552, https://openalex.org/W2011832962, https://openalex.org/W2799061466, https://openalex.org/W2132555912, https://openalex.org/W1991767154, https://openalex.org/W2168980979, https://openalex.org/W2021137021, https://openalex.org/W1579271636, https://openalex.org/W2106084579, https://openalex.org/W2011978385, https://openalex.org/W2133707124, https://openalex.org/W2108435369, https://openalex.org/W2102330962, https://openalex.org/W3106319742, https://openalex.org/W2081746825, https://openalex.org/W2018821242, https://openalex.org/W2017818759 |
| referenced_works_count | 30 |
| abstract_inverted_index.A | 14 |
| abstract_inverted_index.a | 5, 59, 84, 88, 94, 128, 145, 157, 171 |
| abstract_inverted_index.An | 72 |
| abstract_inverted_index.We | 118 |
| abstract_inverted_index.as | 28, 87 |
| abstract_inverted_index.by | 182 |
| abstract_inverted_index.in | 10, 17, 51, 138, 177 |
| abstract_inverted_index.is | 20, 36, 58, 79 |
| abstract_inverted_index.of | 7, 55, 75, 83, 90, 100, 105, 108, 135, 191, 209 |
| abstract_inverted_index.on | 187 |
| abstract_inverted_index.to | 4, 30, 127, 164, 170, 200 |
| abstract_inverted_index.The | 98 |
| abstract_inverted_index.and | 12, 39, 195 |
| abstract_inverted_index.are | 43, 48, 144, 180 |
| abstract_inverted_index.can | 125 |
| abstract_inverted_index.few | 41, 146 |
| abstract_inverted_index.for | 65, 112 |
| abstract_inverted_index.led | 3 |
| abstract_inverted_index.low | 68 |
| abstract_inverted_index.new | 189 |
| abstract_inverted_index.the | 46, 76, 80, 103, 106, 109, 113, 121, 139, 188, 207 |
| abstract_inverted_index.use | 19 |
| abstract_inverted_index.This | 34, 150 |
| abstract_inverted_index.both | 183 |
| abstract_inverted_index.case | 141 |
| abstract_inverted_index.data | 47, 91, 204 |
| abstract_inverted_index.from | 93, 132 |
| abstract_inverted_index.have | 2 |
| abstract_inverted_index.high | 50, 66 |
| abstract_inverted_index.lead | 126 |
| abstract_inverted_index.much | 172 |
| abstract_inverted_index.null | 114 |
| abstract_inverted_index.show | 119 |
| abstract_inverted_index.size | 70 |
| abstract_inverted_index.soft | 161 |
| abstract_inverted_index.some | 201 |
| abstract_inverted_index.test | 129 |
| abstract_inverted_index.that | 120, 130 |
| abstract_inverted_index.this | 153 |
| abstract_inverted_index.tool | 64 |
| abstract_inverted_index.very | 40, 49, 81, 147 |
| abstract_inverted_index.when | 45 |
| abstract_inverted_index.Major | 175 |
| abstract_inverted_index.based | 160, 186 |
| abstract_inverted_index.data. | 71 |
| abstract_inverted_index.large | 148 |
| abstract_inverted_index.leads | 169 |
| abstract_inverted_index.major | 15 |
| abstract_inverted_index.novel | 158 |
| abstract_inverted_index.paper | 151 |
| abstract_inverted_index.shown | 181 |
| abstract_inverted_index.their | 18 |
| abstract_inverted_index.there | 143 |
| abstract_inverted_index.these | 166, 210 |
| abstract_inverted_index.using | 156 |
| abstract_inverted_index.where | 142 |
| abstract_inverted_index.which | 22, 168 |
| abstract_inverted_index.Index, | 194 |
| abstract_inverted_index.cancer | 202 |
| abstract_inverted_index.error, | 137 |
| abstract_inverted_index.matrix | 111 |
| abstract_inverted_index.notion | 190 |
| abstract_inverted_index.number | 6 |
| abstract_inverted_index.sample | 69 |
| abstract_inverted_index.severe | 133 |
| abstract_inverted_index.single | 85 |
| abstract_inverted_index.subset | 89 |
| abstract_inverted_index.type-I | 136 |
| abstract_inverted_index.Cluster | 193 |
| abstract_inverted_index.beyond. | 13 |
| abstract_inverted_index.cluster | 62, 86 |
| abstract_inverted_index.further | 205 |
| abstract_inverted_index.genomic | 203 |
| abstract_inverted_index.methods | 1, 42 |
| abstract_inverted_index.opposed | 29 |
| abstract_inverted_index.sampled | 92 |
| abstract_inverted_index.suffers | 131 |
| abstract_inverted_index.Gaussian | 96, 116 |
| abstract_inverted_index.SigClust | 77, 101, 178 |
| abstract_inverted_index.approach | 78, 163 |
| abstract_inverted_index.clusters | 23 |
| abstract_inverted_index.critical | 154 |
| abstract_inverted_index.estimate | 165 |
| abstract_inverted_index.improved | 173 |
| abstract_inverted_index.original | 122 |
| abstract_inverted_index.recently | 60 |
| abstract_inverted_index.requires | 102 |
| abstract_inverted_index.sampling | 32 |
| abstract_inverted_index.serious, | 38 |
| abstract_inverted_index.spurious | 31 |
| abstract_inverted_index.studies. | 198 |
| abstract_inverted_index.SigClust. | 174 |
| abstract_inverted_index.addresses | 152 |
| abstract_inverted_index.analysis, | 185 |
| abstract_inverted_index.challenge | 16, 35, 155 |
| abstract_inverted_index.component | 74 |
| abstract_inverted_index.developed | 61 |
| abstract_inverted_index.extensive | 196 |
| abstract_inverted_index.important | 8, 25, 73, 140 |
| abstract_inverted_index.inflation | 134 |
| abstract_inverted_index.represent | 24 |
| abstract_inverted_index.(SigClust) | 57 |
| abstract_inverted_index.Clustering | 0, 56 |
| abstract_inverted_index.artifacts. | 33 |
| abstract_inverted_index.available, | 44 |
| abstract_inverted_index.covariance | 110 |
| abstract_inverted_index.definition | 82 |
| abstract_inverted_index.dimension. | 52 |
| abstract_inverted_index.eigenvalue | 123 |
| abstract_inverted_index.especially | 37 |
| abstract_inverted_index.estimation | 104, 124 |
| abstract_inverted_index.evaluation | 63 |
| abstract_inverted_index.likelihood | 159 |
| abstract_inverted_index.simulation | 197 |
| abstract_inverted_index.structure, | 27 |
| abstract_inverted_index.underlying | 26 |
| abstract_inverted_index.usefulness | 208 |
| abstract_inverted_index.Statistical | 53 |
| abstract_inverted_index.Theoretical | 192 |
| abstract_inverted_index.demonstrate | 206 |
| abstract_inverted_index.determining | 21 |
| abstract_inverted_index.dimensional | 67 |
| abstract_inverted_index.discoveries | 9 |
| abstract_inverted_index.eigenvalues | 107 |
| abstract_inverted_index.performance | 179 |
| abstract_inverted_index.Applications | 199 |
| abstract_inverted_index.Significance | 54 |
| abstract_inverted_index.eigenvalues, | 167 |
| abstract_inverted_index.eigenvalues. | 149 |
| abstract_inverted_index.improvements | 176 |
| abstract_inverted_index.mathematical | 184 |
| abstract_inverted_index.multivariate | 95, 115 |
| abstract_inverted_index.thresholding | 162 |
| abstract_inverted_index.distribution. | 97, 117 |
| abstract_inverted_index.improvements. | 211 |
| abstract_inverted_index.bioinformatics | 11 |
| abstract_inverted_index.implementation | 99 |
| cited_by_percentile_year.max | 94 |
| cited_by_percentile_year.min | 89 |
| countries_distinct_count | 0 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile.value | 0.00557626 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | False |