An ensemble method for estimating the number of clusters in a big data set using multiple random samples Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.1186/s40537-023-00709-4
Clustering a big dataset without knowing the number of clusters presents a big challenge to many existing clustering algorithms. In this paper, we propose a Random Sample Partition-based Centers Ensemble (RSPCE) algorithm to identify the number of clusters in a big dataset. In this algorithm, a set of disjoint random samples is selected from the big dataset, and the I-niceDP algorithm is used to identify the number of clusters and initial centers in each sample. Subsequently, a cluster ball model is proposed to merge two clusters in the random samples that are likely sampled from the same cluster in the big dataset. Finally, based on the ball model, the RSPCE ensemble method is used to ensemble the results of all samples into the final result as a set of initial cluster centers in the big dataset. Intensive experiments were conducted on both synthetic and real datasets to validate the feasibility and effectiveness of the proposed RSPCE algorithm. The experimental results show that the ensemble result from multiple random samples is a reliable approximation of the actual number of clusters, and the RSPCE algorithm is scalable to big data.
Related Topics
- Type
- article
- Language
- en
- Landing Page
- https://doi.org/10.1186/s40537-023-00709-4
- https://journalofbigdata.springeropen.com/counter/pdf/10.1186/s40537-023-00709-4
- OA Status
- gold
- Cited By
- 11
- References
- 33
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4362580199
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4362580199Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.1186/s40537-023-00709-4Digital Object Identifier
- Title
-
An ensemble method for estimating the number of clusters in a big data set using multiple random samplesWork title
- Type
-
articleOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-04-01Full publication date if available
- Authors
-
Mohammad Sultan Mahmud, Joshua Zhexue Huang, Rukhsana Ruby, Kaishun WuList of authors in order
- Landing page
-
https://doi.org/10.1186/s40537-023-00709-4Publisher landing page
- PDF URL
-
https://journalofbigdata.springeropen.com/counter/pdf/10.1186/s40537-023-00709-4Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
goldOpen access status per OpenAlex
- OA URL
-
https://journalofbigdata.springeropen.com/counter/pdf/10.1186/s40537-023-00709-4Direct OA link when available
- Concepts
-
Computer science, Big data, Cluster analysis, Disjoint sets, Scalability, Data mining, Cluster (spacecraft), Ensemble learning, Partition (number theory), Merge (version control), Artificial intelligence, Mathematics, Database, Information retrieval, Combinatorics, Programming languageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
11Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 4, 2024: 5, 2023: 2Per-year citation counts (last 5 years)
- References (count)
-
33Number of works referenced by this work
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4362580199 |
|---|---|
| doi | https://doi.org/10.1186/s40537-023-00709-4 |
| ids.doi | https://doi.org/10.1186/s40537-023-00709-4 |
| ids.openalex | https://openalex.org/W4362580199 |
| fwci | 2.80987176 |
| type | article |
| title | An ensemble method for estimating the number of clusters in a big data set using multiple random samples |
| awards[0].id | https://openalex.org/G7924544171 |
| awards[0].funder_id | https://openalex.org/F4320321001 |
| awards[0].display_name | |
| awards[0].funder_award_id | 61972261 |
| awards[0].funder_display_name | National Natural Science Foundation of China |
| biblio.issue | 1 |
| biblio.volume | 10 |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10637 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9995999932289124 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Advanced Clustering Algorithms Research |
| topics[1].id | https://openalex.org/T12761 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9898999929428101 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Data Stream Mining Techniques |
| topics[2].id | https://openalex.org/T10057 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9896000027656555 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1707 |
| topics[2].subfield.display_name | Computer Vision and Pattern Recognition |
| topics[2].display_name | Face and Expression Recognition |
| funders[0].id | https://openalex.org/F4320321001 |
| funders[0].ror | https://ror.org/01h0zpd94 |
| funders[0].display_name | National Natural Science Foundation of China |
| is_xpac | False |
| apc_list.value | 1060 |
| apc_list.currency | GBP |
| apc_list.value_usd | 1300 |
| apc_paid.value | 1060 |
| apc_paid.currency | GBP |
| apc_paid.value_usd | 1300 |
| concepts[0].id | https://openalex.org/C41008148 |
| concepts[0].level | 0 |
| concepts[0].score | 0.7328628301620483 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[0].display_name | Computer science |
| concepts[1].id | https://openalex.org/C75684735 |
| concepts[1].level | 2 |
| concepts[1].score | 0.7236572504043579 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q858810 |
| concepts[1].display_name | Big data |
| concepts[2].id | https://openalex.org/C73555534 |
| concepts[2].level | 2 |
| concepts[2].score | 0.7233508825302124 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q622825 |
| concepts[2].display_name | Cluster analysis |
| concepts[3].id | https://openalex.org/C45340560 |
| concepts[3].level | 2 |
| concepts[3].score | 0.617287814617157 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q215382 |
| concepts[3].display_name | Disjoint sets |
| concepts[4].id | https://openalex.org/C48044578 |
| concepts[4].level | 2 |
| concepts[4].score | 0.49542343616485596 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q727490 |
| concepts[4].display_name | Scalability |
| concepts[5].id | https://openalex.org/C124101348 |
| concepts[5].level | 1 |
| concepts[5].score | 0.4827520251274109 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[5].display_name | Data mining |
| concepts[6].id | https://openalex.org/C164866538 |
| concepts[6].level | 2 |
| concepts[6].score | 0.47820353507995605 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q367351 |
| concepts[6].display_name | Cluster (spacecraft) |
| concepts[7].id | https://openalex.org/C45942800 |
| concepts[7].level | 2 |
| concepts[7].score | 0.47628775238990784 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q245652 |
| concepts[7].display_name | Ensemble learning |
| concepts[8].id | https://openalex.org/C42812 |
| concepts[8].level | 2 |
| concepts[8].score | 0.4711272418498993 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q1082910 |
| concepts[8].display_name | Partition (number theory) |
| concepts[9].id | https://openalex.org/C197129107 |
| concepts[9].level | 2 |
| concepts[9].score | 0.4377145767211914 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q1921621 |
| concepts[9].display_name | Merge (version control) |
| concepts[10].id | https://openalex.org/C154945302 |
| concepts[10].level | 1 |
| concepts[10].score | 0.27778148651123047 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[10].display_name | Artificial intelligence |
| concepts[11].id | https://openalex.org/C33923547 |
| concepts[11].level | 0 |
| concepts[11].score | 0.2049068808555603 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[11].display_name | Mathematics |
| concepts[12].id | https://openalex.org/C77088390 |
| concepts[12].level | 1 |
| concepts[12].score | 0.07603713870048523 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[12].display_name | Database |
| concepts[13].id | https://openalex.org/C23123220 |
| concepts[13].level | 1 |
| concepts[13].score | 0.0 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q816826 |
| concepts[13].display_name | Information retrieval |
| concepts[14].id | https://openalex.org/C114614502 |
| concepts[14].level | 1 |
| concepts[14].score | 0.0 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[14].display_name | Combinatorics |
| concepts[15].id | https://openalex.org/C199360897 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[15].display_name | Programming language |
| keywords[0].id | https://openalex.org/keywords/computer-science |
| keywords[0].score | 0.7328628301620483 |
| keywords[0].display_name | Computer science |
| keywords[1].id | https://openalex.org/keywords/big-data |
| keywords[1].score | 0.7236572504043579 |
| keywords[1].display_name | Big data |
| keywords[2].id | https://openalex.org/keywords/cluster-analysis |
| keywords[2].score | 0.7233508825302124 |
| keywords[2].display_name | Cluster analysis |
| keywords[3].id | https://openalex.org/keywords/disjoint-sets |
| keywords[3].score | 0.617287814617157 |
| keywords[3].display_name | Disjoint sets |
| keywords[4].id | https://openalex.org/keywords/scalability |
| keywords[4].score | 0.49542343616485596 |
| keywords[4].display_name | Scalability |
| keywords[5].id | https://openalex.org/keywords/data-mining |
| keywords[5].score | 0.4827520251274109 |
| keywords[5].display_name | Data mining |
| keywords[6].id | https://openalex.org/keywords/cluster |
| keywords[6].score | 0.47820353507995605 |
| keywords[6].display_name | Cluster (spacecraft) |
| keywords[7].id | https://openalex.org/keywords/ensemble-learning |
| keywords[7].score | 0.47628775238990784 |
| keywords[7].display_name | Ensemble learning |
| keywords[8].id | https://openalex.org/keywords/partition |
| keywords[8].score | 0.4711272418498993 |
| keywords[8].display_name | Partition (number theory) |
| keywords[9].id | https://openalex.org/keywords/merge |
| keywords[9].score | 0.4377145767211914 |
| keywords[9].display_name | Merge (version control) |
| keywords[10].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[10].score | 0.27778148651123047 |
| keywords[10].display_name | Artificial intelligence |
| keywords[11].id | https://openalex.org/keywords/mathematics |
| keywords[11].score | 0.2049068808555603 |
| keywords[11].display_name | Mathematics |
| keywords[12].id | https://openalex.org/keywords/database |
| keywords[12].score | 0.07603713870048523 |
| keywords[12].display_name | Database |
| language | en |
| locations[0].id | doi:10.1186/s40537-023-00709-4 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S2737955091 |
| locations[0].source.issn | 2196-1115 |
| locations[0].source.type | journal |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | 2196-1115 |
| locations[0].source.is_core | True |
| locations[0].source.is_in_doaj | True |
| locations[0].source.display_name | Journal Of Big Data |
| locations[0].source.host_organization | https://openalex.org/P4310319900 |
| locations[0].source.host_organization_name | Springer Science+Business Media |
| locations[0].source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| locations[0].source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| locations[0].license | cc-by |
| locations[0].pdf_url | https://journalofbigdata.springeropen.com/counter/pdf/10.1186/s40537-023-00709-4 |
| locations[0].version | publishedVersion |
| locations[0].raw_type | journal-article |
| locations[0].license_id | https://openalex.org/licenses/cc-by |
| locations[0].is_accepted | True |
| locations[0].is_published | True |
| locations[0].raw_source_name | Journal of Big Data |
| locations[0].landing_page_url | https://doi.org/10.1186/s40537-023-00709-4 |
| locations[1].id | pmh:oai:doaj.org/article:fd86b4da6db542a487c3aef204d0855b |
| locations[1].is_oa | False |
| locations[1].source.id | https://openalex.org/S4306401280 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | False |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | DOAJ (DOAJ: Directory of Open Access Journals) |
| locations[1].source.host_organization | |
| locations[1].source.host_organization_name | |
| locations[1].license | |
| locations[1].pdf_url | |
| locations[1].version | submittedVersion |
| locations[1].raw_type | article |
| locations[1].license_id | |
| locations[1].is_accepted | False |
| locations[1].is_published | False |
| locations[1].raw_source_name | Journal of Big Data, Vol 10, Iss 1, Pp 1-33 (2023) |
| locations[1].landing_page_url | https://doaj.org/article/fd86b4da6db542a487c3aef204d0855b |
| indexed_in | crossref, doaj |
| authorships[0].author.id | https://openalex.org/A5056369956 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-5795-787X |
| authorships[0].author.display_name | Mohammad Sultan Mahmud |
| authorships[0].countries | CN |
| authorships[0].affiliations[0].institution_ids | https://openalex.org/I180726961 |
| authorships[0].affiliations[0].raw_affiliation_string | Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China |
| authorships[0].institutions[0].id | https://openalex.org/I180726961 |
| authorships[0].institutions[0].ror | https://ror.org/01vy4gh70 |
| authorships[0].institutions[0].type | education |
| authorships[0].institutions[0].lineage | https://openalex.org/I180726961 |
| authorships[0].institutions[0].country_code | CN |
| authorships[0].institutions[0].display_name | Shenzhen University |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Mohammad Sultan Mahmud |
| authorships[0].is_corresponding | False |
| authorships[0].raw_affiliation_strings | Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China |
| authorships[1].author.id | https://openalex.org/A5003347359 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-6797-2571 |
| authorships[1].author.display_name | Joshua Zhexue Huang |
| authorships[1].countries | CN |
| authorships[1].affiliations[0].institution_ids | https://openalex.org/I180726961 |
| authorships[1].affiliations[0].raw_affiliation_string | Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China |
| authorships[1].institutions[0].id | https://openalex.org/I180726961 |
| authorships[1].institutions[0].ror | https://ror.org/01vy4gh70 |
| authorships[1].institutions[0].type | education |
| authorships[1].institutions[0].lineage | https://openalex.org/I180726961 |
| authorships[1].institutions[0].country_code | CN |
| authorships[1].institutions[0].display_name | Shenzhen University |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Joshua Zhexue Huang |
| authorships[1].is_corresponding | False |
| authorships[1].raw_affiliation_strings | Big Data Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China |
| authorships[2].author.id | https://openalex.org/A5014049285 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-8373-9542 |
| authorships[2].author.display_name | Rukhsana Ruby |
| authorships[2].countries | CN |
| authorships[2].affiliations[0].institution_ids | https://openalex.org/I4210136793 |
| authorships[2].affiliations[0].raw_affiliation_string | Guangdong Laboratory of Artificial Intelligence and Digital Economy, Shenzhen, 518107, China |
| authorships[2].institutions[0].id | https://openalex.org/I4210136793 |
| authorships[2].institutions[0].ror | https://ror.org/03qdqbt06 |
| authorships[2].institutions[0].type | facility |
| authorships[2].institutions[0].lineage | https://openalex.org/I4210136793 |
| authorships[2].institutions[0].country_code | CN |
| authorships[2].institutions[0].display_name | Peng Cheng Laboratory |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Rukhsana Ruby |
| authorships[2].is_corresponding | False |
| authorships[2].raw_affiliation_strings | Guangdong Laboratory of Artificial Intelligence and Digital Economy, Shenzhen, 518107, China |
| authorships[3].author.id | https://openalex.org/A5001188748 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-2216-0737 |
| authorships[3].author.display_name | Kaishun Wu |
| authorships[3].countries | CN |
| authorships[3].affiliations[0].institution_ids | https://openalex.org/I180726961 |
| authorships[3].affiliations[0].raw_affiliation_string | National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, 518060, China |
| authorships[3].institutions[0].id | https://openalex.org/I180726961 |
| authorships[3].institutions[0].ror | https://ror.org/01vy4gh70 |
| authorships[3].institutions[0].type | education |
| authorships[3].institutions[0].lineage | https://openalex.org/I180726961 |
| authorships[3].institutions[0].country_code | CN |
| authorships[3].institutions[0].display_name | Shenzhen University |
| authorships[3].author_position | last |
| authorships[3].raw_author_name | Kaishun Wu |
| authorships[3].is_corresponding | False |
| authorships[3].raw_affiliation_strings | National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, 518060, China |
| has_content.pdf | True |
| has_content.grobid_xml | True |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://journalofbigdata.springeropen.com/counter/pdf/10.1186/s40537-023-00709-4 |
| open_access.oa_status | gold |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | An ensemble method for estimating the number of clusters in a big data set using multiple random samples |
| has_fulltext | True |
| is_retracted | False |
| updated_date | 2025-11-06T03:46:38.306776 |
| primary_topic.id | https://openalex.org/T10637 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9995999932289124 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Advanced Clustering Algorithms Research |
| related_works | https://openalex.org/W4390608645, https://openalex.org/W4256429076, https://openalex.org/W4247566972, https://openalex.org/W4394895745, https://openalex.org/W1971174658, https://openalex.org/W2960264696, https://openalex.org/W2015634066, https://openalex.org/W2048339306, https://openalex.org/W2028295504, https://openalex.org/W3046629113 |
| cited_by_count | 11 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 4 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 5 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 2 |
| locations_count | 2 |
| best_oa_location.id | doi:10.1186/s40537-023-00709-4 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S2737955091 |
| best_oa_location.source.issn | 2196-1115 |
| best_oa_location.source.type | journal |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | 2196-1115 |
| best_oa_location.source.is_core | True |
| best_oa_location.source.is_in_doaj | True |
| best_oa_location.source.display_name | Journal Of Big Data |
| best_oa_location.source.host_organization | https://openalex.org/P4310319900 |
| best_oa_location.source.host_organization_name | Springer Science+Business Media |
| best_oa_location.source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| best_oa_location.source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| best_oa_location.license | cc-by |
| best_oa_location.pdf_url | https://journalofbigdata.springeropen.com/counter/pdf/10.1186/s40537-023-00709-4 |
| best_oa_location.version | publishedVersion |
| best_oa_location.raw_type | journal-article |
| best_oa_location.license_id | https://openalex.org/licenses/cc-by |
| best_oa_location.is_accepted | True |
| best_oa_location.is_published | True |
| best_oa_location.raw_source_name | Journal of Big Data |
| best_oa_location.landing_page_url | https://doi.org/10.1186/s40537-023-00709-4 |
| primary_location.id | doi:10.1186/s40537-023-00709-4 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S2737955091 |
| primary_location.source.issn | 2196-1115 |
| primary_location.source.type | journal |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | 2196-1115 |
| primary_location.source.is_core | True |
| primary_location.source.is_in_doaj | True |
| primary_location.source.display_name | Journal Of Big Data |
| primary_location.source.host_organization | https://openalex.org/P4310319900 |
| primary_location.source.host_organization_name | Springer Science+Business Media |
| primary_location.source.host_organization_lineage | https://openalex.org/P4310319900, https://openalex.org/P4310319965 |
| primary_location.source.host_organization_lineage_names | Springer Science+Business Media, Springer Nature |
| primary_location.license | cc-by |
| primary_location.pdf_url | https://journalofbigdata.springeropen.com/counter/pdf/10.1186/s40537-023-00709-4 |
| primary_location.version | publishedVersion |
| primary_location.raw_type | journal-article |
| primary_location.license_id | https://openalex.org/licenses/cc-by |
| primary_location.is_accepted | True |
| primary_location.is_published | True |
| primary_location.raw_source_name | Journal of Big Data |
| primary_location.landing_page_url | https://doi.org/10.1186/s40537-023-00709-4 |
| publication_date | 2023-04-01 |
| publication_year | 2023 |
| referenced_works | https://openalex.org/W1996881001, https://openalex.org/W1987971958, https://openalex.org/W2071949631, https://openalex.org/W2884562637, https://openalex.org/W2036816792, https://openalex.org/W2883811944, https://openalex.org/W2773612851, https://openalex.org/W3018761215, https://openalex.org/W3092037388, https://openalex.org/W3112798043, https://openalex.org/W2165835468, https://openalex.org/W2740924709, https://openalex.org/W2952380632, https://openalex.org/W2890125629, https://openalex.org/W2043028407, https://openalex.org/W2118211060, https://openalex.org/W2625743308, https://openalex.org/W1979607399, https://openalex.org/W1967168896, https://openalex.org/W1648123822, https://openalex.org/W2037183202, https://openalex.org/W2116984363, https://openalex.org/W2942755646, https://openalex.org/W1992929897, https://openalex.org/W2806794445, https://openalex.org/W2807006342, https://openalex.org/W1990063425, https://openalex.org/W2153293405, https://openalex.org/W2520619240, https://openalex.org/W2135945534, https://openalex.org/W2051224630, https://openalex.org/W4235169531, https://openalex.org/W2162833336 |
| referenced_works_count | 33 |
| abstract_inverted_index.a | 2, 12, 25, 40, 46, 77, 127, 171 |
| abstract_inverted_index.In | 20, 43 |
| abstract_inverted_index.as | 126 |
| abstract_inverted_index.in | 39, 73, 87, 99, 133 |
| abstract_inverted_index.is | 52, 62, 81, 113, 170, 184 |
| abstract_inverted_index.of | 9, 37, 48, 68, 119, 129, 153, 174, 178 |
| abstract_inverted_index.on | 105, 141 |
| abstract_inverted_index.to | 15, 33, 64, 83, 115, 147, 186 |
| abstract_inverted_index.we | 23 |
| abstract_inverted_index.The | 158 |
| abstract_inverted_index.all | 120 |
| abstract_inverted_index.and | 58, 70, 144, 151, 180 |
| abstract_inverted_index.are | 92 |
| abstract_inverted_index.big | 3, 13, 41, 56, 101, 135, 187 |
| abstract_inverted_index.set | 47, 128 |
| abstract_inverted_index.the | 7, 35, 55, 59, 66, 88, 96, 100, 106, 109, 117, 123, 134, 149, 154, 163, 175, 181 |
| abstract_inverted_index.two | 85 |
| abstract_inverted_index.ball | 79, 107 |
| abstract_inverted_index.both | 142 |
| abstract_inverted_index.each | 74 |
| abstract_inverted_index.from | 54, 95, 166 |
| abstract_inverted_index.into | 122 |
| abstract_inverted_index.many | 16 |
| abstract_inverted_index.real | 145 |
| abstract_inverted_index.same | 97 |
| abstract_inverted_index.show | 161 |
| abstract_inverted_index.that | 91, 162 |
| abstract_inverted_index.this | 21, 44 |
| abstract_inverted_index.used | 63, 114 |
| abstract_inverted_index.were | 139 |
| abstract_inverted_index.RSPCE | 110, 156, 182 |
| abstract_inverted_index.based | 104 |
| abstract_inverted_index.data. | 188 |
| abstract_inverted_index.final | 124 |
| abstract_inverted_index.merge | 84 |
| abstract_inverted_index.model | 80 |
| abstract_inverted_index.Random | 26 |
| abstract_inverted_index.Sample | 27 |
| abstract_inverted_index.actual | 176 |
| abstract_inverted_index.likely | 93 |
| abstract_inverted_index.method | 112 |
| abstract_inverted_index.model, | 108 |
| abstract_inverted_index.number | 8, 36, 67, 177 |
| abstract_inverted_index.paper, | 22 |
| abstract_inverted_index.random | 50, 89, 168 |
| abstract_inverted_index.result | 125, 165 |
| abstract_inverted_index.(RSPCE) | 31 |
| abstract_inverted_index.Centers | 29 |
| abstract_inverted_index.centers | 72, 132 |
| abstract_inverted_index.cluster | 78, 98, 131 |
| abstract_inverted_index.dataset | 4 |
| abstract_inverted_index.initial | 71, 130 |
| abstract_inverted_index.knowing | 6 |
| abstract_inverted_index.propose | 24 |
| abstract_inverted_index.results | 118, 160 |
| abstract_inverted_index.sample. | 75 |
| abstract_inverted_index.sampled | 94 |
| abstract_inverted_index.samples | 51, 90, 121, 169 |
| abstract_inverted_index.without | 5 |
| abstract_inverted_index.Abstract | 0 |
| abstract_inverted_index.Ensemble | 30 |
| abstract_inverted_index.Finally, | 103 |
| abstract_inverted_index.I-niceDP | 60 |
| abstract_inverted_index.clusters | 10, 38, 69, 86 |
| abstract_inverted_index.dataset, | 57 |
| abstract_inverted_index.dataset. | 42, 102, 136 |
| abstract_inverted_index.datasets | 146 |
| abstract_inverted_index.disjoint | 49 |
| abstract_inverted_index.ensemble | 111, 116, 164 |
| abstract_inverted_index.existing | 17 |
| abstract_inverted_index.identify | 34, 65 |
| abstract_inverted_index.multiple | 167 |
| abstract_inverted_index.presents | 11 |
| abstract_inverted_index.proposed | 82, 155 |
| abstract_inverted_index.reliable | 172 |
| abstract_inverted_index.scalable | 185 |
| abstract_inverted_index.selected | 53 |
| abstract_inverted_index.validate | 148 |
| abstract_inverted_index.Intensive | 137 |
| abstract_inverted_index.algorithm | 32, 61, 183 |
| abstract_inverted_index.challenge | 14 |
| abstract_inverted_index.clusters, | 179 |
| abstract_inverted_index.conducted | 140 |
| abstract_inverted_index.synthetic | 143 |
| abstract_inverted_index.Clustering | 1 |
| abstract_inverted_index.algorithm, | 45 |
| abstract_inverted_index.algorithm. | 157 |
| abstract_inverted_index.clustering | 18 |
| abstract_inverted_index.algorithms. | 19 |
| abstract_inverted_index.experiments | 138 |
| abstract_inverted_index.feasibility | 150 |
| abstract_inverted_index.experimental | 159 |
| abstract_inverted_index.Subsequently, | 76 |
| abstract_inverted_index.approximation | 173 |
| abstract_inverted_index.effectiveness | 152 |
| abstract_inverted_index.Partition-based | 28 |
| cited_by_percentile_year.max | 98 |
| cited_by_percentile_year.min | 94 |
| countries_distinct_count | 1 |
| institutions_distinct_count | 4 |
| citation_normalized_percentile.value | 0.90011244 |
| citation_normalized_percentile.is_in_top_1_percent | False |
| citation_normalized_percentile.is_in_top_10_percent | True |