Constructing Decision Trees from Data Streams Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2403.19867
In this work, we present data stream algorithms to compute optimal splits for decision tree learning. In particular, given a data stream of observations \(x_i\) and their corresponding labels \(y_i\), without the i.i.d. assumption, the objective is to identify the optimal split \(j\) that partitions the data into two sets, minimizing the mean squared error (for regression) or the misclassification rate and Gini impurity (for classification). We propose several efficient streaming algorithms that require sublinear space and use a small number of passes to solve these problems. These algorithms can also be extended to the MapReduce model. Our results, while not directly comparable, complements the seminal work of Domingos-Hulten (KDD 2000) and Hulten-Spencer-Domingos (KDD 2001).
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2403.19867
- https://arxiv.org/pdf/2403.19867
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393399066
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393399066Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2403.19867Digital Object Identifier
- Title
-
Constructing Decision Trees from Data StreamsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-03-28Full publication date if available
- Authors
-
Huy Pham, Hoang Ta, Hoa T. VuList of authors in order
- Landing page
-
https://arxiv.org/abs/2403.19867Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2403.19867Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2403.19867Direct OA link when available
- Concepts
-
Massively parallel, Computer science, Decision tree, Tree (set theory), Parallel computing, Data mining, Mathematics, CombinatoricsTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393399066 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2403.19867 |
| ids.doi | https://doi.org/10.48550/arxiv.2403.19867 |
| ids.openalex | https://openalex.org/W4393399066 |
| fwci | |
| type | preprint |
| title | Constructing Decision Trees from Data Streams |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T10317 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9911999702453613 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1705 |
| topics[0].subfield.display_name | Computer Networks and Communications |
| topics[0].display_name | Advanced Database Systems and Queries |
| topics[1].id | https://openalex.org/T11106 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9779000282287598 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1711 |
| topics[1].subfield.display_name | Signal Processing |
| topics[1].display_name | Data Management and Algorithms |
| topics[2].id | https://openalex.org/T12761 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9707000255584717 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Data Stream Mining Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C190475519 |
| concepts[0].level | 2 |
| concepts[0].score | 0.7015042304992676 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q544384 |
| concepts[0].display_name | Massively parallel |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.645659863948822 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C84525736 |
| concepts[2].level | 2 |
| concepts[2].score | 0.6176921129226685 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q831366 |
| concepts[2].display_name | Decision tree |
| concepts[3].id | https://openalex.org/C113174947 |
| concepts[3].level | 2 |
| concepts[3].score | 0.4674352705478668 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q2859736 |
| concepts[3].display_name | Tree (set theory) |
| concepts[4].id | https://openalex.org/C173608175 |
| concepts[4].level | 1 |
| concepts[4].score | 0.36261361837387085 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q232661 |
| concepts[4].display_name | Parallel computing |
| concepts[5].id | https://openalex.org/C124101348 |
| concepts[5].level | 1 |
| concepts[5].score | 0.24253100156784058 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q172491 |
| concepts[5].display_name | Data mining |
| concepts[6].id | https://openalex.org/C33923547 |
| concepts[6].level | 0 |
| concepts[6].score | 0.1386147439479828 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[6].display_name | Mathematics |
| concepts[7].id | https://openalex.org/C114614502 |
| concepts[7].level | 1 |
| concepts[7].score | 0.09071439504623413 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q76592 |
| concepts[7].display_name | Combinatorics |
| keywords[0].id | https://openalex.org/keywords/massively-parallel |
| keywords[0].score | 0.7015042304992676 |
| keywords[0].display_name | Massively parallel |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.645659863948822 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/decision-tree |
| keywords[2].score | 0.6176921129226685 |
| keywords[2].display_name | Decision tree |
| keywords[3].id | https://openalex.org/keywords/tree |
| keywords[3].score | 0.4674352705478668 |
| keywords[3].display_name | Tree (set theory) |
| keywords[4].id | https://openalex.org/keywords/parallel-computing |
| keywords[4].score | 0.36261361837387085 |
| keywords[4].display_name | Parallel computing |
| keywords[5].id | https://openalex.org/keywords/data-mining |
| keywords[5].score | 0.24253100156784058 |
| keywords[5].display_name | Data mining |
| keywords[6].id | https://openalex.org/keywords/mathematics |
| keywords[6].score | 0.1386147439479828 |
| keywords[6].display_name | Mathematics |
| keywords[7].id | https://openalex.org/keywords/combinatorics |
| keywords[7].score | 0.09071439504623413 |
| keywords[7].display_name | Combinatorics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2403.19867 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2403.19867 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2403.19867 |
| locations[1].id | doi:10.48550/arxiv.2403.19867 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2403.19867 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5042503438 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-0565-1924 |
| authorships[0].author.display_name | Huy Pham |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Pham, Huy |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5094302499 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Hoang Ta |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ta, Hoang |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5034746607 |
| authorships[2].author.orcid | https://orcid.org/0000-0001-8873-0208 |
| authorships[2].author.display_name | Hoa T. Vu |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Vu, Hoa T. |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2403.19867 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-04-02T00:00:00 |
| display_name | Constructing Decision Trees from Data Streams |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T10317 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9911999702453613 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1705 |
| primary_topic.subfield.display_name | Computer Networks and Communications |
| primary_topic.display_name | Advanced Database Systems and Queries |
| related_works | https://openalex.org/W2023839151, https://openalex.org/W1774183074, https://openalex.org/W2057488824, https://openalex.org/W2334687145, https://openalex.org/W2178011914, https://openalex.org/W4235962491, https://openalex.org/W2061778832, https://openalex.org/W1513001507, https://openalex.org/W2257153718, https://openalex.org/W3134702077 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2403.19867 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2403.19867 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2403.19867 |
| primary_location.id | pmh:oai:arXiv.org:2403.19867 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2403.19867 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2403.19867 |
| publication_date | 2024-03-28 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 19, 78 |
| abstract_inverted_index.In | 0, 16 |
| abstract_inverted_index.We | 66 |
| abstract_inverted_index.be | 91 |
| abstract_inverted_index.is | 36 |
| abstract_inverted_index.of | 22, 81, 107 |
| abstract_inverted_index.or | 57 |
| abstract_inverted_index.to | 8, 37, 83, 93 |
| abstract_inverted_index.we | 3 |
| abstract_inverted_index.Our | 97 |
| abstract_inverted_index.and | 25, 61, 76, 111 |
| abstract_inverted_index.can | 89 |
| abstract_inverted_index.for | 12 |
| abstract_inverted_index.not | 100 |
| abstract_inverted_index.the | 31, 34, 39, 45, 51, 58, 94, 104 |
| abstract_inverted_index.two | 48 |
| abstract_inverted_index.use | 77 |
| abstract_inverted_index.(KDD | 109, 113 |
| abstract_inverted_index.(for | 55, 64 |
| abstract_inverted_index.Gini | 62 |
| abstract_inverted_index.also | 90 |
| abstract_inverted_index.data | 5, 20, 46 |
| abstract_inverted_index.into | 47 |
| abstract_inverted_index.mean | 52 |
| abstract_inverted_index.rate | 60 |
| abstract_inverted_index.that | 43, 72 |
| abstract_inverted_index.this | 1 |
| abstract_inverted_index.tree | 14 |
| abstract_inverted_index.work | 106 |
| abstract_inverted_index.2000) | 110 |
| abstract_inverted_index.These | 87 |
| abstract_inverted_index.\(j\) | 42 |
| abstract_inverted_index.error | 54 |
| abstract_inverted_index.given | 18 |
| abstract_inverted_index.sets, | 49 |
| abstract_inverted_index.small | 79 |
| abstract_inverted_index.solve | 84 |
| abstract_inverted_index.space | 75 |
| abstract_inverted_index.split | 41 |
| abstract_inverted_index.their | 26 |
| abstract_inverted_index.these | 85 |
| abstract_inverted_index.while | 99 |
| abstract_inverted_index.work, | 2 |
| abstract_inverted_index.2001). | 114 |
| abstract_inverted_index.i.i.d. | 32 |
| abstract_inverted_index.labels | 28 |
| abstract_inverted_index.model. | 96 |
| abstract_inverted_index.number | 80 |
| abstract_inverted_index.passes | 82 |
| abstract_inverted_index.splits | 11 |
| abstract_inverted_index.stream | 6, 21 |
| abstract_inverted_index.\(x_i\) | 24 |
| abstract_inverted_index.compute | 9 |
| abstract_inverted_index.optimal | 10, 40 |
| abstract_inverted_index.present | 4 |
| abstract_inverted_index.propose | 67 |
| abstract_inverted_index.require | 73 |
| abstract_inverted_index.seminal | 105 |
| abstract_inverted_index.several | 68 |
| abstract_inverted_index.squared | 53 |
| abstract_inverted_index.without | 30 |
| abstract_inverted_index.\(y_i\), | 29 |
| abstract_inverted_index.decision | 13 |
| abstract_inverted_index.directly | 101 |
| abstract_inverted_index.extended | 92 |
| abstract_inverted_index.identify | 38 |
| abstract_inverted_index.impurity | 63 |
| abstract_inverted_index.results, | 98 |
| abstract_inverted_index.MapReduce | 95 |
| abstract_inverted_index.efficient | 69 |
| abstract_inverted_index.learning. | 15 |
| abstract_inverted_index.objective | 35 |
| abstract_inverted_index.problems. | 86 |
| abstract_inverted_index.streaming | 70 |
| abstract_inverted_index.sublinear | 74 |
| abstract_inverted_index.algorithms | 7, 71, 88 |
| abstract_inverted_index.minimizing | 50 |
| abstract_inverted_index.partitions | 44 |
| abstract_inverted_index.assumption, | 33 |
| abstract_inverted_index.comparable, | 102 |
| abstract_inverted_index.complements | 103 |
| abstract_inverted_index.particular, | 17 |
| abstract_inverted_index.regression) | 56 |
| abstract_inverted_index.observations | 23 |
| abstract_inverted_index.corresponding | 27 |
| abstract_inverted_index.Domingos-Hulten | 108 |
| abstract_inverted_index.classification). | 65 |
| abstract_inverted_index.misclassification | 59 |
| abstract_inverted_index.Hulten-Spencer-Domingos | 112 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |