MEDS-Tab: Automated tabularization and baseline methods for MEDS datasets Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2411.00200
Effective, reliable, and scalable development of machine learning (ML) solutions for structured electronic health record (EHR) data requires the ability to reliably generate high-quality baseline models for diverse supervised learning tasks in an efficient and performant manner. Historically, producing such baseline models has been a largely manual effort--individual researchers would need to decide on the particular featurization and tabularization processes to apply to their individual raw, longitudinal data; and then train a supervised model over those data to produce a baseline result to compare novel methods against, all for just one task and one dataset. In this work, powered by complementary advances in core data standardization through the MEDS framework, we dramatically simplify and accelerate this process of tabularizing irregularly sampled time-series data, providing researchers the ability to automatically and scalably featurize and tabularize their longitudinal EHR data across tens of thousands of individual features, hundreds of millions of clinical events, and diverse windowing horizons and aggregation strategies, all before ultimately leveraging these tabular data to automatically produce high-caliber XGBoost baselines in a highly computationally efficient manner. This system scales to dramatically larger datasets than tabularization tools currently available to the community and enables researchers with any MEDS format dataset to immediately begin producing reliable and performant baseline prediction results on various tasks, with minimal human effort required. This system will greatly enhance the reliability, reproducibility, and ease of development of powerful ML solutions for health problems across diverse datasets and clinical settings.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2411.00200
- https://arxiv.org/pdf/2411.00200
- OA Status
- green
- Cited By
- 1
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4404344739
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4404344739Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2411.00200Digital Object Identifier
- Title
-
MEDS-Tab: Automated tabularization and baseline methods for MEDS datasetsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-10-31Full publication date if available
- Authors
-
Nassim Oufattole, Teya Bergamaschi, Aleksia Kolo, Hyewon Jeong, Hanna Gaggin, Collin M. Stultz, Matthew B. A. McDermottList of authors in order
- Landing page
-
https://arxiv.org/abs/2411.00200Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2411.00200Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2411.00200Direct OA link when available
- Concepts
-
Baseline (sea), Computer science, Political science, LawTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
1Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4404344739 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2411.00200 |
| ids.doi | https://doi.org/10.48550/arxiv.2411.00200 |
| ids.openalex | https://openalex.org/W4404344739 |
| fwci | |
| type | preprint |
| title | MEDS-Tab: Automated tabularization and baseline methods for MEDS datasets |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12535 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.2630999982357025 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Machine Learning and Data Classification |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C12725497 |
| concepts[0].level | 2 |
| concepts[0].score | 0.757630467414856 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q810247 |
| concepts[0].display_name | Baseline (sea) |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.43440529704093933 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C17744445 |
| concepts[2].level | 0 |
| concepts[2].score | 0.1845722198486328 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q36442 |
| concepts[2].display_name | Political science |
| concepts[3].id | https://openalex.org/C199539241 |
| concepts[3].level | 1 |
| concepts[3].score | 0.0 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q7748 |
| concepts[3].display_name | Law |
| keywords[0].id | https://openalex.org/keywords/baseline |
| keywords[0].score | 0.757630467414856 |
| keywords[0].display_name | Baseline (sea) |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.43440529704093933 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/political-science |
| keywords[2].score | 0.1845722198486328 |
| keywords[2].display_name | Political science |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2411.00200 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2411.00200 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2411.00200 |
| locations[1].id | doi:10.48550/arxiv.2411.00200 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2411.00200 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5059929364 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Nassim Oufattole |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Oufattole, Nassim |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5114634775 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Teya Bergamaschi |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Bergamaschi, Teya |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5028540249 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Aleksia Kolo |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Kolo, Aleksia |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5101998939 |
| authorships[3].author.orcid | https://orcid.org/0000-0003-1230-870X |
| authorships[3].author.display_name | Hyewon Jeong |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Jeong, Hyewon |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5114634776 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Hanna Gaggin |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Gaggin, Hanna |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5024941370 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-3415-242X |
| authorships[5].author.display_name | Collin M. Stultz |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Stultz, Collin M. |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5083993242 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-6048-9707 |
| authorships[6].author.display_name | Matthew B. A. McDermott |
| authorships[6].author_position | last |
| authorships[6].raw_author_name | McDermott, Matthew B. A. |
| authorships[6].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2411.00200 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2024-11-14T00:00:00 |
| display_name | MEDS-Tab: Automated tabularization and baseline methods for MEDS datasets |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12535 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.2630999982357025 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Machine Learning and Data Classification |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2383111961, https://openalex.org/W2365952365, https://openalex.org/W2352448290, https://openalex.org/W2380820513, https://openalex.org/W2913146933, https://openalex.org/W2372385138, https://openalex.org/W4296359239 |
| cited_by_count | 1 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2411.00200 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2411.00200 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2411.00200 |
| primary_location.id | pmh:oai:arXiv.org:2411.00200 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2411.00200 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2411.00200 |
| publication_date | 2024-10-31 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 44, 71, 79, 172 |
| abstract_inverted_index.In | 95 |
| abstract_inverted_index.ML | 232 |
| abstract_inverted_index.an | 32 |
| abstract_inverted_index.by | 99 |
| abstract_inverted_index.in | 31, 102, 171 |
| abstract_inverted_index.of | 5, 117, 140, 142, 146, 148, 228, 230 |
| abstract_inverted_index.on | 53, 210 |
| abstract_inverted_index.to | 20, 51, 60, 62, 77, 82, 127, 165, 180, 189, 200 |
| abstract_inverted_index.we | 110 |
| abstract_inverted_index.EHR | 136 |
| abstract_inverted_index.all | 87, 158 |
| abstract_inverted_index.and | 2, 34, 57, 68, 92, 113, 129, 132, 151, 155, 192, 205, 226, 240 |
| abstract_inverted_index.any | 196 |
| abstract_inverted_index.for | 10, 26, 88, 234 |
| abstract_inverted_index.has | 42 |
| abstract_inverted_index.one | 90, 93 |
| abstract_inverted_index.the | 18, 54, 107, 125, 190, 223 |
| abstract_inverted_index.(ML) | 8 |
| abstract_inverted_index.MEDS | 108, 197 |
| abstract_inverted_index.This | 177, 218 |
| abstract_inverted_index.been | 43 |
| abstract_inverted_index.core | 103 |
| abstract_inverted_index.data | 16, 76, 104, 137, 164 |
| abstract_inverted_index.ease | 227 |
| abstract_inverted_index.just | 89 |
| abstract_inverted_index.need | 50 |
| abstract_inverted_index.over | 74 |
| abstract_inverted_index.raw, | 65 |
| abstract_inverted_index.such | 39 |
| abstract_inverted_index.task | 91 |
| abstract_inverted_index.tens | 139 |
| abstract_inverted_index.than | 184 |
| abstract_inverted_index.then | 69 |
| abstract_inverted_index.this | 96, 115 |
| abstract_inverted_index.will | 220 |
| abstract_inverted_index.with | 195, 213 |
| abstract_inverted_index.(EHR) | 15 |
| abstract_inverted_index.apply | 61 |
| abstract_inverted_index.begin | 202 |
| abstract_inverted_index.data, | 122 |
| abstract_inverted_index.data; | 67 |
| abstract_inverted_index.human | 215 |
| abstract_inverted_index.model | 73 |
| abstract_inverted_index.novel | 84 |
| abstract_inverted_index.tasks | 30 |
| abstract_inverted_index.their | 63, 134 |
| abstract_inverted_index.these | 162 |
| abstract_inverted_index.those | 75 |
| abstract_inverted_index.tools | 186 |
| abstract_inverted_index.train | 70 |
| abstract_inverted_index.work, | 97 |
| abstract_inverted_index.would | 49 |
| abstract_inverted_index.across | 138, 237 |
| abstract_inverted_index.before | 159 |
| abstract_inverted_index.decide | 52 |
| abstract_inverted_index.effort | 216 |
| abstract_inverted_index.format | 198 |
| abstract_inverted_index.health | 13, 235 |
| abstract_inverted_index.highly | 173 |
| abstract_inverted_index.larger | 182 |
| abstract_inverted_index.manual | 46 |
| abstract_inverted_index.models | 25, 41 |
| abstract_inverted_index.record | 14 |
| abstract_inverted_index.result | 81 |
| abstract_inverted_index.scales | 179 |
| abstract_inverted_index.system | 178, 219 |
| abstract_inverted_index.tasks, | 212 |
| abstract_inverted_index.XGBoost | 169 |
| abstract_inverted_index.ability | 19, 126 |
| abstract_inverted_index.compare | 83 |
| abstract_inverted_index.dataset | 199 |
| abstract_inverted_index.diverse | 27, 152, 238 |
| abstract_inverted_index.enables | 193 |
| abstract_inverted_index.enhance | 222 |
| abstract_inverted_index.events, | 150 |
| abstract_inverted_index.greatly | 221 |
| abstract_inverted_index.largely | 45 |
| abstract_inverted_index.machine | 6 |
| abstract_inverted_index.manner. | 36, 176 |
| abstract_inverted_index.methods | 85 |
| abstract_inverted_index.minimal | 214 |
| abstract_inverted_index.powered | 98 |
| abstract_inverted_index.process | 116 |
| abstract_inverted_index.produce | 78, 167 |
| abstract_inverted_index.results | 209 |
| abstract_inverted_index.sampled | 120 |
| abstract_inverted_index.tabular | 163 |
| abstract_inverted_index.through | 106 |
| abstract_inverted_index.various | 211 |
| abstract_inverted_index.advances | 101 |
| abstract_inverted_index.against, | 86 |
| abstract_inverted_index.baseline | 24, 40, 80, 207 |
| abstract_inverted_index.clinical | 149, 241 |
| abstract_inverted_index.dataset. | 94 |
| abstract_inverted_index.datasets | 183, 239 |
| abstract_inverted_index.generate | 22 |
| abstract_inverted_index.horizons | 154 |
| abstract_inverted_index.hundreds | 145 |
| abstract_inverted_index.learning | 7, 29 |
| abstract_inverted_index.millions | 147 |
| abstract_inverted_index.powerful | 231 |
| abstract_inverted_index.problems | 236 |
| abstract_inverted_index.reliable | 204 |
| abstract_inverted_index.reliably | 21 |
| abstract_inverted_index.requires | 17 |
| abstract_inverted_index.scalable | 3 |
| abstract_inverted_index.scalably | 130 |
| abstract_inverted_index.simplify | 112 |
| abstract_inverted_index.available | 188 |
| abstract_inverted_index.baselines | 170 |
| abstract_inverted_index.community | 191 |
| abstract_inverted_index.currently | 187 |
| abstract_inverted_index.efficient | 33, 175 |
| abstract_inverted_index.features, | 144 |
| abstract_inverted_index.featurize | 131 |
| abstract_inverted_index.processes | 59 |
| abstract_inverted_index.producing | 38, 203 |
| abstract_inverted_index.providing | 123 |
| abstract_inverted_index.reliable, | 1 |
| abstract_inverted_index.required. | 217 |
| abstract_inverted_index.settings. | 242 |
| abstract_inverted_index.solutions | 9, 233 |
| abstract_inverted_index.thousands | 141 |
| abstract_inverted_index.windowing | 153 |
| abstract_inverted_index.Effective, | 0 |
| abstract_inverted_index.accelerate | 114 |
| abstract_inverted_index.electronic | 12 |
| abstract_inverted_index.framework, | 109 |
| abstract_inverted_index.individual | 64, 143 |
| abstract_inverted_index.leveraging | 161 |
| abstract_inverted_index.particular | 55 |
| abstract_inverted_index.performant | 35, 206 |
| abstract_inverted_index.prediction | 208 |
| abstract_inverted_index.structured | 11 |
| abstract_inverted_index.supervised | 28, 72 |
| abstract_inverted_index.tabularize | 133 |
| abstract_inverted_index.ultimately | 160 |
| abstract_inverted_index.aggregation | 156 |
| abstract_inverted_index.development | 4, 229 |
| abstract_inverted_index.immediately | 201 |
| abstract_inverted_index.irregularly | 119 |
| abstract_inverted_index.researchers | 48, 124, 194 |
| abstract_inverted_index.strategies, | 157 |
| abstract_inverted_index.time-series | 121 |
| abstract_inverted_index.dramatically | 111, 181 |
| abstract_inverted_index.high-caliber | 168 |
| abstract_inverted_index.high-quality | 23 |
| abstract_inverted_index.longitudinal | 66, 135 |
| abstract_inverted_index.reliability, | 224 |
| abstract_inverted_index.tabularizing | 118 |
| abstract_inverted_index.Historically, | 37 |
| abstract_inverted_index.automatically | 128, 166 |
| abstract_inverted_index.complementary | 100 |
| abstract_inverted_index.featurization | 56 |
| abstract_inverted_index.tabularization | 58, 185 |
| abstract_inverted_index.computationally | 174 |
| abstract_inverted_index.standardization | 105 |
| abstract_inverted_index.reproducibility, | 225 |
| abstract_inverted_index.effort--individual | 47 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 7 |
| citation_normalized_percentile |