Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods Article Swipe
YOU?
·
· 2024
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2404.00826
Social determinants of health (SDoH) play a critical role in shaping health outcomes, particularly in pediatric populations where interventions can have long-term implications. SDoH are frequently studied in the Electronic Health Record (EHR), which provides a rich repository for diverse patient data. In this work, we present a novel annotated corpus, the Pediatric Social History Annotation Corpus (PedSHAC), and evaluate the automatic extraction of detailed SDoH representations using fine-tuned and in-context learning methods with Large Language Models (LLMs). PedSHAC comprises annotated social history sections from 1,260 clinical notes obtained from pediatric patients within the University of Washington (UW) hospital system. Employing an event-based annotation scheme, PedSHAC captures ten distinct health determinants to encompass living and economic stability, prior trauma, education access, substance use history, and mental health with an overall annotator agreement of 81.9 F1. Our proposed fine-tuning LLM-based extractors achieve high performance at 78.4 F1 for event arguments. In-context learning approaches with GPT-4 demonstrate promise for reliable SDoH extraction with limited annotated examples, with extraction performance at 82.3 F1 for event triggers.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2404.00826
- https://arxiv.org/pdf/2404.00826
- OA Status
- green
- Cited By
- 2
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4393725148
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4393725148Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2404.00826Digital Object Identifier
- Title
-
Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and MethodsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2024Year of publication
- Publication date
-
2024-03-31Full publication date if available
- Authors
-
Yujuan Fu, Giridhar Kaushik Ramachandran, Nicholas J Dobbins, Namu Park, Michael Leu, Abby R. Rosenberg, Kevin Lybarger, Fei Xia, Özlem Uzuner, Meliha YetişgenList of authors in order
- Landing page
-
https://arxiv.org/abs/2404.00826Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2404.00826Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2404.00826Direct OA link when available
- Concepts
-
Natural language processing, Social determinants of health, Computer science, Linguistics, Medicine, Public health, Pathology, PhilosophyTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
2Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 1, 2024: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4393725148 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2404.00826 |
| ids.doi | https://doi.org/10.48550/arxiv.2404.00826 |
| ids.openalex | https://openalex.org/W4393725148 |
| fwci | |
| type | preprint |
| title | Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T11610 |
| topics[0].field.id | https://openalex.org/fields/36 |
| topics[0].field.display_name | Health Professions |
| topics[0].score | 0.9462000131607056 |
| topics[0].domain.id | https://openalex.org/domains/4 |
| topics[0].domain.display_name | Health Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/3600 |
| topics[0].subfield.display_name | General Health Professions |
| topics[0].display_name | Food Security and Health in Diverse Populations |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C204321447 |
| concepts[0].level | 1 |
| concepts[0].score | 0.45427852869033813 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[0].display_name | Natural language processing |
| concepts[1].id | https://openalex.org/C78491826 |
| concepts[1].level | 3 |
| concepts[1].score | 0.44062745571136475 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q3045352 |
| concepts[1].display_name | Social determinants of health |
| concepts[2].id | https://openalex.org/C41008148 |
| concepts[2].level | 0 |
| concepts[2].score | 0.4011094570159912 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[2].display_name | Computer science |
| concepts[3].id | https://openalex.org/C41895202 |
| concepts[3].level | 1 |
| concepts[3].score | 0.33887597918510437 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q8162 |
| concepts[3].display_name | Linguistics |
| concepts[4].id | https://openalex.org/C71924100 |
| concepts[4].level | 0 |
| concepts[4].score | 0.2196812629699707 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q11190 |
| concepts[4].display_name | Medicine |
| concepts[5].id | https://openalex.org/C138816342 |
| concepts[5].level | 2 |
| concepts[5].score | 0.08985158801078796 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q189603 |
| concepts[5].display_name | Public health |
| concepts[6].id | https://openalex.org/C142724271 |
| concepts[6].level | 1 |
| concepts[6].score | 0.0844348669052124 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q7208 |
| concepts[6].display_name | Pathology |
| concepts[7].id | https://openalex.org/C138885662 |
| concepts[7].level | 0 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[7].display_name | Philosophy |
| keywords[0].id | https://openalex.org/keywords/natural-language-processing |
| keywords[0].score | 0.45427852869033813 |
| keywords[0].display_name | Natural language processing |
| keywords[1].id | https://openalex.org/keywords/social-determinants-of-health |
| keywords[1].score | 0.44062745571136475 |
| keywords[1].display_name | Social determinants of health |
| keywords[2].id | https://openalex.org/keywords/computer-science |
| keywords[2].score | 0.4011094570159912 |
| keywords[2].display_name | Computer science |
| keywords[3].id | https://openalex.org/keywords/linguistics |
| keywords[3].score | 0.33887597918510437 |
| keywords[3].display_name | Linguistics |
| keywords[4].id | https://openalex.org/keywords/medicine |
| keywords[4].score | 0.2196812629699707 |
| keywords[4].display_name | Medicine |
| keywords[5].id | https://openalex.org/keywords/public-health |
| keywords[5].score | 0.08985158801078796 |
| keywords[5].display_name | Public health |
| keywords[6].id | https://openalex.org/keywords/pathology |
| keywords[6].score | 0.0844348669052124 |
| keywords[6].display_name | Pathology |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2404.00826 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2404.00826 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2404.00826 |
| locations[1].id | doi:10.48550/arxiv.2404.00826 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2404.00826 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102542035 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Yujuan Fu |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Fu, Yujuan |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5009919464 |
| authorships[1].author.orcid | https://orcid.org/0000-0002-9800-6149 |
| authorships[1].author.display_name | Giridhar Kaushik Ramachandran |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ramachandran, Giridhar Kaushik |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5043896692 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-3598-8747 |
| authorships[2].author.display_name | Nicholas J Dobbins |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Dobbins, Nicholas J |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5037431914 |
| authorships[3].author.orcid | https://orcid.org/0009-0002-4646-2561 |
| authorships[3].author.display_name | Namu Park |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Park, Namu |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5094347559 |
| authorships[4].author.orcid | |
| authorships[4].author.display_name | Michael Leu |
| authorships[4].author_position | middle |
| authorships[4].raw_author_name | Leu, Michael |
| authorships[4].is_corresponding | False |
| authorships[5].author.id | https://openalex.org/A5069092052 |
| authorships[5].author.orcid | https://orcid.org/0000-0002-7608-607X |
| authorships[5].author.display_name | Abby R. Rosenberg |
| authorships[5].author_position | middle |
| authorships[5].raw_author_name | Rosenberg, Abby R. |
| authorships[5].is_corresponding | False |
| authorships[6].author.id | https://openalex.org/A5060395338 |
| authorships[6].author.orcid | https://orcid.org/0000-0001-5798-2664 |
| authorships[6].author.display_name | Kevin Lybarger |
| authorships[6].author_position | middle |
| authorships[6].raw_author_name | Lybarger, Kevin |
| authorships[6].is_corresponding | False |
| authorships[7].author.id | https://openalex.org/A5100676785 |
| authorships[7].author.orcid | https://orcid.org/0000-0003-4343-1444 |
| authorships[7].author.display_name | Fei Xia |
| authorships[7].author_position | middle |
| authorships[7].raw_author_name | Xia, Fei |
| authorships[7].is_corresponding | False |
| authorships[8].author.id | https://openalex.org/A5070926324 |
| authorships[8].author.orcid | https://orcid.org/0000-0001-8011-9850 |
| authorships[8].author.display_name | Özlem Uzuner |
| authorships[8].author_position | middle |
| authorships[8].raw_author_name | Uzuner, Ozlem |
| authorships[8].is_corresponding | False |
| authorships[9].author.id | https://openalex.org/A5002548520 |
| authorships[9].author.orcid | https://orcid.org/0000-0001-9919-9811 |
| authorships[9].author.display_name | Meliha Yetişgen |
| authorships[9].author_position | last |
| authorships[9].raw_author_name | Yetisgen, Meliha |
| authorships[9].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2404.00826 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Extracting Social Determinants of Health from Pediatric Patient Notes Using Large Language Models: Novel Corpus and Methods |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T11610 |
| primary_topic.field.id | https://openalex.org/fields/36 |
| primary_topic.field.display_name | Health Professions |
| primary_topic.score | 0.9462000131607056 |
| primary_topic.domain.id | https://openalex.org/domains/4 |
| primary_topic.domain.display_name | Health Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/3600 |
| primary_topic.subfield.display_name | General Health Professions |
| primary_topic.display_name | Food Security and Health in Diverse Populations |
| related_works | https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W2358668433, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W2382290278, https://openalex.org/W2478288626, https://openalex.org/W4391913857, https://openalex.org/W2350741829, https://openalex.org/W2530322880 |
| cited_by_count | 2 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 1 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2404.00826 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2404.00826 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2404.00826 |
| primary_location.id | pmh:oai:arXiv.org:2404.00826 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2404.00826 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2404.00826 |
| publication_date | 2024-03-31 |
| publication_year | 2024 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 6, 35, 47 |
| abstract_inverted_index.F1 | 145, 169 |
| abstract_inverted_index.In | 42 |
| abstract_inverted_index.an | 101, 128 |
| abstract_inverted_index.at | 143, 167 |
| abstract_inverted_index.in | 9, 14, 27 |
| abstract_inverted_index.of | 2, 63, 95, 132 |
| abstract_inverted_index.to | 111 |
| abstract_inverted_index.we | 45 |
| abstract_inverted_index.F1. | 134 |
| abstract_inverted_index.Our | 135 |
| abstract_inverted_index.and | 58, 69, 114, 124 |
| abstract_inverted_index.are | 24 |
| abstract_inverted_index.can | 19 |
| abstract_inverted_index.for | 38, 146, 156, 170 |
| abstract_inverted_index.ten | 107 |
| abstract_inverted_index.the | 28, 51, 60, 93 |
| abstract_inverted_index.use | 122 |
| abstract_inverted_index.(UW) | 97 |
| abstract_inverted_index.78.4 | 144 |
| abstract_inverted_index.81.9 | 133 |
| abstract_inverted_index.82.3 | 168 |
| abstract_inverted_index.SDoH | 23, 65, 158 |
| abstract_inverted_index.from | 84, 89 |
| abstract_inverted_index.have | 20 |
| abstract_inverted_index.high | 141 |
| abstract_inverted_index.play | 5 |
| abstract_inverted_index.rich | 36 |
| abstract_inverted_index.role | 8 |
| abstract_inverted_index.this | 43 |
| abstract_inverted_index.with | 73, 127, 152, 160, 164 |
| abstract_inverted_index.1,260 | 85 |
| abstract_inverted_index.GPT-4 | 153 |
| abstract_inverted_index.Large | 74 |
| abstract_inverted_index.data. | 41 |
| abstract_inverted_index.event | 147, 171 |
| abstract_inverted_index.notes | 87 |
| abstract_inverted_index.novel | 48 |
| abstract_inverted_index.prior | 117 |
| abstract_inverted_index.using | 67 |
| abstract_inverted_index.where | 17 |
| abstract_inverted_index.which | 33 |
| abstract_inverted_index.work, | 44 |
| abstract_inverted_index.(EHR), | 32 |
| abstract_inverted_index.(SDoH) | 4 |
| abstract_inverted_index.Corpus | 56 |
| abstract_inverted_index.Health | 30 |
| abstract_inverted_index.Models | 76 |
| abstract_inverted_index.Record | 31 |
| abstract_inverted_index.Social | 0, 53 |
| abstract_inverted_index.health | 3, 11, 109, 126 |
| abstract_inverted_index.living | 113 |
| abstract_inverted_index.mental | 125 |
| abstract_inverted_index.social | 81 |
| abstract_inverted_index.within | 92 |
| abstract_inverted_index.(LLMs). | 77 |
| abstract_inverted_index.History | 54 |
| abstract_inverted_index.PedSHAC | 78, 105 |
| abstract_inverted_index.access, | 120 |
| abstract_inverted_index.achieve | 140 |
| abstract_inverted_index.corpus, | 50 |
| abstract_inverted_index.diverse | 39 |
| abstract_inverted_index.history | 82 |
| abstract_inverted_index.limited | 161 |
| abstract_inverted_index.methods | 72 |
| abstract_inverted_index.overall | 129 |
| abstract_inverted_index.patient | 40 |
| abstract_inverted_index.present | 46 |
| abstract_inverted_index.promise | 155 |
| abstract_inverted_index.scheme, | 104 |
| abstract_inverted_index.shaping | 10 |
| abstract_inverted_index.studied | 26 |
| abstract_inverted_index.system. | 99 |
| abstract_inverted_index.trauma, | 118 |
| abstract_inverted_index.Language | 75 |
| abstract_inverted_index.captures | 106 |
| abstract_inverted_index.clinical | 86 |
| abstract_inverted_index.critical | 7 |
| abstract_inverted_index.detailed | 64 |
| abstract_inverted_index.distinct | 108 |
| abstract_inverted_index.economic | 115 |
| abstract_inverted_index.evaluate | 59 |
| abstract_inverted_index.history, | 123 |
| abstract_inverted_index.hospital | 98 |
| abstract_inverted_index.learning | 71, 150 |
| abstract_inverted_index.obtained | 88 |
| abstract_inverted_index.patients | 91 |
| abstract_inverted_index.proposed | 136 |
| abstract_inverted_index.provides | 34 |
| abstract_inverted_index.reliable | 157 |
| abstract_inverted_index.sections | 83 |
| abstract_inverted_index.Employing | 100 |
| abstract_inverted_index.LLM-based | 138 |
| abstract_inverted_index.Pediatric | 52 |
| abstract_inverted_index.agreement | 131 |
| abstract_inverted_index.annotated | 49, 80, 162 |
| abstract_inverted_index.annotator | 130 |
| abstract_inverted_index.automatic | 61 |
| abstract_inverted_index.comprises | 79 |
| abstract_inverted_index.education | 119 |
| abstract_inverted_index.encompass | 112 |
| abstract_inverted_index.examples, | 163 |
| abstract_inverted_index.long-term | 21 |
| abstract_inverted_index.outcomes, | 12 |
| abstract_inverted_index.pediatric | 15, 90 |
| abstract_inverted_index.substance | 121 |
| abstract_inverted_index.triggers. | 172 |
| abstract_inverted_index.(PedSHAC), | 57 |
| abstract_inverted_index.Annotation | 55 |
| abstract_inverted_index.Electronic | 29 |
| abstract_inverted_index.In-context | 149 |
| abstract_inverted_index.University | 94 |
| abstract_inverted_index.Washington | 96 |
| abstract_inverted_index.annotation | 103 |
| abstract_inverted_index.approaches | 151 |
| abstract_inverted_index.arguments. | 148 |
| abstract_inverted_index.extraction | 62, 159, 165 |
| abstract_inverted_index.extractors | 139 |
| abstract_inverted_index.fine-tuned | 68 |
| abstract_inverted_index.frequently | 25 |
| abstract_inverted_index.in-context | 70 |
| abstract_inverted_index.repository | 37 |
| abstract_inverted_index.stability, | 116 |
| abstract_inverted_index.demonstrate | 154 |
| abstract_inverted_index.event-based | 102 |
| abstract_inverted_index.fine-tuning | 137 |
| abstract_inverted_index.performance | 142, 166 |
| abstract_inverted_index.populations | 16 |
| abstract_inverted_index.determinants | 1, 110 |
| abstract_inverted_index.particularly | 13 |
| abstract_inverted_index.implications. | 22 |
| abstract_inverted_index.interventions | 18 |
| abstract_inverted_index.representations | 66 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 10 |
| citation_normalized_percentile |