Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions Article Swipe

PDF

Hanjie Chen , Zhouxiang Fang , Yash Singla , Mark Dredze ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.18653/v1/2025.naacl-long.182

Related Topics

Computer Science

Artificial Intelligence

Business

Marketing

Concepts

Benchmarking Computer science Question answering Natural language processing Information retrieval Artificial intelligence Business Marketing

Metadata

Type: article
Language: en
Landing Page: https://doi.org/10.18653/v1/2025.naacl-long.182
PDF: https://aclanthology.org/2025.naacl-long.182.pdf
OA Status: gold
Cited By: 9
Related Works: 10
OpenAlex ID: https://openalex.org/W4411119840

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4411119840

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.18653/v1/2025.naacl-long.182

Digital Object Identifier
Title: Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions

Work title
Type: article

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-01-01

Full publication date if available
Authors: Hanjie Chen, Zhouxiang Fang, Yash Singla, Mark Dredze

List of authors in order
Landing page: https://doi.org/10.18653/v1/2025.naacl-long.182

Publisher landing page
PDF URL: https://aclanthology.org/2025.naacl-long.182.pdf

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: gold

Open access status per OpenAlex
OA URL: https://aclanthology.org/2025.naacl-long.182.pdf

Direct OA link when available
Concepts: Benchmarking, Computer science, Question answering, Natural language processing, Information retrieval, Artificial intelligence, Business, Marketing

Top concepts (fields/topics) attached by OpenAlex
Cited by: 9

Total citation count in OpenAlex
Citations by year (recent): 2025: 9

Per-year citation counts (last 5 years)
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4411119840
doi	https://doi.org/10.18653/v1/2025.naacl-long.182
ids.doi	https://doi.org/10.18653/v1/2025.naacl-long.182
ids.openalex	https://openalex.org/W4411119840
fwci	43.37770632
type	article
title	Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
biblio.issue
biblio.volume
biblio.last_page	3599
biblio.first_page	3563
topics[0].id	https://openalex.org/T10028
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9979000091552734
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Topic Modeling
topics[1].id	https://openalex.org/T10181
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9398999810218811
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Natural Language Processing Techniques
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C86251818
concepts[0].level	2
concepts[0].score	0.8345228433609009
concepts[0].wikidata	https://www.wikidata.org/wiki/Q816754
concepts[0].display_name	Benchmarking
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.7253850102424622
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C44291984
concepts[2].level	2
concepts[2].score	0.6612468957901001
concepts[2].wikidata	https://www.wikidata.org/wiki/Q1074173
concepts[2].display_name	Question answering
concepts[3].id	https://openalex.org/C204321447
concepts[3].level	1
concepts[3].score	0.5817055106163025
concepts[3].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[3].display_name	Natural language processing
concepts[4].id	https://openalex.org/C23123220
concepts[4].level	1
concepts[4].score	0.3987928032875061
concepts[4].wikidata	https://www.wikidata.org/wiki/Q816826
concepts[4].display_name	Information retrieval
concepts[5].id	https://openalex.org/C154945302
concepts[5].level	1
concepts[5].score	0.38624852895736694
concepts[5].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[5].display_name	Artificial intelligence
concepts[6].id	https://openalex.org/C144133560
concepts[6].level	0
concepts[6].score	0.0
concepts[6].wikidata	https://www.wikidata.org/wiki/Q4830453
concepts[6].display_name	Business
concepts[7].id	https://openalex.org/C162853370
concepts[7].level	1
concepts[7].score	0.0
concepts[7].wikidata	https://www.wikidata.org/wiki/Q39809
concepts[7].display_name	Marketing
keywords[0].id	https://openalex.org/keywords/benchmarking
keywords[0].score	0.8345228433609009
keywords[0].display_name	Benchmarking
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.7253850102424622
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/question-answering
keywords[2].score	0.6612468957901001
keywords[2].display_name	Question answering
keywords[3].id	https://openalex.org/keywords/natural-language-processing
keywords[3].score	0.5817055106163025
keywords[3].display_name	Natural language processing
keywords[4].id	https://openalex.org/keywords/information-retrieval
keywords[4].score	0.3987928032875061
keywords[4].display_name	Information retrieval
keywords[5].id	https://openalex.org/keywords/artificial-intelligence
keywords[5].score	0.38624852895736694
keywords[5].display_name	Artificial intelligence
language	en
locations[0].id	doi:10.18653/v1/2025.naacl-long.182
locations[0].is_oa	True
locations[0].source
locations[0].license	cc-by
locations[0].pdf_url	https://aclanthology.org/2025.naacl-long.182.pdf
locations[0].version	publishedVersion
locations[0].raw_type	proceedings-article
locations[0].license_id	https://openalex.org/licenses/cc-by
locations[0].is_accepted	True
locations[0].is_published	True
locations[0].raw_source_name	Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
locations[0].landing_page_url	https://doi.org/10.18653/v1/2025.naacl-long.182
indexed_in	crossref
authorships[0].author.id	https://openalex.org/A5101511644
authorships[0].author.orcid	https://orcid.org/0009-0001-5547-6634
authorships[0].author.display_name	Hanjie Chen
authorships[0].author_position	first
authorships[0].raw_author_name	Hanjie Chen
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5111042556
authorships[1].author.orcid
authorships[1].author.display_name	Zhouxiang Fang
authorships[1].author_position	middle
authorships[1].raw_author_name	Zhouxiang Fang
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5029401961
authorships[2].author.orcid
authorships[2].author.display_name	Yash Singla
authorships[2].author_position	middle
authorships[2].raw_author_name	Yash Singla
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5024437840
authorships[3].author.orcid	https://orcid.org/0000-0002-0422-2474
authorships[3].author.display_name	Mark Dredze
authorships[3].author_position	last
authorships[3].raw_author_name	Mark Dredze
authorships[3].is_corresponding	False
has_content.pdf	True
has_content.grobid_xml	True
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://aclanthology.org/2025.naacl-long.182.pdf
open_access.oa_status	gold
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T03:46:38.306776
primary_topic.id	https://openalex.org/T10028
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9979000091552734
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Topic Modeling
related_works	https://openalex.org/W4238897586, https://openalex.org/W435179959, https://openalex.org/W2619091065, https://openalex.org/W2059640416, https://openalex.org/W1490753184, https://openalex.org/W2284465472, https://openalex.org/W2291782699, https://openalex.org/W1993948687, https://openalex.org/W2000169967, https://openalex.org/W3204019825
cited_by_count	9
counts_by_year[0].year	2025
counts_by_year[0].cited_by_count	9
locations_count	1
best_oa_location.id	doi:10.18653/v1/2025.naacl-long.182
best_oa_location.is_oa	True
best_oa_location.source
best_oa_location.license	cc-by
best_oa_location.pdf_url	https://aclanthology.org/2025.naacl-long.182.pdf
best_oa_location.version	publishedVersion
best_oa_location.raw_type	proceedings-article
best_oa_location.license_id	https://openalex.org/licenses/cc-by
best_oa_location.is_accepted	True
best_oa_location.is_published	True
best_oa_location.raw_source_name	Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
best_oa_location.landing_page_url	https://doi.org/10.18653/v1/2025.naacl-long.182
primary_location.id	doi:10.18653/v1/2025.naacl-long.182
primary_location.is_oa	True
primary_location.source
primary_location.license	cc-by
primary_location.pdf_url	https://aclanthology.org/2025.naacl-long.182.pdf
primary_location.version	publishedVersion
primary_location.raw_type	proceedings-article
primary_location.license_id	https://openalex.org/licenses/cc-by
primary_location.is_accepted	True
primary_location.is_published	True
primary_location.raw_source_name	Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
primary_location.landing_page_url	https://doi.org/10.18653/v1/2025.naacl-long.182
publication_date	2025-01-01
publication_year	2025
referenced_works_count	0
abstract_inverted_index
cited_by_percentile_year.max	99
cited_by_percentile_year.min	98
countries_distinct_count	0
institutions_distinct_count	4
sustainable_development_goals[0].id	https://metadata.un.org/sdg/4
sustainable_development_goals[0].score	0.7200000286102295
sustainable_development_goals[0].display_name	Quality Education
citation_normalized_percentile.value	0.99712389
citation_normalized_percentile.is_in_top_1_percent	True
citation_normalized_percentile.is_in_top_10_percent	True