Explanations of Large Language Models Explain Language Representations in the Brain Article Swipe

PDF

Maryam Rahimi , Yadollah Yaghoobzadeh , Mohammad Reza Daliri ·

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2502.14671

Large language models (LLMs) not only exhibit human-like performance but also share computational principles with the brain's language processing mechanisms. While prior research has focused on mapping LLMs' internal representations to neural activity, we propose a novel approach using explainable AI (XAI) to strengthen this link. Applying attribution methods, we quantify the influence of preceding words on LLMs' next-word predictions and use these explanations to predict fMRI data from participants listening to narratives. We find that attribution methods robustly predict brain activity across the language network, revealing a hierarchical pattern: explanations from early layers align with the brain's initial language processing stages, while later layers correspond to more advanced stages. Additionally, layers with greater influence on next-word prediction$\unicode{x2014}$reflected in higher attribution scores$\unicode{x2014}$demonstrate stronger brain alignment. These results underscore XAI's potential for exploring the neural basis of language and suggest brain alignment for assessing the biological plausibility of explanation methods.

Related Topics

Computer Science

Artificial Intelligence

Philosophy

Concepts

Linguistics Computer science Language model Psychology Cognitive science Natural language processing Cognitive psychology Artificial intelligence Philosophy

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2502.14671
PDF: https://arxiv.org/pdf/2502.14671
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4407810269

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4407810269

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2502.14671

Digital Object Identifier
Title: Explanations of Large Language Models Explain Language Representations in the Brain

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-02-20

Full publication date if available
Authors: Maryam Rahimi, Yadollah Yaghoobzadeh, Mohammad Reza Daliri

List of authors in order
Landing page: https://arxiv.org/abs/2502.14671

Publisher landing page
PDF URL: https://arxiv.org/pdf/2502.14671

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2502.14671

Direct OA link when available
Concepts: Linguistics, Computer science, Language model, Psychology, Cognitive science, Natural language processing, Cognitive psychology, Artificial intelligence, Philosophy

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4407810269
doi	https://doi.org/10.48550/arxiv.2502.14671
ids.doi	https://doi.org/10.48550/arxiv.2502.14671
ids.openalex	https://openalex.org/W4407810269
fwci
type	preprint
title	Explanations of Large Language Models Explain Language Representations in the Brain
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T10181
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.6962000131607056
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Natural Language Processing Techniques
topics[1].id	https://openalex.org/T10028
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.692799985408783
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Topic Modeling
topics[2].id	https://openalex.org/T12090
topics[2].field.id	https://openalex.org/fields/33
topics[2].field.display_name	Social Sciences
topics[2].score	0.6773999929428101
topics[2].domain.id	https://openalex.org/domains/2
topics[2].domain.display_name	Social Sciences
topics[2].subfield.id	https://openalex.org/subfields/3316
topics[2].subfield.display_name	Cultural Studies
topics[2].display_name	Language and cultural evolution
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C41895202
concepts[0].level	1
concepts[0].score	0.5019333362579346
concepts[0].wikidata	https://www.wikidata.org/wiki/Q8162
concepts[0].display_name	Linguistics
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.45831555128097534
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C137293760
concepts[2].level	2
concepts[2].score	0.4227461814880371
concepts[2].wikidata	https://www.wikidata.org/wiki/Q3621696
concepts[2].display_name	Language model
concepts[3].id	https://openalex.org/C15744967
concepts[3].level	0
concepts[3].score	0.3894190788269043
concepts[3].wikidata	https://www.wikidata.org/wiki/Q9418
concepts[3].display_name	Psychology
concepts[4].id	https://openalex.org/C188147891
concepts[4].level	1
concepts[4].score	0.3826119601726532
concepts[4].wikidata	https://www.wikidata.org/wiki/Q147638
concepts[4].display_name	Cognitive science
concepts[5].id	https://openalex.org/C204321447
concepts[5].level	1
concepts[5].score	0.3745190501213074
concepts[5].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[5].display_name	Natural language processing
concepts[6].id	https://openalex.org/C180747234
concepts[6].level	1
concepts[6].score	0.346797913312912
concepts[6].wikidata	https://www.wikidata.org/wiki/Q23373
concepts[6].display_name	Cognitive psychology
concepts[7].id	https://openalex.org/C154945302
concepts[7].level	1
concepts[7].score	0.3335057497024536
concepts[7].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[7].display_name	Artificial intelligence
concepts[8].id	https://openalex.org/C138885662
concepts[8].level	0
concepts[8].score	0.11132815480232239
concepts[8].wikidata	https://www.wikidata.org/wiki/Q5891
concepts[8].display_name	Philosophy
keywords[0].id	https://openalex.org/keywords/linguistics
keywords[0].score	0.5019333362579346
keywords[0].display_name	Linguistics
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.45831555128097534
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/language-model
keywords[2].score	0.4227461814880371
keywords[2].display_name	Language model
keywords[3].id	https://openalex.org/keywords/psychology
keywords[3].score	0.3894190788269043
keywords[3].display_name	Psychology
keywords[4].id	https://openalex.org/keywords/cognitive-science
keywords[4].score	0.3826119601726532
keywords[4].display_name	Cognitive science
keywords[5].id	https://openalex.org/keywords/natural-language-processing
keywords[5].score	0.3745190501213074
keywords[5].display_name	Natural language processing
keywords[6].id	https://openalex.org/keywords/cognitive-psychology
keywords[6].score	0.346797913312912
keywords[6].display_name	Cognitive psychology
keywords[7].id	https://openalex.org/keywords/artificial-intelligence
keywords[7].score	0.3335057497024536
keywords[7].display_name	Artificial intelligence
keywords[8].id	https://openalex.org/keywords/philosophy
keywords[8].score	0.11132815480232239
keywords[8].display_name	Philosophy
language	en
locations[0].id	pmh:oai:arXiv.org:2502.14671
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2502.14671
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2502.14671
locations[1].id	doi:10.48550/arxiv.2502.14671
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2502.14671
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5101837226
authorships[0].author.orcid	https://orcid.org/0000-0003-4867-3178
authorships[0].author.display_name	Maryam Rahimi
authorships[0].author_position	first
authorships[0].raw_author_name	Rahimi, Maryam
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5031030600
authorships[1].author.orcid	https://orcid.org/0000-0003-0646-0852
authorships[1].author.display_name	Yadollah Yaghoobzadeh
authorships[1].author_position	middle
authorships[1].raw_author_name	Yaghoobzadeh, Yadollah
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5086383664
authorships[2].author.orcid	https://orcid.org/0000-0001-9241-8751
authorships[2].author.display_name	Mohammad Reza Daliri
authorships[2].author_position	last
authorships[2].raw_author_name	Daliri, Mohammad Reza
authorships[2].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2502.14671
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-02-22T00:00:00
display_name	Explanations of Large Language Models Explain Language Representations in the Brain
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T10181
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.6962000131607056
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Natural Language Processing Techniques
related_works	https://openalex.org/W2169518243, https://openalex.org/W405926467, https://openalex.org/W2252095989, https://openalex.org/W4402742086, https://openalex.org/W1965611333, https://openalex.org/W4404882811, https://openalex.org/W2106335228, https://openalex.org/W2105076537, https://openalex.org/W2032228042, https://openalex.org/W1505098369
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2502.14671
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2502.14671
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2502.14671
primary_location.id	pmh:oai:arXiv.org:2502.14671
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2502.14671
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2502.14671
publication_date	2025-02-20
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	35, 87
abstract_inverted_index.AI	40
abstract_inverted_index.We	73
abstract_inverted_index.in	118
abstract_inverted_index.of	53, 135, 146
abstract_inverted_index.on	25, 56, 115
abstract_inverted_index.to	30, 42, 64, 71, 106
abstract_inverted_index.we	33, 49
abstract_inverted_index.and	60, 137
abstract_inverted_index.but	9
abstract_inverted_index.for	130, 141
abstract_inverted_index.has	23
abstract_inverted_index.not	4
abstract_inverted_index.the	15, 51, 83, 96, 132, 143
abstract_inverted_index.use	61
abstract_inverted_index.also	10
abstract_inverted_index.data	67
abstract_inverted_index.fMRI	66
abstract_inverted_index.find	74
abstract_inverted_index.from	68, 91
abstract_inverted_index.more	107
abstract_inverted_index.only	5
abstract_inverted_index.that	75
abstract_inverted_index.this	44
abstract_inverted_index.with	14, 95, 112
abstract_inverted_index.(XAI)	41
abstract_inverted_index.LLMs'	27, 57
abstract_inverted_index.Large	0
abstract_inverted_index.These	125
abstract_inverted_index.While	20
abstract_inverted_index.XAI's	128
abstract_inverted_index.align	94
abstract_inverted_index.basis	134
abstract_inverted_index.brain	80, 123, 139
abstract_inverted_index.early	92
abstract_inverted_index.later	103
abstract_inverted_index.link.	45
abstract_inverted_index.novel	36
abstract_inverted_index.prior	21
abstract_inverted_index.share	11
abstract_inverted_index.these	62
abstract_inverted_index.using	38
abstract_inverted_index.while	102
abstract_inverted_index.words	55
abstract_inverted_index.(LLMs)	3
abstract_inverted_index.across	82
abstract_inverted_index.higher	119
abstract_inverted_index.layers	93, 104, 111
abstract_inverted_index.models	2
abstract_inverted_index.neural	31, 133
abstract_inverted_index.brain's	16, 97
abstract_inverted_index.exhibit	6
abstract_inverted_index.focused	24
abstract_inverted_index.greater	113
abstract_inverted_index.initial	98
abstract_inverted_index.mapping	26
abstract_inverted_index.methods	77
abstract_inverted_index.predict	65, 79
abstract_inverted_index.propose	34
abstract_inverted_index.results	126
abstract_inverted_index.stages,	101
abstract_inverted_index.stages.	109
abstract_inverted_index.suggest	138
abstract_inverted_index.Applying	46
abstract_inverted_index.activity	81
abstract_inverted_index.advanced	108
abstract_inverted_index.approach	37
abstract_inverted_index.internal	28
abstract_inverted_index.language	1, 17, 84, 99, 136
abstract_inverted_index.methods,	48
abstract_inverted_index.methods.	148
abstract_inverted_index.network,	85
abstract_inverted_index.pattern:	89
abstract_inverted_index.quantify	50
abstract_inverted_index.research	22
abstract_inverted_index.robustly	78
abstract_inverted_index.stronger	122
abstract_inverted_index.activity,	32
abstract_inverted_index.alignment	140
abstract_inverted_index.assessing	142
abstract_inverted_index.exploring	131
abstract_inverted_index.influence	52, 114
abstract_inverted_index.listening	70
abstract_inverted_index.next-word	58, 116
abstract_inverted_index.potential	129
abstract_inverted_index.preceding	54
abstract_inverted_index.revealing	86
abstract_inverted_index.alignment.	124
abstract_inverted_index.biological	144
abstract_inverted_index.correspond	105
abstract_inverted_index.human-like	7
abstract_inverted_index.principles	13
abstract_inverted_index.processing	18, 100
abstract_inverted_index.strengthen	43
abstract_inverted_index.underscore	127
abstract_inverted_index.attribution	47, 76, 120
abstract_inverted_index.explainable	39
abstract_inverted_index.explanation	147
abstract_inverted_index.mechanisms.	19
abstract_inverted_index.narratives.	72
abstract_inverted_index.performance	8
abstract_inverted_index.predictions	59
abstract_inverted_index.explanations	63, 90
abstract_inverted_index.hierarchical	88
abstract_inverted_index.participants	69
abstract_inverted_index.plausibility	145
abstract_inverted_index.Additionally,	110
abstract_inverted_index.computational	12
abstract_inverted_index.representations	29
abstract_inverted_index.scores$\unicode{x2014}$demonstrate	121
abstract_inverted_index.prediction$\unicode{x2014}$reflected	117
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	3
citation_normalized_percentile