Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning Article Swipe

PDF

YOU? · · 2025 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2502.09022

Transformer-based language models have achieved significant success; however, their internal mechanisms remain largely opaque due to the complexity of non-linear interactions and high-dimensional operations. While previous studies have demonstrated that these models implicitly embed reasoning trees, humans typically employ various distinct logical reasoning mechanisms to complete the same task. It is still unclear which multi-step reasoning mechanisms are used by language models to solve such tasks. In this paper, we aim to address this question by investigating the mechanistic interpretability of language models, particularly in the context of multi-step reasoning tasks. Specifically, we employ circuit analysis and self-influence functions to evaluate the changing importance of each token throughout the reasoning process, allowing us to map the reasoning paths adopted by the model. We apply this methodology to the GPT-2 model on a prediction task (IOI) and demonstrate that the underlying circuits reveal a human-interpretable reasoning process used by the model.

Related Topics

Electrical Engineering

Engineering

Computer Security

Voltage

Concepts

Transformer Key (lock) Electronic circuit Computer science Electrical engineering Engineering Computer security Voltage

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2502.09022
PDF: https://arxiv.org/pdf/2502.09022
OA Status: green
Related Works: 10
OpenAlex ID: https://openalex.org/W4407571466

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4407571466

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2502.09022

Digital Object Identifier
Title: Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2025

Year of publication
Publication date: 2025-02-13

Full publication date if available
Authors: Lin Zhang, Li Hu, Di Wang

List of authors in order
Landing page: https://arxiv.org/abs/2502.09022

Publisher landing page
PDF URL: https://arxiv.org/pdf/2502.09022

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2502.09022

Direct OA link when available
Concepts: Transformer, Key (lock), Electronic circuit, Computer science, Electrical engineering, Engineering, Computer security, Voltage

Top concepts (fields/topics) attached by OpenAlex
Cited by: 0

Total citation count in OpenAlex
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4407571466
doi	https://doi.org/10.48550/arxiv.2502.09022
ids.doi	https://doi.org/10.48550/arxiv.2502.09022
ids.openalex	https://openalex.org/W4407571466
fwci
type	preprint
title	Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T13083
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.2249000072479248
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Advanced Text Analysis Techniques
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C66322947
concepts[0].level	3
concepts[0].score	0.6317414045333862
concepts[0].wikidata	https://www.wikidata.org/wiki/Q11658
concepts[0].display_name	Transformer
concepts[1].id	https://openalex.org/C26517878
concepts[1].level	2
concepts[1].score	0.5742504596710205
concepts[1].wikidata	https://www.wikidata.org/wiki/Q228039
concepts[1].display_name	Key (lock)
concepts[2].id	https://openalex.org/C134146338
concepts[2].level	2
concepts[2].score	0.45581525564193726
concepts[2].wikidata	https://www.wikidata.org/wiki/Q1815901
concepts[2].display_name	Electronic circuit
concepts[3].id	https://openalex.org/C41008148
concepts[3].level	0
concepts[3].score	0.39002496004104614
concepts[3].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[3].display_name	Computer science
concepts[4].id	https://openalex.org/C119599485
concepts[4].level	1
concepts[4].score	0.2118561565876007
concepts[4].wikidata	https://www.wikidata.org/wiki/Q43035
concepts[4].display_name	Electrical engineering
concepts[5].id	https://openalex.org/C127413603
concepts[5].level	0
concepts[5].score	0.194955974817276
concepts[5].wikidata	https://www.wikidata.org/wiki/Q11023
concepts[5].display_name	Engineering
concepts[6].id	https://openalex.org/C38652104
concepts[6].level	1
concepts[6].score	0.11442786455154419
concepts[6].wikidata	https://www.wikidata.org/wiki/Q3510521
concepts[6].display_name	Computer security
concepts[7].id	https://openalex.org/C165801399
concepts[7].level	2
concepts[7].score	0.0
concepts[7].wikidata	https://www.wikidata.org/wiki/Q25428
concepts[7].display_name	Voltage
keywords[0].id	https://openalex.org/keywords/transformer
keywords[0].score	0.6317414045333862
keywords[0].display_name	Transformer
keywords[1].id	https://openalex.org/keywords/key
keywords[1].score	0.5742504596710205
keywords[1].display_name	Key (lock)
keywords[2].id	https://openalex.org/keywords/electronic-circuit
keywords[2].score	0.45581525564193726
keywords[2].display_name	Electronic circuit
keywords[3].id	https://openalex.org/keywords/computer-science
keywords[3].score	0.39002496004104614
keywords[3].display_name	Computer science
keywords[4].id	https://openalex.org/keywords/electrical-engineering
keywords[4].score	0.2118561565876007
keywords[4].display_name	Electrical engineering
keywords[5].id	https://openalex.org/keywords/engineering
keywords[5].score	0.194955974817276
keywords[5].display_name	Engineering
keywords[6].id	https://openalex.org/keywords/computer-security
keywords[6].score	0.11442786455154419
keywords[6].display_name	Computer security
language	en
locations[0].id	pmh:oai:arXiv.org:2502.09022
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2502.09022
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2502.09022
locations[1].id	doi:10.48550/arxiv.2502.09022
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2502.09022
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5102007811
authorships[0].author.orcid	https://orcid.org/0000-0002-2985-8145
authorships[0].author.display_name	Lin Zhang
authorships[0].author_position	first
authorships[0].raw_author_name	Zhang, Lin
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5011519506
authorships[1].author.orcid	https://orcid.org/0000-0001-7003-2903
authorships[1].author.display_name	Li Hu
authorships[1].author_position	middle
authorships[1].raw_author_name	Hu, Lijie
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5058374376
authorships[2].author.orcid	https://orcid.org/0000-0002-9729-6455
authorships[2].author.display_name	Di Wang
authorships[2].author_position	last
authorships[2].raw_author_name	Wang, Di
authorships[2].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2502.09022
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T13083
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.2249000072479248
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Advanced Text Analysis Techniques
related_works	https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052
cited_by_count	0
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2502.09022
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2502.09022
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2502.09022
primary_location.id	pmh:oai:arXiv.org:2502.09022
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2502.09022
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2502.09022
publication_date	2025-02-13
publication_year	2025
referenced_works_count	0
abstract_inverted_index.a	131, 142
abstract_inverted_index.In	66
abstract_inverted_index.It	49
abstract_inverted_index.We	122
abstract_inverted_index.by	59, 75, 119, 147
abstract_inverted_index.in	84
abstract_inverted_index.is	50
abstract_inverted_index.of	18, 80, 87, 104
abstract_inverted_index.on	130
abstract_inverted_index.to	15, 44, 62, 71, 99, 113, 126
abstract_inverted_index.us	112
abstract_inverted_index.we	69, 92
abstract_inverted_index.aim	70
abstract_inverted_index.and	21, 96, 135
abstract_inverted_index.are	57
abstract_inverted_index.due	14
abstract_inverted_index.map	114
abstract_inverted_index.the	16, 46, 77, 85, 101, 108, 115, 120, 127, 138, 148
abstract_inverted_index.each	105
abstract_inverted_index.have	3, 27
abstract_inverted_index.same	47
abstract_inverted_index.such	64
abstract_inverted_index.task	133
abstract_inverted_index.that	29, 137
abstract_inverted_index.this	67, 73, 124
abstract_inverted_index.used	58, 146
abstract_inverted_index.(IOI)	134
abstract_inverted_index.GPT-2	128
abstract_inverted_index.While	24
abstract_inverted_index.apply	123
abstract_inverted_index.embed	33
abstract_inverted_index.model	129
abstract_inverted_index.paths	117
abstract_inverted_index.solve	63
abstract_inverted_index.still	51
abstract_inverted_index.task.	48
abstract_inverted_index.their	8
abstract_inverted_index.these	30
abstract_inverted_index.token	106
abstract_inverted_index.which	53
abstract_inverted_index.employ	38, 93
abstract_inverted_index.humans	36
abstract_inverted_index.model.	121, 149
abstract_inverted_index.models	2, 31, 61
abstract_inverted_index.opaque	13
abstract_inverted_index.paper,	68
abstract_inverted_index.remain	11
abstract_inverted_index.reveal	141
abstract_inverted_index.tasks.	65, 90
abstract_inverted_index.trees,	35
abstract_inverted_index.address	72
abstract_inverted_index.adopted	118
abstract_inverted_index.circuit	94
abstract_inverted_index.context	86
abstract_inverted_index.largely	12
abstract_inverted_index.logical	41
abstract_inverted_index.models,	82
abstract_inverted_index.process	145
abstract_inverted_index.studies	26
abstract_inverted_index.unclear	52
abstract_inverted_index.various	39
abstract_inverted_index.achieved	4
abstract_inverted_index.allowing	111
abstract_inverted_index.analysis	95
abstract_inverted_index.changing	102
abstract_inverted_index.circuits	140
abstract_inverted_index.complete	45
abstract_inverted_index.distinct	40
abstract_inverted_index.evaluate	100
abstract_inverted_index.however,	7
abstract_inverted_index.internal	9
abstract_inverted_index.language	1, 60, 81
abstract_inverted_index.previous	25
abstract_inverted_index.process,	110
abstract_inverted_index.question	74
abstract_inverted_index.success;	6
abstract_inverted_index.functions	98
abstract_inverted_index.reasoning	34, 42, 55, 89, 109, 116, 144
abstract_inverted_index.typically	37
abstract_inverted_index.complexity	17
abstract_inverted_index.implicitly	32
abstract_inverted_index.importance	103
abstract_inverted_index.mechanisms	10, 43, 56
abstract_inverted_index.multi-step	54, 88
abstract_inverted_index.non-linear	19
abstract_inverted_index.prediction	132
abstract_inverted_index.throughout	107
abstract_inverted_index.underlying	139
abstract_inverted_index.demonstrate	136
abstract_inverted_index.mechanistic	78
abstract_inverted_index.methodology	125
abstract_inverted_index.operations.	23
abstract_inverted_index.significant	5
abstract_inverted_index.demonstrated	28
abstract_inverted_index.interactions	20
abstract_inverted_index.particularly	83
abstract_inverted_index.Specifically,	91
abstract_inverted_index.investigating	76
abstract_inverted_index.self-influence	97
abstract_inverted_index.high-dimensional	22
abstract_inverted_index.interpretability	79
abstract_inverted_index.Transformer-based	0
abstract_inverted_index.human-interpretable	143
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	3
citation_normalized_percentile