Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning Article Swipe
YOU?
·
· 2025
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2502.09022
Transformer-based language models have achieved significant success; however, their internal mechanisms remain largely opaque due to the complexity of non-linear interactions and high-dimensional operations. While previous studies have demonstrated that these models implicitly embed reasoning trees, humans typically employ various distinct logical reasoning mechanisms to complete the same task. It is still unclear which multi-step reasoning mechanisms are used by language models to solve such tasks. In this paper, we aim to address this question by investigating the mechanistic interpretability of language models, particularly in the context of multi-step reasoning tasks. Specifically, we employ circuit analysis and self-influence functions to evaluate the changing importance of each token throughout the reasoning process, allowing us to map the reasoning paths adopted by the model. We apply this methodology to the GPT-2 model on a prediction task (IOI) and demonstrate that the underlying circuits reveal a human-interpretable reasoning process used by the model.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2502.09022
- https://arxiv.org/pdf/2502.09022
- OA Status
- green
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4407571466
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4407571466Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2502.09022Digital Object Identifier
- Title
-
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model ReasoningWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2025Year of publication
- Publication date
-
2025-02-13Full publication date if available
- Authors
-
Lin Zhang, Li Hu, Di WangList of authors in order
- Landing page
-
https://arxiv.org/abs/2502.09022Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2502.09022Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2502.09022Direct OA link when available
- Concepts
-
Transformer, Key (lock), Electronic circuit, Computer science, Electrical engineering, Engineering, Computer security, VoltageTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
0Total citation count in OpenAlex
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4407571466 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2502.09022 |
| ids.doi | https://doi.org/10.48550/arxiv.2502.09022 |
| ids.openalex | https://openalex.org/W4407571466 |
| fwci | |
| type | preprint |
| title | Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T13083 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.2249000072479248 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Advanced Text Analysis Techniques |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C66322947 |
| concepts[0].level | 3 |
| concepts[0].score | 0.6317414045333862 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q11658 |
| concepts[0].display_name | Transformer |
| concepts[1].id | https://openalex.org/C26517878 |
| concepts[1].level | 2 |
| concepts[1].score | 0.5742504596710205 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q228039 |
| concepts[1].display_name | Key (lock) |
| concepts[2].id | https://openalex.org/C134146338 |
| concepts[2].level | 2 |
| concepts[2].score | 0.45581525564193726 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q1815901 |
| concepts[2].display_name | Electronic circuit |
| concepts[3].id | https://openalex.org/C41008148 |
| concepts[3].level | 0 |
| concepts[3].score | 0.39002496004104614 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[3].display_name | Computer science |
| concepts[4].id | https://openalex.org/C119599485 |
| concepts[4].level | 1 |
| concepts[4].score | 0.2118561565876007 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q43035 |
| concepts[4].display_name | Electrical engineering |
| concepts[5].id | https://openalex.org/C127413603 |
| concepts[5].level | 0 |
| concepts[5].score | 0.194955974817276 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q11023 |
| concepts[5].display_name | Engineering |
| concepts[6].id | https://openalex.org/C38652104 |
| concepts[6].level | 1 |
| concepts[6].score | 0.11442786455154419 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q3510521 |
| concepts[6].display_name | Computer security |
| concepts[7].id | https://openalex.org/C165801399 |
| concepts[7].level | 2 |
| concepts[7].score | 0.0 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q25428 |
| concepts[7].display_name | Voltage |
| keywords[0].id | https://openalex.org/keywords/transformer |
| keywords[0].score | 0.6317414045333862 |
| keywords[0].display_name | Transformer |
| keywords[1].id | https://openalex.org/keywords/key |
| keywords[1].score | 0.5742504596710205 |
| keywords[1].display_name | Key (lock) |
| keywords[2].id | https://openalex.org/keywords/electronic-circuit |
| keywords[2].score | 0.45581525564193726 |
| keywords[2].display_name | Electronic circuit |
| keywords[3].id | https://openalex.org/keywords/computer-science |
| keywords[3].score | 0.39002496004104614 |
| keywords[3].display_name | Computer science |
| keywords[4].id | https://openalex.org/keywords/electrical-engineering |
| keywords[4].score | 0.2118561565876007 |
| keywords[4].display_name | Electrical engineering |
| keywords[5].id | https://openalex.org/keywords/engineering |
| keywords[5].score | 0.194955974817276 |
| keywords[5].display_name | Engineering |
| keywords[6].id | https://openalex.org/keywords/computer-security |
| keywords[6].score | 0.11442786455154419 |
| keywords[6].display_name | Computer security |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2502.09022 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2502.09022 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2502.09022 |
| locations[1].id | doi:10.48550/arxiv.2502.09022 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2502.09022 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5102007811 |
| authorships[0].author.orcid | https://orcid.org/0000-0002-2985-8145 |
| authorships[0].author.display_name | Lin Zhang |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Zhang, Lin |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5011519506 |
| authorships[1].author.orcid | https://orcid.org/0000-0001-7003-2903 |
| authorships[1].author.display_name | Li Hu |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Hu, Lijie |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5058374376 |
| authorships[2].author.orcid | https://orcid.org/0000-0002-9729-6455 |
| authorships[2].author.display_name | Di Wang |
| authorships[2].author_position | last |
| authorships[2].raw_author_name | Wang, Di |
| authorships[2].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2502.09022 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T13083 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.2249000072479248 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Advanced Text Analysis Techniques |
| related_works | https://openalex.org/W4391375266, https://openalex.org/W2899084033, https://openalex.org/W2748952813, https://openalex.org/W2390279801, https://openalex.org/W4391913857, https://openalex.org/W2358668433, https://openalex.org/W4396701345, https://openalex.org/W2376932109, https://openalex.org/W2001405890, https://openalex.org/W4396696052 |
| cited_by_count | 0 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2502.09022 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2502.09022 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2502.09022 |
| primary_location.id | pmh:oai:arXiv.org:2502.09022 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2502.09022 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2502.09022 |
| publication_date | 2025-02-13 |
| publication_year | 2025 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 131, 142 |
| abstract_inverted_index.In | 66 |
| abstract_inverted_index.It | 49 |
| abstract_inverted_index.We | 122 |
| abstract_inverted_index.by | 59, 75, 119, 147 |
| abstract_inverted_index.in | 84 |
| abstract_inverted_index.is | 50 |
| abstract_inverted_index.of | 18, 80, 87, 104 |
| abstract_inverted_index.on | 130 |
| abstract_inverted_index.to | 15, 44, 62, 71, 99, 113, 126 |
| abstract_inverted_index.us | 112 |
| abstract_inverted_index.we | 69, 92 |
| abstract_inverted_index.aim | 70 |
| abstract_inverted_index.and | 21, 96, 135 |
| abstract_inverted_index.are | 57 |
| abstract_inverted_index.due | 14 |
| abstract_inverted_index.map | 114 |
| abstract_inverted_index.the | 16, 46, 77, 85, 101, 108, 115, 120, 127, 138, 148 |
| abstract_inverted_index.each | 105 |
| abstract_inverted_index.have | 3, 27 |
| abstract_inverted_index.same | 47 |
| abstract_inverted_index.such | 64 |
| abstract_inverted_index.task | 133 |
| abstract_inverted_index.that | 29, 137 |
| abstract_inverted_index.this | 67, 73, 124 |
| abstract_inverted_index.used | 58, 146 |
| abstract_inverted_index.(IOI) | 134 |
| abstract_inverted_index.GPT-2 | 128 |
| abstract_inverted_index.While | 24 |
| abstract_inverted_index.apply | 123 |
| abstract_inverted_index.embed | 33 |
| abstract_inverted_index.model | 129 |
| abstract_inverted_index.paths | 117 |
| abstract_inverted_index.solve | 63 |
| abstract_inverted_index.still | 51 |
| abstract_inverted_index.task. | 48 |
| abstract_inverted_index.their | 8 |
| abstract_inverted_index.these | 30 |
| abstract_inverted_index.token | 106 |
| abstract_inverted_index.which | 53 |
| abstract_inverted_index.employ | 38, 93 |
| abstract_inverted_index.humans | 36 |
| abstract_inverted_index.model. | 121, 149 |
| abstract_inverted_index.models | 2, 31, 61 |
| abstract_inverted_index.opaque | 13 |
| abstract_inverted_index.paper, | 68 |
| abstract_inverted_index.remain | 11 |
| abstract_inverted_index.reveal | 141 |
| abstract_inverted_index.tasks. | 65, 90 |
| abstract_inverted_index.trees, | 35 |
| abstract_inverted_index.address | 72 |
| abstract_inverted_index.adopted | 118 |
| abstract_inverted_index.circuit | 94 |
| abstract_inverted_index.context | 86 |
| abstract_inverted_index.largely | 12 |
| abstract_inverted_index.logical | 41 |
| abstract_inverted_index.models, | 82 |
| abstract_inverted_index.process | 145 |
| abstract_inverted_index.studies | 26 |
| abstract_inverted_index.unclear | 52 |
| abstract_inverted_index.various | 39 |
| abstract_inverted_index.achieved | 4 |
| abstract_inverted_index.allowing | 111 |
| abstract_inverted_index.analysis | 95 |
| abstract_inverted_index.changing | 102 |
| abstract_inverted_index.circuits | 140 |
| abstract_inverted_index.complete | 45 |
| abstract_inverted_index.distinct | 40 |
| abstract_inverted_index.evaluate | 100 |
| abstract_inverted_index.however, | 7 |
| abstract_inverted_index.internal | 9 |
| abstract_inverted_index.language | 1, 60, 81 |
| abstract_inverted_index.previous | 25 |
| abstract_inverted_index.process, | 110 |
| abstract_inverted_index.question | 74 |
| abstract_inverted_index.success; | 6 |
| abstract_inverted_index.functions | 98 |
| abstract_inverted_index.reasoning | 34, 42, 55, 89, 109, 116, 144 |
| abstract_inverted_index.typically | 37 |
| abstract_inverted_index.complexity | 17 |
| abstract_inverted_index.implicitly | 32 |
| abstract_inverted_index.importance | 103 |
| abstract_inverted_index.mechanisms | 10, 43, 56 |
| abstract_inverted_index.multi-step | 54, 88 |
| abstract_inverted_index.non-linear | 19 |
| abstract_inverted_index.prediction | 132 |
| abstract_inverted_index.throughout | 107 |
| abstract_inverted_index.underlying | 139 |
| abstract_inverted_index.demonstrate | 136 |
| abstract_inverted_index.mechanistic | 78 |
| abstract_inverted_index.methodology | 125 |
| abstract_inverted_index.operations. | 23 |
| abstract_inverted_index.significant | 5 |
| abstract_inverted_index.demonstrated | 28 |
| abstract_inverted_index.interactions | 20 |
| abstract_inverted_index.particularly | 83 |
| abstract_inverted_index.Specifically, | 91 |
| abstract_inverted_index.investigating | 76 |
| abstract_inverted_index.self-influence | 97 |
| abstract_inverted_index.high-dimensional | 22 |
| abstract_inverted_index.interpretability | 79 |
| abstract_inverted_index.Transformer-based | 0 |
| abstract_inverted_index.human-interpretable | 143 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 3 |
| citation_normalized_percentile |