Sparse Autoencoders Find Highly Interpretable Features in Language Models Article Swipe

PDF

Hoagy Cunningham , Aidan Ewart , Logan Riggs , Robert P. Huben , Lee Sharkey ·

YOU? · · 2023 · Open Access · · DOI: https://doi.org/10.48550/arxiv.2309.08600

One of the roadblocks to a better understanding of neural networks' internals is \textit{polysemanticity}, where neurons appear to activate in multiple, semantically distinct contexts. Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally. One hypothesised cause of polysemanticity is \textit{superposition}, where neural networks represent more features than they have neurons by assigning features to an overcomplete set of directions in activation space, rather than to individual neurons. Here, we attempt to identify those directions, using sparse autoencoders to reconstruct the internal activations of a language model. These autoencoders learn sets of sparsely activating features that are more interpretable and monosemantic than directions identified by alternative approaches, where interpretability is measured by automated methods. Moreover, we show that with our learned set of features, we can pinpoint the features that are causally responsible for counterfactual behaviour on the indirect object identification task \citep{wang2022interpretability} to a finer degree than previous decompositions. This work indicates that it is possible to resolve superposition in language models using a scalable, unsupervised method. Our method may serve as a foundation for future mechanistic interpretability work, which we hope will enable greater model transparency and steerability.

Related Topics

Computer Science

Artificial Intelligence

Machine Learning

Deep Learning

Superposition Principle

Mathematical Analysis

Concepts

Interpretability Computer science Artificial intelligence Set (abstract data type) Artificial neural network Counterfactual thinking Identification (biology) Machine learning Deep neural networks Language model Superposition principle Scalability Task (project management) Natural language processing Mathematics Management Programming language Epistemology Biology Mathematical analysis Economics Philosophy Botany Database

Metadata

Type: preprint
Language: en
Landing Page: http://arxiv.org/abs/2309.08600
PDF: https://arxiv.org/pdf/2309.08600
OA Status: green
Cited By: 34
Related Works: 10
OpenAlex ID: https://openalex.org/W4386839891

All OpenAlex metadata

Raw OpenAlex JSON

OpenAlex ID: https://openalex.org/W4386839891

Canonical identifier for this work in OpenAlex
DOI: https://doi.org/10.48550/arxiv.2309.08600

Digital Object Identifier
Title: Sparse Autoencoders Find Highly Interpretable Features in Language Models

Work title
Type: preprint

OpenAlex work type
Language: en

Primary language
Publication year: 2023

Year of publication
Publication date: 2023-09-15

Full publication date if available
Authors: Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert P. Huben, Lee Sharkey

List of authors in order
Landing page: https://arxiv.org/abs/2309.08600

Publisher landing page
PDF URL: https://arxiv.org/pdf/2309.08600

Direct link to full text PDF
Open access: Yes

Whether a free full text is available
OA status: green

Open access status per OpenAlex
OA URL: https://arxiv.org/pdf/2309.08600

Direct OA link when available
Concepts: Interpretability, Computer science, Artificial intelligence, Set (abstract data type), Artificial neural network, Counterfactual thinking, Identification (biology), Machine learning, Deep neural networks, Language model, Superposition principle, Scalability, Task (project management), Natural language processing, Mathematics, Management, Programming language, Epistemology, Biology, Mathematical analysis, Economics, Philosophy, Botany, Database

Top concepts (fields/topics) attached by OpenAlex
Cited by: 34

Total citation count in OpenAlex
Citations by year (recent): 2025: 28, 2024: 5, 2023: 1

Per-year citation counts (last 5 years)
Related works (count): 10

Other works algorithmically related by OpenAlex

Full payload

id	https://openalex.org/W4386839891
doi	https://doi.org/10.48550/arxiv.2309.08600
ids.doi	https://doi.org/10.48550/arxiv.2309.08600
ids.openalex	https://openalex.org/W4386839891
fwci
type	preprint
title	Sparse Autoencoders Find Highly Interpretable Features in Language Models
biblio.issue
biblio.volume
biblio.last_page
biblio.first_page
topics[0].id	https://openalex.org/T12026
topics[0].field.id	https://openalex.org/fields/17
topics[0].field.display_name	Computer Science
topics[0].score	0.9973999857902527
topics[0].domain.id	https://openalex.org/domains/3
topics[0].domain.display_name	Physical Sciences
topics[0].subfield.id	https://openalex.org/subfields/1702
topics[0].subfield.display_name	Artificial Intelligence
topics[0].display_name	Explainable Artificial Intelligence (XAI)
topics[1].id	https://openalex.org/T10028
topics[1].field.id	https://openalex.org/fields/17
topics[1].field.display_name	Computer Science
topics[1].score	0.9886000156402588
topics[1].domain.id	https://openalex.org/domains/3
topics[1].domain.display_name	Physical Sciences
topics[1].subfield.id	https://openalex.org/subfields/1702
topics[1].subfield.display_name	Artificial Intelligence
topics[1].display_name	Topic Modeling
topics[2].id	https://openalex.org/T11689
topics[2].field.id	https://openalex.org/fields/17
topics[2].field.display_name	Computer Science
topics[2].score	0.9696999788284302
topics[2].domain.id	https://openalex.org/domains/3
topics[2].domain.display_name	Physical Sciences
topics[2].subfield.id	https://openalex.org/subfields/1702
topics[2].subfield.display_name	Artificial Intelligence
topics[2].display_name	Adversarial Robustness in Machine Learning
is_xpac	False
apc_list
apc_paid
concepts[0].id	https://openalex.org/C2781067378
concepts[0].level	2
concepts[0].score	0.9515140056610107
concepts[0].wikidata	https://www.wikidata.org/wiki/Q17027399
concepts[0].display_name	Interpretability
concepts[1].id	https://openalex.org/C41008148
concepts[1].level	0
concepts[1].score	0.7414587140083313
concepts[1].wikidata	https://www.wikidata.org/wiki/Q21198
concepts[1].display_name	Computer science
concepts[2].id	https://openalex.org/C154945302
concepts[2].level	1
concepts[2].score	0.6781646013259888
concepts[2].wikidata	https://www.wikidata.org/wiki/Q11660
concepts[2].display_name	Artificial intelligence
concepts[3].id	https://openalex.org/C177264268
concepts[3].level	2
concepts[3].score	0.6143465638160706
concepts[3].wikidata	https://www.wikidata.org/wiki/Q1514741
concepts[3].display_name	Set (abstract data type)
concepts[4].id	https://openalex.org/C50644808
concepts[4].level	2
concepts[4].score	0.5453928112983704
concepts[4].wikidata	https://www.wikidata.org/wiki/Q192776
concepts[4].display_name	Artificial neural network
concepts[5].id	https://openalex.org/C108650721
concepts[5].level	2
concepts[5].score	0.49951791763305664
concepts[5].wikidata	https://www.wikidata.org/wiki/Q1783253
concepts[5].display_name	Counterfactual thinking
concepts[6].id	https://openalex.org/C116834253
concepts[6].level	2
concepts[6].score	0.49792933464050293
concepts[6].wikidata	https://www.wikidata.org/wiki/Q2039217
concepts[6].display_name	Identification (biology)
concepts[7].id	https://openalex.org/C119857082
concepts[7].level	1
concepts[7].score	0.4939609169960022
concepts[7].wikidata	https://www.wikidata.org/wiki/Q2539
concepts[7].display_name	Machine learning
concepts[8].id	https://openalex.org/C2984842247
concepts[8].level	3
concepts[8].score	0.4862119257450104
concepts[8].wikidata	https://www.wikidata.org/wiki/Q197536
concepts[8].display_name	Deep neural networks
concepts[9].id	https://openalex.org/C137293760
concepts[9].level	2
concepts[9].score	0.47205859422683716
concepts[9].wikidata	https://www.wikidata.org/wiki/Q3621696
concepts[9].display_name	Language model
concepts[10].id	https://openalex.org/C27753989
concepts[10].level	2
concepts[10].score	0.46201837062835693
concepts[10].wikidata	https://www.wikidata.org/wiki/Q284885
concepts[10].display_name	Superposition principle
concepts[11].id	https://openalex.org/C48044578
concepts[11].level	2
concepts[11].score	0.4402826428413391
concepts[11].wikidata	https://www.wikidata.org/wiki/Q727490
concepts[11].display_name	Scalability
concepts[12].id	https://openalex.org/C2780451532
concepts[12].level	2
concepts[12].score	0.4115604758262634
concepts[12].wikidata	https://www.wikidata.org/wiki/Q759676
concepts[12].display_name	Task (project management)
concepts[13].id	https://openalex.org/C204321447
concepts[13].level	1
concepts[13].score	0.3449555039405823
concepts[13].wikidata	https://www.wikidata.org/wiki/Q30642
concepts[13].display_name	Natural language processing
concepts[14].id	https://openalex.org/C33923547
concepts[14].level	0
concepts[14].score	0.1207151710987091
concepts[14].wikidata	https://www.wikidata.org/wiki/Q395
concepts[14].display_name	Mathematics
concepts[15].id	https://openalex.org/C187736073
concepts[15].level	1
concepts[15].score	0.0
concepts[15].wikidata	https://www.wikidata.org/wiki/Q2920921
concepts[15].display_name	Management
concepts[16].id	https://openalex.org/C199360897
concepts[16].level	1
concepts[16].score	0.0
concepts[16].wikidata	https://www.wikidata.org/wiki/Q9143
concepts[16].display_name	Programming language
concepts[17].id	https://openalex.org/C111472728
concepts[17].level	1
concepts[17].score	0.0
concepts[17].wikidata	https://www.wikidata.org/wiki/Q9471
concepts[17].display_name	Epistemology
concepts[18].id	https://openalex.org/C86803240
concepts[18].level	0
concepts[18].score	0.0
concepts[18].wikidata	https://www.wikidata.org/wiki/Q420
concepts[18].display_name	Biology
concepts[19].id	https://openalex.org/C134306372
concepts[19].level	1
concepts[19].score	0.0
concepts[19].wikidata	https://www.wikidata.org/wiki/Q7754
concepts[19].display_name	Mathematical analysis
concepts[20].id	https://openalex.org/C162324750
concepts[20].level	0
concepts[20].score	0.0
concepts[20].wikidata	https://www.wikidata.org/wiki/Q8134
concepts[20].display_name	Economics
concepts[21].id	https://openalex.org/C138885662
concepts[21].level	0
concepts[21].score	0.0
concepts[21].wikidata	https://www.wikidata.org/wiki/Q5891
concepts[21].display_name	Philosophy
concepts[22].id	https://openalex.org/C59822182
concepts[22].level	1
concepts[22].score	0.0
concepts[22].wikidata	https://www.wikidata.org/wiki/Q441
concepts[22].display_name	Botany
concepts[23].id	https://openalex.org/C77088390
concepts[23].level	1
concepts[23].score	0.0
concepts[23].wikidata	https://www.wikidata.org/wiki/Q8513
concepts[23].display_name	Database
keywords[0].id	https://openalex.org/keywords/interpretability
keywords[0].score	0.9515140056610107
keywords[0].display_name	Interpretability
keywords[1].id	https://openalex.org/keywords/computer-science
keywords[1].score	0.7414587140083313
keywords[1].display_name	Computer science
keywords[2].id	https://openalex.org/keywords/artificial-intelligence
keywords[2].score	0.6781646013259888
keywords[2].display_name	Artificial intelligence
keywords[3].id	https://openalex.org/keywords/set
keywords[3].score	0.6143465638160706
keywords[3].display_name	Set (abstract data type)
keywords[4].id	https://openalex.org/keywords/artificial-neural-network
keywords[4].score	0.5453928112983704
keywords[4].display_name	Artificial neural network
keywords[5].id	https://openalex.org/keywords/counterfactual-thinking
keywords[5].score	0.49951791763305664
keywords[5].display_name	Counterfactual thinking
keywords[6].id	https://openalex.org/keywords/identification
keywords[6].score	0.49792933464050293
keywords[6].display_name	Identification (biology)
keywords[7].id	https://openalex.org/keywords/machine-learning
keywords[7].score	0.4939609169960022
keywords[7].display_name	Machine learning
keywords[8].id	https://openalex.org/keywords/deep-neural-networks
keywords[8].score	0.4862119257450104
keywords[8].display_name	Deep neural networks
keywords[9].id	https://openalex.org/keywords/language-model
keywords[9].score	0.47205859422683716
keywords[9].display_name	Language model
keywords[10].id	https://openalex.org/keywords/superposition-principle
keywords[10].score	0.46201837062835693
keywords[10].display_name	Superposition principle
keywords[11].id	https://openalex.org/keywords/scalability
keywords[11].score	0.4402826428413391
keywords[11].display_name	Scalability
keywords[12].id	https://openalex.org/keywords/task
keywords[12].score	0.4115604758262634
keywords[12].display_name	Task (project management)
keywords[13].id	https://openalex.org/keywords/natural-language-processing
keywords[13].score	0.3449555039405823
keywords[13].display_name	Natural language processing
keywords[14].id	https://openalex.org/keywords/mathematics
keywords[14].score	0.1207151710987091
keywords[14].display_name	Mathematics
language	en
locations[0].id	pmh:oai:arXiv.org:2309.08600
locations[0].is_oa	True
locations[0].source.id	https://openalex.org/S4306400194
locations[0].source.issn
locations[0].source.type	repository
locations[0].source.is_oa	True
locations[0].source.issn_l
locations[0].source.is_core	False
locations[0].source.is_in_doaj	False
locations[0].source.display_name	arXiv (Cornell University)
locations[0].source.host_organization	https://openalex.org/I205783295
locations[0].source.host_organization_name	Cornell University
locations[0].source.host_organization_lineage	https://openalex.org/I205783295
locations[0].license
locations[0].pdf_url	https://arxiv.org/pdf/2309.08600
locations[0].version	submittedVersion
locations[0].raw_type	text
locations[0].license_id
locations[0].is_accepted	False
locations[0].is_published	False
locations[0].raw_source_name
locations[0].landing_page_url	http://arxiv.org/abs/2309.08600
locations[1].id	doi:10.48550/arxiv.2309.08600
locations[1].is_oa	True
locations[1].source.id	https://openalex.org/S4306400194
locations[1].source.issn
locations[1].source.type	repository
locations[1].source.is_oa	True
locations[1].source.issn_l
locations[1].source.is_core	False
locations[1].source.is_in_doaj	False
locations[1].source.display_name	arXiv (Cornell University)
locations[1].source.host_organization	https://openalex.org/I205783295
locations[1].source.host_organization_name	Cornell University
locations[1].source.host_organization_lineage	https://openalex.org/I205783295
locations[1].license	cc-by
locations[1].pdf_url
locations[1].version
locations[1].raw_type	article
locations[1].license_id	https://openalex.org/licenses/cc-by
locations[1].is_accepted	False
locations[1].is_published
locations[1].raw_source_name
locations[1].landing_page_url	https://doi.org/10.48550/arxiv.2309.08600
indexed_in	arxiv, datacite
authorships[0].author.id	https://openalex.org/A5104170748
authorships[0].author.orcid
authorships[0].author.display_name	Hoagy Cunningham
authorships[0].author_position	first
authorships[0].raw_author_name	Cunningham, Hoagy
authorships[0].is_corresponding	False
authorships[1].author.id	https://openalex.org/A5092897279
authorships[1].author.orcid
authorships[1].author.display_name	Aidan Ewart
authorships[1].author_position	middle
authorships[1].raw_author_name	Ewart, Aidan
authorships[1].is_corresponding	False
authorships[2].author.id	https://openalex.org/A5113003425
authorships[2].author.orcid
authorships[2].author.display_name	Logan Riggs
authorships[2].author_position	middle
authorships[2].raw_author_name	Riggs, Logan
authorships[2].is_corresponding	False
authorships[3].author.id	https://openalex.org/A5109030017
authorships[3].author.orcid
authorships[3].author.display_name	Robert P. Huben
authorships[3].author_position	middle
authorships[3].raw_author_name	Huben, Robert
authorships[3].is_corresponding	False
authorships[4].author.id	https://openalex.org/A5002207803
authorships[4].author.orcid	https://orcid.org/0009-0009-2137-6027
authorships[4].author.display_name	Lee Sharkey
authorships[4].author_position	last
authorships[4].raw_author_name	Sharkey, Lee
authorships[4].is_corresponding	False
has_content.pdf	False
has_content.grobid_xml	False
is_paratext	False
open_access.is_oa	True
open_access.oa_url	https://arxiv.org/pdf/2309.08600
open_access.oa_status	green
open_access.any_repository_has_fulltext	False
created_date	2025-10-10T00:00:00
display_name	Sparse Autoencoders Find Highly Interpretable Features in Language Models
has_fulltext	False
is_retracted	False
updated_date	2025-11-06T06:51:31.235846
primary_topic.id	https://openalex.org/T12026
primary_topic.field.id	https://openalex.org/fields/17
primary_topic.field.display_name	Computer Science
primary_topic.score	0.9973999857902527
primary_topic.domain.id	https://openalex.org/domains/3
primary_topic.domain.display_name	Physical Sciences
primary_topic.subfield.id	https://openalex.org/subfields/1702
primary_topic.subfield.display_name	Artificial Intelligence
primary_topic.display_name	Explainable Artificial Intelligence (XAI)
related_works	https://openalex.org/W3201448254, https://openalex.org/W2905433371, https://openalex.org/W4286970243, https://openalex.org/W2964449086, https://openalex.org/W4319993887, https://openalex.org/W4297789176, https://openalex.org/W2768346313, https://openalex.org/W2963249138, https://openalex.org/W2998594699, https://openalex.org/W2968060152
cited_by_count	34
counts_by_year[0].year	2025
counts_by_year[0].cited_by_count	28
counts_by_year[1].year	2024
counts_by_year[1].cited_by_count	5
counts_by_year[2].year	2023
counts_by_year[2].cited_by_count	1
locations_count	2
best_oa_location.id	pmh:oai:arXiv.org:2309.08600
best_oa_location.is_oa	True
best_oa_location.source.id	https://openalex.org/S4306400194
best_oa_location.source.issn
best_oa_location.source.type	repository
best_oa_location.source.is_oa	True
best_oa_location.source.issn_l
best_oa_location.source.is_core	False
best_oa_location.source.is_in_doaj	False
best_oa_location.source.display_name	arXiv (Cornell University)
best_oa_location.source.host_organization	https://openalex.org/I205783295
best_oa_location.source.host_organization_name	Cornell University
best_oa_location.source.host_organization_lineage	https://openalex.org/I205783295
best_oa_location.license
best_oa_location.pdf_url	https://arxiv.org/pdf/2309.08600
best_oa_location.version	submittedVersion
best_oa_location.raw_type	text
best_oa_location.license_id
best_oa_location.is_accepted	False
best_oa_location.is_published	False
best_oa_location.raw_source_name
best_oa_location.landing_page_url	http://arxiv.org/abs/2309.08600
primary_location.id	pmh:oai:arXiv.org:2309.08600
primary_location.is_oa	True
primary_location.source.id	https://openalex.org/S4306400194
primary_location.source.issn
primary_location.source.type	repository
primary_location.source.is_oa	True
primary_location.source.issn_l
primary_location.source.is_core	False
primary_location.source.is_in_doaj	False
primary_location.source.display_name	arXiv (Cornell University)
primary_location.source.host_organization	https://openalex.org/I205783295
primary_location.source.host_organization_name	Cornell University
primary_location.source.host_organization_lineage	https://openalex.org/I205783295
primary_location.license
primary_location.pdf_url	https://arxiv.org/pdf/2309.08600
primary_location.version	submittedVersion
primary_location.raw_type	text
primary_location.license_id
primary_location.is_accepted	False
primary_location.is_published	False
primary_location.raw_source_name
primary_location.landing_page_url	http://arxiv.org/abs/2309.08600
publication_date	2023-09-15
publication_year	2023
referenced_works_count	0
abstract_inverted_index.a	5, 89, 149, 169, 178
abstract_inverted_index.an	60
abstract_inverted_index.as	177
abstract_inverted_index.by	56, 109, 116
abstract_inverted_index.in	19, 65, 165
abstract_inverted_index.is	12, 44, 114, 160
abstract_inverted_index.it	159
abstract_inverted_index.of	1, 8, 42, 63, 88, 96, 127
abstract_inverted_index.on	141
abstract_inverted_index.to	4, 17, 59, 70, 76, 83, 148, 162
abstract_inverted_index.us	26
abstract_inverted_index.we	74, 120, 129, 186
abstract_inverted_index.One	0, 39
abstract_inverted_index.Our	173
abstract_inverted_index.and	104, 193
abstract_inverted_index.are	36, 101, 135
abstract_inverted_index.can	130
abstract_inverted_index.for	32, 138, 180
abstract_inverted_index.may	175
abstract_inverted_index.our	124
abstract_inverted_index.set	62, 126
abstract_inverted_index.the	2, 85, 132, 142
abstract_inverted_index.This	155
abstract_inverted_index.from	27
abstract_inverted_index.have	54
abstract_inverted_index.hope	187
abstract_inverted_index.more	50, 102
abstract_inverted_index.sets	95
abstract_inverted_index.show	121
abstract_inverted_index.task	146
abstract_inverted_index.than	52, 69, 106, 152
abstract_inverted_index.that	100, 122, 134, 158
abstract_inverted_index.they	53
abstract_inverted_index.what	33
abstract_inverted_index.will	188
abstract_inverted_index.with	123
abstract_inverted_index.work	156
abstract_inverted_index.Here,	73
abstract_inverted_index.These	92
abstract_inverted_index.cause	41
abstract_inverted_index.doing	37
abstract_inverted_index.finer	150
abstract_inverted_index.learn	94
abstract_inverted_index.model	191
abstract_inverted_index.serve	176
abstract_inverted_index.those	78
abstract_inverted_index.using	80, 168
abstract_inverted_index.where	14, 46, 112
abstract_inverted_index.which	185
abstract_inverted_index.work,	184
abstract_inverted_index.appear	16
abstract_inverted_index.better	6
abstract_inverted_index.degree	151
abstract_inverted_index.enable	189
abstract_inverted_index.future	181
abstract_inverted_index.method	174
abstract_inverted_index.model.	91
abstract_inverted_index.models	167
abstract_inverted_index.neural	9, 34, 47
abstract_inverted_index.object	144
abstract_inverted_index.rather	68
abstract_inverted_index.space,	67
abstract_inverted_index.sparse	81
abstract_inverted_index.attempt	75
abstract_inverted_index.greater	190
abstract_inverted_index.learned	125
abstract_inverted_index.method.	172
abstract_inverted_index.neurons	15, 55
abstract_inverted_index.resolve	163
abstract_inverted_index.activate	18
abstract_inverted_index.causally	136
abstract_inverted_index.concise,	29
abstract_inverted_index.distinct	22
abstract_inverted_index.features	51, 58, 99, 133
abstract_inverted_index.identify	77
abstract_inverted_index.indirect	143
abstract_inverted_index.internal	86
abstract_inverted_index.language	90, 166
abstract_inverted_index.measured	115
abstract_inverted_index.methods.	118
abstract_inverted_index.networks	35, 48
abstract_inverted_index.neurons.	72
abstract_inverted_index.pinpoint	131
abstract_inverted_index.possible	161
abstract_inverted_index.prevents	25
abstract_inverted_index.previous	153
abstract_inverted_index.sparsely	97
abstract_inverted_index.Moreover,	119
abstract_inverted_index.assigning	57
abstract_inverted_index.automated	117
abstract_inverted_index.behaviour	140
abstract_inverted_index.contexts.	23
abstract_inverted_index.features,	128
abstract_inverted_index.indicates	157
abstract_inverted_index.internals	11
abstract_inverted_index.multiple,	20
abstract_inverted_index.networks'	10
abstract_inverted_index.represent	49
abstract_inverted_index.scalable,	170
abstract_inverted_index.activating	98
abstract_inverted_index.activation	66
abstract_inverted_index.directions	64, 107
abstract_inverted_index.foundation	179
abstract_inverted_index.identified	108
abstract_inverted_index.individual	71
abstract_inverted_index.roadblocks	3
abstract_inverted_index.activations	87
abstract_inverted_index.alternative	110
abstract_inverted_index.approaches,	111
abstract_inverted_index.directions,	79
abstract_inverted_index.identifying	28
abstract_inverted_index.internally.	38
abstract_inverted_index.mechanistic	182
abstract_inverted_index.reconstruct	84
abstract_inverted_index.responsible	137
abstract_inverted_index.autoencoders	82, 93
abstract_inverted_index.explanations	31
abstract_inverted_index.hypothesised	40
abstract_inverted_index.monosemantic	105
abstract_inverted_index.overcomplete	61
abstract_inverted_index.semantically	21
abstract_inverted_index.transparency	192
abstract_inverted_index.unsupervised	171
abstract_inverted_index.interpretable	103
abstract_inverted_index.steerability.	194
abstract_inverted_index.superposition	164
abstract_inverted_index.understanding	7
abstract_inverted_index.counterfactual	139
abstract_inverted_index.identification	145
abstract_inverted_index.Polysemanticity	24
abstract_inverted_index.decompositions.	154
abstract_inverted_index.polysemanticity	43
abstract_inverted_index.interpretability	113, 183
abstract_inverted_index.human-understandable	30
abstract_inverted_index.\textit{superposition},	45
abstract_inverted_index.\textit{polysemanticity},	13
abstract_inverted_index.\citep{wang2022interpretability}	147
cited_by_percentile_year
countries_distinct_count	0
institutions_distinct_count	5
sustainable_development_goals[0].id	https://metadata.un.org/sdg/4
sustainable_development_goals[0].score	0.6399999856948853
sustainable_development_goals[0].display_name	Quality Education
citation_normalized_percentile