Sparse Autoencoders Find Highly Interpretable Features in Language Models Article Swipe
YOU?
·
· 2023
· Open Access
·
· DOI: https://doi.org/10.48550/arxiv.2309.08600
One of the roadblocks to a better understanding of neural networks' internals is \textit{polysemanticity}, where neurons appear to activate in multiple, semantically distinct contexts. Polysemanticity prevents us from identifying concise, human-understandable explanations for what neural networks are doing internally. One hypothesised cause of polysemanticity is \textit{superposition}, where neural networks represent more features than they have neurons by assigning features to an overcomplete set of directions in activation space, rather than to individual neurons. Here, we attempt to identify those directions, using sparse autoencoders to reconstruct the internal activations of a language model. These autoencoders learn sets of sparsely activating features that are more interpretable and monosemantic than directions identified by alternative approaches, where interpretability is measured by automated methods. Moreover, we show that with our learned set of features, we can pinpoint the features that are causally responsible for counterfactual behaviour on the indirect object identification task \citep{wang2022interpretability} to a finer degree than previous decompositions. This work indicates that it is possible to resolve superposition in language models using a scalable, unsupervised method. Our method may serve as a foundation for future mechanistic interpretability work, which we hope will enable greater model transparency and steerability.
Related Topics
- Type
- preprint
- Language
- en
- Landing Page
- http://arxiv.org/abs/2309.08600
- https://arxiv.org/pdf/2309.08600
- OA Status
- green
- Cited By
- 34
- Related Works
- 10
- OpenAlex ID
- https://openalex.org/W4386839891
Raw OpenAlex JSON
- OpenAlex ID
-
https://openalex.org/W4386839891Canonical identifier for this work in OpenAlex
- DOI
-
https://doi.org/10.48550/arxiv.2309.08600Digital Object Identifier
- Title
-
Sparse Autoencoders Find Highly Interpretable Features in Language ModelsWork title
- Type
-
preprintOpenAlex work type
- Language
-
enPrimary language
- Publication year
-
2023Year of publication
- Publication date
-
2023-09-15Full publication date if available
- Authors
-
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert P. Huben, Lee SharkeyList of authors in order
- Landing page
-
https://arxiv.org/abs/2309.08600Publisher landing page
- PDF URL
-
https://arxiv.org/pdf/2309.08600Direct link to full text PDF
- Open access
-
YesWhether a free full text is available
- OA status
-
greenOpen access status per OpenAlex
- OA URL
-
https://arxiv.org/pdf/2309.08600Direct OA link when available
- Concepts
-
Interpretability, Computer science, Artificial intelligence, Set (abstract data type), Artificial neural network, Counterfactual thinking, Identification (biology), Machine learning, Deep neural networks, Language model, Superposition principle, Scalability, Task (project management), Natural language processing, Mathematics, Management, Programming language, Epistemology, Biology, Mathematical analysis, Economics, Philosophy, Botany, DatabaseTop concepts (fields/topics) attached by OpenAlex
- Cited by
-
34Total citation count in OpenAlex
- Citations by year (recent)
-
2025: 28, 2024: 5, 2023: 1Per-year citation counts (last 5 years)
- Related works (count)
-
10Other works algorithmically related by OpenAlex
Full payload
| id | https://openalex.org/W4386839891 |
|---|---|
| doi | https://doi.org/10.48550/arxiv.2309.08600 |
| ids.doi | https://doi.org/10.48550/arxiv.2309.08600 |
| ids.openalex | https://openalex.org/W4386839891 |
| fwci | |
| type | preprint |
| title | Sparse Autoencoders Find Highly Interpretable Features in Language Models |
| biblio.issue | |
| biblio.volume | |
| biblio.last_page | |
| biblio.first_page | |
| topics[0].id | https://openalex.org/T12026 |
| topics[0].field.id | https://openalex.org/fields/17 |
| topics[0].field.display_name | Computer Science |
| topics[0].score | 0.9973999857902527 |
| topics[0].domain.id | https://openalex.org/domains/3 |
| topics[0].domain.display_name | Physical Sciences |
| topics[0].subfield.id | https://openalex.org/subfields/1702 |
| topics[0].subfield.display_name | Artificial Intelligence |
| topics[0].display_name | Explainable Artificial Intelligence (XAI) |
| topics[1].id | https://openalex.org/T10028 |
| topics[1].field.id | https://openalex.org/fields/17 |
| topics[1].field.display_name | Computer Science |
| topics[1].score | 0.9886000156402588 |
| topics[1].domain.id | https://openalex.org/domains/3 |
| topics[1].domain.display_name | Physical Sciences |
| topics[1].subfield.id | https://openalex.org/subfields/1702 |
| topics[1].subfield.display_name | Artificial Intelligence |
| topics[1].display_name | Topic Modeling |
| topics[2].id | https://openalex.org/T11689 |
| topics[2].field.id | https://openalex.org/fields/17 |
| topics[2].field.display_name | Computer Science |
| topics[2].score | 0.9696999788284302 |
| topics[2].domain.id | https://openalex.org/domains/3 |
| topics[2].domain.display_name | Physical Sciences |
| topics[2].subfield.id | https://openalex.org/subfields/1702 |
| topics[2].subfield.display_name | Artificial Intelligence |
| topics[2].display_name | Adversarial Robustness in Machine Learning |
| is_xpac | False |
| apc_list | |
| apc_paid | |
| concepts[0].id | https://openalex.org/C2781067378 |
| concepts[0].level | 2 |
| concepts[0].score | 0.9515140056610107 |
| concepts[0].wikidata | https://www.wikidata.org/wiki/Q17027399 |
| concepts[0].display_name | Interpretability |
| concepts[1].id | https://openalex.org/C41008148 |
| concepts[1].level | 0 |
| concepts[1].score | 0.7414587140083313 |
| concepts[1].wikidata | https://www.wikidata.org/wiki/Q21198 |
| concepts[1].display_name | Computer science |
| concepts[2].id | https://openalex.org/C154945302 |
| concepts[2].level | 1 |
| concepts[2].score | 0.6781646013259888 |
| concepts[2].wikidata | https://www.wikidata.org/wiki/Q11660 |
| concepts[2].display_name | Artificial intelligence |
| concepts[3].id | https://openalex.org/C177264268 |
| concepts[3].level | 2 |
| concepts[3].score | 0.6143465638160706 |
| concepts[3].wikidata | https://www.wikidata.org/wiki/Q1514741 |
| concepts[3].display_name | Set (abstract data type) |
| concepts[4].id | https://openalex.org/C50644808 |
| concepts[4].level | 2 |
| concepts[4].score | 0.5453928112983704 |
| concepts[4].wikidata | https://www.wikidata.org/wiki/Q192776 |
| concepts[4].display_name | Artificial neural network |
| concepts[5].id | https://openalex.org/C108650721 |
| concepts[5].level | 2 |
| concepts[5].score | 0.49951791763305664 |
| concepts[5].wikidata | https://www.wikidata.org/wiki/Q1783253 |
| concepts[5].display_name | Counterfactual thinking |
| concepts[6].id | https://openalex.org/C116834253 |
| concepts[6].level | 2 |
| concepts[6].score | 0.49792933464050293 |
| concepts[6].wikidata | https://www.wikidata.org/wiki/Q2039217 |
| concepts[6].display_name | Identification (biology) |
| concepts[7].id | https://openalex.org/C119857082 |
| concepts[7].level | 1 |
| concepts[7].score | 0.4939609169960022 |
| concepts[7].wikidata | https://www.wikidata.org/wiki/Q2539 |
| concepts[7].display_name | Machine learning |
| concepts[8].id | https://openalex.org/C2984842247 |
| concepts[8].level | 3 |
| concepts[8].score | 0.4862119257450104 |
| concepts[8].wikidata | https://www.wikidata.org/wiki/Q197536 |
| concepts[8].display_name | Deep neural networks |
| concepts[9].id | https://openalex.org/C137293760 |
| concepts[9].level | 2 |
| concepts[9].score | 0.47205859422683716 |
| concepts[9].wikidata | https://www.wikidata.org/wiki/Q3621696 |
| concepts[9].display_name | Language model |
| concepts[10].id | https://openalex.org/C27753989 |
| concepts[10].level | 2 |
| concepts[10].score | 0.46201837062835693 |
| concepts[10].wikidata | https://www.wikidata.org/wiki/Q284885 |
| concepts[10].display_name | Superposition principle |
| concepts[11].id | https://openalex.org/C48044578 |
| concepts[11].level | 2 |
| concepts[11].score | 0.4402826428413391 |
| concepts[11].wikidata | https://www.wikidata.org/wiki/Q727490 |
| concepts[11].display_name | Scalability |
| concepts[12].id | https://openalex.org/C2780451532 |
| concepts[12].level | 2 |
| concepts[12].score | 0.4115604758262634 |
| concepts[12].wikidata | https://www.wikidata.org/wiki/Q759676 |
| concepts[12].display_name | Task (project management) |
| concepts[13].id | https://openalex.org/C204321447 |
| concepts[13].level | 1 |
| concepts[13].score | 0.3449555039405823 |
| concepts[13].wikidata | https://www.wikidata.org/wiki/Q30642 |
| concepts[13].display_name | Natural language processing |
| concepts[14].id | https://openalex.org/C33923547 |
| concepts[14].level | 0 |
| concepts[14].score | 0.1207151710987091 |
| concepts[14].wikidata | https://www.wikidata.org/wiki/Q395 |
| concepts[14].display_name | Mathematics |
| concepts[15].id | https://openalex.org/C187736073 |
| concepts[15].level | 1 |
| concepts[15].score | 0.0 |
| concepts[15].wikidata | https://www.wikidata.org/wiki/Q2920921 |
| concepts[15].display_name | Management |
| concepts[16].id | https://openalex.org/C199360897 |
| concepts[16].level | 1 |
| concepts[16].score | 0.0 |
| concepts[16].wikidata | https://www.wikidata.org/wiki/Q9143 |
| concepts[16].display_name | Programming language |
| concepts[17].id | https://openalex.org/C111472728 |
| concepts[17].level | 1 |
| concepts[17].score | 0.0 |
| concepts[17].wikidata | https://www.wikidata.org/wiki/Q9471 |
| concepts[17].display_name | Epistemology |
| concepts[18].id | https://openalex.org/C86803240 |
| concepts[18].level | 0 |
| concepts[18].score | 0.0 |
| concepts[18].wikidata | https://www.wikidata.org/wiki/Q420 |
| concepts[18].display_name | Biology |
| concepts[19].id | https://openalex.org/C134306372 |
| concepts[19].level | 1 |
| concepts[19].score | 0.0 |
| concepts[19].wikidata | https://www.wikidata.org/wiki/Q7754 |
| concepts[19].display_name | Mathematical analysis |
| concepts[20].id | https://openalex.org/C162324750 |
| concepts[20].level | 0 |
| concepts[20].score | 0.0 |
| concepts[20].wikidata | https://www.wikidata.org/wiki/Q8134 |
| concepts[20].display_name | Economics |
| concepts[21].id | https://openalex.org/C138885662 |
| concepts[21].level | 0 |
| concepts[21].score | 0.0 |
| concepts[21].wikidata | https://www.wikidata.org/wiki/Q5891 |
| concepts[21].display_name | Philosophy |
| concepts[22].id | https://openalex.org/C59822182 |
| concepts[22].level | 1 |
| concepts[22].score | 0.0 |
| concepts[22].wikidata | https://www.wikidata.org/wiki/Q441 |
| concepts[22].display_name | Botany |
| concepts[23].id | https://openalex.org/C77088390 |
| concepts[23].level | 1 |
| concepts[23].score | 0.0 |
| concepts[23].wikidata | https://www.wikidata.org/wiki/Q8513 |
| concepts[23].display_name | Database |
| keywords[0].id | https://openalex.org/keywords/interpretability |
| keywords[0].score | 0.9515140056610107 |
| keywords[0].display_name | Interpretability |
| keywords[1].id | https://openalex.org/keywords/computer-science |
| keywords[1].score | 0.7414587140083313 |
| keywords[1].display_name | Computer science |
| keywords[2].id | https://openalex.org/keywords/artificial-intelligence |
| keywords[2].score | 0.6781646013259888 |
| keywords[2].display_name | Artificial intelligence |
| keywords[3].id | https://openalex.org/keywords/set |
| keywords[3].score | 0.6143465638160706 |
| keywords[3].display_name | Set (abstract data type) |
| keywords[4].id | https://openalex.org/keywords/artificial-neural-network |
| keywords[4].score | 0.5453928112983704 |
| keywords[4].display_name | Artificial neural network |
| keywords[5].id | https://openalex.org/keywords/counterfactual-thinking |
| keywords[5].score | 0.49951791763305664 |
| keywords[5].display_name | Counterfactual thinking |
| keywords[6].id | https://openalex.org/keywords/identification |
| keywords[6].score | 0.49792933464050293 |
| keywords[6].display_name | Identification (biology) |
| keywords[7].id | https://openalex.org/keywords/machine-learning |
| keywords[7].score | 0.4939609169960022 |
| keywords[7].display_name | Machine learning |
| keywords[8].id | https://openalex.org/keywords/deep-neural-networks |
| keywords[8].score | 0.4862119257450104 |
| keywords[8].display_name | Deep neural networks |
| keywords[9].id | https://openalex.org/keywords/language-model |
| keywords[9].score | 0.47205859422683716 |
| keywords[9].display_name | Language model |
| keywords[10].id | https://openalex.org/keywords/superposition-principle |
| keywords[10].score | 0.46201837062835693 |
| keywords[10].display_name | Superposition principle |
| keywords[11].id | https://openalex.org/keywords/scalability |
| keywords[11].score | 0.4402826428413391 |
| keywords[11].display_name | Scalability |
| keywords[12].id | https://openalex.org/keywords/task |
| keywords[12].score | 0.4115604758262634 |
| keywords[12].display_name | Task (project management) |
| keywords[13].id | https://openalex.org/keywords/natural-language-processing |
| keywords[13].score | 0.3449555039405823 |
| keywords[13].display_name | Natural language processing |
| keywords[14].id | https://openalex.org/keywords/mathematics |
| keywords[14].score | 0.1207151710987091 |
| keywords[14].display_name | Mathematics |
| language | en |
| locations[0].id | pmh:oai:arXiv.org:2309.08600 |
| locations[0].is_oa | True |
| locations[0].source.id | https://openalex.org/S4306400194 |
| locations[0].source.issn | |
| locations[0].source.type | repository |
| locations[0].source.is_oa | True |
| locations[0].source.issn_l | |
| locations[0].source.is_core | False |
| locations[0].source.is_in_doaj | False |
| locations[0].source.display_name | arXiv (Cornell University) |
| locations[0].source.host_organization | https://openalex.org/I205783295 |
| locations[0].source.host_organization_name | Cornell University |
| locations[0].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[0].license | |
| locations[0].pdf_url | https://arxiv.org/pdf/2309.08600 |
| locations[0].version | submittedVersion |
| locations[0].raw_type | text |
| locations[0].license_id | |
| locations[0].is_accepted | False |
| locations[0].is_published | False |
| locations[0].raw_source_name | |
| locations[0].landing_page_url | http://arxiv.org/abs/2309.08600 |
| locations[1].id | doi:10.48550/arxiv.2309.08600 |
| locations[1].is_oa | True |
| locations[1].source.id | https://openalex.org/S4306400194 |
| locations[1].source.issn | |
| locations[1].source.type | repository |
| locations[1].source.is_oa | True |
| locations[1].source.issn_l | |
| locations[1].source.is_core | False |
| locations[1].source.is_in_doaj | False |
| locations[1].source.display_name | arXiv (Cornell University) |
| locations[1].source.host_organization | https://openalex.org/I205783295 |
| locations[1].source.host_organization_name | Cornell University |
| locations[1].source.host_organization_lineage | https://openalex.org/I205783295 |
| locations[1].license | cc-by |
| locations[1].pdf_url | |
| locations[1].version | |
| locations[1].raw_type | article |
| locations[1].license_id | https://openalex.org/licenses/cc-by |
| locations[1].is_accepted | False |
| locations[1].is_published | |
| locations[1].raw_source_name | |
| locations[1].landing_page_url | https://doi.org/10.48550/arxiv.2309.08600 |
| indexed_in | arxiv, datacite |
| authorships[0].author.id | https://openalex.org/A5104170748 |
| authorships[0].author.orcid | |
| authorships[0].author.display_name | Hoagy Cunningham |
| authorships[0].author_position | first |
| authorships[0].raw_author_name | Cunningham, Hoagy |
| authorships[0].is_corresponding | False |
| authorships[1].author.id | https://openalex.org/A5092897279 |
| authorships[1].author.orcid | |
| authorships[1].author.display_name | Aidan Ewart |
| authorships[1].author_position | middle |
| authorships[1].raw_author_name | Ewart, Aidan |
| authorships[1].is_corresponding | False |
| authorships[2].author.id | https://openalex.org/A5113003425 |
| authorships[2].author.orcid | |
| authorships[2].author.display_name | Logan Riggs |
| authorships[2].author_position | middle |
| authorships[2].raw_author_name | Riggs, Logan |
| authorships[2].is_corresponding | False |
| authorships[3].author.id | https://openalex.org/A5109030017 |
| authorships[3].author.orcid | |
| authorships[3].author.display_name | Robert P. Huben |
| authorships[3].author_position | middle |
| authorships[3].raw_author_name | Huben, Robert |
| authorships[3].is_corresponding | False |
| authorships[4].author.id | https://openalex.org/A5002207803 |
| authorships[4].author.orcid | https://orcid.org/0009-0009-2137-6027 |
| authorships[4].author.display_name | Lee Sharkey |
| authorships[4].author_position | last |
| authorships[4].raw_author_name | Sharkey, Lee |
| authorships[4].is_corresponding | False |
| has_content.pdf | False |
| has_content.grobid_xml | False |
| is_paratext | False |
| open_access.is_oa | True |
| open_access.oa_url | https://arxiv.org/pdf/2309.08600 |
| open_access.oa_status | green |
| open_access.any_repository_has_fulltext | False |
| created_date | 2025-10-10T00:00:00 |
| display_name | Sparse Autoencoders Find Highly Interpretable Features in Language Models |
| has_fulltext | False |
| is_retracted | False |
| updated_date | 2025-11-06T06:51:31.235846 |
| primary_topic.id | https://openalex.org/T12026 |
| primary_topic.field.id | https://openalex.org/fields/17 |
| primary_topic.field.display_name | Computer Science |
| primary_topic.score | 0.9973999857902527 |
| primary_topic.domain.id | https://openalex.org/domains/3 |
| primary_topic.domain.display_name | Physical Sciences |
| primary_topic.subfield.id | https://openalex.org/subfields/1702 |
| primary_topic.subfield.display_name | Artificial Intelligence |
| primary_topic.display_name | Explainable Artificial Intelligence (XAI) |
| related_works | https://openalex.org/W3201448254, https://openalex.org/W2905433371, https://openalex.org/W4286970243, https://openalex.org/W2964449086, https://openalex.org/W4319993887, https://openalex.org/W4297789176, https://openalex.org/W2768346313, https://openalex.org/W2963249138, https://openalex.org/W2998594699, https://openalex.org/W2968060152 |
| cited_by_count | 34 |
| counts_by_year[0].year | 2025 |
| counts_by_year[0].cited_by_count | 28 |
| counts_by_year[1].year | 2024 |
| counts_by_year[1].cited_by_count | 5 |
| counts_by_year[2].year | 2023 |
| counts_by_year[2].cited_by_count | 1 |
| locations_count | 2 |
| best_oa_location.id | pmh:oai:arXiv.org:2309.08600 |
| best_oa_location.is_oa | True |
| best_oa_location.source.id | https://openalex.org/S4306400194 |
| best_oa_location.source.issn | |
| best_oa_location.source.type | repository |
| best_oa_location.source.is_oa | True |
| best_oa_location.source.issn_l | |
| best_oa_location.source.is_core | False |
| best_oa_location.source.is_in_doaj | False |
| best_oa_location.source.display_name | arXiv (Cornell University) |
| best_oa_location.source.host_organization | https://openalex.org/I205783295 |
| best_oa_location.source.host_organization_name | Cornell University |
| best_oa_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| best_oa_location.license | |
| best_oa_location.pdf_url | https://arxiv.org/pdf/2309.08600 |
| best_oa_location.version | submittedVersion |
| best_oa_location.raw_type | text |
| best_oa_location.license_id | |
| best_oa_location.is_accepted | False |
| best_oa_location.is_published | False |
| best_oa_location.raw_source_name | |
| best_oa_location.landing_page_url | http://arxiv.org/abs/2309.08600 |
| primary_location.id | pmh:oai:arXiv.org:2309.08600 |
| primary_location.is_oa | True |
| primary_location.source.id | https://openalex.org/S4306400194 |
| primary_location.source.issn | |
| primary_location.source.type | repository |
| primary_location.source.is_oa | True |
| primary_location.source.issn_l | |
| primary_location.source.is_core | False |
| primary_location.source.is_in_doaj | False |
| primary_location.source.display_name | arXiv (Cornell University) |
| primary_location.source.host_organization | https://openalex.org/I205783295 |
| primary_location.source.host_organization_name | Cornell University |
| primary_location.source.host_organization_lineage | https://openalex.org/I205783295 |
| primary_location.license | |
| primary_location.pdf_url | https://arxiv.org/pdf/2309.08600 |
| primary_location.version | submittedVersion |
| primary_location.raw_type | text |
| primary_location.license_id | |
| primary_location.is_accepted | False |
| primary_location.is_published | False |
| primary_location.raw_source_name | |
| primary_location.landing_page_url | http://arxiv.org/abs/2309.08600 |
| publication_date | 2023-09-15 |
| publication_year | 2023 |
| referenced_works_count | 0 |
| abstract_inverted_index.a | 5, 89, 149, 169, 178 |
| abstract_inverted_index.an | 60 |
| abstract_inverted_index.as | 177 |
| abstract_inverted_index.by | 56, 109, 116 |
| abstract_inverted_index.in | 19, 65, 165 |
| abstract_inverted_index.is | 12, 44, 114, 160 |
| abstract_inverted_index.it | 159 |
| abstract_inverted_index.of | 1, 8, 42, 63, 88, 96, 127 |
| abstract_inverted_index.on | 141 |
| abstract_inverted_index.to | 4, 17, 59, 70, 76, 83, 148, 162 |
| abstract_inverted_index.us | 26 |
| abstract_inverted_index.we | 74, 120, 129, 186 |
| abstract_inverted_index.One | 0, 39 |
| abstract_inverted_index.Our | 173 |
| abstract_inverted_index.and | 104, 193 |
| abstract_inverted_index.are | 36, 101, 135 |
| abstract_inverted_index.can | 130 |
| abstract_inverted_index.for | 32, 138, 180 |
| abstract_inverted_index.may | 175 |
| abstract_inverted_index.our | 124 |
| abstract_inverted_index.set | 62, 126 |
| abstract_inverted_index.the | 2, 85, 132, 142 |
| abstract_inverted_index.This | 155 |
| abstract_inverted_index.from | 27 |
| abstract_inverted_index.have | 54 |
| abstract_inverted_index.hope | 187 |
| abstract_inverted_index.more | 50, 102 |
| abstract_inverted_index.sets | 95 |
| abstract_inverted_index.show | 121 |
| abstract_inverted_index.task | 146 |
| abstract_inverted_index.than | 52, 69, 106, 152 |
| abstract_inverted_index.that | 100, 122, 134, 158 |
| abstract_inverted_index.they | 53 |
| abstract_inverted_index.what | 33 |
| abstract_inverted_index.will | 188 |
| abstract_inverted_index.with | 123 |
| abstract_inverted_index.work | 156 |
| abstract_inverted_index.Here, | 73 |
| abstract_inverted_index.These | 92 |
| abstract_inverted_index.cause | 41 |
| abstract_inverted_index.doing | 37 |
| abstract_inverted_index.finer | 150 |
| abstract_inverted_index.learn | 94 |
| abstract_inverted_index.model | 191 |
| abstract_inverted_index.serve | 176 |
| abstract_inverted_index.those | 78 |
| abstract_inverted_index.using | 80, 168 |
| abstract_inverted_index.where | 14, 46, 112 |
| abstract_inverted_index.which | 185 |
| abstract_inverted_index.work, | 184 |
| abstract_inverted_index.appear | 16 |
| abstract_inverted_index.better | 6 |
| abstract_inverted_index.degree | 151 |
| abstract_inverted_index.enable | 189 |
| abstract_inverted_index.future | 181 |
| abstract_inverted_index.method | 174 |
| abstract_inverted_index.model. | 91 |
| abstract_inverted_index.models | 167 |
| abstract_inverted_index.neural | 9, 34, 47 |
| abstract_inverted_index.object | 144 |
| abstract_inverted_index.rather | 68 |
| abstract_inverted_index.space, | 67 |
| abstract_inverted_index.sparse | 81 |
| abstract_inverted_index.attempt | 75 |
| abstract_inverted_index.greater | 190 |
| abstract_inverted_index.learned | 125 |
| abstract_inverted_index.method. | 172 |
| abstract_inverted_index.neurons | 15, 55 |
| abstract_inverted_index.resolve | 163 |
| abstract_inverted_index.activate | 18 |
| abstract_inverted_index.causally | 136 |
| abstract_inverted_index.concise, | 29 |
| abstract_inverted_index.distinct | 22 |
| abstract_inverted_index.features | 51, 58, 99, 133 |
| abstract_inverted_index.identify | 77 |
| abstract_inverted_index.indirect | 143 |
| abstract_inverted_index.internal | 86 |
| abstract_inverted_index.language | 90, 166 |
| abstract_inverted_index.measured | 115 |
| abstract_inverted_index.methods. | 118 |
| abstract_inverted_index.networks | 35, 48 |
| abstract_inverted_index.neurons. | 72 |
| abstract_inverted_index.pinpoint | 131 |
| abstract_inverted_index.possible | 161 |
| abstract_inverted_index.prevents | 25 |
| abstract_inverted_index.previous | 153 |
| abstract_inverted_index.sparsely | 97 |
| abstract_inverted_index.Moreover, | 119 |
| abstract_inverted_index.assigning | 57 |
| abstract_inverted_index.automated | 117 |
| abstract_inverted_index.behaviour | 140 |
| abstract_inverted_index.contexts. | 23 |
| abstract_inverted_index.features, | 128 |
| abstract_inverted_index.indicates | 157 |
| abstract_inverted_index.internals | 11 |
| abstract_inverted_index.multiple, | 20 |
| abstract_inverted_index.networks' | 10 |
| abstract_inverted_index.represent | 49 |
| abstract_inverted_index.scalable, | 170 |
| abstract_inverted_index.activating | 98 |
| abstract_inverted_index.activation | 66 |
| abstract_inverted_index.directions | 64, 107 |
| abstract_inverted_index.foundation | 179 |
| abstract_inverted_index.identified | 108 |
| abstract_inverted_index.individual | 71 |
| abstract_inverted_index.roadblocks | 3 |
| abstract_inverted_index.activations | 87 |
| abstract_inverted_index.alternative | 110 |
| abstract_inverted_index.approaches, | 111 |
| abstract_inverted_index.directions, | 79 |
| abstract_inverted_index.identifying | 28 |
| abstract_inverted_index.internally. | 38 |
| abstract_inverted_index.mechanistic | 182 |
| abstract_inverted_index.reconstruct | 84 |
| abstract_inverted_index.responsible | 137 |
| abstract_inverted_index.autoencoders | 82, 93 |
| abstract_inverted_index.explanations | 31 |
| abstract_inverted_index.hypothesised | 40 |
| abstract_inverted_index.monosemantic | 105 |
| abstract_inverted_index.overcomplete | 61 |
| abstract_inverted_index.semantically | 21 |
| abstract_inverted_index.transparency | 192 |
| abstract_inverted_index.unsupervised | 171 |
| abstract_inverted_index.interpretable | 103 |
| abstract_inverted_index.steerability. | 194 |
| abstract_inverted_index.superposition | 164 |
| abstract_inverted_index.understanding | 7 |
| abstract_inverted_index.counterfactual | 139 |
| abstract_inverted_index.identification | 145 |
| abstract_inverted_index.Polysemanticity | 24 |
| abstract_inverted_index.decompositions. | 154 |
| abstract_inverted_index.polysemanticity | 43 |
| abstract_inverted_index.interpretability | 113, 183 |
| abstract_inverted_index.human-understandable | 30 |
| abstract_inverted_index.\textit{superposition}, | 45 |
| abstract_inverted_index.\textit{polysemanticity}, | 13 |
| abstract_inverted_index.\citep{wang2022interpretability} | 147 |
| cited_by_percentile_year | |
| countries_distinct_count | 0 |
| institutions_distinct_count | 5 |
| sustainable_development_goals[0].id | https://metadata.un.org/sdg/4 |
| sustainable_development_goals[0].score | 0.6399999856948853 |
| sustainable_development_goals[0].display_name | Quality Education |
| citation_normalized_percentile |