Chen Agassy
YOU?
Author Swipe
View article: Enhancing Automated Interpretability with Output-Centric Feature Descriptions
Enhancing Automated Interpretability with Output-Centric Feature Descriptions Open
Automated interpretability pipelines generate natural language descriptions for the concepts represented by features in large language models (LLMs), such as plants or the first word in a sentence. These descriptions are derived using inpu…