Publications

Please see my Google Scholar page for an up-to-date list.

Preprints & In Submission

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov. arXiv preprint. [paper]

SAEs Are Good for Steering – If You Select the Right Features
Dana Arad, Aaron Mueller, Yonatan Belinkov. arXiv preprint. [paper] [code]

How to Improve the Robustness of Closed-source Models on NLI
Joe Stacey, Lisa Alazraki, Aran Ubhi, Beyza Ermis, Aaron Mueller, Marek Rei. arXiv preprint. [paper]

Peer-reviewed Articles

2025

Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
Ethan Gotlieb Wilcox, Michael Hu, Aaron Mueller, Tal Linzen, Alex Warstadt, Leshem Choshen, Chengxu Zhuang, Ryan Cotterell, Adina Williams. Journal of Memory and Language (JML). [paper]

Position-aware Automatic Circuit Discovery
Aaron Mueller*, Atticus Geiger*, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov. To appear at the International Conference on Machine Learning (ICML). [website] [paper] [code] [data] [leaderboard]

Position-aware Automatic Circuit Discovery
Tal Haklay, Hadas Orgad, Aaron Mueller, Yonatan Belinkov. To appear at the Association for Computational Linguistics. [paper] [code]

Characterizing the Role of Similarity in the Property Inferences of Language Models
Juan Diego Rodriguez, Aaron Mueller, Kanishka Misra. Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics. [paper] [code]

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages
Jannik Brinkmann, Chris Wendler, Christian Bartelt, Aaron Mueller. Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics. [paper] [code]

Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models
Michael Hanna*, Aaron Mueller*. Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics. [paper] [code]

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller. International Conference on Learning Representations. [paper] [code]

NNsight and NDIF: Democratizing Access to Foundation Model Internals
Jaden Fiotto-Kaufman, Alexander R Loftus, Eric Todd, Jannik Brinkmann, Caden Juang, Koyena Pal, Can Rager, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Michael Ripa, Adam Belfki, Nikhil Prakash, Sumeet Multani, Carla Brodley, Arjun Guha, Jonathan Bell, Byron Wallace, David Bau. International Conference on Learning Representations. [paper] [website] [source]

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Yaniv Nikankin, Anja Reusch, Aaron Mueller, Yonatan Belinkov. International Conference on Learning Representations. [paper] [code]

2024

Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks (Honorable Mention for Top Paper)
Aaron Mueller. Mechanistic Interpretability Workshop at the 2024 International Conference on Machine Learning. [paper]

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models
Aruna Sankaranarayanan, Dylan Hadfield-Menell, Aaron Mueller. Workshop on LLMs and Cognition at the 2024 International Conference on Machine Learning. [paper]

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
Aaron Mueller, Albert Webson, Jackson Petty, Tal Linzen. North American Chapter of the Association for Computational Linguistics (NAACL). [paper] [code]

Function Vectors in Large Language Models
Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau. International Conference on Learning Representations (ICLR). [website] [paper] [code]

2023

How to Plant Trees 🌳 in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases
Aaron Mueller, Tal Linzen. Association for Computational Linguistics (ACL). [paper] [code]

Meta-learning with Demonstration Retrieval for Efficient Few-shot Learning
Aaron Mueller, Kanika Narang, Lambert Mathias, Qifan Wang, Hamed Firooz. Findings of the Association for Computational Linguistics (ACL). [paper]

Language Model Acceptability Judgments Are Not Always Robust to Context (Outstanding Paper Award)
Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren Fuentes, Roger Levy, Adina Williams. Association for Computational Linguistics (ACL). [paper]

Inverse Scaling: When Bigger Isn't Better (Featured Paper)
Ian R. McKenzie, Alexander Lyzhov, Michael Martin Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Xudong Shen, Joe Cavanagh, Andrew George Gritsevskiy, Derik Kauffman, Aaron T. Kirtland, Zhengping Zhou, Yuhui Zhang, Sicong Huang, Daniel Wurgaft, Max Weiss, Alexis Ross, Gabriel Recchia, Alisa Liu, Jiacheng Liu, Tom Tseng, Tomasz Korbak, Najoung Kim, Samuel R. Bowman, Ethan Perez. Transactions on Machine Learning Research (TMLR). [paper]

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman. Association for Computational Linguistics (ACL). [paper]

2022

Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models
Aaron Mueller, Robert Frank, Tal Linzen, Luheng Wang, Sebastian Schuster. Association for Computational Linguistics (ACL). [paper] [code]

Label Semantic Aware Pre-training for Few-shot Text Classification
Aaron Mueller, Jason Krone, Salvatore Romeo, Saab Mansour, Elman Mansimov, Yi Zhang, Dan Roth. Association for Computational Linguistics (ACL). [paper]

Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models
Aaron Mueller, Yu Xia, Tal Linzen. Conference on Computational Natural Language Learning (CoNLL). [paper] [code]

Bernice: A Multilingual Pre-trained Encoder for Twitter
Alexandra DeLucia, Shijie Wu, Aaron Mueller, Carlos Aguirre, Mark Dredze. Empirical Methods in Natural Language Processing (EMNLP). [paper]

2021

Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling
Aaron Mueller, Mark Dredze. North American Chapter of the Association for Computational Linguistics (NAACL). [paper] [poster] [code] [bib]

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
Aaron Mueller*, Matthew Finlayson*, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov. Association for Computational Linguistics (ACL). [paper] [code] [bib]

Decoding Methods for Neural Narrative Generation
Alexandra DeLucia*, Aaron Mueller*, Xiang Lisa Li, João Sedoc. Workshop on Generation Evaluation and Metrics (GEM), at the Association for Computational Linguistics (ACL). [paper] [code] [bib]

Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement
Aaron Mueller, Zach Wood-Doughty, Silvio Amir, Mark Dredze, Alicia Lynn Nobles. Proceedings of the Association for Computing Machinery (ACM) on Human-Computer Interaction (HCI), Vol. 5 (CSCW1). [paper] [bib]

2020

Cross-Linguistic Syntactic Evaluation of Word Prediction Models
Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou, Natalia Talmina, Tal Linzen. Association for Computational Linguistics (ACL). [paper] [code] [bib]

An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages
Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky. Language Resources and Evaluation Conference (LREC). [paper] [bib]

The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration
Arya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post, David Yarowsky. Language Resources and Evaluation Conference (LREC). [paper] [bib]

Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages
Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky. Language Resources and Evaluation Conference (LREC). [paper] [bib]

2019

Quantity Doesn't Buy Quality Syntax with Neural Language Models
Marten van Schijndel, Aaron Mueller, Tal Linzen. Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). [paper] [poster] [bib]

Modeling Color Terminology Across Thousands of Languages
Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David Yarowsky. Empirical Methods in Natural Language Processing & International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). [paper] [bib]

Sentence-Level Adaptation for Low-Resource Neural Machine Translation
Aaron Mueller*, Yash Kumar Lal*. Workshop on Technologies for MT of Low Resource Languages (LoResMT), at Machine Translation Summit (MTSummit). [paper] [bib]

Aaron Mueller

Publications

Preprints & In Submission

Peer-reviewed Articles

2025

2024

2023

2022

2021

2020

2019

Proceedings and Other

2024

2023