Publications
Please see
my Google Scholar page for an up-to-date list.
Preprints & In Submission
-
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability
Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov.
arXiv preprint. [paper]
-
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, Yonatan Belinkov.
arXiv preprint. [paper]
-
How to Improve the Robustness of Closed-source Models on NLI
Joe Stacey, Lisa Alazraki, Aran Ubhi, Beyza Ermis, Aaron Mueller, Marek Rei.
arXiv preprint. [paper]
Peer-reviewed Articles
-
Bigger is not always better: The importance of human-scale language modeling for psycholinguistics
Ethan Gotlieb Wilcox, Michael Hu, Aaron Mueller, Tal Linzen, Alex Warstadt, Leshem Choshen, Chengxu Zhuang, Ryan Cotterell, Adina Williams. Journal of Memory and Language (JML). [paper]
-
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller*, Atticus Geiger*, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov. International Conference on Machine Learning (ICML). [website] [paper] [code] [data] [leaderboard]
-
NNsight and NDIF: Democratizing Access to Foundation Model Internals
Jaden Fiotto-Kaufman, Alexander R Loftus, Eric Todd, Jannik Brinkmann, Caden Juang, Koyena Pal, Can Rager, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Michael Ripa, Adam Belfki, Nikhil Prakash, Sumeet Multani, Carla Brodley, Arjun Guha, Jonathan Bell, Byron Wallace, David Bau. International Conference on Learning Representations (ICLR). [paper] [website] [source]
-
Inverse Scaling: When Bigger Isn't Better (Featured Paper)
Ian R. McKenzie, Alexander Lyzhov, Michael Martin Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Xudong Shen, Joe Cavanagh, Andrew George Gritsevskiy, Derik Kauffman, Aaron T. Kirtland, Zhengping Zhou, Yuhui Zhang, Sicong Huang, Daniel Wurgaft, Max Weiss,
Alexis Ross, Gabriel Recchia, Alisa Liu, Jiacheng Liu, Tom Tseng, Tomasz Korbak, Najoung Kim, Samuel R. Bowman, Ethan Perez. Transactions on Machine Learning Research (TMLR). [paper]
-
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman. Association for Computational Linguistics (ACL). [paper]
Proceedings and Other
-
Findings of the Second BabyLM Challenge: Sample-efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu, Aaron Mueller, Candace Ross, Adina Williams, Tal Linzen, Chengxu Zhuang, Ryan Cotterell, Leshem Choshen, Alex Warstadt, Ethan Gotlieb Wilcox. Proceedings of the shared task at the Conference on Computational Natural Language Learning (CoNLL). [website] [paper]
-
Findings of the BabyLM Challenge: Sample-efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt*, Aaron Mueller*, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjabe, Adina Williams, Tal Linzen, Ryan Cotterell. Proceedings of the shared task at the Conference on Computational Natural Language Learning (CoNLL). [website] [paper]