Aaron Mueller

Aaron Mueller


Zuckerman postdoctoral fellow working with Yonatan Belinkov and David Bau on interpretability and robustness in language models. Incoming Assistant Professor at Boston University Computer Science (Fall 2025).

Email: λ@bu.edu, where λ=amueller

About

I am interested in evaluating and improving the robustness of language models, and the methods used to understand them. My work spans causal and mechanistic interpretability methods; evaluations of language models inspired by linguistic principles and findings in cognitive science; and building more sample-efficient language models.

I completed by Ph.D. in Computer Science at the Center for Language and Speech Processing at Johns Hopkins University under the supervision of Tal Linzen and Mark Dredze. My dissertation analyzed the behaviors and mechanisms underlying emergent syntactic abilities in neural language models. My Ph.D. studies were supported by a National Science Foundation Graduate Research Fellowship.

I completed my B.S. in Computer Science and B.S. in Linguistics at the University of Kentucky, where I was a Gaines Fellow and Patterson Scholar. My thesis, which focused on neural machine translation for low-resource French dialects, was advised by Ramakanth Kavuluru and Mark Richard Lauersdorf.


News

2025/10

Upcoming keynote at INTERPLAY (COLM workshop)

2025/08

Upcoming keynote at the New England Mechanistic Interpretability Workshop

2025/07

Paper on position-aware circuit discovery to appear at ACL!

2025/07

Paper on benchmarking progress in mechanistic interpretability to appear at ICML!

2025/04

Three papers at ICLR

2025/04

Three papers at NAACL

2025/04

Panelist in the ACL mentorship session at NAACL

2025/02

Attended and gave a talk at the Bellairs Workshop on Causality

2024/12

Invited talk at Tel Aviv University

2024/11

Invited talk at the Technion

2024/07

Presented a paper at the ICML Mech Interp workshop! Counterfactuals are everywhere in mech interp, but they have key issues that will bias our results if we're not careful.

2024/07

New preprint! NNsight and NDIF are tools for democratizing access to and control over the internals of large foundation models.

2024/07

New preprint on the benefits of human-scale language modeling

2024/07

Invited talks at Saarland University and EPFL