Assistant Professor at Boston University Computer Science working on interpretability, robustness, and sample efficiency.
Email: λ@bu.edu, where λ=amueller
I am interested in evaluating and improving the robustness of language models. My work spans causal and mechanistic interpretability methods; evaluations of language models inspired by linguistic principles and findings in cognitive science; and the development and analysis of more sample-efficient language models.
I completed by Ph.D. in Computer Science at the Center for Language and Speech Processing at Johns Hopkins University under the supervision of Tal Linzen and Mark Dredze. My dissertation analyzed the behaviors and mechanisms underlying emergent syntactic competencies in neural language models. My Ph.D. studies were supported by a National Science Foundation Graduate Research Fellowship.
Upcoming keynote at INTERPLAY (COLM workshop)
Upcoming keynote at the New England Mechanistic Interpretability Workshop
Paper on position-aware circuit discovery to appear at ACL!
Paper on benchmarking progress in mechanistic interpretability to appear at ICML!
New preprint: how can we improve OOD generalization without access to model weights?
New preprint: when steering with SAE features, it's crucial to distinguish between input features and output features
Three papers at NAACL
Panelist in the ACL mentorship session at NAACL
Attended and gave a talk at the Bellairs Workshop on Causality
Invited talk at Tel Aviv University
Invited talk at the Technion
Presented a paper at the ICML Mech Interp workshop! Counterfactuals are everywhere in mech interp, but they have key issues that will bias our results if we're not careful.
New preprint! NNsight and NDIF are tools for democratizing access to and control over the internals of large foundation models.
New preprint on the benefits of human-scale language modeling
Invited talks at Saarland University and EPFL