Prospective Students

Thanks for your interest in joining our lab! We're recruiting PhD students and postdocs to start at BU in Fall 2025. This page is written to address questions on what our lab works on, the structure of the group, and how to apply. If you have additional questions that are not answered here, please feel free to email me with [Prospective Student] in the subject line.

Applying

Prospective PhD students: If you are interested in joining our lab and are not currently a BU student, please apply to the Boston University Computer Science PhD program by December 15. I will review all applications that mention my name. Please do not email me with your application materials. During admissions, if I see a good fit, I will contact you for an interview.

Prospective postdocs: Please contact me with your CV, a description of your research interests, and any outside funding you plan to apply for. Ideally, you should reach out about a year before you'd like to start. This may seem early, but many fellowships have very long timelines.

Current BU PhD students who are not my advisees: Email me and we can discuss!

Current BU master's students and undergraduates: Feel free to reach out, but note that I have limited availability for MS and undergraduate supervision. I will likely only agree to work with students who have performed well in at least one of my advanced courses, unless you already have relevant prior experience.

What does our lab work on?

Broadly, our lab's areas of research are natural language processing (NLP), computational linguistics, interpretability, and evaluation. Our main aim is to design methods, datasets, and theoretical frameworks that (i) reveal how language is learned, understood, produced, and used in natural language systems (including the mind); (ii) allow us to decipher and precisely edit/control the causal mechanisms underlying these capabilities in language models; and (iii) enable more efficient and robust language modeling.

We value linguistics and cognitive science expertise as much as machine learning expertise. Currently, our methods focus primarily on neural networks, but precedent suggests that this could quickly change. Regardless of the methods used to model it, our lab will always work with language data.

Below is a summary of the directions we are currently pursuing.

Interpretability. Language models can accomplish amazing things, but they also often fail at surprisingly simple tasks. Can we understand and predict how and why language models will generalize in particular ways? What causal mechanisms underlie their behaviors, and can we edit these to improve generalization?
- Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller. "Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models."
- Matthew Finlayson*, Aaron Mueller*, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov. "Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models." ACL 2021.
- Eric Todd, Millicent Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau. "Function Vectors in Large Language Models." ICLR 2024.

Evaluation. What are language models capable of? Do they process or produce language in human-like ways? What should we even be measuring? None of these questions are settled, but answers to them have significant implications for the kinds of work we should be pursuing.
- Aaron Mueller, Albert Webson, Jackson Petty, Tal Linzen. "In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax." NAACL 2024.
- Aaron Mueller, Robert Frank, Tal Linzen, Luheng Wang, Sebastian Schuster. "Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models." Findings of ACL 2022.
- Aaron Mueller, Garrett Nicolai, Panayiota Petrou-Zeniou, Natalia Talmina, Tal Linzen. "Cross-Linguistic Syntactic Evaluation of Word Prediction Models." ACL 2020.

Sample efficiency. By 13 years old, a human acquires the ability to robustly understand and produce language after being exposed to less than 100 million words. Conversely, state-of-the-art language models are exposed to billions to trillions of words—far more than a human would hear or read in their lifetime! How can we improve language models given a more human-like amount of linguistic data? What kinds of signals and methods will be required for this, and can cognitive science/linguistics inspire better methods? Building efficient systems has many practical and scientific advantages, including the following:
- Learnability. What kinds of data are required for a particular phenomenon to be learned? We can empirically test this if we ensure that our datasets are cognitively plausible.
- Accessibility. If less data is required to train better systems, it becomes faster and easier to iterate on language modeling methods and architectures. It also reduces the financial and computational opportunity cost of training a good language model, which enables more diverse research directions.
- Alex Warstadt*, Aaron Mueller*, Leshem Choshen, Ethan Wilcox, Chengxu Zhuang, Juan Ciro, Rafael Mosquera, Bhargavi Paranjabe, Adina Williams, Tal Linzen, Ryan Cotterell. "Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora." CoNLL 2023.
- Aaron Mueller, Tal Linzen. "How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases." ACL 2023.

Causality. When we give explanations of how or why certain systems (whether computational or human) behave in certain ways, we want our explanations to be causally efficacious. That is, we want to capture the true graph of causes and effects, rather than merely capturing commonly co-occurring events that do not actually explain the behavior. But what do we mean by cause and effect? How can we apply ideas from the causality literature to better analyze and understand language models? Can we automate the process of causally explaining language model behaviors?
- Aaron Mueller. "Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks." ICML 2024 Workshop on Mechanistic Interpretability (Honorable Mention – Best Paper).
- Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov. "The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability."

Advising philosophy

The scientific process is highly non-linear. There are many solutions to most problems, and many dead ends and challenging obstacles along the way. My role is to guide and unblock you at each step of the process, and to help you learn how to independently conduct research. I will be more hands-on earlier in your career, but with the ultimate goal of guiding you toward the ability to devise and pursue your own ideas.

Goals

The purpose of a PhD program is to train new scientists, to generate new knowledge, and to help one learn the cultural and scientific norms of a research community. By the end of a PhD, one should have the ability to (i) devise deep but well-scoped research questions, (ii) design and be able to implement well-controlled experiments, (iii) lead a focused project to completion, and (iv) present one's ideas and work effectively, both in writing and orally. At a higher level, one should be able to independently design and pursue one's own research agenda.

Group structure

On average, I expect the group to consist of about five to eight PhD students and zero to two postdocs. This should be composed of a mixture of researchers with diverse interests and expertise, from cognitive science and linguistics to deep learning and mathematics. This keeps the group small enough such that everyone knows what everyone else is doing and has regular interaction with each other. This is important, as peer mentoring can often be just as (if not more) effective than advisor-student mentoring! It also keeps the group large enough that many influences and priorities are always informing the direction of our work.

Interaction

I want to play an active role in my students' research projects. I plan to meet one-on-one with each of my students at least once a week. These meetings can consist of anything from project planning and technical discussion to career planning and general life check-ins. This is in addition to project-specific meetings. We will hold a weekly formal lab meeting.

Work-life balance

Work-life balance is essential for one's intellectual and physical health. We encourage a culture of pursuing hobbies outside of Boston University. We also plan to do at least one social lab outing each semester. (Of course, the pace and amount of work necessary for successful research can vary widely from week to week! But on average, I believe that a healthy balance will lead to better long-term outcomes.)

Aaron Mueller