How can we get computers to understand and generate human language? This is among the most challenging—and currently, the most quickly advancing—approaches in contemporary artificial intelligence. Natural language systems are deployed in the world in increasingly many forms: chatbots, code assistants, web agents, among others. This course provides an introduction to the engineering and science that underlies current NLP systems.

Prerequisites

Highly recommended prerequisites: Not required, but it will be very useful to have taken a machine learning course before taking this one.

Learning objectives
Students will:

  1. Gain exposure to foundational ideas in NLP.
  2. Understand the theory underlying current NLP ideas.
  3. Learn how to implement each element of the contemporary NLP pipeline.


Logistics



News

Watch this section for homework and project updates!



Course Schedule

Note: we will almost definitely alter this schedule! Order may also change depending on the availability of guests.

Date Topic Homework Readings Notes
Jan 20, 2025 Course introduction
  • What is NLP?
  • Overview of course topics
HW-1 released (ungraded)
Jan 22, 2025 Text classification
  • Logistic regression
  • Gradient descent
  • Features
  • Machine learning basics
HW0 released
Jan 27, 2025 Introduction to language modeling

Tokenization
  • The type-token distinction
  • Feature engineering
Jan 29, 2025 Sequence modeling
  • Review of probability theory
  • N-grams
Feb 3, 2025 Neural sequence modeling I
  • Feed-forward neural networks
  • Backpropagation
  • Embeddings
HW0 due
HW1 released
Feb 5, 2025 Neural sequence modeling II
  • Recurrent neural networks
  • LSTMs
Feb 10, 2025 Attention
  • Transformers
  • Parallel processing
Feb 12, 2025 Large language models I
  • Pre-training
  • Autoregressive language models: GPT
HW1 due
HW2 released
Feb 17, 2025 Large language models II
  • Masked language models: BERT
  • Sequence-to-sequence models: T5
Feb 19, 2025 Evaluating language models
Feb 24, 2025 NLP tasks I
  • Prompting
  • In-context learning
Feb 26, 2025 NLP tasks II
  • Fine-tuning
  • Low-rank adapters
Mar 3, 2025 Post-training I
  • Intro to reinforcement learning
  • Reinforcement learning from human feedback (RLHF)
HW2 due
Final project proposal released
Mar 5, 2025 Post-training II
  • Direct preference optimization (DPO)
  • Reinforcement learning with verifiable rewards (RLVR)
Mar 10, 2025

Spring break - no class

Mar 12, 2025

Spring break - no class

Mar 17, 2025 Morphology and syntax I
  • Intro to morphology and syntax
  • POS tagging
  • Hidden Markov models (HMMs)
HW3 released
Mar 19, 2025 Morphology and syntax II
  • (Probabilistic) context-free grammars
  • Constituency parsing
  • CKY
Mar 24, 2025 Morphology and syntax III
  • Dependency parsing
  • Shift-reduce
Mar 26, 2025
  • Semantics
    • Models of meaning
    • Semantic role labeling
    • Semantic parsing
    HW3 due
    Mar 31, 2025 Exam
    Apr 2, 2025
    • Review of exam
    • Discourse and pragmatics
      • The structure of conversation
      • Coreference resolution
    Final project proposal due
    Apr 7, 2025 NLP tasks I: Classification
    • Paraphrase detection
    • Natural language inference
    • Multiple-choice question answering
    Apr 9, 2025 NLP tasks II: Generation
    • Machine translation
    • Open-ended question answering
    Apr 14, 2025 Multilingual NLP
    • Multilingual pre-training
    • Low-resource languages
    Apr 16, 2025 Interpretability and evaluation I
    • Best practices in evaluation
    • Out-of-distribution generalization
    Apr 21, 2025 Interpretability and evaluation II
    • (Mechanistic) interpretability
    • Model editing
    Midway report due
    Apr 23, 2025 Retrieval and tool use
    Apr 28, 2025 Guest lecture
    Apr 30, 2025 Final project help session
    TBD Final report due



    List of Topics

    By the end of this course, you should be familiar with each of the following topics. Items with an asterisk* may be on the exam.

    Grading

    The course is graded out of 100 total points.

    Homeworks: 15 points

    The homeworks are largely for your benefit as study tools. You may use AI in any way you wish to complete the homeworks. Regardless of whether you decide to use AI tools, you are fully responsible for what you submit.


    Exam: 35 points

    There will be one exam about 2/3 of the way through the course. See the list of topics above for a guide to what the exam will cover. You may not use any electronic resources for the exam; this includes the textbook, AI tools, the internet, text messages, among other items.
    This will be an open-note exam! If you bring notes, they must be on one physical piece of paper. Electronic notes will not be allowed.


    Final project: 50 points

    This is an open-ended project where you will review and pursue an NLP topic of your choosing.


    Grading of the final project will be based on the following:

    Can we publish our final project? It is feasible to convert a course project into an academic publication, but it can take a lot of work! I encourage those interested to discuss this with me at the end of the semester.


    Policies and Conduct


    Outside Resources & AI Policy

    AI tools are completely allowed for the homeworks. I recommend doing the assignments on your own as exam preparation, but for the purpose of grading, you can complete the assignments completely with AI if you so choose. It is the student's responsibility to verify any submitted content.

    AI tools are allowed for the final project. Our policy here is more nuanced: you may use AI as a tool, but do not use AI as a crutch or replacement for thinking. What's the difference? AI as a tool includes: AI as a crutch/replacement includes: The line between tool and crutch can be fuzzy, so if you're unsure, I recommend asking ahead of time! I promise not to judge if you ask before you turn in the assignment. :)

    No AI tools are allowed during exams. These will be hand-written in class.

    I strongly encourage you to use any outside source at your disposal when doing the homework and your final project. Your reports and code should be original, but you may take inspiration from existing papers as long as you give them proper credit. When doing your project, feel free to base your implementations on publicly available code as well (as long as you make significant modifications to accommodate your original idea), but be sure to give proper credit in your report and your GitHub README if you do so.

    For the final project, failing to properly cite an outside source is equivalent to taking credit for ideas that are not your own, which is plagiarism.


    Academic Integrity

    Read through BU's Academic Conduct Code. All students are expected to abide by these guidelines. In the context of this class, it's particularly important that you cite the source of your ideas, facts, and/or methods, and do not claim someone else's work as your own.


    Absence and Late Work Policy

    Attendance will not be taken. Attend lectures as you wish.

    You have 6 free late days that you can use however you wish with no excuse necessary. Using a late day means that you can still receive full credit for the assignment with no late penalty. Turning in an assignment late after using all your late days will incur a 10% drop in the score for each late day. 5 days after an assignment's due date, the assignment can no longer be turned in, regardless of whether you use your free late days. This applies to all homeworks. It also applies to the proposal and midway report for the final project—but not the final report, which must be turned in on time. Note that a late day is a step function: turning in a homework 5 minutes late is equivalent to turning it in 23 hours late, so if you know you'll be late, we recommend taking the extra time to verify your understanding of the material.

    Extensions can be negotiated in cases of medical emergency or other sudden pressing circumstances. Students should contact the course staff ASAP and negotiate before the assignment's original due date. If this applies to the first homework, please come talk to us the first day of class.

    Exams cannot easily be made up. If you know you cannot make an exam day, you must notify us at least 14 days in advance so that we can make alternate arrangements.


    Accommodations

    Boston University's policy is to provide reasonable accommodations to students with qualifying disabilities who are enrolled in Boston University courses. Students seeking accommodations must engage in an interactive process with, and provide appropriate documentation of their disability to, Disability & Access Services (DAS). If this applies, please get in touch with me as soon as possible to discuss accommodations; note that students are not required to disclose information regarding their disability, if applicable, but should request approval for such accommodations through DAS beforehand.


    Religious Observance

    Students are permitted to be absent from class, including classes involving examinations, labs, excursions, and other special events, for purposes of religious observance. In-class, take-home and lab assignments, and other work shall be made up in consultation with the student's instructors. More details on BU's religious observance policy are available here.