Syllabus
CS 499
Natural Language Processing (NLP)
Instructor
Ziyu Yao (ziyuyao [at] gmu [dot] edu)Office Hours: by appointments (Virtual or in person at ENGR4415).
Teaching Assistant
Wenjie Xi (wxi [at] gmu [dot] edu)Office Hours: Tue 1-2pm & 3-4pm, ENGR 4456
Meets
Monday and Wednesday, 3:00 to 4:15 PM, David King Jr. Hall 2053.Safe Return to Campus: Students are expected to follow the university's Safe-Return-to-Campus Policy (including mask wearing, daily health check, etc.) for attending any classes. Please check out the policy before coming to the campus and the classroom. Note that students who choose not to abide by these expectations will be referred to the Office of Student Conduct for failure to comply.
Course Web Page
https://nlp.cs.gmu.edu/course/cs499-spring22/.We will use Blackboard for course materials/assignments/grading, and Piazza for Q&A (sign up link here).
Course Description
Massive amounts of information in our daily life are expressed in natural language. In this class, we will study building computing systems that can process, understand, and communicate in natural language. This field is called natural language processing, or NLP. This class will focus on introducing foundamental concepts in NLP, and will cover techniques and necessary programming skills for building machine learning/deep learning-based NLP models. In the last several classes, we will further study cutting-edge research problems in NLP, including text generation, question answering, neural network interpretation, interactive learning, multilingual NLP, and so on.Prerequisites
CS310 (Data Structure), CS330 (Formal Methods and Models), and proficiency in Python programming. Please contact the instructor if you have questions about the necessary background.Class Format
The class will be in-person. Each class will take the following format:- Reading: Before the class, you will be pointed to some reading materials (see "Reading Materials" in the course schedule) . Reading is not required but highly recommended to do before the class.
- Summary/Elaboration/Q&A: In the class, the instructor will summarize important points from the reading material, elaborating on details that were not included in the reading while fielding any questions. New material on cutting-edge methods, or a deep look into one salient method will also be covered.
- In-class Coding Exercise/Quiz: In some classes, the instructor will provide coding templates, and the students will need complete coding exercises in class. Sometimes this will be a quiz with a couple of questions. The purpose of the exercises/quizzes is to help students get familiar with concepts covered in class and/or kick start their assignments. Student performance will be graded; therefore, students are suggested to attend class on time. The use of laptop computers is required for classes having coding exercises.
- Presentation: In the last class, students need present their final projects in class.
Grading
There will be no midterm or final exam. Your final grade will be dependent on:- In-class Coding Exercise/Quiz: 20%.
- Assignments: 40%. There will be four assignments in total. Each assignment will be an independent "small" coding project. In the project, the instructor will provide a code template and students will be instructed to complete the project.
- Final Project and Presentation: 40%. This includes (1) submissing a project proposal in the middle of the semester; (2) extending from your proposal, submitting an intermediate project report describing your progress and plan; (3) submitting the final project report with source code in the end of the semester; and (4) presenting your project in the last class. See details and instructions following the link.
Your final letter grade will be given based on (depending on class performance, the instructors may shift these boundaries down to raise students' grades.):
Letter Grade | Points (out of 100) |
---|---|
A+ | 100+ (w/ extra credits) |
A | 95-100 |
A- | 90-94 |
B+ | 85-89 |
B | 82-84 |
B- | 78-81 |
C+ | 74-77 |
C | 72-73 |
C- | 70-71 |
D | 60-69 |
F | 0-59 |
Late Day Policy for Assignments:
In case there are unforeseen circumstances that don’t let you turn in your assignment on time, 4 late days total over the four assignments will be allowed. Notes:
(1) Late days cannot be applied to the final project;
(2) The last two assignments are harder than the others, so it’d be a good idea to try to save your late days for them if possible;
(3) The late days cannot be used fractionally, e.g., submitting the assignment 1 hour late will incur 1 late day.
Assignments that are late beyond the allowed late days will be graded down by 5% per day.
In the case of a serious illness or other excused absence, as defined by university policies (including providing necessary evidence), coursework submissions will be accepted late by the same number of days as the excused absence.
Class Attendance Policy:
As we will have coding exercises/quizzes for some classes, attendence is highly suggested. However, in the case of a serious illness or other excused absence, as defined by university policies (including providing necessary evidence), students will be excused and the exercise/quiz grade will be dropped. I expect such cases to be relatively rare, and if you’ll be away for more than 2 classes over the semester, please consult in advance.
Readings
Students should be able to understand the course content just by following the lecture and by doing the readings. However, the following textbooks serve as good references.- Jurafsky and Martin, Speech and Language Processing, 3rd edition [dec302020 version (used in class)] [latest version] (Referred to as "JM");
- Jacob Eisenstein, Natural Language Processing [online] (Referred to as "Eisenstein").
Tentative Schedule
# | Date | Topic | Reading Materials | Assignments |
---|---|---|---|---|
1 | 01/24 | Introduction and Class Outline | ||
2 | 01/26 | Working with Text in Python | JM Ch2 | |
3 | 01/31 | N-gram Language Models | JM Ch3.1-3.4 | Assignment 1 Out |
4 | 02/02 | Classification 1 | JM Ch4.7-4.9, Ch11.5 | |
5 | 02/07 | Classification 2 | JM Ch5.1-5.5, Ch4.1-4.4, Eisenstein Ch2.3-2.4, 2.6 | |
6 | 02/09 | Classification 3 | JM Ch5.6 | |
7 | 02/14 | Neural 1: Feedforward Neural Networks | JM Ch7.1-7.4
Blog by Michael Nielsen, DL book. PyTorch basics |
Assignment 1 Due Assignment 2 Out |
8 | 02/16 | Neural 2: Word Embeddings | JM Ch6; Mikolov et al., 2013a&b | |
9 | 02/21 | Neural 3: RNN-based Neural Language Models | JM Ch9-9.3; "understand LSTM" blog by Olah; "gradient vanishing" blog by Nielsen; Karpathy et al. 2015 | |
10 | 02/23 | Sequence 1: POS tagging, HMMs | JM Ch8.1-8.2, 8.4 | |
11 | 02/28 | Sequence 2: NER, CRFs | JM Ch8.5, Eisenstein Ch7.5.3 | Assignment 2 Due Assignment 3 Out |
12 | 03/02 | Parsing 1: Dependency Parsing | JM Ch14 | |
13 | 03/07 | Parsing 2: Constituency Parsing | JM Ch12.1-12.5, 13.1-13.2, 13.4 | |
14 | 03/09 | Assignment Q&A | ||
15 | 03/14 | Spring Recess - No Class | ||
16 | 03/16 | Spring Recess - No Class | ||
17 | 03/21 | Parsing 2 (cont'): Constituency Parsing | Eisenstein Ch12-12.3, JM Ch15-15.3, JM Ch12.6.1 | Assignment 4 Out |
18 | 03/23 | Parsing 3: Semantic Parsing | Eisenstein Ch13 | Assignment 3 Due |
19 | 03/28 | Machine Translation | Eisenstein 18.1-18.2 | |
20 | 03/30 | Neural 4: Seq2Seq & Attention, Transformers | Eisenstein 18.3.1; Attention-based NMT; Transformer paper and Alammar's blog |
|
21 | 04/04 | Contextual Representations and Pre-training | Peters et al., 2018 (ELMo); Devlin et al., 2019 (BERT); OpenAI GPT2 | Project Proposal Due |
22 | 04/06 | Contextual Representations and Pre-training 2 | Assignment 4 Due | |
23 | 04/11 | Text Generation | JM Ch24; See et al., 2017 | |
24 | 04/13 | Question Answering 1 | JM Ch23; ACL20 tutorial | |
25 | 04/18 | Question Answering 2 | Project Progress Due | |
26 | 04/20 | Interpreting NNs | ||
27 | 04/25 | Assignment Q&A | ||
28 | 04/27 | Multilingual NLP & Ethics | ||
29 | 05/02 | Misc. & Wrap-up | ||
30 | 05/04 | Final Project Presentation | Final Project Due on 05/09 |