Project

CS 499

Natural Language Processing (NLP)

Instructor

Ziyu Yao (ziyuyao [at] gmu [dot] edu)
Office Hours: by appointments (Virtual or in person at ENGR4415).

Requirements

Students are free to complete the final project either individually or in a group (but with no more than 3 students). If working individually, the project can be less ambitious but should not be less complete; If working in a group, all students should contribute equally, and all students will obtain the same grade from the project.

Students are allowed to combine this project with their research or other course projects. However, the project must still involve NLP concepts from this course. If the students are not sure whether their other projects could be combined with this course project, they should email the instructor for confirmation.
Note that any external resources used in this project must be clearly cited in the reports.

Students should talk to the instructor as earlier as possible if they have questions about the project requirements and evaluation plan.

Project Ideas

Students are encouraged to propose any ideas they find passionate about, as long as the ideas fall into the scope of this course. The key requirement is to work on something creatively using NLP knowledge learned in this class. Here are a few examples for reference:
  • Reproducing a state-of-the-art model: Students can pick one interested NLP task and reproduce one of the state-of-the-art models. Note that the reproducing means to implement the same model from scratch, without using any available source code. The reproduced model should show performance comparable to the reported state-of-the-art results.
  • Improving an existing task: Students can also propose improvement on an existing task. In this case, students should include in their project proposal the specific task, the dataset(s), and a rough plan on the improvement (i.e., what you plan to do and how you expect it will improve the task). The students are allowed to use the available source code of existing models on the same task (as baselines); however, they should still show hands-on implementation, such as revising or adding components to the existing models, or implementing a new model from scratch.
  • Tackling a new NLP task: Students are encouraged to propose new NLP tasks that have never been explored in literature. In this case, students should include in their project proposal the specific task, a clear data plan (i.e., how you plan to collect the training/test data), and a rough plan on the models. Since the students are among the first to explore this task, it is acceptable to construct simpler models using available source code.
  • Building a tool or a demo system with creativity: There has been excellent work building tools or demo systems for performing model debugging, model interpretation, and so on. Students are encouraged to build such a system either from scratch or by leveraging existing frameworks. However, in any cases, the students should show creativity, such as proposing new angles of model debugging/interpretation or applying the exisiting analysis to a new task. Simply running an existing system on an already-been-analyzed task will not give high grade.

Resources

Students may find the following resources useful, although they are encouraged to explore any others.
  • Workshops (e.g., a list of workshops at ACL'22 and EMNLP'21) are places which particularly welcome junior researchers to join and present their research. Workshops are typically about popular or emerging research topics in this field; therefore, skimming through the lists may help students identify interested NLP tasks.

    Some workshops also come with “shared tasks”, which are open challenges with training/test data provided by the organizers, such as the document-grounded dialogue and the fact verification challenge. While some tasks have finished, students interested in these topics could still use the provided dataset and explore more effective models.

  • Research resources about COVID-19: The outbreak of COVID-19 has completely transformed our life. The past years have witnessed tremendous scientific research about COVID-19. In the field of NLP, we have two emergency workshops, one in ACL'20 and one in EMNLP'20. Students are encouraged to check out the posted resources and the existing research along this topic. Does any of them inspire you to do something to help?

Submissions

Throughout the course, students will need to make three submissions about the course project (see Syllabus for due dates):
  • Sometime in the second half of the semester, submitting a project proposal;
  • In around 3/4 of the semester, submitting a project progress report describing the progress and plan;
  • In the end of the semester, submitting a final project report along with the source code implementation. Note that students will also need to present their project in the last class, which is one week before the report submission deadline. This means that by the time of presentation, the students should have completed 90% of the project, and the last week is supposed to be used for report writing, source code clean up, etc.

Project Proposal

The project proposal should include at least the following items:
  • What problem you want to address, and the motivation (especially if it is a new or less-studied problem);
  • What dataset(s) you plan to use; if you plan to collect a new dataset, describe the procedure and the source data;
  • How you plan to pursue this project -- for this, students should include a tentative plan on their expected progress by the time of the progress checking;
  • (Optional, a brief literature study) Has anyone already tried to tackle the task (i.e., the baselines)? Or is there any relevant work which inspires your research? Include a brief literature survey on the same/similar problem/method in your proposal.
A PDF proposal must be turned in through Blackboard by the due date. Please remember to include your team information. There is no other format requirement on the proposal writing. Students are recommended to discuss the project idea with the instructor before the deadline. Feedback will be provided.

Project Progress Checking

To help the instructor track the project progress, students will need to submit a PDF summarizing their project progress. This report could directly expand from the project proposal by adding what the students have already done; this means that students could reuse their project proposal document and gradually add more contents as the project goes.

For example, students working on improving an existing task may add descriptions and results of the baseline models (if using open-sourced code); similarly, students working on a new NLP task may have finished the data collection and could describe the procedure and data statistics in the report. If students are unclear about this requirement, please come to the instructor immediately.

Students are expected to complete their intermediate plan as described in the project proposal. If for any reasons the students are unable to provide intermediate progress, they should include some justification and let the instructor know the case.

Final Project Submission

Same as before, the final project report could expand from the project proposal and the progress report, with all "plans" being replaced by what have been actually done. In addition to what have been included in the project proposal/progress report, the final project report should replace the "plan" by descriptions of:
  • What methods you have proposed, including all technical details;
  • Experimental results, plus analysis and discussion of the results;
  • Possible future work extending from this project.
The final project submission should include a PDF project report, as well as a compressed folder of the student source code. A README document is required in the source code submission, such that the TA/instructor can reproduce your results. Both will be submitted through Blackboard.

Project Evaluation

The course project takes up 40% of the total grade. In general, your project will be graded based on

  • Whether you have conducted creative work using NLP knowledge; this thus says that simply running an existing (open-source) implementation and repeating the same experiments as in prior work will not give you a descent score;
  • Your writing performance, i.e., whether you have clearly describe all items following the submission requirements;
  • Your actual contribution/workload (in terms of both idea novelty and engineering effort); in particular, if you work in a team, then I expect to see reasonably more outcome compared with students who work individually.

Specifically, each submission will be scored and evaluated:

  • Project proposal: 5%

    • The submission will be evaluated based on whether the proposed idea is appropriate (in terms of topics, novelty, and expected workload divided by the team size).
  • Progress report: 10%

    • The submission will be evaluated based on whether substantial effort following the project proposal has been made by the time of submission.
  • Final submission and presentation: 25%

    • The submission will be evaluated holistically based on the aforementioned three bullet points (creativity with NLP, writing, and actual contribution/workload). This also covers the in-class presentation, which students are graded based on whether they can present their work in a clear and compelling way to the class.
Previous