Projects

Dataset Maps

As language technologies become more ubiquitous, there are increasing efforts towards expanding the language diversity and coverage of natural language processing (NLP) systems. Arguably, the most important factor influencing the quality of modern NLP systems is data availability.

Efficient NLP/AI

We study building NLP/AI models with limited supervision, especially for low-resource domains (e.g., healthcare).

Human-AI Interaction

We explore how machine learning systems can interact with humans effectively. This includes being able to converse with humans through dialogues, as well as proactively collaborate with and learn from humans during decision making.

Information Intelligence

We explore computational approaches for information intelligence tasks such as Question Answering, Information Extraction, etc.

Language and Code

We seek to build natural language interfaces that allow humans to communicate with computers/machines easily. This requires modeling natural language, programming language, and their interplay. Applications of this research include semantic parsing and general-purpose code generation.

Language Models

Language model-relevant research including prompt engineering, LLM reasoning, LLM interpretability, applications of LLMs for other disciplines, etc.

OCR

This NEH-funded project focuses on the development of modern Optical Character Recognition (OCR) and post-correction tools tailored for Indigenous Latin American Languages.

Fairness

Advances in natural language processing (NLP) technology now make it possible to perform many tasks through natural language or over natural language data – automatic systems can answer questions, perform web search, or command our computers to perform specific tasks.

Speech

Most languages of the world are “oral”: they are not traditionally written and even if an alphabet exists, the community doesn’t usually use it. Hence, building NLP systems that can directly operate on speech input is paramount.

Morphology

Human language is marked by considerable diversity around the world, and the surface form of languages varies substantially. Morphology describes the way through which different word forms arise from lexemes. Computational morphology attempts to reproduce this process across languages, or uses machine learning models to model/discover the morphophonological processes that exist in a language.