Natural language processing (NLP) aims to enable computers to use human languages – so that people can, for example, interact with computers naturally; or communicate with people who don’t speak a common language; or manipulate speech or text data at scales not otherwise possible. The NLP group at George Mason Computer Science is interested in all aspects of NLP, with a focus on building tools for under-served languages, and constructing natural language interfaces that can reliably assist humans in knowledge acquisition and task completion.
We are currently working on multilingual models, on building Machine Translation robust to L2-language variations, on NLP for documentation of endangered languages, on exploring the interplay between language and code, on constructing interactive natural language interfaces, and on improving the efficiency of NLP models.
Our research is/has been supported by the following organizations/companies:
We study building NLP/AI models with limited supervision, especially for low-resource domains (e.g., healthcare).
We explore how machine learning systems can interact with humans effectively. This includes being able to converse with humans through dialogues, as well as proactively collaborate with and learn from humans during decision making.
We seek to build natural language interfaces that allow humans to communicate with computers/machines easily. This requires modeling natural language, programming language, and their interplay. Applications of this research include semantic parsing and general-purpose code generation.
This NEH-funded project focuses on the development of modern Optical Character Recognition (OCR) and post-correction tools tailored for Indigenous Latin American Languages.
Advances in natural language processing (NLP) technology now make it possible to perform many tasks through natural language or over natural language data – automatic systems can answer questions, perform web search, or command our computers to perform specific tasks.
Most languages of the world are “oral”: they are not traditionally written and even if an alphabet exists, the community doesn’t usually use it. Hence, building NLP systems that can directly operate on speech input is paramount.
Human language is marked by considerable diversity around the world, and the surface form of languages varies substantially. Morphology describes the way through which different word forms arise from lexemes. Computational morphology attempts to reproduce this process across languages, or uses machine learning models to model/discover the morphophonological processes that exist in a language.
NLP systems are typically trained and evaluated in “clean” settings, over data without significant noise. However, systems deployed in the real world need to deal with vast amounts of noise. At GMU NLP we work towards making NLP systems more robust to several types of noise (adversarial or naturally occuring).
Language Documentation aims at producing a permanent record that describes a language as used by its language community by producing a formal grammatical description along with a lexicon. Our group works on integrating NLP systems into the documentation workflow, aiming to speed-up the process and help the work of field linguists and language communities.
Machine Translation is the task of translating between human languages using computers. Starting from simple word-for-word rule-based system in 1950s, we now have large multilingual neural models that can learn translate between dozens of languages.
An exciting research direction that we pursue at GMU NLP is building multi-lingual and polyglot systems. The languages of the world often share similar characteristics, and training systems cross-lingually allows us to leverage these similarities and overcome data scarcity issues.
Browse all blog posts
Computational Linguistics, Machine Translation, Speech Recognition, NLP for Endangered Languages
Human-AI Interaction, Language and Code, Efficient NLP/AI
Computational linguistics, Natural language processing, Machine learning
Natural Language Processing, Machine Learning, Computer Vision, Common Sense Reasoning
Natural Language Processing, Fairness in AI, Multilingual NLP, Machine Learning, Deep Learning
Natural Language Processing, Machine Translation, Machine Learning, Data Mining
Natural Language Processing, Machine Learning, Computer Vision
Language Processing for Bemba