Graham Neubig

Associate Professor Carnegie Mellon University

  • Pittsburgh PA

Graham Neubig's research is concerned with language and its role in human communication.

Contact

Carnegie Mellon University

View more experts managed by Carnegie Mellon University

Biography

Graham Neubig's research is concerned with language and its role in human communication. In particular, his long-term research goal is to break down barriers in human-human or human-machine communication through the development of natural language processing (NLP) technologies. This includes the development of technology for machine translation, which helps break down barriers in communication for people who speak different languages, and natural language understanding, which helps computers understand and respond to human language. Within this overall goal of breaking down barriers to human communication, I have focused on several aspects of language that both make it interesting as a scientific subject, and hold potential for the construction of practical systems.

Areas of Expertise

Machine Learning
Natural Language Processing
Machine Translation
Spoken Language Processing

Media Appearances

AI isn't ready to do your job

Business Insider  online

2025-04-22

AI Agents aren't ready to do your job. Researchers at CMU staffed a fake company with AI agents, and the results were disastrous. "While agents may be used to accelerate some portion of the tasks that human workers are doing, they are likely not a replacement for all tasks at the moment," said Graham Neubig (School of Computer Science).

View More

Where DeepL Beats ChatGPT in Machine Translation with Graham Neubig

Slator  online

2023-07-14

In this week’s SlatorPod, we are joined by Graham Neubig, Associate Professor of Computer Science at Carnegie Mellon University, to discuss his research on multilingual natural language processing (NLP) and machine translation (MT).

View More

Angry Bing chatbot just mimicking humans, say experts

ARY News  online

2023-02-18

“I think this is basically mimicking conversations that it’s seen online,” said Graham Neubig, an associate professor at Carnegie Mellon University’s language technologies institute.

View More

Show All +

Social

Industry Expertise

Education/Learning
Research

Education

University of Illinois Urbana-Champaign

B.S.

Computer Science

2005

Kyoto University

M.S.

Informatics

2010

Kyoto University

Ph.D.

Informatics

2012

Articles

Can we automate scientific reviewing?

Journal of Artificial Intelligence Research

2022

The rapid development of science and technology has been accompanied by an exponential growth in peer-reviewed scientific publications. At the same time, the review of each paper is a laborious process that must be carried out by subject matter experts. Thus, providing high-quality reviews of this growing number of papers is a significant challenge. In this work, we ask the question “can we automate scientific reviewing?”, discussing the possibility of using natural language processing (NLP) models to generate peer reviews for scientific papers. Because it is non-trivial to define what a “good” review is in the first place, we first discuss possible evaluation metrics that could be used to judge success in this task.

View more

AmericasNLI: Machine translation and natural language inference systems for Indigenous languages of the Americas

Frontiers in Artificial Intelligence

2022

Little attention has been paid to the development of human language technology for truly low-resource languages—i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas.

View more

DIRE and its data: Neural decompiled variable renamings with respect to software class

ACM Transactions on Software Engineering and Methodology

2023

The decompiler is one of the most common tools for examining executable binaries without the corresponding source code. It transforms binaries into high-level code, reversing the compilation process. Unfortunately, decompiler output is far from readable because the decompilation process is often incomplete. State-of-the-art techniques use machine learning to predict missing information like variable names. While these approaches are often able to suggest good variable names in context, no existing work examines how the selection of training data influences these machine learning models. We investigate how data provenance and the quality of training data affect performance, and how well, if at all, trained models generalize across software domains. We focus on the variable renaming problem using one such machine learning model, DIRE.

View more

Show All +