Eric Nyberg

Professor Carnegie Mellon University

  • Pittsburgh PA

Eric Nyberg builds software applications that can understand and process human language.

Contact

Carnegie Mellon University

View more experts managed by Carnegie Mellon University

Biography

Noted for his contributions to the fields of automatic text translation, information retrieval and automatic question answering, Eric Nyberg builds software applications that can understand and process human language. For the past decade, he has worked on question-answering technology, often in collaboration with colleagues at IBM. Since 2007, he and his CMU colleagues have participated in the Open Advancement of Question Answering, a collaboration with IBM that led to the development of Watson, a question answering computing system that defeated human opponents in nationally televised matches of Jeopardy. He currently directs the Master of Computational Data Science (MCDS) program. He is also co-founder and chief data scientist at Cognistx and serves on the Scientific Advisory Board for Fairhair.ai.

Areas of Expertise

Automatic Text Translation
Processing Human Language
Automated Qustion Answering
IBM Watson
Information Retrieval
Articifical Intelligence

Media Appearances

Meltwater acquires Algo, an AI-based news and data tracker

TechCrunch  online

2017-08-29

Michelsen is also not the only notable name in Algo’s pedigree: the company’s tech was partly developed by Eric Nyberg, a natural language pioneer and veteran who played a big role in the development of Watson at IBM, and is now the lead of the Language Technology Institute at Carnegie Mellon University. Nyberg is an advisor to Algo.

View More

Super-Smart Retail, Coming Soon To A Device Near You

Forbes  online

2015-08-04

One of the founders is Eric Nyberg, PhD, a professor in the Language Technologies Institute at Carnegie Mellon University. Eric directs the Master's Program in Computational Data Science. He was very involved in CMU's partnership with IBM in the development of Watson™ that ultimately triumphed over human competitors in the Jeopardy! Challenge.

View More

IBM readies Watson for post-Jeopardy life

CNN Money  online

2011-02-14

Watson didn't come cheap. IBM (IBM, Fortune 500) won't disclose how much it has invested in the project, but Eric Nyberg, a Carnegie Mellon computer science professor who has worked on Watson, estimates that the project cost IBM up to $100 million.

View More

Social

Industry Expertise

Computer Networking
Computer Hardware
Computer Software

Accomplishments

Allen Newell Award for Research Excellence

n/a

Education

Boston University

B.A.

Carnegie Mellon University

Ph.D.

Computational Linguistics

Patents

Integrated and authoring and translation system

US6658627

2003

The present invention is a system of integrated, computer-based processes for monolingual information development and multilingual translation. An interactive text editor enforces lexical and grammatical constraints on a natural language subset used by the authors to create their text, which they help disambiguate to ensure translatability. The resulting translatable source language text undergoes machine translation into any one of a set of target languages, without the translated text requiring any postediting.

View more

Integrated authoring and translation system

US5677835

1997

The present invention is a system of integrated, computer-based processes for monolingual information development and multilingual translation. An interactive text editor enforces lexical and grammatical constraints on a natural language subset used by the authors to create their text, which they help disambiguate to ensure translatability. The resulting translatable source language text undergoes machine translation into any one of a set of target languages, without the translated text requiring any postediting.

View more

Natural language processing system and method for parsing a plurality of input symbol sequences into syntactically or pragmatically correct word messages

US5299125

1994

A Natural Language Processing System utilizes a symbol parsing layer in combination with an intelligent word parsing layer to produce a syntactically or pragmatically correct output sentence or other word message. Initially, a plurality of polysemic symbol sequences are input through a keyboard segmented into a plurality of semantic, syntactic, or pragmatic segments including agent, action and patient segments, for example. One polysemic symbol sequence, including a plurality of polysemic symbols, is input from each of the three segments of the keyboard.

View more

Articles

Distribution-aware Goal Prediction and Conformant Model-based Planning for Safe Autonomous Driving

arXiv preprint

2022

The feasibility of collecting a large amount of expert demonstrations has inspired growing research interests in learning-to-drive settings, where models learn by imitating the driving behaviour from experts. However, exclusively relying on imitation can limit agents' generalisability to novel scenarios that are outside the support of the training data. In this paper, we address this challenge by factorising the driving task, based on the intuition that modular architectures are more generalisable and more robust to changes in the environment compared to monolithic, end-to-end frameworks. Specifically, we draw inspiration from the trajectory forecasting community and reformulate the learning-to-drive task as obstacle-aware perception and grounding, distribution-aware goal prediction, and model-based planning. Firstly, we train the obstacle-aware perception module to extract salient representation of the visual context. Then, we learn a multi-modal goal distribution by performing conditional density-estimation using normalising flow. Finally, we ground candidate trajectory predictions road geometry, and plan the actions based on on vehicle dynamics. Under the CARLA simulator, we report state-of-the-art results on the CARNOVEL benchmark.

View more

Knowledge-driven scene priors for semantic audio-visual embodied navigation

arXiv preprint

2022

Generalisation to unseen contexts remains a challenge for embodied navigation agents. In the context of semantic audio-visual navigation (SAVi) tasks, the notion of generalisation should include both generalising to unseen indoor visual scenes as well as generalising to unheard sounding objects. However, previous SAVi task definitions do not include evaluation conditions on truly novel sounding objects, resorting instead to evaluating agents on unheard sound clips of known objects; meanwhile, previous SAVi methods do not include explicit mechanisms for incorporating domain knowledge about object and region semantics. These weaknesses limit the development and assessment of models' abilities to generalise their learned experience. In this work, we introduce the use of knowledge-driven scene priors in the semantic audio-visual embodied navigation task: we combine semantic information from our novel knowledge graph that encodes object-region relations, spatial knowledge from dual Graph Encoder Networks, and background knowledge from a series of pre-training tasks -- all within a reinforcement learning framework for audio-visual navigation. We also define a new audio-visual navigation sub-task, where agents are evaluated on novel sounding objects, as opposed to unheard clips of known objects. We show improvements over strong baselines in generalisation to unseen regions and novel sounding objects, within the Habitat-Matterport3D simulation environment, under the SoundSpaces task.

View more

Using Implicit Feedback to Improve Question Generation

arXiv preprint

2023

Question Generation (QG) is a task of Natural Language Processing (NLP) that aims at automatically generating questions from text. Many applications can benefit from automatically generated questions, but often it is necessary to curate those questions, either by selecting or editing them. This task is informative on its own, but it is typically done post-generation, and, thus, the effort is wasted. In addition, most existing systems cannot incorporate this feedback back into them easily. In this work, we present a system, GEN, that learns from such (implicit) feedback. Following a pattern-based approach, it takes as input a small set of sentence/question pairs and creates patterns which are then applied to new unseen sentences. Each generated question, after being corrected by the user, is used as a new seed in the next iteration, so more patterns are created each time. We also take advantage of the corrections made by the user to score the patterns and therefore rank the generated questions. Results show that GEN is able to improve by learning from both levels of implicit feedback when compared to the version with no learning, considering the top 5, 10, and 20 questions. Improvements go up from 10%, depending on the metric and strategy used.

View more

Show All +