Deva Ramanan is a professor in the Robotics Institute at Carnegie Mellon University and the director of the CMU Argo AI Center for Autonomous Vehicle Research. The Center engages in fundamental research to produce advanced perception and next-generation decision-making algorithms that enable vehicles to perceive and navigate autonomously in diverse real-world urban conditions. His research interests span computer vision and machine learning, with a focus on visual recognition often motivated by the task of understanding people from visual data. He served at the program chair of the IEEE Computer Vision and Pattern Recognition (CVPR) 2018. He is on the editorial board of the International Journal of Computer Vision and is an associate editor for the IEEE Transactions on Pattern Analysis and Machine Intelligence. He regularly serves as a senior program committee member for CVPR, the International Conference on Computer Vision, and the European Conference on Computer Vision. He also regularly serves on NSF panels for computer vision and machine learning.
Areas of Expertise (10)
Machine Learning Embedded in Systems
3-D Vision and Recognition
Visual Servoing and Visual Tracking
Sensing & Perception
Graphics & Creative Tools
Media Appearances (5)
Generative modeling tool renders 2D sketches in 3D
Tech Xplore online
"As long as you can draw a sketch, you can make your own customized 3D model," said RI doctoral candidate Kangle Deng, who was part of the research team with Zhu, Professor Deva Ramanan and Ph.D. student Gengshan Yang.
Self-driving cars would be nowhere without HD maps
"Even though a traffic light and the moon may resemble each other, a self-driving system should use a combination of contextual cues — including spatial, temporal and prior knowledge — to tell them apart," Deva Ramanan, principal scientist at self-driving tech competitor Argo AI explains in a blog post.
New Perception Metric Balances Reaction Time, Accuracy
Carnegie Mellon University online
The new metric, called streaming perception accuracy, was developed by Li, together with Deva Ramanan, associate professor in the Robotics Institute and principal scientist at Argo AI, and Yu-Xiong Wang, assistant professor at the University of Illinois at Urbana-Champaign. They presented it last month at the virtual European Conference on Computer Vision, where it received a best paper honorable mention award.
Carnegie Mellon, Argo AI to Create Self-Driving Vehicle Research Center
Robotics Business Review online
Deva Ramanan, an associate professor in the Robotics Institute who also serves as machine learning lead at Argo AI, will be the center’s principal investigator. The center’s research will involve faculty members and students from across CMU. The center will give students access to the fleet-scale data sets, vehicles and large-scale infrastructure that are crucial for advancing self-driving technologies and that otherwise would be difficult to obtain.
Beyond deep fakes: Transforming video content into another video's style, automatically
Bansal will present the method today at ECCV 2018, the European Conference on Computer Vision, in Munich. His co-authors include Deva Ramanan, CMU associate professor of robotics.
Industry Expertise (2)
IARPA Award for "Walk-Through Rendering From Images of Varying Altitudes (professional)
University of California at Berkeley: Ph.D., Electrical Engineering and Computer Science
University of Delaware: B.S., Computer Engineering
- IEEE Computer Vision and Pattern Recognition (CVPR)
- International Journal of Computer Vision
- IEEE Transactions on Pattern Analysis and Machine Intelligence
Edge-based Privacy-Sensitive Live Learning for Discovery of Training DataProceedings of the 1st International Workshop on Networked AI Systems
2023 Finding true positives (TPs) to construct a training set for a new class of interest in machine learning (ML) is often a challenge. The novelty of the class suggests that cloud archives are unlikely to be helpful. We observe that most video data collected for surveillance and briefly stored at the edge before being overwritten is currently unused. To efficiently harness this untapped resource, we describe Delphi, a privacy-sensitive interactive labeling system that continuously improves labeling productivity through background learning. Our experimental results confirm the value of Delphi for training set construction from edge-sourced data.
Towards long-tailed 3d detectionConference on Robot Learning
2023 Contemporary autonomous vehicle (AV) benchmarks have advanced techniques for training 3D detectors, particularly on large-scale lidar data. Surprisingly, although semantic class labels naturally follow a long-tailed distribution, contemporary benchmarks focus on only a few common classes (eg, pedestrian and car) and neglect many rare classes in-the-tail (eg, debris and stroller). However, AVs must still detect rare classes to ensure safe operation. Moreover, semantic classes are often organized within a hierarchy, eg, tail classes such as child and construction-worker are arguably subclasses of pedestrian. However, such hierarchical relationships are often ignored, which may lead to misleading estimates of performance and missed opportunities for algorithmic innovation.
WEDGE: A multi-weather autonomous driving dataset built from generative vision-language modelsProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
2023 The open road poses many challenges to autonomous perception, including poor visibility from extreme weather conditions. Models trained on good-weather datasets frequently fail at detection in these out-of-distribution settings. To aid adversarial robustness in perception, we introduce WEDGE (WEather images by DALL-E GEneration): a synthetic dataset generated with a vision-language generative model via prompting. WEDGE consists of 3360 images in 16 extreme weather conditions manually annotated with 16513 bounding boxes, supporting research in the tasks of weather classification and 2D object detection. We have analyzed WEDGE from research standpoints, verifying its effectiveness for extreme-weather autonomous perception. We establish baseline performance for classification and detection with 53.87% test accuracy and 45.41 mAP.
Reconstructing animatable categories from videosProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
2023 Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging. Recently, differentiable rendering provides a pathway to obtain high-quality 3D models from monocular videos, but these are limited to rigid categories or single instances. We present RAC, a method to build category-level 3D models from monocular videos, disentangling variations over instances and motion over time. Three key ideas are introduced to solve this problem:(1) specializing a category-level skeleton to instances,(2) a method for latent space regularization that encourages shared structure across a category while maintaining instance details, and (3) using 3D background models to disentangle objects from the background. We build 3D models for humans, cats, and dogs given monocular videos.
Distilling Neural Fields for Real-Time Articulated Shape ReconstructionProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
2023 We present a method for reconstructing articulated 3D models from videos in real-time, without test-time optimization or manual 3D supervision at training time. Prior work often relies on pre-built deformable models (eg SMAL/SMPL), or slow per-scene optimization through differentiable rendering (eg dynamic NeRFs). Such methods fail to support arbitrary object categories, or are unsuitable for real-time applications. To address the challenge of collecting large-scale 3D training data for arbitrary deformable object categories, our key insight is to use off-the-shelf video-based dynamic NeRFs as 3D supervision to train a fast feed-forward network, turning 3D shape and motion prediction into a supervised distillation task. Our temporal-aware network uses articulated bones and blend skinning to represent arbitrary deformations, and is self-supervised on video datasets without requiring 3D shapes or viewpoints as input.