Kris Kitani

Associate Research Professor Carnegie Mellon University

Pittsburgh PA

Carnegie Mellon University
View more experts managed by Carnegie Mellon University

Biography

Kris M. Kitani is an associate research professor of the Robotics Institute at Carnegie Mellon University. He received his BS at the University of Southern California and his MS and PhD at the University of Tokyo. His research projects span the areas of computer vision, machine learning and human computer interaction. In particular, his research interests lie at the intersection of first-person vision, human activity modeling and inverse reinforcement learning. His work has been awarded the Marr Prize honorable mention at ICCV 2017, best paper honorable mention at CHI 2017 and CHI 2020, best paper at W4A 2017 and 2019, best application paper ACCV 2014 and best paper honorable mention ECCV 2012.

The vision of Kris Kitani's lab is to realize robust autonomous systems built for real-world perception and interactive decision-making. The key focus areas of his lab are perception, decision-making, and interaction. His focus is broad because he believes that innovating at the system-level requires expertise and integration at the component level.

Kitani's lab innovates across the full spectrum of perception, including vision-based human pose estimation, action recognition, object detection/tracking/forecasting, and 3D scene understanding. They formulate models for decision-making, including approaches such as reinforcement learning, inverse reinforcement learning, imitation learning, and game-theoretic modeling. They develop real-world systems to enable cyber-physical interaction, including wearable camera systems, multi-modal sensors, portable navigational aides, and assistive mobile robots.

Areas of Expertise

Sensing & Perception

Sustainability

Digital Twins

Computer Vision

Artificial Intelligence

Media Appearances

Beep Beep! Suitcase Helps Visually Impaired People Navigate Airports

WESA radio

2019-05-14

“It’s basic physics,” said Kris Kitani, a head researcher at CMU’s Cognitive Assistance Laboratory and one of BBeep’s creators. “A person is a point on the ground plan. And they’re moving at a certain velocity. And then we can predict, based on that velocity, where they will be at in a few seconds.”

Local Students Create Camera Inside Of A Football

CBS News Pittsburgh online

2013-02-27

"We stick this camera into a little hole in the football," says researcher Kris Kitani as he held a tiny camera in his hand. "And then, with that, we'd record it, and we'd throw the football. Once you put the camera in there, it's going to be rotating really quickly as the ball is spinning and flying in the sky."

Spotlight

Beyond the Pitch: CMU Experts on the 2026 World Cup

Jun 12, 2026

1 min

As the 2026 FIFA World Cup continues across North America, Carnegie Mellon University experts are available to help media examine the stories unfolding beyond the pitch, from geopolitics and global flashpoints to sports business, fan engagement, AI, robotics, biomechanics and athlete performance. CMU’s World Cup Experts Hub brings together faculty and specialists who can provide timely insight into the political, technological, commercial and human performance issues connected to one of the world’s largest sporting events. Featured Topics World Cup Geopolitics and Global Flashpoints Diplomacy, national identity, international competition, Iran’s participation, regional tensions and how major tournaments can reflect wider global conflicts, alliances and cultural divides. The Business of Soccer Marketing impact, soccer’s growing presence in North America, fan access, audience development and how technology can expand the experience for people watching around the world. AI, Robotics and Sports Technology How 3D motion analysis, robotic systems, wearable innovation and performance technologies are changing athlete training, preparation, injury analysis and the way fans experience the game. Performance, Motion and Split-Second Decisions The biomechanics, motor control, hesitation and decision-making behind elite soccer, including the movements, injuries and officiating moments that can define a match. Media can visit CMU’s World Cup Experts Hub to explore available experts and connect directly with the right source for their story.

Media

Social

Education

The University of Tokyo

Ph.D.

Information and Communication Engineering

The University of Tokyo

M.S.

Information and Communication Engineering

University of Southern California

B.S.

Electrical Engineering

Languages

Japanese
English

Patents

Method And System For Generating Pedestrian-Vehicle Interaction Data For Training An Autonomous Vehicle Inventors

18829746

2026-03-12

A method and system for generating virtual pedestrian-vehicle interaction data includes generating a virtual reality environment in virtual reality device, generating a scenario in the virtual reality environment, the scenario comprising virtual vehicle movements, displaying the scenario in a virtual reality device, storing virtual reality movements relative to the scenario, the virtual reality movements comprising at least a yaw movement, communicating the virtual vehicle movements to a simulator controller, communicating the virtual vehicle movements to the simulator controller, associating the virtual reality movements, the virtual vehicle movements and the scenario to form pedestrian-vehicle data, and training an autonomous vehicle system using the pedestrian-vehicle data.

Method for diverse sequential point cloud forecasting

19383503

2026-03-05

A method for sequential point cloud forecasting is described. The method includes training a vector-quantized conditional variational autoencoder (VQ-CVAE) framework to map an output to a closest vector in a discrete latent space to obtain a future latent space. The method also includes outputting, by a trained VQ-CVAE, a categorical distribution of a probability of V vectors in a discrete latent space in response to an input previously sampled latent space and past point cloud sequences. The method further includes sampling an inferred future latent space from the categorical distribution of the probability of the V vectors in the discrete latent space. The method also includes predicting a future point cloud sequence according to the inferred future latent space and the past point cloud sequences. The method further includes denoising, by a denoising diffusion probabilistic model (DDPM), the predicted future point cloud sequences according to an added noise.

Articles

EgoMDM: Diffusion-Based Human Motion Synthesis from Sparse Egocentric Sensors

IEEE Xplore

2026-05-27

Accurate three-dimensional (3D) human motion tracking is essential for immersive augmented reality (AR) and virtual reality (VR) applications, allowing users to engage with virtual environments through realistic full-body avatars. Achieving this level of detail, however, is challenging when the driving signals are sparse, typically coming only from upper-body sensors, such as head-mounted devices and hand controllers. To address this challenge, we propose EgoMDM (Egocentric Motion Diffusion Model), an end-to-end diffusion-based framework designed to reconstruct full-body motion from sparse tracking signals. EgoMDM models human motion in a conditional autoregressive manner using a unidirectional recurrent neural network, making it well-suited for real-time applications.

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors

arXiv preprint

2026-03-27

We propose Ground Reaction Inertial Poser (GRIP), a method that reconstructs physically plausible human motion using four wearable devices. Unlike conventional IMU-only approaches, GRIP combines IMU signals with foot pressure data to capture both body dynamics and ground interactions.

BodyContact4D: A Multi-view Video Dataset for Understanding Human and Environment Interactions

IEEE Xplore

2026-03-20

To improve vision-based methods for understanding how people interact with their physical environment, we introduce a multi-view video and body-contact sensing dataset designed to capture dynamic human activities that involve interactions with the physical environment. The dataset includes activities such as parkour, physical training, and gym exercises, characterized by frequent body-environment contact. The proposed dataset includes 780 K images across 120 K pose sequences from 7 subjects.

Kris Kitani

Carnegie Mellon University

Biography

Areas of Expertise

Media Appearances

Beep Beep! Suitcase Helps Visually Impaired People Navigate Airports

Local Students Create Camera Inside Of A Football

Spotlight

Media

Social

Education

The University of Tokyo

The University of Tokyo

University of Southern California

Links

Languages

Patents

Method And System For Generating Pedestrian-Vehicle Interaction Data For Training An Autonomous Vehicle Inventors

Method for diverse sequential point cloud forecasting

Articles

EgoMDM: Diffusion-Based Human Motion Synthesis from Sparse Egocentric Sensors

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors

BodyContact4D: A Multi-view Video Dataset for Understanding Human and Environment Interactions

DuoMo: Dual Motion Diffusion for World-Space Human Reconstruction

Kris Kitani

Carnegie Mellon University

Biography

Areas of Expertise

Media Appearances

Beep Beep! Suitcase Helps Visually Impaired People Navigate Airports

Local Students Create Camera Inside Of A Football

Spotlight

Beyond the Pitch: CMU Experts on the 2026 World Cup

Media

Social

Education

The University of Tokyo

The University of Tokyo

University of Southern California

Links

Languages

Patents

Method And System For Generating Pedestrian-Vehicle Interaction Data For Training An Autonomous Vehicle Inventors

Method for diverse sequential point cloud forecasting

Articles

EgoMDM: Diffusion-Based Human Motion Synthesis from Sparse Egocentric Sensors

Ground Reaction Inertial Poser: Physics-based Human Motion Capture from Sparse IMUs and Insole Pressure Sensors

BodyContact4D: A Multi-view Video Dataset for Understanding Human and Environment Interactions

DuoMo: Dual Motion Diffusion for World-Space Human Reconstruction