Xiaoming Liu earned his Ph.D degree in Electrical and Computer Engineering from Carnegie Mellon University in 2004. He received a B.E. degree from Beijing Information Technology Institute, China and a M.E. degree from Zhejiang University, China in 1997 and 2000 respectively, both in Computer Science. Prior to joining MSU, he was a research scientist at the Computer Vision Laboratory of GE Global Research. His research interests include computer vision, pattern recognition, machine learning, biometrics, human computer interface, etc.
Industry Expertise (3)
Areas of Expertise (6)
Human Computer Interfaces
MSU Foundation Professor
Fellow of International Association for Pattern Recognition (IAPR)
2020, for contributions to face and video analysis
Finalist of the CVPR 2019 Best Paper Award
2019, for students’ work of “Deep Tree Learning for Zero-shot Face Anti-Spoofing”
Best Oral Paper Award
2019, for the paper “UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss” at the First Workshop on Statistical Deep Learning in Computer Vision (SDLCV)
Withrow Distinguished Scholar–Junior Award
2018, established by the Withrow family to recognize faculty of the MSU College of Engineering who have demonstrated excellence in scholarly activities
Invited Participant, Microsoft Research Faculty Summit
Best Poster Award, 26th British Machine Vision Conference (BMVC)
2015, as co-author
Zhejiang University: M.S., Computer Science and Engineering 2000
Carnegie Mellon University: Ph.D., Electrical and Computer Engineering 2004
Beijing Information Technology Institute: B.A., Computer Science and Engineering 1997
- IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM) Special Issue on Trustworthy Biometrics : Guest Editor, 2020 - 2022
- Corresponding Expert of Frontiers of Information Technology & Electronic Engineering : Guest Editor, 2019 - 2022
- Engineering Journal Special Issue on Artificial Intelligence 2021 : Guest Editor, 2021
- Pattern Recognition Letter Special Issue on Biometric Presentation Attacks: handcrafted features versus deep learning approaches : Guest Editor, 2019
- ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) Special Issue on Face Analysis for Applications : Guest Editor, 2018 - 2019
- Machine Vision and Applications Special Issue on 2018 IEEE Winter Conference on Applications of Computer Vision : Guest Editor, 2018
- International Journal of Computer Vision Special Issue on Deep Learning for Face Analysis : Guest Editor, 2017 - 2018
- IEEE Transaction on Image Processing : Associate Editor, 2019 - Present
- Pattern Recognition : Associate Editor, 2019 - Present
- Pattern Recognition Letters : Associate Editor : 2019 - Present
- Neurocomputing Journal : Associate Editor, 2016 - 2019
MSU, Facebook develop research model to fight deepfakes
MSU Today online
“Our method will facilitate deepfake detection and tracing in real-world settings where the deepfake image itself is often the only information detectors have to work with,” said Xiaoming Liu, MSU Foundation Professor of computer science. “It’s important to go beyond current methods of image attribution because a deepfake could be created using a generative model that the current detector has not seen during its training.”
Facebook scientists say they can now tell where deepfakes have come from
Schick questioned whether Facebook’s tool would work on the latter, adding that “there can never be a one size fits all detector.” But Xiaoming Liu, Facebook’s collaborator at Michigan State, said the work has “been evaluated and validated on both cases of deepfakes.” Liu added that the “performance might be lower” in cases where the manipulation only happens in a very small area.
Facebook says it’s made a big leap forward in detecting deepfakes
Hassner says the research took inspiration from prior work by a Michigan State computer scientist who collaborated on the project, Xiaoming Liu. Liu had studied the subtle differences between images taken with different brands and kinds of digital cameras. He built machine-learning systems that could analyze images and determine, with a high degree of accuracy, the type of camera used to take that particular picture.
Biometric smart cards and civic digital identity apps to redefine wallets
Biometric Update online
Michigan State University biometrics researcher Sixue Gong explains a method for de-biasing facial recognition described in a research paper written with Xiaoming Liu and Anil Jain in an interview with Biometric Update. The idea is one of several promising attempts to move beyond improving training dataset balance to address the problem, which Sixue says is necessary.
Method for facial recognition bias reduction with adversarial network shows promise
Biometric Update online
A paper jointly written by Sixue Gong, Xiaoming Liu and Anil K. Jain, all of Michigan State University, ‘Jointly de-biasing face recognition and demographic attribute estimation,’ was presented at the European Conference on Computer Vision (ECCV) 2020.
Xiaoming Liu named Fellow by IAPR
MSU College of Engineering online
“Cameras can see diverse scenes,” he explained, “so our research objects range from human faces and bodies, to urban scenes, plants, and medical imaging. Recent interests also include 3D perception in autonomous driving and defending against various digital image manipulations, such as deepfake."
Event Appearances (5)
On the Accuracy, Vulnerability, and Biasness of Face Recognition, The 15th Chinese Conference on Biometrics Recognition (CCBR)
The 15th Chinese Conference on Biometrics Recognition (CCBR), Shanghai, China Virtual
Monocular Video-based 3D Perception for Autonomous Driving
7th Tech.AD USA conference 2020, Detroit MI Virtual
3D Perception for Autonomous Driving: Research and Education
Southern University of Science & Technology, Shenzhen, China Virtual
Autonomous Sensing: from 3D Object Detection to Biometric Recognition
Army Research Laboratory, Adelphi, MD Virtual
Monocular Vision-based 3D Perception for Autonomous Driving
General Motor Research and Development Center, Warren MI Virtual
Research Focus (1)
Computer Vision, Pattern Recognition, Image and Video Processing, Machine Learning, Human Computer Interface, Medical Image Analysis, Multimedia Retrieval.
Visual analytics system for convolutional neural network based classifiers
A visual analytics method and system is disclosed for visualizing an operation of an image classification model having at least one convolutional neural network layer. The image classification model classifies sample images into one of a predefined set of possible classes. The visual analytics method determines a unified ordering of the predefined set of possible classes based on a similarity hierarchy such that classes that are similar to one another are clustered together in the unified ordering. The visual analytics method displays various graphical depictions, including a class hierarchy viewer, a confusion matrix, and a response map. In each case, the elements of the graphical depictions are arranged in accordance with the unified ordering. Using the method, a user a better able to understand the training process of the model, diagnose the separation power of the different feature detectors of the model, and improve the architecture of the model.
Disentangled representation learning generative adversarial network for pose-invariant face recognition
A system and method for identifying a subject using imaging are provided. In some aspects, the method includes receiving an image depicting a subject to be identified, and applying a trained Disentangled Representation learning-Generative Adversarial Network (DR-GAN) to the image to generate an identity representation of the subject, wherein the DR-GAN comprises a discriminator and a generator having at least one of an encoder and a decoder. The method also includes identifying the subject using the identity representation, and generating a report indicative of the subject identified.
Research Grants (5)
Intelligent Diagnosis for Machine and Human-Centric Adversaries,” DARPA Reverse Engineering of Deceptions (RED) program
Northeastern University $1,000,000
Face manipulation detection
SCH: INT: Collaborative Research: Unobtrusive sensing and motivational feedback for family wellness
National Science Foundation $365,000
National Institute of Standards and Technology
National Institute of Standards and Technolog $147,000
Computer Vision Research
Facebook Reality Lab $25,000
Journal Articles (5)
Radar-Camera Pixel Depth Association for Depth CompletionarXiv preprint
Yunfei Long, Daniel Morris, Xiaoming Liu, Marcos Castro, Punarjay Chakravarty, Praveen Narayanan
2021 While radar and video data can be readily fused at the detection level, fusing them at the pixel level is potentially more beneficial. This is also more challenging in part due to the sparsity of radar, but also because automotive radar beams are much wider than a typical pixel combined with a large baseline between camera and radar, which results in poor association between radar pixels and color pixel. A consequence is that depth completion methods designed for LiDAR and video fare poorly for radar and video. Here we propose a radar-to-pixel association stage which learns a mapping from radar returns to pixels. This mapping also serves to densify radar returns. Using this as a first stage, followed by a more traditional depth completion method, we are able to achieve image-guided depth completion with radar and video. We demonstrate performance superior to camera and radar alone on the nuScenes dataset. Our source code is available at this https URL.
Riggable 3D Face Reconstruction via In-Network OptimizationarXiv preprint
Ziqian Bai, Zhaopeng Cui, Xiaoming Liu, Ping Tan
2021 This paper presents a method for riggable 3D face reconstruction from monocular images, which jointly estimates a personalized face rig and per-image parameters including expressions, poses, and illuminations. To achieve this goal, we design an end-to-end trainable network embedded with a differentiable in-network optimization. The network first parameterizes the face rig as a compact latent code with a neural decoder, and then estimates the latent code as well as per-image parameters via a learnable optimization. By estimating a personalized face rig, our method goes beyond static reconstructions and enables downstream applications such as video retargeting. In-network optimization explicitly enforces constraints derived from the first principles, thus introduces additional priors than regression-based methods. Finally, data-driven priors from deep learning are utilized to constrain the ill-posed monocular setting and ease the optimization difficulty. Experiments demonstrate that our method achieves SOTA reconstruction accuracy, reasonable robustness and generalization ability, and supports standard face rig applications.
Depth Completion with Twin Surface Extrapolation at Occlusion BoundariesarXiv preprint
Saif Imran, Xiaoming Liu, Daniel Morris
2021 Depth completion starts from a sparse set of known depth values and estimates the unknown depths for the remaining image pixels. Most methods model this as depth interpolation and erroneously interpolate depth pixels into the empty space between spatially distinct objects, resulting in depth-smearing across occlusion boundaries. Here we propose a multi-hypothesis depth representation that explicitly models both foreground and background depths in the difficult occlusion-boundary regions. Our method can be thought of as performing twin-surface extrapolation, rather than interpolation, in these regions. Next our method fuses these extrapolated surfaces into a single depth image leveraging the image data. Key to our method is the use of an asymmetric loss function that operates on a novel twin-surface representation. This enables us to train a network to simultaneously do surface extrapolation and surface fusion. We characterize our loss function and compare with other common losses. Finally, we validate our method on three different datasets; KITTI, an outdoor real-world dataset, NYU2, indoor real-world depth dataset and Virtual KITTI, a photo-realistic synthetic dataset with dense groundtruth, and demonstrate improvement over the state of the art.
Unified Detection of Digital and Physical Face AttacksarXiv preprint
Debayan Deb, Xiaoming Liu, Anil K Jain
2021 State-of-the-art defense mechanisms against face attacks achieve near perfect accuracies within one of three attack categories, namely adversarial, digital manipulation, or physical spoofs, however, they fail to generalize well when tested across all three categories. Poor generalization can be attributed to learning incoherent attacks jointly. To overcome this shortcoming, we propose a unified attack detection framework, namely UniFAD, that can automatically cluster 25 coherent attack types belonging to the three categories. Using a multi-task learning framework along with k-means clustering, UniFAD learns joint representations for coherent attacks, while uncorrelated attack types are learned separately. Proposed UniFAD outperforms prevailing defense methods and their fusion with an overall TDR = 94.73% @ 0.2% FDR on a large fake face dataset consisting of 341K bona fide images and 448K attack images of 25 types across all 3 categories. Proposed method can detect an attack within 3 milliseconds on a Nvidia 2080Ti. UniFAD can also identify the attack types and categories with 75.81% and 97.37% accuracies, respectively.
Fully Understanding Generic Objects: Modeling, Segmentation, and ReconstructionarXiv preprint
Feng Liu, Luan Tran, Xiaoming Liu
2021 Inferring 3D structure of a generic object from a 2D image is a long-standing objective of computer vision. Conventional approaches either learn completely from CAD-generated synthetic data, which have difficulty in inference from real images, or generate 2.5D depth image via intrinsic decomposition, which is limited compared to the full 3D reconstruction. One fundamental challenge lies in how to leverage numerous real 2D images without any 3D ground truth. To address this issue, we take an alternative approach with semi-supervised learning. That is, for a 2D image of a generic object, we decompose it into latent representations of category, shape and albedo, lighting and camera projection matrix, decode the representations to segmented 3D shape and albedo respectively, and fuse these components to render an image well approximating the input image. Using a category-adaptive 3D joint occupancy field (JOF), we show that the complete shape and albedo modeling enables us to leverage real 2D images in both modeling and model fitting. The effectiveness of our approach is demonstrated through superior 3D reconstruction from a single image, being either synthetic or real, and shape segmentation.