Zachary Lipton

Associate Professor Carnegie Mellon University

Pittsburgh PA

Zachary Lipton's research spans machine learning methods and their applications in healthcare and natural language processing.

Contact

Carnegie Mellon University
View more experts managed by Carnegie Mellon University

View all Experts

Biography

Zachary Lipton is the Chief Technology Officer and Chief Scientist at Abridge, where he oversees the builder organization responsible for all of product development and AI research. He is also the Raj Reddy Associate Professor of Machine Learning at Carnegie Mellon University, where he directs the Approximately Correct Machine Intelligence (ACMI) lab, whose research focuses include the theoretical and engineering foundations of robust and adaptive machine learning algorithms, applications to both prediction and decision-making problems in clinical medicine, natural language processing, and the impact of machine learning systems on society. He is the founder of the Approximately Correct blog (approximatelycorrect.com) and a co-author of Dive Into Deep Learning, an interactive open-source book drafted entirely through Jupyter notebooks that has reached millions of readers.

Areas of Expertise

Machine Learning

Machine Intelligence

Natural Language Processing (NLP)

Deep Learning

Media Appearances

For Shiv Rao, practicing doctor and founder of $2.75 billion AI startup Abridge, innovation is an art form

Fortune online

2025-02-17

This is a feature on Shiv Rao (CMU alumnus) and Founder/CEO of Abridge, which uses AI to turn doctor-patient conversations into clinical notes in real-time. Zack Lipton (School of Computer Science) is Abridge's CTO who put aside a life as a professional jazz saxophonist to join Rao.

OpenAI shakeup has rocked Silicon Valley, leaving some techies concerned about future of AI

CNBC online

2023-11-20

“I imagine Microsoft might ask for a board seat next time they decide to plow $15 billion into a startup,” said Zachary Lipton, a Carnegie Mellon University professor of machine learning and operations research.

What’s the Future for A.I.?

The New York Times online

2023-04-04

“This will affect tasks that are more repetitive, more formulaic, more generic,” said Zachary Lipton, a professor at Carnegie Mellon who specializes in artificial intelligence and its impact on society.

Show All +

Media

Social

Education

UC San Diego

Ph.D.

Computer Science

2017

UC San Diego

M.S.

Computer Science

2015

Columbia University

B.A.

Mathematics - Economics

2007

Articles

Complementary benefits of contrastive learning and self-training under distribution shift

Advances in Neural Information Processing Systems

2024

Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the popularity and compatibility of these techniques, their efficacy in combination remains surprisingly unexplored. In this paper, we first undertake a systematic empirical investigation of this combination, finding (i) that in domain adaptation settings, self-training and contrastive learning offer significant complementary gains; and (ii) that in semi-supervised learning settings, surprisingly, the benefits are not synergistic. Across eight distribution shift datasets (eg, BREEDs, WILDS), we demonstrate that the combined method obtains 3--8\% higher accuracy than either approach independently.

Online label shift: Optimal dynamic regret meets practical algorithms

Advances in Neural Information Processing Systems

2024

This paper focuses on supervised and unsupervised online label shift, where the class marginals variesbut the class-conditionals remain invariant. In the unsupervised setting, our goal is to adapt a learner, trained on some offline labeled data, to changing label distributions given unlabeled online data. In the supervised setting, we must both learn a classifier and adapt to the dynamically evolving class marginals given only labeled online data. We develop novel algorithms that reduce the adaptation problem to online regression and guarantee optimal dynamic regret without any prior knowledge of the extent of drift in the label distribution. Our solution is based on bootstrapping the estimates of* online regression oracles* that track the drifting proportions. Experiments across numerous simulated and real-world online label shift scenarios demonstrate the superior performance of our proposed approaches, often achieving 1-3% improvement in accuracy while being sample and computationally efficient

Resolving the Human-subjects Status of Machine Learning's Crowdworkers: What ethical framework should govern the interaction of ML researchers and crowdworkers?

Queue

2023

In recent years, machine learning (ML) has relied heavily on crowdworkers both for building datasets and for addressing research questions requiring human interaction or judgment. The diversity of both the tasks performed and the uses of the resulting data render it difficult to determine when crowdworkers are best thought of as workers versus human subjects. These difficulties are compounded by conflicting policies, with some institutions and researchers regarding all ML crowdworkers as human subjects and others holding that they rarely constitute human subjects. Notably few ML papers involving crowdwork mention IRB oversight, raising the prospect of non-compliance with ethical and regulatory requirements.

Show All +

Zachary Lipton

Carnegie Mellon University

Biography

Areas of Expertise

Media Appearances

For Shiv Rao, practicing doctor and founder of $2.75 billion AI startup Abridge, innovation is an art form

OpenAI shakeup has rocked Silicon Valley, leaving some techies concerned about future of AI

What’s the Future for A.I.?

What’s wrong with “explainable A.I.”

Is AI overhyped? Researchers weigh in on technology's promise and problems

Media

Social

Education

UC San Diego

UC San Diego

Columbia University

Links

Articles

Complementary benefits of contrastive learning and self-training under distribution shift

Online label shift: Optimal dynamic regret meets practical algorithms

Resolving the Human-subjects Status of Machine Learning's Crowdworkers: What ethical framework should govern the interaction of ML researchers and crowdworkers?

Deep equilibrium based neural operators for steady-state PDEs

Identifying Game-Based Digital Biomarkers of Cognitive Risk for Adolescent Substance Misuse: Protocol for a Proof-of-Concept Study