Zachary Lipton

Associate Professor Carnegie Mellon University

  • Pittsburgh PA

Zachary Lipton's research spans machine learning methods and their applications in healthcare and natural language processing.

Contact

Carnegie Mellon University

View more experts managed by Carnegie Mellon University

Biography

Zachary Lipton is the Chief Technology Officer and Chief Scientist at Abridge, where he oversees the builder organization responsible for all of product development and AI research. He is also the Raj Reddy Associate Professor of Machine Learning at Carnegie Mellon University, where he directs the Approximately Correct Machine Intelligence (ACMI) lab, whose research focuses include the theoretical and engineering foundations of robust and adaptive machine learning algorithms, applications to both prediction and decision-making problems in clinical medicine, natural language processing, and the impact of machine learning systems on society. He is the founder of the Approximately Correct blog (approximatelycorrect.com) and a co-author of Dive Into Deep Learning, an interactive open-source book drafted entirely through Jupyter notebooks that has reached millions of readers.

Areas of Expertise

Machine Learning
Machine Intelligence
Natural Language Processing (NLP)
Deep Learning

Media Appearances

For Shiv Rao, practicing doctor and founder of $2.75 billion AI startup Abridge, innovation is an art form

Fortune  online

2025-02-17

This is a feature on Shiv Rao (CMU alumnus) and Founder/CEO of Abridge, which uses AI to turn doctor-patient conversations into clinical notes in real-time. Zack Lipton (School of Computer Science) is Abridge's CTO who put aside a life as a professional jazz saxophonist to join Rao.

View More

OpenAI shakeup has rocked Silicon Valley, leaving some techies concerned about future of AI

CNBC  online

2023-11-20

“I imagine Microsoft might ask for a board seat next time they decide to plow $15 billion into a startup,” said Zachary Lipton, a Carnegie Mellon University professor of machine learning and operations research.

View More

What’s the Future for A.I.?

The New York Times  online

2023-04-04

“This will affect tasks that are more repetitive, more formulaic, more generic,” said Zachary Lipton, a professor at Carnegie Mellon who specializes in artificial intelligence and its impact on society.

View More

Show All +

Social

Education

UC San Diego

Ph.D.

Computer Science

2017

UC San Diego

M.S.

Computer Science

2015

Columbia University

B.A.

Mathematics - Economics

2007

Articles

Complementary benefits of contrastive learning and self-training under distribution shift

Advances in Neural Information Processing Systems

2024

Self-training and contrastive learning have emerged as leading techniques for incorporating unlabeled data, both under distribution shift (unsupervised domain adaptation) and when it is absent (semi-supervised learning). However, despite the popularity and compatibility of these techniques, their efficacy in combination remains surprisingly unexplored. In this paper, we first undertake a systematic empirical investigation of this combination, finding (i) that in domain adaptation settings, self-training and contrastive learning offer significant complementary gains; and (ii) that in semi-supervised learning settings, surprisingly, the benefits are not synergistic. Across eight distribution shift datasets (eg, BREEDs, WILDS), we demonstrate that the combined method obtains 3--8\% higher accuracy than either approach independently.

View more

Online label shift: Optimal dynamic regret meets practical algorithms

Advances in Neural Information Processing Systems

2024

This paper focuses on supervised and unsupervised online label shift, where the class marginals variesbut the class-conditionals remain invariant. In the unsupervised setting, our goal is to adapt a learner, trained on some offline labeled data, to changing label distributions given unlabeled online data. In the supervised setting, we must both learn a classifier and adapt to the dynamically evolving class marginals given only labeled online data. We develop novel algorithms that reduce the adaptation problem to online regression and guarantee optimal dynamic regret without any prior knowledge of the extent of drift in the label distribution. Our solution is based on bootstrapping the estimates of* online regression oracles* that track the drifting proportions. Experiments across numerous simulated and real-world online label shift scenarios demonstrate the superior performance of our proposed approaches, often achieving 1-3% improvement in accuracy while being sample and computationally efficient

View more

Resolving the Human-subjects Status of Machine Learning's Crowdworkers: What ethical framework should govern the interaction of ML researchers and crowdworkers?

Queue

2023

In recent years, machine learning (ML) has relied heavily on crowdworkers both for building datasets and for addressing research questions requiring human interaction or judgment. The diversity of both the tasks performed and the uses of the resulting data render it difficult to determine when crowdworkers are best thought of as workers versus human subjects. These difficulties are compounded by conflicting policies, with some institutions and researchers regarding all ML crowdworkers as human subjects and others holding that they rarely constitute human subjects. Notably few ML papers involving crowdwork mention IRB oversight, raising the prospect of non-compliance with ethical and regulatory requirements.

View more

Show All +