Zico Kolter

Associate Professor and Director of Machine Learning Carnegie Mellon University

  • Pittsburgh PA

Zico Kolter researches how to make deep learning algorithms more robust, safer, and understand how data impacts how models function.

Contact

Carnegie Mellon University

View more experts managed by Carnegie Mellon University

Biography

Zico Kolter is a Professor of Computer Science and the head of the Machine Learning Department at Carnegie Mellon University, where he has been a key figure for 12 years. Zico completed his Ph.D. in computer science at Stanford University in 2010, followed by a postdoctoral fellowship at MIT from 2010 to 2012. Throughout his career, he has made significant contributions to the field of machine learning, authoring numerous award-winning papers at prestigious conferences such as NeurIPS, ICML, and AISTATS.

Zico's research includes developing the first methods for creating deep learning models with guaranteed robustness. He pioneered techniques for embedding hard constraints into AI models using classical optimization within neural network layers. More recently, in 2023, his team developed innovative methods for automatically assessing the safety of large language models (LLMs), demonstrating the potential to bypass existing model safeguards through automated optimization techniques. Alongside his academic pursuits, Zico has worked closely within the industry throughout his career, formerly as Chief Data Scientist at C3.ai, and currently as Chief Expert at Bosch and Chief Technical Advisor at Gray Swan, a startup specializing in AI safety and security.

Areas of Expertise

Articifical Intelligence
AI Models
Machine Learning
Deep Learning
Neural Networks
Large Language Models, Generative AI
Elections

Media Appearances

Power Shift: How CMU Is Leading America’s Energy Evolution

CMU News  online

2025-07-11

From reimagining AI data centers to modernizing and securing the electric grid, CMU researchers are working on practical solutions to pressing challenges in how the U.S. produces, moves and secures energy.

“As work across Carnegie Mellon shows, AI has the potential to drastically improve our energy consumption by assisting in developing more efficient techniques for grid operation, building better materials for batteries, and potentially even truly revolutionizing energy by accelerating the development of technologies like nuclear fusion,” said Zico Kolter, head of the Machine Learning Department in CMU’s School of Computer Science.

View More

Small Language Models Are the New Rage, Researchers Say

Wired  online

2025-04-13

Small Language Models (SMLs) are capturing the attention of researchers. Using less power than LLMs, they are not used as general purpose tools, instead they focus on narrowly defined tasks like summarizing conversations. "The reason [SLMs] get so good with such small models and such little data is that they use high-quality data instead of the messy stuff,” said Zico Kolter (School of Computer Science).

View More

How researchers broke ChatGPT and what it could mean for future AI development

ZDNET  online

2023-07-27

"There is no obvious solution," Zico Kolter, a professor at Carnegie Mellon and author of the report, told the Times. "You can create as many of these attacks as you want in a short amount of time."

View More

Show All +

Spotlight

1 min

Power Shift: How CMU Is Leading America’s Energy Evolution

Carnegie Mellon University, long known for its prowess in computer science and engineering, is now emerging as a key innovator within America’s energy landscape. As AI models grow more powerful, so too does their appetite for energy, straining an aging and outdated grid and prompting urgent questions about infrastructure, security and access. From reimagining AI data centers to modernizing and securing the electric grid, CMU researchers are working on practical solutions to pressing challenges in how the U.S. produces, moves and secures energy. Learn what CMU experts have to say about their Work That Matters.

Zico KolterDimitrios SkarlatosGranger MorganAudrey Kurth CroninVyas SekarLarry Pileggi

1 min

Pittsburgh’s AI-Powered Renaissance

Carnegie Mellon University’s artificial intelligence experts come from a wide range of backgrounds and perspectives, representing fields including computer science, sustainability, national security and entrepreneurship. Ahead of the AI Horizons Summit highlighting the city's commitment to responsible technology, CMU experts weighed in on why they see Pittsburgh as a hub for human-centered AI.

Zico KolterValerie KarplusAmeet TalwalkarIra MoskowitzMichael MattarockMeredith Grelli

Social

Education

Georgetown University

B.S.

Computer Science

2005

Stanford University

Ph.D.

Computer Science

2010

Event Appearances

Speaker: AI Horizons Keynote: AI for a Better World – Navigating Truth in the AI Era

AI Horizons Pittsburgh Summit  Pittsburgh, PA

2024-10-14

Moderator: AI in Financial Services: Transforming the Sector for a Better World

AI Horizons Pittsburgh Summit  Pittsburgh, PA

2024-10-14

Articles

Scaling Laws for Data Filtering–Data Curation cannot be Compute Agnostic

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2024

Vision-language models (VLMs) are trained for thousands of GPU hours on carefully curated web datasets. In recent times, data curation has gained prominence with several works developing strategies to retain 'high-quality' subsets of 'raw' scraped data. For instance, the LAION public dataset retained only 10% of the total crawled data. However, these strategies are typically developed agnostic of the available compute for training. In this paper, we first demonstrate that making filtering decisions independent of training compute is often suboptimal: the limited high-quality data rapidly loses its utility when repeated, eventually requiring the inclusion of 'unseen' but 'lower-quality' data. To address this quality-quantity tradeoff (QQT), we introduce neural scaling laws that account for the non-homogeneous nature of web data, an angle ignored in existing literature.

View more

Tofu: A task of fictitious unlearning for llms

arXiv preprint

2024

Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they result in models equivalent to those where the data to be forgotten was never learned in the first place. To address this challenge, we present TOFU, a Task of Fictitious Unlearning, as a benchmark aimed at helping deepen our understanding of unlearning. We offer a dataset of 200 diverse synthetic author profiles, each consisting of 20 question-answer pairs, and a subset of these profiles called the forget set that serves as the target for unlearning.

View more

Massive Activations in Large Language Models

arXiv preprint

2024

We observe an empirical phenomenon in Large Language Models (LLMs) -- very few activations exhibit significantly larger values than others (e.g., 100,000 times larger). We call them massive activations. First, we demonstrate the widespread existence of massive activations across various LLMs and characterize their locations. Second, we find their values largely stay constant regardless of the input, and they function as indispensable bias terms in LLMs. Third, these massive activations lead to the concentration of attention probabilities to their corresponding tokens, and further, implicit bias terms in the self-attention output. Last, we also study massive activations in Vision Transformers.

View more

Show All +