Zico Kolter

Associate Professor and Director of Machine Learning Carnegie Mellon University

  • Pittsburgh PA

Zico Kolter researches how to make deep learning algorithms more robust, safer, and understand how data impacts how models function.

Contact

Carnegie Mellon University

View more experts managed by Carnegie Mellon University

Biography

Zico Kolter is a Professor of Computer Science and the head of the Machine Learning Department at Carnegie Mellon University, where he has been a key figure for 12 years. Zico completed his Ph.D. in computer science at Stanford University in 2010, followed by a postdoctoral fellowship at MIT from 2010 to 2012. Throughout his career, he has made significant contributions to the field of machine learning, authoring numerous award-winning papers at prestigious conferences such as NeurIPS, ICML, and AISTATS.

Zico's research includes developing the first methods for creating deep learning models with guaranteed robustness. He pioneered techniques for embedding hard constraints into AI models using classical optimization within neural network layers. More recently, in 2023, his team developed innovative methods for automatically assessing the safety of large language models (LLMs), demonstrating the potential to bypass existing model safeguards through automated optimization techniques. Alongside his academic pursuits, Zico has worked closely within the industry throughout his career, formerly as Chief Data Scientist at C3.ai, and currently as Chief Expert at Bosch and Chief Technical Advisor at Gray Swan, a startup specializing in AI safety and security.

Areas of Expertise

Elections
Large Language Models, Generative AI
Neural Networks
Deep Learning
Machine Learning
AI Models

Media Appearances

Small Language Models Are the New Rage, Researchers Say

Wired  online

2025-04-13

Small Language Models (SMLs) are capturing the attention of researchers. Using less power than LLMs, they are not used as general purpose tools, instead they focus on narrowly defined tasks like summarizing conversations. "The reason [SLMs] get so good with such small models and such little data is that they use high-quality data instead of the messy stuff,” said Zico Kolter (School of Computer Science).

View More

The AI Agent Era Requires a New Kind of Game Theory

Wired  online

2025-04-09

Zico Kolter, a Carnegie Mellon professor and board member at OpenAI, tells WIRED about the dangers of AI agents interacting with one another—and why models need to be more resistant to attacks.

View More

Meet Alphalab’s 2025 cohort of innovative Pittsburgh startups

Technical.ly  online

2025-03-19

Zico Kolter (School of Computer Science) will join ex-Google chief Eric Schmidt's AI Safety Science program. Schmidt is spending $10M on fundamental research into safety problems in AI. Kolter's role will focus on AI attacks.

View More

Show All +

Spotlight

1 min

Pittsburgh’s AI-Powered Renaissance

Carnegie Mellon University’s artificial intelligence experts come from a wide range of backgrounds and perspectives, representing fields including computer science, sustainability, national security and entrepreneurship. Ahead of the AI Horizons Summit highlighting the city's commitment to responsible technology, CMU experts weighed in on why they see Pittsburgh as a hub for human-centered AI.

Zico KolterValerie KarplusAmeet TalwalkarIra MoskowitzMichael MattarockMeredith Grelli

Social

Education

Stanford University

Ph.D.

Computer Science

2010

Georgetown University

B.S.

Computer Science

2005

Event Appearances

Moderator: AI in Financial Services: Transforming the Sector for a Better World

AI Horizons Pittsburgh Summit  Pittsburgh, PA

2024-10-14

Speaker: AI Horizons Keynote: AI for a Better World – Navigating Truth in the AI Era

AI Horizons Pittsburgh Summit  Pittsburgh, PA

2024-10-14

Articles

Rethinking LLM Memorization through the Lens of Adversarial Compression

arXiv preprint

2024

Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on how we define memorization. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs. A given string from the training data is considered memorized if it can be elicited by a prompt (much) shorter than the string itself -- in other words, if these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. The ACR overcomes the limitations of existing notions of memorization by (i) offering an adversarial view of measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute.

View more

Forcing Diffuse Distributions out of Language Models

arXiv preprint

2024

Despite being trained specifically to follow user instructions, today's instructiontuned language models perform poorly when instructed to produce random outputs. For example, when prompted to pick a number uniformly between one and ten Llama-2-13B-chat disproportionately favors the number five, and when tasked with picking a first name at random, Mistral-7B-Instruct chooses Avery 40 times more often than we would expect based on the U.S. population. When these language models are used for real-world tasks where diversity of outputs is crucial, such as language model assisted dataset construction, their inability to produce diffuse distributions over valid choices is a major hurdle. In this work, we propose a fine-tuning method that encourages language models to output distributions that are diffuse over valid outcomes. The methods we introduce generalize across a variety of tasks and distributions and make large language models practical for synthetic dataset generation with little human intervention.

View more

Massive Activations in Large Language Models

arXiv preprint

2024

We observe an empirical phenomenon in Large Language Models (LLMs) -- very few activations exhibit significantly larger values than others (e.g., 100,000 times larger). We call them massive activations. First, we demonstrate the widespread existence of massive activations across various LLMs and characterize their locations. Second, we find their values largely stay constant regardless of the input, and they function as indispensable bias terms in LLMs. Third, these massive activations lead to the concentration of attention probabilities to their corresponding tokens, and further, implicit bias terms in the self-attention output. Last, we also study massive activations in Vision Transformers.

View more

Show All +