Advancing the Fundamental Science of AI Safety
Opportunities for Funding
-
Safety in the Inference-Time Compute Paradigm: Expression of Interest
AI Safety Science
The Challenge
Every day, AI technology is becoming more consequential. As a result, the impact of safety failures are potentially incredibly harmful.
-
We do not have a robust ecosystem of safety benchmarks and evaluations.
There is a scarcity of robust and reliable benchmarks to effectively assess model performance, and existing benchmarks often exhibit high correlation, indicating they may not be evaluating distinct and independent capabilities. Additionally, there is a notable lack of established evaluation methods for many emerging agentic and multi-modal capabilities. -
Current philanthropic and government funding of AI safety research is insufficient.
One estimate puts total funding for AI safety research at only $80-130 million per year over the 2021-2024 period (LessWrong, 2024).This level of funding prohibits the type of larger and longer-term research efforts that would require more talent, compute, and time. -
Academics are underleveraged in AI safety research.
Currently, safety research for the largest AI models is primarily conducted by leading AI labs. Given the advancements and impact of machine learning, we believe that with adequate resources, university faculty and students can contribute to a more thorough understanding of Large Language Models (LLMs) and develop fundamental evaluation methods. However, It is estimated that only a small fraction (1-3%) of AI publications focuses on safety (Toner et al., 2022; Emerging Tech Observatory, 2024), indicating a need for increased investment.
Program Goals
-
Deepen our understanding of safety properties of AI systems
-
Create principled methodologies for developing benchmarks
-
Advance safety approaches resistant to obsolescence from fast-evolving technology
-
Support the development of a global, technical AI safety community
Research Agenda
Supporting scientific advances that can be broadly applied to safety criteria and testing methodologies for large classes of models
We intend to support research in the following areas:
-
Assurance
How can we provide confidence for users of generative AI systems that their systems are safe to use? -
Generalizability
How can we generalize the results of LLM testing given the scale and complexity of LLMs and the breadth of models that exist? -
Testing and Evaluation Frameworks
How can the comprehensive evaluation of generative AI systems' vastness, diversity, and rapid evolution be automated through testing? -
Applied Research
We intend to support the creation of high-quality benchmarks that address existing safety challenges and inform theoretical work.
Featured Projects
-
Dr. Sanjeev Arora, Princeton University
-
Dr. Eugene Bagdasarian & Dr. Shlomo Zilberstein, University of Massachusetts Amherst
-
Dr. Yoshua Bengio, Mila - Quebec Artificial Intelligence Institute
-
Dr. Nicolas Flammarion, EPFL Swiss Federal Technology Institute of Lausanne
-
Dr. Adam Gleave and Kellin Pelrine, FAR.AI, and Dr. Thomas Costello, American University and MIT
-
Dr. Tatsu Hashimoto, Stanford University
-
Dr. Matthias Hein, University of Tübingen and Dr. Jonas Geiping, ELLIS Institute Tübingen
-
Dr. Daniel Kang, University of Illinois, Urbana-Champaign
-
Dr. Mykel Kochenderfer, Stanford University
-
Dr. Zico Kolter, Carnegie Mellon University
-
Dr. Sanmi Koyejo, Stanford University
-
Dr. David Krueger, University of Cambridge
-
Dr. Anna Leshinskaya, University of California-Irvine
-
Dr. Bo Li, University of Illinois Urbana-Champaign
-
Dr. Sharon Li, University of Wisconsin-Madison
-
Dr. Evan Miyazono and Daniel Windham, Atlas Computing
-
Dr. Karthik Narasimhan, Princeton University
-
Dr. Arvind Narayanan, Princeton University
-
Dr. Aditi Raghunathan, Dr. Aviral Kumar and Dr. Andrea Bajcsy, Carnegie Mellon University
-
Dr. Maarten Sap and Dr. Graham Neubig, Carnegie Mellon University
-
Dr. Dawn Song, University of California-Berkeley
-
Dr. Huan Sun, Dr. Yu Su and Dr. Zhiqiang Lin, The Ohio State University
-
Dr. Florian Tramèr, ETH Zurich
-
Dr. Ziang Xiao, Johns Hopkins University and Dr. Susu Zhang, University of Illinois Urbana-Champaign