Back To Top
Back to Opportunities

AI Safety Science

Safety in the Inference-Time Compute Paradigm: Expression of Interest

Apr 8, 2025

Overview

Schmidt Sciences is a philanthropic organization that accelerates scientific knowledge and breakthroughs to support a thriving world. We are pleased to announce a new request for proposals to support research on technical AI safety, focused on the inference-time compute paradigm.

The goal of the AI Safety Science program is to deepen our understanding of the safety properties of systems built with large language models (LLMs) and to develop well-founded, concrete, implementable technical methods for testing, evaluating, and improving the safety of LLMs. The program will improve our understanding of various testing methodologies and drive methods to design safer AI systems, grow the technical AI safety academic community, and ensure that safety theory is informed by practice—and vice versa.

Schmidt Sciences seeks research proposals to advance the science of AI safety, specifically focused on the inference-time compute paradigm. Research proposals should anticipate a budget of up to USD $500,000 for projects between 12-18 months. 

Context

Large language models (LLMs) have entered a new scaling paradigm. These models have historically been characterized by scaling laws relating model size, dataset size, and pre-training compute to model performance (Kaplan et al., 2020). However, recent breakthroughs across all frontier AI labs demonstrate a different approach with apparently better performance (e.g., OpenAI’s o1 and o3, DeepSeek’s R1, Google DeepMind’s Flash Thinking, xAI’s Grok 3 (Think), Anthropic’s Claude 3.7 Sonnet). Through reinforcement learning and other techniques, AI labs have developed methods to efficiently allocate more compute during inference time to dramatically improve LLM performance. This has created “reasoning models,” LLMs that leverage additional compute at inference time through RL-optimized chain-of-thought reasoning, recursive refinement, and other methods.

These reasoning models can break down complex problems into steps, verify intermediate conclusions, and explore multiple approaches before producing a final answer (e.g., Guo et al., 2025Muennighoff et al., 2025Snell et al., 2024Geiping et al., 2025). This new inference-time compute paradigm (or test-time compute paradigm) demands rigorous scientific investigation for assuring safety, because we expect novel failure modes, novel opportunities for safeguards, and emergent behaviors to arise when models engage in increasingly complex multi-step reasoning at inference time. In particular, in this Request for Proposals, we seek swiftly conducted novel research to address current unknowns and make progress toward concrete outputs—such as tools, models, or other research artifacts that enable further progress.

Core question

We are interested in funding the crucial work needed to both understand the implications of this paradigm on model safety and how to utilize the inference-time compute paradigm to actively make LLMs safer. (For detailed discussion on how Schmidt Sciences thinks about safety, see our website and research agenda.)

Our core RFP question: What is the most critical technical AI safety challenge or opportunity that has emerged as a result of the inference-time compute paradigm? How would you address it?

Illustrative examples of project ideas

This section is not designed to direct or constrain your creative thinking about the hardest safety problems in inference-time compute. Rather, it provides illustrations of problems that might be considered both challenging and worthy of study. These ideas do not represent the full scope of the inference-time compute paradigm.

We encourage applications for research that discover novel failure modes emerging from inference-time compute, demonstrate the replicable nature of recently surfaced problems to certify their validity, design robust evaluations that quantify and measure associated risks, or construct targeted interventions that actively enhance model safety.

Projects should aim to produce tangible research outcomes that advance the scientific understanding of inference-time compute safety—such as theoretical analyses, rigorously validated evaluation designs, mitigation strategies, functional prototype implementations, or reproducible experimental results.

Out of scope topics

Application process

This RFP will have two stages: an expression of interest (EoI) and a full proposal.

Stage 1: Expressions of Interest (EoIs)

At this stage, applicants submit up to 500 words addressing the core RFP question. We seek concisely described ideas for rigorous research that can, in a relatively short period of time, lead to progress on significant issues or opportunities in safety. The ideal project would focus on one critical technical safety problem or opportunity relevant for inference-time computation, and propose concrete/tangible methods for addressing it. The response should contain a crisp statement of what you think is the most critical technical challenge or opportunity related to AI safety in the evolving inference-time compute paradigm, why it is the most critical, and brief highlights of how you would tackle that challenge.

We encourage efforts that focus on the full range of available reasoning models, research that aims to produce generalizable insights rather than isolated model evaluations, and efforts that seek to develop a rigorous scientific understanding of this new paradigm.

Submissions must include the following information:

All expressions of interest must be submitted in English. Researchers may submit more than one expression of interest. Applicants can submit up to 500 words in their expression of interest, references do not count toward the world limit. Relevant diagrams or figures can be uploaded, but other types of submissions will not be reviewed.

The deadline for EoIs is Wednesday, April 30, 2025 at 11:59 PM, Anywhere on Earth. Late submissions will not be accepted.

Stage 2: Full proposals

After reviewing EoI submissions, we will invite a subset of applicants to submit full project proposals. Full proposals include more detail on goals, research plans, research outputs, and a detailed budget.

Submissions will be assessed on criteria that include:

All full proposals must be submitted in English.

Project duration

We are looking to fund 12- to 18-month projects. Not all teams are positioned to jump right into this work in Q3 2025 and deliver results in 12-18 months. To that end, if invited to submit a full proposal, applicants will have the opportunity to outline the team’s capabilities and existing infrastructure to start this work quickly.

Project resources

Awards will be up to USD $500,000 per project. Some projects might require much less funding to execute.

In addition to funding, the Safety Science program provides:

Eligibility

We invite individual researchers, research teams, research institutions, and multi-institution collaborations within university, national laboratory, institute, non-profit research organizations, or agency settings to submit research ideas.

We encourage collaborations across geographic boundaries, particularly outside North America and Western Europe. International applicants are welcome, and there is no requirement to include U.S.-based institutions.

Indirect costs of any project that we fund must be at or below 10% to comply with our policy.

Timeline

Office hours

We will hold virtual office hours on the following dates and times:

Please sign up for office hours here. Please only sign up for one office hours slot.

Questions

Please send any questions to Ryan Gajarawala at aisafety@schmidtsciences.org

Submit your application

Apply
Schmidt Sciences
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.