Sign up for our newsletter
Back To Top
Back to 2025 Report
AI Safety Science

Humility by Design: AI That Asks First

Dylan Hadfield-Menell

Program AI2050
School Massachusetts Institute of Technology
Field of Study Artificial Intelligence

Through the development of GOOD (Goals from Open-Ended Dialogue), AI2050 Fellow Dylan Hadfield-Menell proved that AI systems can track uncertainty about users’ goals by simply pausing to ask clarifying questions. His research aims to reduce premature actions taken by agents, making them safer and easier to correct in real-world use.

As AI assistants (otherwise known as AI agents) begin handling our calendars, inboxes and finances, one misplaced assumption about our goals can cascade into mistakes that extend far beyond our chat windows: imagine an assistant emailing sensitive information, paying a fraudulent invoice or even shutting down a critical public service –– all before you even noticed. AI2050 Early Career Fellow Dr. Dylan Hadfield-Menell’s work aims to prevent these mistakes by designing AI agents that methodically check in with people about their goals instead of making assumptions about what they’re trying to achieve. 

Hadfield-Menell, an associate professor of electrical engineering and computer science at Massachusetts Institute of Technology,  focuses his research on a deceptively simple idea: humility can make AI safer. Today’s AI models are trained to guess confidently when given a task, which often leads them to latch onto a single interpretation of what a user wants and runs with it. But Hadfield-Menell believes these systems would make better decisions if they asked more questions about what you’re trying to accomplish.

“What we’re seeing a lot with AI systems now is that they jump to conclusions about what you want and then follow through,” says Hadfield-Menell. “It feels like the system is getting things done. But when you look at the details, it’s like, oh, this wasn’t really what I wanted.”

In experiments, Hadfield-Menell’s team found that even well-tuned agents can take actions prematurely based on shaky inferences about a user’s request. His solution to this problem is a structured approach called GOOD—Goals from Open-Ended Dialogue. GOOD prompts AI agents to pause, list a range of plausible user goals, rank how likely each one is, ask the user clarifying questions and act only when uncertainty about those goals has been reduced.

GOOD is designed for the messy reality of human decision-making, where goals shift as new information arrives. In the team’s experiments, people rarely came in with a perfectly formed plan; instead, they refined their preferences mid-conversation, changed course after learning what the agent could do, or updated their goals as new constraints emerged. Systems built with GOOD track these shifts seamlessly. “You can’t hide from the fact that people’s goals change during an interaction,” Hadfield-Menell says. “Any realistic system has to deal with that.”

His team is now pushing GOOD’s approach into new domains, integrating it into real AI agent workflows and extending the method to vision-based and robotic systems. As AI agents take on more responsibility in everyday life, Hadfield-Menell hopes methods like GOOD will help ensure they act with the humility, deference and reliability that people and society require.

Dr. Hadfield-Menell says that “Schmidt Sciences’ AI2050 support helped accelerate this line of research at a critical moment. The grant funded graduate student support, provided the computing resources needed for large-scale experiments and gave me the freedom to pursue unconventional directions in AI safety.”