Aditi Raghunathan
| Program | AI2050 |
| School | Carnegie Mellon University |
| Field of Study | Astrophysics & Space |
AI2050 Fellow Aditi Raghunathan’s research exposes how AI systems can become less reliable in practice, even as their benchmark performance improves. Through a series of recent papers, she identifies hidden failure modes in AI systems, as well as practical ways to monitor reliability as models and their working environments change.
Not long after a new artificial-intelligence system is unveiled, there is often a quiet moment of disappointment. In the laboratory, and under conditions that resemble a friendly dress rehearsal, the model performs impeccably. Released into a world where users are unpredictable and context is forever shifting, it begins to falter: a health-care algorithm trained on data from one hospital struggles when confronted with unfamiliar equipment or patient populations. A language model that breezes through standardized tests grows confused when a question is phrased a little differently, or when new information contradicts what it once “knew.”
The problem is that modern AI systems are trained on datasets that reward fluency within a specific frame while quietly cultivating blind spots just beyond it. Understanding and correcting this mismatch between apparent competence and real-world reliability has become the focus of Dr. Aditi Raghunathan, an AI2050 Early Career Fellow whose research examines what happens when machine-learning systems encounter change.
“You throw a lot of data at these models and they learn and do really impressive things,” says Raghunathan, an assistant professor of computer science at Carnegie Mellon University. “But at the same time, they also have a lot of failures, which are compounded by a tendency for the models to find and exploit ‘shortcuts’ to complete tasks…and currently we do not have a very good way of actually updating these models.” Humans, she notes, learn by integrating mistakes into a broader understanding of the world. Large language models, by contrast, often lack a reliable mechanism for doing the same.
In a recent series of papers, Raghunathan and her collaborators uncover a counterintuitive pattern. In one study, a widely used language model became less likely to rely on new information placed directly in front of it, even as its scores on standard benchmarks improved. In fields like medicine, law, and public policy, where context is not optional, this shortcoming can mean missing a critical constraint, applying an outdated rule, or proceeding on the basis of a faulty assumption.
Raghunathan’s research points toward practical ways to manage these risks by using signals that reveal when reliability is slipping. For example, developers can track whether a model continues to prioritize user-provided context when it conflicts with the model’s built-in knowledge. A steady decline in this behavior––despite improvements on standard evaluations—can serve as an early warning that the system is becoming less adaptable to change.
Looking ahead, Raghunathan is pushing toward AI systems that can absorb new information, adapt to change, and fail transparently rather than unexpectedly. As AI becomes embedded in everyday decisions, that kind of reliability may matter as much as raw capability.
AI & Advanced Computing
NeurIPS 2024 | Nov 7, 2024
AI2050
AI & Advanced Computing
ICLR 2025 | Oct 14, 2024
AI2050
AI & Advanced Computing
ICLR 2024 | Apr 14, 2024
AI2050