Check out our 2025 Impact Report
Back To Top

2025 Impact Report

Back to 2025 Report
AI & Advanced Computing

When Machines Learn to Imitate Science

Atoosa Kaasrizadeh

Program AI2050
Organization Carnegie Mellon University
Field of Study Artificial Intelligence

By testing autonomous “AI scientists” against established standards of good science, AI2050 Fellow Atoosa Kasirzadeh revealed that many of these systems fail to meet core scientific norms of rigor and responsible evaluation. Her work proposes safeguards to protect researchers as these tools scale and enter scientific workflows.

It is not difficult to imagine AI producing a scientific paper. Given a research question and some data, it will generate something convincing on the surface. What is harder to envision is how such systems might begin to reshape scientific norms if we allow them to redefine what passes as acceptable practice.

Dr. Atoosa Kasirzadeh has been thinking about this problem for some time. A philosopher, mathematician, and systems engineer, she occupies a unique intersection between formal theory and the realities of contemporary AI development. Now an AI2050 Early Career Fellow and a professor at Carnegie Mellon University, she studies what happens when artificial systems are asked to operate in scientific domains, where the values that research is built upon are often implicit.

In recent work, Kasirzadeh and her collaborators turned their attention to “AI scientists”: autonomous agents designed to automate the entire research pipeline, from hypothesis generation to paper writing. The creators of AI scientists often describe these systems as aligned with human scientific practice, but Kasirzadeh was skeptical. She and her colleagues put several open-source versions to the test, measuring against the standards of human researchers.

The answers were troubling. In some cases, the systems cherry-picked favorable evaluation benchmarks after research results had already been generated. Even more egregiously, the systems showed a tendency to generate fabricated datasets altogether. “They would cook up data which doesn’t even exist, simply to make performance look stronger than it was,” says Kasirzadeh.

Kasirzadeh worries that this kind of misalignment could have consequences far beyond academic embarrassment. Autonomous research systems are advancing quickly, and if they scale unchecked, they could flood journals and conferences with plausible but unreliable work. More concerning is how easily such tools could be used to “hack the system,” or find short cuts to satisfying results in high-stakes domains like medicine or pharmaceuticals, where the appearance of rigor can carry enormous weight. Science, she notes, has spent centuries developing standards around evidence, transparency and accountability. Encoding those standards into machines requires clarity about what those values are and how they should be upheld.

With support from the AI2050 program at Schmidt Sciences, Kasirzadeh has been able to pursue this line of inquiry at a moment when it feels increasingly urgent. Her aim is not to halt the development of powerful AI systems, but to ask what it would mean for machines not just to perform well, but to participate responsibly in human institutions as they begin to take on more consequential roles.