Connor Coley
| Program | AI2050 |
| School | Massachusetts Institute of Technology |
| Field of Study | Artificial Intelligence |
To bridge the gap between computer-generated molecules and real-world chemistry, AI2050 Fellow Connor Coley developed SynFormer, a system that designs compounds by first figuring out how to build them.
A computer-synthesized drug molecule can look brilliant on a screen, and still be useless in practice for a simple reason: it can’t be made in real life. Dr. Connor Coley, an AI2050 Fellow and professor at the Massachusetts Institute of Technology, describes the problem as a basic disconnect.
“One of the very practical issues that has impeded these AI models’ ability to be useful, is that they can design molecular structures on a computer screen that look okay, but you can’t easily adapt them into the laboratory to test them,” says Coley.
His response has been to make the AI behave less like an unbounded imagination and more like a synthetic chemist with a shopping list (and a budget). His group’s system, SynFormer, begins from a simple premise: instead of conjuring a molecule and hoping a lab can reverse-engineer it, the model lays out the recipe first: the starting materials needed, the reaction steps, and the sequence.
As Coley puts it, “We don’t ask our models to generate structures… We ask our models to generate experimental procedures, and that procedure then results in molecules.” The system is trained on a simulated but tightly constrained chemistry universe built from 223,244 commercially available chemistry building blocks and 115 reaction templates, together defining a combinatorial space estimated at well beyond trillions of possible molecules.
In addition to retracing routes to many “make-on-demand” compounds, SynFormer can do something more useful still: take a promising design that isn’t readily synthesizable and nudge it toward a close, buildable cousin. The practical constraints imposed on SynFormer could also shorten the “design-build-test” loop of chemical synthesis from a seasons-long project to something that might take “less than a week, or even a couple days,” says Coley.
Some versions of the SynFormer approach are already circulating beyond academia, particularly through MIT’s Machine Learning for Pharmaceutical Discovery and Synthesis Consortium. His group is now pushing the scalability of this approach further, training on billions of synthetic pathways in an effort to broaden its coverage. If that scaling holds, it could pave the way towards autonomous discovery: a system that can search immense chemical spaces without brute-force enumeration, and propose compounds in a form a lab can act on immediately.
AI & Advanced Computing
arXiv | Oct 4, 2024
AI2050