Sign up for our newsletter
Back To Top
Back to 2025 Report
Science Systems

A Simpler Way to Represent the Evolutionary Past

Samir Bhatt

Program Schmidt Science Polymaths
School University of Copenhagen
Field of Study Machine Learning and Public Health

Samir Bhatt developed an algorithm to dramatically simplify the mathematically complex challenge of charting the evolution of diseases and languages “for the joy of doing it.”

How many ways can you draw a family tree connecting 30 people? Far from a few lines on paper, the possibilities number more than you can count.

“The number of ways to connect just 30 people is larger than the number of grains of sand on Earth,” says Dr. Samir Bhatt, a Schmidt Science Polymath and professor at the University of Copenhagen.

This is the mathematical reality facing scientists who build phylogenetic trees—diagrams that map the evolutionary history of species, populations, or strains of microbes. These trees make it possible to, for example, determine when mammals split from their ancestors or track the emergence of viral variants during the COVID-19 pandemic. But finding the most accurate tree can mean searching through an unfathomably large number of options.

A sample output from Phylo2Vec, which aims to compress phylogenic trees into a small numeric format that helps researchers explore different tree structures more efficiently.

Bhatt and his team have developed an algorithm called Phylo2Vec to help researchers navigate this vast space. Rather than representing a tree as a string of nested parentheses—the conventional format—Phylo2Vec translates it into a compact string of integers. These sequences take six to eight times less storage space, and they simplify comparisons since a one-number difference indicates distinct, but neighboring trees.  

Bhatt developed the mathematics with his group, then partnered with engineers at the University of Washington’s Scientific Software Engineering Center, part of Schmidt Sciences’ Virtual Institute for Scientific Software. “They sped up our code and did many clever tricks under the hood,” he says. “When we got the software out, we felt reassured it was done properly.”

Mathematics moves slowly, but the method is already gaining traction. Since the paper’s publication last summer, early signs of momentum have emerged as other researchers have proposed variations on the integer system and a framework unifying them.

Bhatt’s interest in phylogenetics grew out of his work tracking infectious diseases, particularly COVID-19. During an outbreak, trees help scientists assess how much a pathogen has mutated, how fast it’s spreading, and where it likely originated—factors that shape the public health response. “How do you know that an mpox strain found a month ago is new or not?” Bhatt says. “You’re going to have to build a tree.”

In Bhatt’s view, Phylo2Vec is a first step. His team is now building mathematical models for how trees develop over time, with the ultimate goal of reconstructing evolutionary history more accurately.

The Polymaths program has given him room to follow his curiosity. “It was the first bit of funding that really allowed me to be me,” Bhatt says. “We created Phylo2Vec for the joy of doing it, and for the promise it could have in this field.”

Dr. Samir Bhatt is supported through the Schmidt Science Polymaths program, which awards $500,000 per year for up to five years to recently tenured professors. The program helps these researchers explore new ideas across disciplines, using emerging technologies to test high-risk theories that might not otherwise receive funding or support.