For all the excitement surrounding AlphaFold and the surge of AI‑driven protein design tools, one fact often goes unexamined: nearly every model is trained on databases of known proteins. These databases feel vast, but compared to the astronomical number of sequences that could, in principle, form a functional protein, they represent only a sliver of what’s possible. That raises a fundamental question for the field: how representative is the protein universe we currently know? A new study published in PNAS and titled “Descent from a common ancestor restricts exploration of protein sequence space,” takes direct aim at that question, using large‑scale protein evolution modeling to probe the limits of protein diversification. The work, led by researchers at the Okinawa Institute of Science and Technology (OIST), the Institute of Science and Technology Austria (ISTA), the University of Vienna, and the Centro de Astrobiología (CAB), suggests that the boundaries of today’s protein diversity were set remarkably early in life’s history. “Modern AI methods are thought to be revolutionizing protein design,” said Fyodor Kondrashov, PhD, who heads OIST’s Evolutionary and Synthetic Biology Unit, in a press release. “Yet most of these AI design methods are typically trained on databases of known proteins. Without understanding how representative these known proteins are of sequence space, how confident can we be that such methods can generate truly diverse protein designs?” An abstract, not-to-scale visual representation of different protein sequence spaces for a particular biological function. A large box represents all possible amino acid sequences (approximately 20L…