GENEration Hope
Research Milestones

What Are Biological Foundation Models?

Biological foundation models are trained on the raw languages of life, and researchers are watching to see whether they can make biology more searchable, predictable, and useful.

Source lens: GENEration Hope editorial analysis

Biology has always had languages. DNA is one. Protein sequence is another. Cells speak through gene expression, signaling pathways, structures, images, and measurements that are too numerous for any one scientist to hold in mind at once. A biological foundation model is an attempt to train an AI system on those languages at scale.

The idea borrows from the broader foundation-model era. A model trained on vast amounts of text can learn grammar, context, style, and relationships between ideas. A model trained on biological data might learn patterns in proteins, genes, cells, tissues, or disease states. It may not understand biology the way a scientist does, but it can sometimes surface relationships that would be hard to see by reading papers one at a time.

For rare disease, this is especially interesting because the field is scattered by definition. Each condition may have too few patients, too little natural-history data, and too few labs working on the problem. A foundation model cannot create missing data out of thin air, but it may help researchers connect a rare disorder to a broader biological neighborhood: related pathways, structural motifs, cellular signatures, or disease mechanisms that have been studied elsewhere.

Think of it less like a magic oracle and more like a map. A good map does not tell you whether the bridge is safe, whether the road is flooded, or whether you should make the trip. But it can show you where the roads might be. In drug discovery, that may mean suggesting protein variants worth testing, helping predict whether a mutation changes function, prioritizing therapeutic targets, or finding compounds that deserve a closer look.

The hard part is validation. Biology is full of confident-looking patterns that fall apart in the lab. A model may make a plausible prediction because the training data contains hidden bias, because the assay is noisy, or because the disease mechanism is not represented well enough. For families, this is an important distinction. A model-generated idea is not a treatment. It is a reason to do the next experiment.

Scientists should also be cautious about what kind of data a model has seen. A protein model, a single-cell model, a gene-expression model, and a multimodal clinical model may each answer different kinds of questions. The most useful systems may not be one giant model of all biology, but a coordinated set of tools that scientists use with judgment.

The promise is real because the need is real. Rare disease research often begins in a fog: a gene, a variant, a handful of symptoms, and a family asking what comes next. Biological foundation models may help clear a little of that fog. They will not replace careful experiments, clinical insight, or patient communities. But they may help the field ask better questions sooner, and in rare disease, sooner can matter.

What happened?

Biological foundation models are trained on the raw languages of life, and researchers are watching to see whether they can make biology more searchable, predictable, and useful.

Why it matters for rare disease families

These models may help scientists reason across genes, proteins, cells, and disease mechanisms, but they must be tested carefully in real biology.

What technology is driving it?

Technology lens: Research Milestones, Foundation Models, Biology, AI.

What still needs to be solved

What still needs work: stronger evidence, careful validation, access planning, cost questions, and clear communication for families.

Related interviews and explainers