AlphaFold had the Protein Data Bank. The virtual cell has nothing comparable yet.

The models being built to simulate cellular biology are only as good as the data they train on. That data — large-scale, multimodal, designed from first principles for machine learning — does not yet exist.

Frontier labs are actively training biological foundation models today. The datasets they need don't exist at the required scale or quality — Perturb is building them.

How it works
How it works: perturbations flow into cell systems, multimodal measurement, training data, and biological foundation models, which feed back into experiment selection via the active learning loop.

Each iteration compounds. The model learns which experiments maximize information — and selects the next round accordingly. Biological experiment space is too large for brute force; active learning is the only scalable path.

Get in touch
daniel@perturb.bio