AlphaFold had the Protein Data Bank. The virtual cell has nothing comparable yet.

The models being built to simulate cellular biology are only as good as the data they train on. That data — large-scale, multimodal, designed from first principles for machine learning — does not yet exist.

Frontier labs are actively training biological foundation models today. The datasets they need don't exist at the required scale or quality. Perturb is building them.

Multimodal co-measured

Transcriptomics and morphology from identically perturbed cells in matched parallel arms. Not sequential. Not inferred.

Physiologically relevant

iPSC-derived human cardiomyocytes. Not cancer cell lines. The cell type where drug attrition is highest and data is absent.

Closed-loop engine

Each dataset round trains a model that selects the next experiments. Active learning compounds information density with every iteration.

How it works
How it works: perturbations flow into cell systems, multimodal measurement, training data, and biological foundation models, which feed back into experiment selection via the active learning loop.

Each iteration compounds. The model learns which experiments maximize information and selects the next round accordingly. Biological experiment space is too large for brute force; active learning is the only scalable path.

Founded by

Daniel Reda — two prior exits in life science data: CureTogether (acquired by 23andMe) and Redasoft (acquired by Hitachi). Background in Molecular Genetics.

Scientific Advisory Board forming.

Get in touch
daniel@perturb.bio