Breaking Eroom's Law with Software
Using software to accelerate medical discovery • Can two trends cancel each other out? • Protein visualization and other things they don't teach you in Intro to HTML/CSS
While software developers have enjoyed an exponential increase in compute power through Moore’s Law, biologists working in drug discovery have suffered the opposite effect, dubbed “Eroom’s Law.” Low-hanging fruit has been plucked, and the massive search space (20 times the length of the protein) means that specific areas are over-explored, while others are neglected. Thus the task of designing proteins to fight disease gets exponentially more difficult as the space of proteins grows larger.
Sphinx Bio, a biotech software startup from San Francisco, asks this question: can algorithmic advances, made possible by the exponential increase in the number of transistors, combat the exponential increase in drug discovery complexity? We’ll dive into exactly how they’re doing it using ML infrastructure and good design.
The Background - What Sphinx Does
I met Nicholas, the founder of Sphinx, because of a shared interest in applying software to solve difficult problems. This interest led him to create and run the Bits in Bio group, and online community for software/biotech professionals. Nicholas studied both biology and computer in college but always enjoyed the computational side of biology more than wet lab work.
Nicholas worked at a couple biotech/pharmaceutical firms after college, including Octant. His work focused on building software tools that were used internally for a variety of research applications. The commonality of these problems led Nicholas to build a product that can be reused by multiple companies and was the inspiration behind Sphinx.
Sphinx is planning to build a whole suite of ML-backed software tools to help accelerate drug discovery, but they are starting by focusing specifically on tools to design proteins. The particular area that Sphinx solves today is called “de novo binder design.” This is a process that starts with a target protein for which a biologist wants to manufacture a binder. Then, given the physical shape of the protein, the protein designer needs to find a protein such that:
its physical shape “fits” or binds with the target protein
it is compatible with the human body (or in whatever environment it needs to be deployed)
it is manufacturable at a reasonable cost
This is a complex, time-consuming process that involves both computational and wet lab work, and its complexity only increases as new drugs are discovered. Sphinx’s software simplifies this process by leveraging physical and ML models to find a set of “backbones” or 3D structures that would fit the target protein. They then use more ML to “transcribe” it to its amino acid sequences, using AlphaFold to check for accuracy. Finally, they come up with candidate binders. Off the platform, scientists test these candidates in the lab and compare the experimental results for further iteration. The ML models they use come from the open source community, but Sphinx customers can also deploy their own custom models if they choose to.
The Software Challenges and Stack
Software developers at Sphinx faces two primary challenges:
Building scalable ML infrastructure to enable scientists to run models, and potentially even collaborate securely, and
Designing data-intensive interfaces that quickly surface relevant insights out of complex datasets and enable quick and natural exploratory data analysis
If you’re interested in solving these challenges, consider applying, as Sphinx is hiring software engineers.
Sphinx uses Modal Labs to deploy ML models, which are generally PyTorch or HuggingFace models. Because of the heavy ML work, the backend is written in Python, while the frontend is written in a standard TypeScript/React deployment, with Postgres for persistence.
The visual layer uses a number of interesting libraries: besides material UI for general “standard” GUI elements, Sphinx uses Mol* (pronounced Mol-star) to visualize protein structures and builds some of its own components for biotech-specific visualizations.
All of this means that Sphinx is building a lot of interesting software both on the backend and the frontend and presents a great opportunity for any software engineer who is seeking a new challenge and to learn about biology and drug discovery. At the same time, most of this software work requires no knowledge of biology - another in a list of companies we have covered where software engineers can have a great impact in deep tech without needing to be subject matter experts.
Advice for Biotech Founders
As a final question, I asked Nicholas if he had any advice for founders in the biotech space, particularly to those who want to focus on software.
The first piece of advice is that there’s massive need here. Machine learning is accelerating all aspects of biotech, including drug discovery, and the tooling, visualization, data layers, and integration solutions are at their infancy. As more of biology is conducted with computers the demand will only grow, so there’s vast space for many more startups to operate here.
The second is that a biotech software startup is a software startup. Sphinx’s major challenges are shared by many other companies in other spaces: building a scalable backend that delivers powerful computations and designing a frontend that imposes clarity on increasingly complex data. So if you can found a web app SaaS startup, you can found a biotech software startup (though maybe pair with a biologist).