Protein Dojo
Protein Dojo is a platform for learning protein design by practicing with realistic drug targets and modern computational molecular design tools. I built it over the last few weeks of winter holidays, during family nap times and around holiday celebrations.
My goals for this project are twofold. One, I wanted to make somethign useful for people who are just getting into computational molecule design. If you’re like me, you learned about the protein folding problem a long time ago and you heard about Alphafold 2 (“the ChatGPT moment for protein folding”). You know this technology is important and want to learn more about it, but where do you start? If you’re not a professional biochemist, what proteins should you actually try to design?
The first reason I built Protein Dojo was to answer that question. The second was that I wanted to push LLM code-writing tools to their limits - to see how far I could get with a fairly ambitious project and a short timeframe. I barely wrote a single line of code for this project - probably in the single digits. Claude Code and OpenAI Codex wrote everything. I still spent a bunch of time debugging by reading code, asking Claude to try dfferent things, thinking about the architecture, and iterating on the product experience. But I didn’t do the actual code-typing bit.
So what can you actually do on Protein Dojo? Well, there’s a whole set of computational molecular design challenges: realistic drug targets that relate to real diseases like Interleukin-6 (arthritis + more), VEGF-A (cancer), EGFR (cancer), and BDNF (Alzheimer’s and other neurodegenerative diseases). If you’re not a biologist, you’ll learn a lot just by reading about these proteins and understanding why scientists care about binding to them. You can also see some of the existing molecules that scientists have designed against these targets - real FDA-approved durgs like Humira and Herceptin.
Besides reading, you can also practice actually designing protein binders against those targets. I’ve added real computational design tools that are used by professional molecular designers.
The design tools supported right now are RFDiffusion3, made by Baker Lab, and BoltzGen, made by Boltz Bio. Both are open-source, commercial-friendly models that are popular with industry pros. You can also use your own tools, on your own computer, or on model aggregation sites like Ariax Bio and Tamarind Bio, and submit the sequences you designed externally.
There’s a leaderboard feature so that you can compete to have the best design. It uses a computational score to rank the submissions which isn’t perfect - you’d have to test the designs in a real lab to have better rankings!
Building this has been a lot of fun. I’d be thrilled if you give it a try, and get in touch if you have any feedback.