Synthetic Biology
We are employing engineering principles to model, design and build synthetic gene circuits and programmable cells, in order to create novel classes of diagnostics & therapeutics. We are also using deep learning approaches to discover new genetic parts and enhance the synthetic biology design process.
Antibiotics & AI
As part of the Antibiotics-AI Project, we are harnessing the power of artificial intelligence (AI) to discover novel classes of antibiotics and rapidly understand how they work. We are also using deep learning approaches for the de novo design of new antibiotics and the development of combination treatments.
The Collins Lab is part of the Institute for Medical Engineering and Science (IMES) and the Department of Biological Engineering at MIT, the Harvard-MIT Program in Health Sciences and Technology (HST), the Broad Institute of MIT and Harvard, and the Wyss Institute for Biologically Inspired Engineering at Harvard. At MIT, our lab is part of the Synthetic Biology Center, the Computational and Systems Biology Initiative, and the Microbiology Graduate Program.
RECENT PUBLICATIONS
Machine learning for synthetic gene circuit engineering
Sebastian Palacios, James J. Collins, and Domitilla Del Vecchio
Current Opinion in Biotechnology (2025)
Synthetic biology leverages engineering principles to program biology with new functions for applications in medicine, energy, food, and the environment. A central aspect of synthetic biology is the creation of synthetic gene circuits — engineered biological circuits capable of performing operations, detecting signals, and regulating cellular functions. Their development involves large design spaces with intricate interactions among circuit components and the host cellular machinery. Here, we discuss the emerging role of machine learning in addressing these challenges. We articulate how machine learning may enhance synthetic gene circuit engineering, from individual components to circuit-level aspects, while highlighting associated challenges. We discuss potential hybrid approaches that combine machine learning with mechanistic modeling to leverage the advantages of data-driven models with the prescriptive ability of mechanism-based models. Machine learning and its integration with mechanistic modeling are poised to advance synthetic biology, but challenges need to be overcome for such efforts to realize their potential.
Engineering synthetic phosphorylation signaling
networks in human cells
Xiaoyu Yang, Jason W. Rocks, Kaiyi Jiang, Andrew J. Walters, Kshitij Rai, Jing Liu, Jason Nguyen, Scott D. Olson, Pankaj Mehta, James J. Collins, Nichole M. Daringer, and Caleb J. Bashor
Science (2025)
Protein phosphorylation signaling networks have a central role in how cells sense and respond to their environment. We engineered artificial phosphorylation networks in which reversible enzymatic phosphorylation cycles were assembled from modular protein domain parts and wired together to create synthetic phosphorylation circuits in human cells. Our design scheme enabled model-guided tuning of circuit function and the ability to make diverse network connections; synthetic phosphorylation circuits can be coupled to upstream cell surface receptors to enable fast-timescale sensing of extracellular ligands, and downstream connections can regulate gene expression. We engineered cell-based cytokine controllers that dynamically sense and suppress activated T cells. Our work introduces a generalizable approach that allows the design of signaling circuits that enable user-defined sense-and-respond function for diverse biosensing and therapeutic applications.
An explainable deep learning platform for molecular discovery
Felix Wong, Satotaka Omori, Alicia Li, Aarti Krishnan, Ryan S. Lach, Joseph Rufo, Maxwell Z. Wilson, and James J. Collins
Nature Protocols (2024)
Deep learning approaches have been increasingly applied to the discovery of novel chemical compounds. These predictive approaches can accurately model compounds and increase true discovery rates, but they are typically black box in nature and do not generate specifc chemical insights. Explainable deep learning aims to ‘open up’ the black box by providing generalizable and human-understandable reasoning for model predictions. These explanations can augment molecular discovery by identifying structural classes of compounds with desired activity in lieu of lone compounds. Additionally, these explanations can guide hypothesis generation and make searching large chemical spaces more efcient. Here we present an explainable deep learning platform that enables vast chemical spaces to be mined and the chemical substructures underlying predicted activity to be identifed. The platform relies on Chemprop, a software package implementing graph neural networks as a deep learning model architecture. In contrast to similar approaches, graph neural networks have been shown to be state of the art for molecular property prediction. Focusing on discovering structural classes of antibiotics, this protocol provides guidelines for experimental data generation, model implementation and model explainability and evaluation. This protocol does not require coding profciency or specialized hardware, and it can be executed in as little as 1–2 weeks, starting from data generation and ending in the testing of model predictions. The platform can be broadly applied to discover structural classes of other small molecules, including anticancer, antiviral and senolytic drugs, as well as to discover structural classes of inorganic molecules with desired physical and chemical properties.
Accurate RNA 3D structure prediction using a language model-based deep learning approach
Tao Shen, Zhihang Hu, Siqi Sun, Di Liu, Felix Wong, Jiuming Wang, Jiayang Chen, Yixuan Wang, Liang Hong, Jin Xiao, Liangzhen Zheng, Tejas Krishnamoorthi, Irwin King, Sheng Wang, Peng Yin, James J. Collins, and Yu Li
Nature Methods (2024)
Accurate prediction of RNA three-dimensional (3D) structures remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to the scarcity of experimentally determined data, complicates computational prediction efforts. Here we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pretrained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate the superiority of RhoFold+ over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and interhelical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.
Catalase activity deficiency sensitizes multidrug-resistant Mycobacterium tuberculosis to the ATP synthase inhibitor bedaquiline
Boatema Ofori-Anyinam, Meagan Hamblin, Miranda L. Coldren, Barry Li, Gautam Mereddy, Mustafa Shaikh, Avi Shah, Courtney Grady, Navpreet Ranu, Sean Lu, Paul C. Blainey, Shuyi Ma, James J. Collins, and Jason H. Yang
Nature Communications (2024)
Multidrug-resistant tuberculosis (MDR-TB), defined as resistance to the firstline drugs isoniazid and rifampin, is a growing source of global mortality and threatens global control of tuberculosis disease. The diarylquinoline bedaquiline has recently emerged as a highly efficacious drug against MDR-TB and kills Mycobacterium tuberculosis by inhibiting mycobacterial ATP synthase. However, the mechanisms underlying bedaquiline’s efficacy against MDR-TB remain unknown. Here we investigate bedaquiline hyper-susceptibility in drugresistant Mycobacterium tuberculosis using systems biology approaches. We discovered that MDR clinical isolates are commonly sensitized to bedaquiline. This hypersensitization is caused by several physiological changes induced by deficient catalase activity. These include enhanced accumulation of reactive oxygen species, increased susceptibility to DNA damage, induction of sensitizing transcriptional programs, and metabolic repression of several biosynthetic pathways. In this work we demonstrate how resistance-associated changes in bacterial physiology can mechanistically induce collateral antimicrobial drug sensitivity and reveal druggable vulnerabilities in antimicrobial resistant pathogens.