Synthetic Biology
We are employing engineering principles to model, design and build synthetic gene circuits and programmable cells, in order to create novel classes of diagnostics & therapeutics. We are also using deep learning approaches to discover new genetic parts and enhance the synthetic biology design process.
Antibiotics & AI
As part of the Antibiotics-AI Project, we are harnessing the power of artificial intelligence (AI) to discover novel classes of antibiotics and rapidly understand how they work. We are also using deep learning approaches for the de novo design of new antibiotics and the development of combination treatments.
The Collins Lab is part of the Institute for Medical Engineering and Science (IMES) and the Department of Biological Engineering at MIT, the Harvard-MIT Program in Health Sciences and Technology (HST), the Broad Institute of MIT and Harvard, and the Wyss Institute for Biologically Inspired Engineering at Harvard. At MIT, our lab is part of the Synthetic Biology Center, the Computational and Systems Biology Initiative, and the Microbiology Graduate Program.
RECENT PUBLICATIONS
An explainable deep learning platform for molecular discovery
Felix Wong, Satotaka Omori, Alicia Li, Aarti Krishnan, Ryan S. Lach, Joseph Rufo, Maxwell Z. Wilson, and James J. Collins
Nature Protocols (2024)
Deep learning approaches have been increasingly applied to the discovery of novel chemical compounds. These predictive approaches can accurately model compounds and increase true discovery rates, but they are typically black box in nature and do not generate specifc chemical insights. Explainable deep learning aims to ‘open up’ the black box by providing generalizable and human-understandable reasoning for model predictions. These explanations can augment molecular discovery by identifying structural classes of compounds with desired activity in lieu of lone compounds. Additionally, these explanations can guide hypothesis generation and make searching large chemical spaces more efcient. Here we present an explainable deep learning platform that enables vast chemical spaces to be mined and the chemical substructures underlying predicted activity to be identifed. The platform relies on Chemprop, a software package implementing graph neural networks as a deep learning model architecture. In contrast to similar approaches, graph neural networks have been shown to be state of the art for molecular property prediction. Focusing on discovering structural classes of antibiotics, this protocol provides guidelines for experimental data generation, model implementation and model explainability and evaluation. This protocol does not require coding profciency or specialized hardware, and it can be executed in as little as 1–2 weeks, starting from data generation and ending in the testing of model predictions. The platform can be broadly applied to discover structural classes of other small molecules, including anticancer, antiviral and senolytic drugs, as well as to discover structural classes of inorganic molecules with desired physical and chemical properties.
Accurate RNA 3D structure prediction using a language model-based deep learning approach
Tao Shen, Zhihang Hu, Siqi Sun, Di Liu, Felix Wong, Jiuming Wang, Jiayang Chen, Yixuan Wang, Liang Hong, Jin Xiao, Liangzhen Zheng, Tejas Krishnamoorthi, Irwin King, Sheng Wang, Peng Yin, James J. Collins, and Yu Li
Nature Methods (2024)
Accurate prediction of RNA three-dimensional (3D) structures remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to the scarcity of experimentally determined data, complicates computational prediction efforts. Here we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pretrained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate the superiority of RhoFold+ over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and interhelical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.
Catalase activity deficiency sensitizes multidrug-resistant Mycobacterium tuberculosis to the ATP synthase inhibitor bedaquiline
Boatema Ofori-Anyinam, Meagan Hamblin, Miranda L. Coldren, Barry Li, Gautam Mereddy, Mustafa Shaikh, Avi Shah, Courtney Grady, Navpreet Ranu, Sean Lu, Paul C. Blainey, Shuyi Ma, James J. Collins, and Jason H. Yang
Nature Communications (2024)
Multidrug-resistant tuberculosis (MDR-TB), defined as resistance to the firstline drugs isoniazid and rifampin, is a growing source of global mortality and threatens global control of tuberculosis disease. The diarylquinoline bedaquiline has recently emerged as a highly efficacious drug against MDR-TB and kills Mycobacterium tuberculosis by inhibiting mycobacterial ATP synthase. However, the mechanisms underlying bedaquiline’s efficacy against MDR-TB remain unknown. Here we investigate bedaquiline hyper-susceptibility in drugresistant Mycobacterium tuberculosis using systems biology approaches. We discovered that MDR clinical isolates are commonly sensitized to bedaquiline. This hypersensitization is caused by several physiological changes induced by deficient catalase activity. These include enhanced accumulation of reactive oxygen species, increased susceptibility to DNA damage, induction of sensitizing transcriptional programs, and metabolic repression of several biosynthetic pathways. In this work we demonstrate how resistance-associated changes in bacterial physiology can mechanistically induce collateral antimicrobial drug sensitivity and reveal druggable vulnerabilities in antimicrobial resistant pathogens.
Deep generative design of RNA aptamers using structural predictions
Felix Wong, Dongchen He, Aarti Krishnan, Liang Hong, Alexander Z. Wang, Jiuming Wang, Zhihang Hu, Satotaka Omori, Alicia Li, Jiahua Rao, Qinze Yu, Wengong Jin, Tianqing Zhang, Katherine Ilia, Jack X. Chen, Shuangjia Zheng, Irwin King, Yu Li, and James J. Collins
Nature Computational Science (2024)
RNAs represent a class of programmable biomolecules capable of performing diverse biological functions. Recent studies have developed accurate RNA three-dimensional structure prediction methods, which may enable new RNAs to be designed in a structure-guided manner. Here, we develop a structure-to-sequence deep learning platform for the de novo generative design of RNA aptamers. We show that our approach can design RNA aptamers that are predicted to be structurally similar, yet sequence dissimilar, to known light-up aptamers that fuoresce in the presence of small molecules. We experimentally validate several generated RNA aptamers to have fuorescent activity, show that these aptamers can be optimized for activity in silico, and fnd that they exhibit a mechanism of fuorescence similar to that of known light-up aptamers. Our results demonstrate how structural predictions can guide the targeted and resource-efcient design of new RNA sequences.
Customizable gene sensing and response without altering endogenous coding sequences
Fabio Caliendo, Elvira Vitu, Junmin Wang, Shuo-Hsiu Kuo, Hayden Sandt, Casper Nørskov Enghuus, Jesse Tordoff, Neslly Estrada, James J. Collins, and Ron Weiss
Nature Chemical Biology (2024)
Synthetic biology aims to modify cellular behaviors by implementing genetic circuits that respond to changes in cell state. Integrating genetic biosensors into endogenous gene coding sequences using clustered regularly interspaced short palindromic repeats and Cas9 enables interrogation of gene expression dynamics in the appropriate chromosomal context. However, embedding a biosensor into a gene coding sequence may unpredictably alter endogenous gene regulation. To address this challenge, we developed an approach to integrate genetic biosensors into endogenous genes without modifying their coding sequence by inserting into their terminator region single-guide RNAs that activate downstream circuits. Sensor dosage responses can be fine-tuned and predicted through a mathematical model. We engineered a cell stress sensor and actuator in CHO-K1 cells that conditionally activates antiapoptotic protein BCL-2 through a downstream circuit, thereby increasing cell survival under stress conditions. Our gene sensor and actuator platform has potential use for a wide range of applications that include biomanufacturing, cell fate control and cell-based therapeutics.