The Sequence Landscape of Bacterial Genes is Shaped by Long-Range mRNA Folding
Author(s)
Gill, Manraj Singh
DownloadThesis PDF (27.41Mb)
Advisor
Li, Gene-Wei
Terms of use
Metadata
Show full item recordAbstract
An evolutionary selection for optimal expression of genes in regulatory networks has led to discernable sequence patterns in bacterial genomes observed in nature. Such patterns result from gene regulatory strategies that leverage sequence-dependent interactions with key cellular machineries and regulatory molecules. While numerous regulatory strategies that shape bacterial gene sequence have been characterized, predicting functional consequences from sequence alone remains challenging due to the sheer vastness of the possible sequence space. Moreover, the primary gene sequence encodes information on secondary and tertiary topologies that the molecules of the central dogma can fold into. Specifically, though local messenger RNA (mRNA) structures are known to regulate bacterial gene expression, the role of long-range mRNA folding remains unclear despite the predicted prevalence of such interactions across mRNAs. In bacteria, a major regulator of mRNA decay and translation rates is accessibility of the ribosome binding site (RBS) to the ribosome. Sequences in the mRNA’s 5´ untranslated region (UTR) complementary to the RBS can decrease gene expression by base pairing and occluding ribosomes from binding. To determine whether such antagonistic sequences are also the primary determinants of sequence choice along the rest of the mRNA transcript, we measured the effect of all possible 8-nucleotide substitutions (65,536 variants) on mRNA levels when placed in multiple positions along a bacterial transcript. We find that, while the vast majority of substitutions in the middle of genes negligibly affect RNA level, 8mers with complementarity to parts of the RBS exhibit the strongest effects by increasing RNA degradation rates up to 4-fold. RBS-complementary sequences also decrease translation initiation rates when placed in a coding sequence, and are able to occlude ribosome binding even when they are located hundreds of nucleotides away from the start codon. The inhibitory effect of such secondary structures on gene expression likely explains a strong selection against sequences complementary to conserved parts of RBSs throughout coding sequences of genes from diverse bacterial genomes, which we uncover through computational analysis. Together, this thesis reveals the widespread impact of RNA intramolecular interactions in vivo on both mRNA stability and translation and uncovers a key constraint on gene sequences.
Date issued
2025-09Department
Massachusetts Institute of Technology. Department of BiologyPublisher
Massachusetts Institute of Technology