Endoribonuclease toxin-antitoxin systems in bacteria: targets and growth inhibition by Peter Holmes Culviner B.S. Biochemistry University of Wisconsin–Madison, 2011 SUBMITTED TO THE DEPARTMENT OF BIOLOGY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN BIOLOGY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY SEPTEMBER 2019 © 2019 Peter Holmes Culviner. All rights reserved. The author hereby grants MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature of Author: ____________________________________________________________ Peter H. Culviner Department of Biology September 3rd, 2019 Certified by: ___________________________________________________________________ Michael T. Laub Professor of Biology Thesis supervisor Accepted by: __________________________________________________________________ Stephen P. Bell Professor of Biology 1 2 Endoribonuclease toxin-antitoxin systems in bacteria: targets and growth inhibition by Peter Holmes Culviner Submitted to the Graduate Program in Biology on September 3rd, 2019 in partial fulfillment of the requirement for the degree of Doctor of Philosophy in Biology at the Massachusetts Institute of Technology ABSTRACT Toxin-antitoxin (TA) systems are widely-distributed genetic modules that can reversibly inhibit the host bacteria’s growth. Both toxin and antitoxin are encoded together on an operon and the antitoxin directly binds the toxin, preventing its activity. Under stressful conditions, the antitoxin may be degraded, allowing the toxin to inhibit growth. Bacteria often encode many copies of these mysterious systems and they have been suggested to play a role in a myriad of processes including plasmid maintenance, survival through antibiotic stress, growth regulation, and defense against bacteriophage. However, how TA systems might accomplish these diverse feats is not well understood. The toxic element of many of these systems is an endoribonuclease. In this work, I characterize the RNA targets of 9 endoribonuclease toxins encoded by the bacterium Escherichia coli. Previous studies had shown that the toxin MazF created a pool of leaderless mRNAs that were preferentially translated by specialized ribosomes created through MazF cleavage of the mature 16S rRNA. In my first project, I developed an RNA-sequencing-based pipeline to identify and quantify MazF cleavage across the transcriptome. I found that, in vivo, MazF does not generate appreciable quantities of specialized ribosomes nor leaderless transcripts. Instead, it degrades a large portion of E. coli transcripts, preventing their proper translation. Further, I found that MazF strongly inhibits the biogenesis of new ribosomes through both cleavage of nascent rRNA and inhibition of ribosomal protein synthesis. In my second project, I expanded this work to 8 other endoribonuclease toxins. I found that, like MazF, these toxins degrade a significant portion of E. coli transcripts, leading to a global inhibition of translation. Of particular interest, a number of these toxins are incapable of cleaving untranslated RNA such as rRNA but are still able to inhibit ribosome biosynthesis, likely through degrading ribosomal protein transcripts. I conclude that endoribonuclease toxins are efficient inhibitors of synthesis of macromolecular complexes. Thesis Supervisor: Michael T. Laub Title: Professor of Biology 3 ACKNOWLEDGEMENTS To my advisor Michael Laub: thank you for allowing me to go down all these odd (and probably sometimes foolish) scientific paths. I’ve learned so much over the course of my graduate work. I have you to thank for reminding me to figure out what is reasonable and interesting from a mess of data. To my committee members Alan Grossman and David Bartel: thank you for serving on my committee over the course of my work. You’ve both truly helped me ground my work and toe the line between bacteriology and RNA biology. And Alan, you were the first to introduce me to TA systems back in my first year. Thank you to Sophie Helaine for serving on my thesis committee. To my family: thank you for listening to me ramble on about science these last few years (and the years before that). You’ve always encouraged my curiosity. To my brother, it’s been a joy discussing the overlap between programming and biology with you. To my classmates, both years: thank you for supporting me through it all. Audra and Holly, the dinners venting about grad school were definitely a calming influence. To the Laub lab: you’ve all been amazing to work with. You have kept my science sharp, but make coming to work every day fun. My two baymates, Anjana and Isabel, you’ve both been great to discuss science, podcasts, and life with. Denizens of the original room 2, I’m sorry I failed to name houseplants after you all as promised. Denizens of new room 2, Mary and Ben: please keep the weirdness alive. Room 3, is it still quiet time? Monica, thank you for always being up for fielding my microbiology questions and nerding out about games and sequencing data. Room 4, you are all great, but it’s simply a long walk. Team TACo, thanks for being there through all of the TA system stress (and stress response). And Chantal, thanks for all the discussion about science and not science and all the support, I’m so glad you joined the lab. To everyone: I’m sorry about the puns. (I’m not). 4 TABLE OF CONTENTS CHAPTER 1 Introduction: Bacterial Toxin-Antitoxin Systems ____________________________________ 14 I. Introduction _______________________________________________________________ 15 II. Biology and classification of toxin-antitoxin systems ______________________________ 17 Discovery of toxin-antitoxin systems ___________________________________________ 17 Classification of toxin-antitoxin systems ________________________________________ 20 Type I: RNA antitoxin prevents toxin translation________________________________ 21 Type II: Protein antitoxin inactivates toxin by direct binding ______________________ 22 Type III: RNA antitoxin that prevents toxin activity _____________________________ 23 Type IV: Protein antitoxin that counteracts the toxin’s activity _____________________ 23 Type V: Protein antitoxin that degrades toxin mRNA ____________________________ 24 Type VI: Protein antitoxin that targets the toxin protein for degradation______________ 24 Distribution of toxin-antitoxin systems _________________________________________ 24 Toxin targets ______________________________________________________________ 25 DNA gyrase ____________________________________________________________ 25 Membrane integrity _______________________________________________________ 27 Protein synthesis _________________________________________________________ 27 Biological functions of toxin-antitoxin systems ___________________________________ 28 Post-segregational killing __________________________________________________ 28 5 Abortive infection ________________________________________________________ 30 III. Endoribonuclease activity of toxin-antitoxin systems _____________________________ 31 Discovery of MazF and programmed cell death ___________________________________ 31 MazF: non-specific mRNA interferase __________________________________________ 32 MazF, RNase toxins, and persistence ___________________________________________ 33 MazF: specific global regulator _______________________________________________ 35 E. coli’s chromosomal RNase toxins ___________________________________________ 38 RelE___________________________________________________________________ 38 MqsR __________________________________________________________________ 39 HicA __________________________________________________________________ 40 YoeB __________________________________________________________________ 41 YafQ __________________________________________________________________ 41 YafO __________________________________________________________________ 42 ChpB __________________________________________________________________ 42 YhaV __________________________________________________________________ 42 HigB __________________________________________________________________ 43 RnlA __________________________________________________________________ 43 IV. Conclusions______________________________________________________________ 43 V. References _______________________________________________________________ 44 6 CHAPTER 2 Global Analysis of the E. coli Toxin MazF Reveals Widespread Cleavage of mRNA and the Inhibition of rRNA Maturation and Ribosome Biogenesis ____________________________ 54 Summary ___________________________________________________________________ 55 Introduction _________________________________________________________________ 56 Results _____________________________________________________________________ 58 High-throughput mapping and quantification of MazF-dependent cleavages ____________ 58 Sequencing of 5'-OH-terminated fragments is not a quantitative measure of RNA cleavage 63 MazF has an extended recognition element, including nucleotides flanking a central ACA _ 65 Leaderless mRNAs are rare and not protected from MazF cleavage in their coding regions 69 Leaderless transcripts are not preferentially translated ______________________________ 73 MazF directly inhibits the complete translation of its mRNA targets __________________ 77 MazF does not generate specialized ribosomes, but does efficiently inhibit rRNA maturation78 Discussion __________________________________________________________________ 85 Mapping the specificity and global cleavage patterns of endoribonuclease toxins ________ 85 A global analysis of RNA cleavage by MazF reveals its specificity and targets __________ 86 MazF rapidly blocks rRNA maturation and ribosome biogenesis _____________________ 88 Concluding remarks ________________________________________________________ 90 Methods____________________________________________________________________ 91 Experimental Model and Subject Details ________________________________________ 91 7 Growth conditions ________________________________________________________ 91 Strain construction _______________________________________________________ 91 Plasmid construction ______________________________________________________ 91 Experimental Method Details _________________________________________________ 92 MazF induction __________________________________________________________ 92 RNA extraction __________________________________________________________ 92 Paired-end library preparation ______________________________________________ 93 5'-OH library preparation __________________________________________________ 98 Ribosome profiling library preparation _______________________________________ 99 qRT-PCR acquisition and analysis __________________________________________ 103 6S RNA reporter ________________________________________________________ 104 YFP* translation reporter construct _________________________________________ 104 Northern blotting ________________________________________________________ 104 Quantification of 43 nt fragment and rRNA cleavage estimate ____________________ 105 Isotopic labeling of mature and nascent rRNA _________________________________ 106 Data Analysis Details ______________________________________________________ 107 Sequencing read mapping and normalization __________________________________ 107 Comparison of qRT-PCR data to sequencing data ______________________________ 109 Assessing the reproducibility of the cleavage ratio _____________________________ 109 Identifying cleaved regions and the MazF motif _______________________________ 110 8 Verification of the motif __________________________________________________ 111 Identification of leaderless transcripts _______________________________________ 112 Observation of ribosome footprints near MazF motifs ___________________________ 112 Calculation of changes in 3' ribosome footprints _______________________________ 112 Data and Software Availability _______________________________________________ 113 Sequencing data ________________________________________________________ 113 Northern blot data _______________________________________________________ 113 Tables ________________________________________________________________ 113 Acknowledgements __________________________________________________________ 113 Author Contributions ________________________________________________________ 114 Declaration of Interests _______________________________________________________ 114 References _________________________________________________________________ 114 CHAPTER 3 Specificity and Growth Inhibition in E. coli’s Endoribonuclease Toxins ________________ 116 Summary __________________________________________________________________ 117 Introduction ________________________________________________________________ 118 Results ____________________________________________________________________ 121 Most toxins inhibit cell growth and can be antagonized by a cognate antitoxin _________ 121 Toxin expression results in widespread degradation of E. coli transcripts ______________ 122 Toxins have limited but variable cleavage specificity _____________________________ 126 9 Toxins cause a buildup of rRNA precursors and cleave ribosomal protein transcripts ____ 130 Toxin expression inhibits translation and collapses polysomes ______________________ 132 Toxins inhibit ribosome biogenesis and do not degrade mature ribosomes _____________ 134 Discussion _________________________________________________________________ 136 Toxin expression causes widespread degradation of E. coli transcripts ________________ 136 Toxins inhibit ribosome biogenesis through cleavage of ribosome protein transcripts ____ 138 Concluding remarks _______________________________________________________ 139 Methods___________________________________________________________________ 140 Experimental Model Details _________________________________________________ 140 Growth conditions _______________________________________________________ 140 Strain construction ______________________________________________________ 140 Plasmid construction _____________________________________________________ 140 Experimental Method Details ________________________________________________ 141 Induction of toxicity and rescue ____________________________________________ 141 Preparation of RNA-Seq libraries ___________________________________________ 141 Isotopic labeling of mature and nascent rRNA _________________________________ 141 Data Analysis Details ______________________________________________________ 142 References _________________________________________________________________ 143 CHAPTER 4 Conclusions and Future Directions ______________________________________________ 145 10 I. Conclusions ______________________________________________________________ 146 II. Future Directions _________________________________________________________ 149 Better characterize nucleotide and non-nucleotide predictors of toxin RNase cleavage ___ 149 Compare toxin cleavage sites across bacterial and phage genomes ___________________ 152 Use single-cell reporters of toxin activity to screen for toxin activation _______________ 153 III. References ______________________________________________________________ 154 11 TABLE OF FIGURES CHAPTER 1 Figure 1.1: Regulation of toxicity in toxin-antitoxin systems. __________________________ 21 Figure 1.2: Targets of toxicity. __________________________________________________ 26 Figure 1.3: Functions of TA systems. _____________________________________________ 29 CHAPTER 2 Figure 2.1: An RNA-seq-based approach for mapping the cleavage targets of MazF. _______ 59 Figure 2.2: Growth with uninduced MazF; replicability of cleavage ratios; gene traces ______ 61 Figure 2.3: 5'-OH enrichment sequencing methodology. ______________________________ 64 Figure 2.4: Mapping of 5'-OH termini does not accurately quantify MazF cleavage events. __ 65 Figure 2.5 MazF has an extended recognition motif with a central ACA trinucleotide. ______ 67 Figure 2.6: Normalized enrichment; robustness of motif finding. _______________________ 69 Figure 2.7: MazF does not produce a large pool of leaderless transcripts. _________________ 71 Figure 2.8: Leaderless transcripts are degraded in their coding regions. __________________ 72 Figure 2.9: MazF does not produce preferentially translated leaderless transcripts, but inhibits translation of targets. __________________________________________________________ 74 Figure 2.10: Leaderless transcripts are not preferentially translated; MazF causes increased footprints on ssrA; MazF sites lead to decreased translation. ___________________________ 76 Figure 2.11: MazF does not target mature ribosomes, but does inhibit ribosome maturation. _ 80 Figure 2.12: Ribosomal protein transcripts, but not ribosomes are targets of cleavage. ______ 82 CHAPTER 3 Figure 3.1: RNase toxins rescuably inhibit cell growth. ______________________________ 121 12 Figure 3.2: Example cleavage ratio profiles. ______________________________________ 124 Figure 3.3: Toxins target a wide variety of genes. __________________________________ 126 Figure 3.4: Toxin cleavage specificity. ___________________________________________ 128 Figure 3.5: RNase toxins inhibit rRNA maturation. _________________________________ 131 Figure 3.6: Toxins alter global translation and ribosome profiles. ______________________ 133 Figure 3.7: Toxins inhibit ribosome biogenesis. ____________________________________ 135 Figure 3.8: Mature ribosomes are not degraded by toxins.____________________________ 137 CHAPTER 4 Figure 4.1: Shared toxin targets; alternative methods to find motifs. ____________________ 150 Figure 4.2: Identification of toxin activation on a single-cell level. _____________________ 153 13 Chapter 1 Introduction: Bacterial Toxin-Antitoxin Systems 14 I. Introduction Fundamental to all domains of life is the ability to both sense and respond to the surrounding environment. These processes occur at the scale of complex, multi-cellular organisms down to relatively simple single-celled bacteria. At the molecular scale, this integration of environmental stimuli often alters of the transfer of information through the central ‘dogma’ of information transfer in biology (Bernstein et al., 2002; Brenner et al., 1961; Naville and Gautheret, 2009; Stock et al., 2000). The DNA of the genome is transcribed into short-lived messenger RNA (mRNA) that the ribosome translates into proteins, the cell’s molecular machines. However, cells do not simply constitutively express all of their genes. Since production of unneeded proteins has a fitness cost, gene expression is controlled and specific to the context of the cell (Shachrai et al., 2010; Watson et al., 2014). Studies of bacteria have been vital to our understanding of this process: one of the first well-understood genetic pathways was E. coli’s lac operon, which produces the cellular machinery to import and metabolize the sugar lactose (Jacob and Monod, 1961). Specifically, LacI normally binds and represses the operon encoding the enzyme that hydrolyzes lactose and the lactose transporter. However, when LacI binds allolactose, a byproduct of lactose metabolism, it no longer binds the DNA, thus relieving repression and allowing transcription and translation of the lac operon. Through the action of LacI, the cell is able to increase expression of its lactose- specific genes when lactose is present. However, implicit in discussion of the lac operon’s activation is its eventual deactivation once lactose is no longer present. In this context, continued production of the operon is a waste of cellular resources (Novick and Weiner, 1957). To regulate gene expression life takes advantage of the relative instability of RNA. While the genome’s DNA is stable on the order of an organism’s lifetime and passed to subsequent generations, RNA is constantly degraded into its individual nucleotides and recycled into new 15 molecules. This instability ensures that all genes that are not constitutively produced, including lac, have a fundamental off-switch in that their RNA is eventually degraded. My core interest during my thesis work has been exploring the molecular mechanisms that bacteria use to alter this off-switch. What makes one RNA stable and another rapidly degraded? Do bacteria actively alter the degradation rates of individual RNAs or is RNA degradation passively controlled at the scale of the entire transcriptome? Though RNA is chemically more unstable than DNA, the degradation and recycling of RNA is controlled by enzymes called RNases. The activity of various RNases allows a wide variety of RNA half-lives, from ~1 to 15 minutes for the majority of mRNA in E. coli (Bernstein et al., 2002). A longer-lived RNA might undergo more rounds of translation than a less stable one. The aggregate specificity of a cell’s RNases is another layer of control over the expression of genes. Given their central role in gene expression, the activity of the primary RNA recycling (‘housekeeping’) RNases must be tightly controlled. One of the best studied bacterial housekeeping RNases, the essential gene RNase E, tightly ties its expression to its own activity by degrading its own transcript at a high-affinity site in the 5'-UTR of its mRNA (Jain and Belasco, 1995). Though their levels must be tightly regulated, degradation of mRNA by housekeeping RNases is not thought to be contextual and is instead an ongoing part of cell metabolism (Deutscher, 2006). However, there is growing evidence that bacteriophage, bacteria’s viral predators, encode proteins to alter the specificity of these housekeeping RNases to nefarious ends, reprogramming RNase E to degrade host RNA and protect phage RNA, clearing the way for expression of viral genes (Marchand et al., 2001; Qi et al., 2015). Given this, it is surprising to me that there is not more evidence of cells tailoring their RNase activity to their context. Like the lac 16 operon turns on in response to lactose, why do cells not activate specialized RNases to turn genes off to rapidly adapt to their changing environment? The focus of my thesis work is a widespread but poorly-characterized class of genes that may exhibit this behavior. Toxin-antitoxin (TA) systems are made up of protein toxins that inhibit vital cellular processes and protein or RNA antitoxins that either directly or indirectly inhibit the toxin’s activity. Originally discovered as elements that can increase the stability of the plasmid that encodes them (Ogura and Hiraga, 1983), TA systems are also widespread on bacterial chromosomes, including in pathogenic bacteria. For example, the M. tuberculosis chromosome encodes ~80 identified type II TA systems (Zaychikova et al., 2015). In many TA systems, the toxin has been shown or inferred by homology to be an RNase. In lab strain E. coli alone, there are 11 TA systems with proposed RNase activity (Harms et al., 2018). Despite their abundance, the conditions under which these RNases are activated and their RNA targets are either poorly understood or the subject of debate. Where this information is available, it sheds little light on why bacteria have so many copies of these systems. In this chapter, I review what is known about TA systems, with a particular focus on those whose toxins encode an RNase. In Chapter 2, I describe my work on characterizing the E. coli toxin MazF, which challenged a paradigm of how toxin RNases regulate bacterial gene expression. Chapter 3 focuses on the other RNase toxins of E. coli. Finally, in Chapter 4, I sum up my work and suggest future directions for the study of RNase TA systems. II. Biology and classification of toxin-antitoxin systems Discovery of toxin-antitoxin systems Toxin-antitoxin systems were originally discovered as genetic elements that increased the stability of the plasmids that encode them. Many natural plasmids are kept at a low copy number within 17 bacterial cells and can be lost without mechanisms to maintain them (Sengupta and Austin, 2011). One such mechanism was found in the ccd (coupled cell division) region of the E. coli mini-F plasmid (Ogura and Hiraga, 1983). The authors of this study found that mini-F plasmids containing this region were more stably maintained than those without it. Further, when they inhibited replication of the plasmids encoding ccd, they observed cell elongation and inhibition of cell division. By dissecting this region, they were able to determine that it included two sub-regions: one, ccdB, imparted the growth inhibitory function and the other, ccdA, counteracted growth inhibition. The authors argued that ccdA’s inhibitory effects on ccdB could not be maintained at gene dosages when there were fewer of 2 copies of plasmid in the cell, thus coupling cell division to maintenance of the plasmid. Though the term ‘toxin-antitoxin system’ had not yet been coined, over the next few years ccdAB would begin to define many of the key characteristics of these systems. The ccdB gene encodes a protein that inhibits DNA gyrase by direct binding, and was eventually shown to kill cells when not counteracted by ccdA (Bernard and Couturier, 1992). The ccdA gene was found to also encode a protein, CcdA, that directly binds CcdB to inhibit its action on gyrase (Bernard and Couturier, 1991; Tam and Kline, 1989a). Intriguingly, the CcdA-CcdB complex was found to bind to the promoter region of the ccdAB operon, negatively autoregulating transcription of both genes (Tam and Kline, 1989b). This implied a molecular mechanism for how plasmids might be maintained by CcdB: when copies of both genes were present on a plasmid within a host cell, CcdA protein was produced, counteracting CcdB’s toxicity. However, if a daughter cell segregated without a copy of the plasmid, the operon was no longer present and, without CcdA production, CcdB present in the cytoplasm split between the daughter cells activated, killing the cell. This activity 18 was termed ‘post-segregational killing’. But why was residual CcdA in this daughter cell not sufficient to inhibit the toxicity? The key to CcdB activation is the instability of the CcdA protein. Pulse-labeling showed that the CcdA protein is a target of the Lon protease and significantly more unstable than CcdB (Van Melderen et al., 1994). This, combined with later observations that antitoxins, in general, are much more efficiently translated (Li et al., 2014), suggests a complete mechanism for control of post- segregational killing. When the host cell has a copy of the plasmid containing the TA system, the unstable antitoxin is translated in excess of the more stable toxin, preventing activation by direct binding of the toxin. When the plasmid is lost, the antitoxin’s instability is no longer compensated for by continued antitoxin expression and remaining toxin is eventually liberated, killing the cell. However, just as the ccdAB system was beginning to define the characteristics of a TA system, the universe of known systems expanded. Soon after the discovery of ccd, another plasmid maintenance loci, hok-sok (host killing and suppressor of killing), was discovered (Gerdes et al., 1986a). In this system, the sok gene product was an RNA that inhibited the translation of the toxic protein encoded by the hok mRNA (Gerdes et al., 1988). Despite this difference in antitoxin identity between hok and ccd, this system’s activation after plasmid loss was proposed to stem from the same mechanism: lower stability of the antitoxin compared to the toxin mRNA. This diversity of mechanisms was not just in the identity of the antitoxin; the short, hydrophobic hok protein was proposed to kill cells by disrupting the membrane (Gerdes et al., 1986b). This illustrated that very different systems could be co-opted in plasmids to achieve the same goal: a ‘selfish’ maintenance of the TA loci itself and the plasmid around it. However, the hypothesis that these systems exist primarily as plasmid maintenance systems would prove to be short-lived. Two homologs of another plasmid maintenance loci, pem, were discovered 19 on the E. coli chromosome (Masuda et al., 1993; Tsuchimoto et al., 1988). Once whole bacterial genomes began to be sequenced, they were found to be dotted with many TA systems, sometimes with multiple homologs of the same system in a genome (Pandey and Gerdes, 2005). If toxin- antitoxin systems exist on chromosomes, do they really just maintain mobile elements or might they play other roles in bacterial physiology? If they do have other purposes, why would multiple homologs with the same mechanism of toxicity be maintained? To begin to address these questions, I will first catalog the major classes of TA systems that have been identified before discussing RNase TA systems in particular. Classification of toxin-antitoxin systems The diversity of identified TA systems necessitates a loose definition. Broadly, a TA system encodes a toxin that inhibits a vital cellular process and an antitoxin that counteracts the toxin’s activity (Harms et al., 2018; Yamaguchi et al., 2011). However, behind this expansive definition a number of assumptions have grown, built by the common themes of known TA systems. In the majority of cases, the toxin and antitoxin are encoded at the same site, though are not necessarily co-operonic. The toxin is always a protein. The antitoxin is a protein or an RNA that relieves toxicity, either directly or indirectly. In most cases, the proposed mechanism for toxin activation relies on the inherent instability of the antitoxin relative to the toxin. The antitoxin’s chemistry and mechanism of counteracting toxin forms the highest level of toxin-antitoxin classification. Below I briefly describe the antitoxin’s mechanism for each of the 6 types of TA systems with extra focus on type II TA systems, as these are the focus of Chapter 2 and 3. Common mechanisms of antitoxin activity are summarized in Figure 1.1A. 20 Figure 1.1: Regulation of toxicity in toxin-antitoxin systems. (A) Molecular basis of control of toxicity for TA types I-IV. (B) Transcriptional control of type II TA systems. Low toxin:antitoxin ratios (left) favor complexes with low toxin stoichiometry; these bind tightly to the promoter, repressing it. High toxin:antitoxin ratios favor formation of different complexes that do not bind tightly to the promoter, relieving repression. Combined with the antitoxin’s higher translation efficiency, this conditional cooperativity favors low toxin:antitoxin ratios. Type I: RNA antitoxin prevents toxin translation These TA systems are defined by their use of a short RNA antitoxin to antagonize the toxin’s translation. The antitoxin RNA is encoded on the opposite strand from the toxin mRNA facilitating base paring between the antitoxin RNA and the 5'-end of the toxin mRNA. The precise mechanism of translation inhibition varies between different systems. In the tisB-istR-1 system, the antitoxin IstR-1 binds the 5' untranslated region (5'-UTR) of tisB mRNA blocking ribosome binding to a 21 standby site required for translation and facilitating RNase III cleavage of the transcript (Darfeuille et al., 2007; Vogel et al., 2004). In the hok-sok system, a secondary open reading frame, mok, needs to be translated for proper initiation of hok translation (Thisted and Gerdes, 1992). The antitoxin, sok, binds to mok’s ORF, preventing translation of mok (and thus sok) as well as facilitating RNase III cleavage of the transcript (Gerdes et al., 1992). The majority of type I toxins are short hydrophobic peptides that likely induce toxicity by disrupting the cell’s membrane. Type II: Protein antitoxin inactivates toxin by direct binding Type II TA systems are controlled by a protein antitoxin that prevents the toxin’s interaction with its target by binding the toxin. The toxin and antitoxin are co-transcribed as an operon, typically with the antitoxin first. The antitoxic activity has been shown to act through direct binding to the toxin’s active site, competing with substrate, or through binding to a secondary site to generate allosteric changes in the toxin’s structure to prevent target binding. For example, in E. coli mazEF, the C-terminus of MazE binds the active site, displacing a key catalytic residues (Kamada et al., 2003). In contrast, the CcdA antitoxin binds to a secondary site on CcdB toxin to induce a conformational change in CcdB to prevent binding and inhibition of gyrase. In addition, CcdA is able to bind to the CcdB-gyrase complex to release gyrase-bound toxin (De Jonge et al., 2009). The levels of toxin and antitoxin are controlled at multiple levels. First, the toxin-antitoxin complex itself negatively autoregulates its expression, ensuring relatively low levels of both toxin and antitoxin (Page and Peti, 2016). To enable activation of the system the antitoxin is intrinsically more unstable than the toxin; to account for this difference, the antitoxin is more efficiently translated. However, many type II systems have been shown to have an additional level of transcriptional regulation (Figure 1.1B). As the ratio of toxin:antitoxin rises, the formation of TA complexes with higher toxin stoichiometry are favored. These complexes do not efficiently bind 22 the operon’s operator site, de-repressing the operon and generating more mRNA; this process is known as conditional cooperativity (Garcia-Pino et al., 2010; Overgaard et al., 2008). Due to its higher translation efficiency this should lead to the synthesis of more antitoxin, restoring the repressive stoichiometry and inactivating any liberated toxin. Conversely, if antitoxin is actively being degraded, this regulation scheme could lead to a spike in free toxin. Type III: RNA antitoxin that prevents toxin activity Unlike type I systems, in which the RNA antitoxin antagonizes the toxin mRNA, type III antitoxin RNAs antagonize the toxin protein; all characterized type III toxins are endoribonucleases that cleave their antitoxin. These systems are co-operonic with tandem repeats of antitoxin RNA followed by the toxin; the toxin and antitoxin are separated by a rho-independent terminator that limits toxin mRNA abundance (Goeders et al., 2016). The first system, ToxIN, was discovered as an anti-phage element in the plant pathogen Erwinia carotovora (Blower et al., 2011). It has since been shown that the antitoxin, ToxI, is a cleavage target of ToxN and binds the toxin after cleavage, though it is unclear if the antitoxin truly sequesters the toxin or if it is simply a preferred substrate (Blower et al., 2011; Short et al., 2018). Type IV: Protein antitoxin that counteracts the toxin’s activity Unlike type II systems, in which the protein antitoxin directly interacts with the toxin, in type IV systems, the antitoxin counteracts the toxin’s activity indirectly. The mechanism of the antitoxin’s antagonism is, therefore, specific to the toxin’s activity. In the yeeUV system, the toxin, YeeV, inhibits the polymerization of the cytoskeletal polymers MreB and FtsZ, which are critical for cell shape and division (Masuda et al., 2012). However, YeeU reverses not only YeeV’s toxicity, but also the toxicity of other unrelated cytoskeletal inhibitors. It was shown to bind MreB and FtsZ directly, enhancing their polymerization. 23 Type V: Protein antitoxin that degrades toxin mRNA There is only a single example of a type V system, ghoST. In this system, the antitoxin, GhoS, is a RNase that degrades the downstream toxin gene’s mRNA (Wang et al., 2012). The GhoT protein is a short, hydrophobic protein that disrupts the cell membrane and kills cells. Type VI: Protein antitoxin that targets the toxin protein for degradation There is also only one example of a type VI system. In the socAB system, the antitoxin, SocA, is a proteolytic adapter that targets the toxin, SocB, for degradation by the protease ClpXP (Aakre et al., 2013). SocB binds the DNA sliding clamp and inhibits replication. Distribution of toxin-antitoxin systems An early study quantified the distribution of TA systems across bacteria using BLAST searches against known type II toxins and antitoxins (Pandey and Gerdes, 2005). Starting with 126 prokaryotic genomes, the authors identified 671 complete TA loci, with the majority of genomes containing 1-5 loci. Since then, improvements in algorithms and statistical power (arising from more known systems), have continued to increase this number (Leplae et al., 2011; Makarova et al., 2009). Most sequenced bacterial genomes, like E. coli, have multiple TA systems found by these homology-based approaches. Beyond simply counting them, genomics studies of TA systems have focused primarily on answering two key questions: why does a given bacteria have the number of systems it does and where do new systems come from? One route to answering these questions is to compare the traits of host bacteria to the number of TA systems and see if there are any correlations. A simple possibility is that the number of systems scales with the size of the genome. However, depending on how the analysis was conducted, it has been argued that there is either a strong correlation or no significant correlation (Leplae et al., 2011; Makarova et al., 2009). Another possibility is that species that are actively losing many 24 genes (e.g. during adaptation to become an obligate intercellular pathogen), may rapidly lose many TA systems since their deletion rarely has a strong phenotype (Sevin and Barloy-Hubler, 2007). Finally, due to their proposed roles in stress response and adaptation (see below), it has been suggested that TA systems may be lost in bacteria that live in less variable environments (Pandey and Gerdes, 2005). It is also important to consider how TA systems are gained as well as lost. Type II TA systems have been shown to be associated with mobile elements, and thus, are likely to be horizontally transferred. V. cholerae’s 13 TA systems provide a striking example of this: all 13 TA loci are located in the integron region of Chromosome II and associated with attC sites for gene cassette integration (Pandey and Gerdes, 2005). Strong associations of TA systems with particular sections of the genome have also led to the argument that they may be responsible for avoiding large-scale deletions in these regions (Szekeres et al., 2007). Toxin targets Toxins have been shown to target a wide array of cellular processes to inhibit growth. Below I describe a few of the better-characterized mechanisms of growth inhibition. These are also summarized in Figure 1.2. DNA gyrase The first identified TA system, ccd, targets DNA gyrase to control plasmid inheritance (Bernard and Couturier, 1992). The toxin CcdB, binds gyrase and causes DNA cleavage in an ATP- dependent manner. Subsequently, mutations on gyrase were found that inhibited this DNA 25 Figure 1.2: Targets of toxicity. A non-exhaustive list of targets of characterized toxins. Toxins, like antibiotics, have been found to target many of the processes vital to bacterial growth. Note that toxins targeting cellular RNA are reviewed in a later section. Figure 1.3: Functions of TA systems. cleavage and the cytotoxic activity of CcdB (Bernard et al., 1993). Another family of toxins, ParE, Figure 1.4: Functions of TA systems. a lso bind gyrase and lead to DNA cleavage and the SOS response (Jiang et al., 2002). Despite the potentially irreversible toxicity of these systems, they are found on bacterial chromosomes as well Figure 1.5: Functions of TA systems. as plasmids. These toxins have a similar mechanism of action to quinolone antibiotics that also 26 bind gyrase and lead to DNA cleavage. Notably, mutations in gyrase that are resistant to one of the toxins or quinolones are not necessarily resistant to other gyrase inhibitors (Bernard et al., 1993; Yuan et al., 2010), indicating that they likely bind to different surfaces on gyrase. Toxin inhibition of gyrase highlights that even for a single protein target, toxins are able to evolve distinct mechanisms of inhibition. Membrane integrity Type I systems commonly encode short, hydrophobic peptides that lead to the formation of ‘ghost cells,’ a common phenotype for cells whose membranes have been compromised (Gerdes et al., 1986b). Two of these toxins, Hok and TisB, have since been shown to form pores, reduce membrane potential, and cause the leakage of small molecules from the cell (Gurnev et al., 2012; Unoson and Wagner, 2008; Wilmaerts et al., 2018). However, this mechanism of action is not isolated to type I systems: the type V ghoST system likely also encodes a pore-forming toxin (Wang et al., 2012). Other components vital to proper membrane integrity are also targeted. For example, the zeta toxin phosphorylates the peptidoglycan precursor uridine diphosphate-N- acetylglucosamine (Mutschler et al., 2011). This modification leads to inhibition of MurA which catalyzes first step of peptidoglycan synthesis. Finally, cell morphology is also a target. Expression of the E. coli toxin YeeV inhibits the polymerization of FtsZ and MreB, leading to the formation of elongated, lemon-shaped cells and eventual cell lysis (Tan et al., 2011). Protein synthesis Protein synthesis is also the target of many distinct toxin mechanisms. One method of inhibiting proper translation is by RNase cleavage of the RNA translation machinery, tRNA and ribosomes, or the mRNA template—I review the action of RNase TA systems in a later section. Toxins also inhibit translation through modification of the translation machinery. For example, the synthesis 27 of tRNA is targeted: HipA toxin inhibits glutamyl-tRNA synthetase by phosphorylation (Germain et al., 2013). After it is charged, tRNA itself can also be a target. The TacT acytltransferase blocks the amine group of amino-acid charged tRNA, stalling translation in elongating ribosomes (Cheverton et al., 2016). Finally, translation elongation is a target. The Doc toxin phosphorylates and inactivates the elongation factor Ef-Tu (Castro-Roa et al., 2013). The wide diversity of targets of TA systems highlights that, like antibiotics, they have evolved to target many cellular processes important for growth. Biological functions of toxin-antitoxin systems But why do cells encode systems that target their own growth? Particularly in a single cellular organism, this is counterintuitive. Perhaps due to this mystery and their wide array of cellular targets, TA systems have been proposed to have many different biological functions. There is no reason why all TA systems, even those that have the same molecular target, must have the same function. Below, I summarize two well-supported models for the function of TA systems. I delay discussion of programmed cell death and persistence, which have generally been associated with RNase TA systems, to the section on RNases; schematics of all models are shown in Figure 1.3. Post-segregational killing In nature, plasmids often exist at relatively low copy numbers and often require active maintenance systems to ensure they are inherited in daughter cells (Sengupta and Austin, 2011). If a TA system is present on a plasmid that is lost in a daughter cell, eventually all of the unstable antitoxin will be degraded and the stable toxin will be free to kill the cell; this process is known as post- segregational killing. Therefore, TA systems may be maintained within hosts without directly benefiting them. However, stabilization of the surrounding plasmid or chromosomal region might provide context-specific benefits to the host (e.g. virulence genes, phage defense elements, and 28 Figure 1.3: Functions of TA systems. (A) Low copy number plasmids are not always properly segregated into both daughter cells. If the plasmid encodes a TA system, eventually the unstable antitoxin will degrade and the toxin will inhibit growth. Thus, plasmid-containing cells overtake the population. (B) Phage infection causes massive changes to the cellular environment; typically including degradation of the host’s chromosome. Perhaps due to the rapid decrease in their expression, some TA systems activate during phage infection, inhibiting production of new phage particles. Though the infected cell dies during abortive infection, the population is protected. (C) Many TA systems, particularly type II, have proposed roles in the host’s regulation of stress response. A key theme in many of these hypotheses is improved survival of stress through the slowing of cell growth. antibiotic resistance genes). In plasmids, both type I hok-sok systems, which encode membrane porins, and type II ccdAB systems, which encode gyrase inhibitors, have been shown to stabilize 29 plasmids, illustrating that systems with diverse regulation and targets can operate as plasmid-borne selfish elements (Bernard and Couturier, 1992; Gerdes et al., 1986a). In the chromosome, introduction of a relE or parD system, which encodes an endoribonuclease or a gyrase inhibitor respectively, have been shown to stabilize a dispensable region of the E. coli chromosome (Szekeres et al., 2007). However, this increased stability may be dependent on the expression level of the TA loci. Since it is typically possible to make deletions of chromosomal TA loci at their native locations it is unclear what role systems play in genome stabilization. Abortive infection Abortive infection is a form of bacterial innate immunity that protects populations of cells from the propagation of phage. When a cell is infected by a phage particle, it undergoes an altruistic suicide prior to replication of the phage, thus preventing the formation of new phage particles that would infect the rest of the population. Anti-phage activity has been observed in a number of different TA families. The plasmid-borne type III TA modules toxIN and tenpIN are widely distributed and can generate abortive infection in multiple species of bacteria (Goeders et al., 2016). There is also evidence that some type II systems may be anti-phage elements. The RNase toxin RnlA has also been shown to inhibit E. coli infection by T4 lacking the dmd gene (Koga et al., 2011; Otsuka and Yonesaki, 2005). Dmd acts as a promiscuous antitoxin and inhibits both RnlA and a homologous toxin LsoA (Otsuka and Yonesaki, 2012). T4 also antagonizes the RNase toxin MazF by inactivating it with the ADP-ribosyltransferase Alt (Alawneh et al., 2016). The fact that phage encode antitoxins is suggestive that many toxins may play a still poorly-understood role in phage defense. 30 III. Endoribonuclease activity of toxin-antitoxin systems Some of the most well-characterized TA systems encode endoribonucleases as their toxins. This may be because E. coli encodes 11 of these systems. Despite this, we still have an incomplete understanding of which RNA these systems target, much less their biological function. Below I review the many proposed functions of these systems. I frame this discussion around one of the first discovered chromosomal TA systems, MazF. I summarize some of these models in Figure 1.3C. Discovery of MazF and programmed cell death Since its discovery, E. coli’s MazF has been both defining and divisive. MazF, first known as chpA, was identified as a homolog of the pem plasmid maintenance system (Masuda et al., 1993; Tsuchimoto et al., 1988). Like the R100 plasmid’s PemK system, MazF was toxic when expressed and this toxicity was rescued by co-expression of MazE. Intriguingly, the mazEF gene is co- operonic with (or at least directly downstream of) the relA gene, a central regulator of amino acid starvation in E. coli. Briefly, the RelA protein senses the presence of uncharged tRNA on the ribosome and produces the alarmone (p)ppGpp that activates a global starvation stress response program including downregulation of rRNA transcription and upregulation of amino acid biosynthesis genes (Potrykus and Cashel, 2008). Based on its proximity to relA, the authors proposed the physiological role of MazF and chromosomal TA systems might be downregulation of growth under conditions where rapid growth might be harmful. Thus, MazF was explored as a stress response regulator controlled by amino acid starvation. Intriguingly, the combination of heat and production of large quantities of (p)ppGpp (using a constitutively active form of RelA) was found to induce cell death; this lethality was reduced in a ΔmazEF strain (Aizenman et al., 1996). The authors proposed that mazEF controlled a form of 31 ‘altruistic cell death’, where some cells lyse to provide nutrients to ensure survival of a portion of the population during extreme starvation. Their model echoed key components of how TA systems promote plasmid maintenance. Amino acid starvation, via (p)ppGpp, shuts off transcription of the mazEF operon; as the unstable MazE antitoxin is degraded, MazF is activated, killing the cell. In subsequent work MazF’s regulatory footprint continued to broaden, with suggested roles for programmed cell death during nutrient starvation, environmental stress, and antibiotic stress (Hazan et al., 2004; Sat et al., 2001, 2003). However, to my knowledge, there is no conclusive evidence that the presence of genomic MazF leads to improved survival of E. coli populations under stress. This link between decreased individual fitness and increased population fitness is critical to supporting this model. In fact, the central claim that MazF causes significant cell death in physiological conditions has long been disputed. Another group showed that delayed expression of the antitoxin MazE rescued lost CFUs from expression of MazF, suggesting that MazF is bacteriostatic, not bactericidal (Pedersen et al., 2002). Further, unannotated mutations and deletions have been identified in MazF strains used for key experiments supporting the programmed cell death model (Ramisetty et al., 2016; Tsilibaris et al., 2007). Most worryingly, mutations in the upstream relA gene cause variation in the levels of (p)ppGpp between WT and ΔmazEF strains, meaning that many of MazF’s proposed roles in cell death could instead arise from stresses associated with the cell’s ability to respond to amino acid starvation. Regardless, the programmed cell death model continues to guide and inspire hypotheses for the roles of chromosomal TA systems. MazF: non-specific mRNA interferase The discovery of the molecular mechanism of MazF toxicity led to new hypotheses about its function. It had long been known that MazF blocked either transcription or translation, but it was 32 eventually shown in a combination of in vitro experiments and in vivo expression that MazF cleaved mRNAs specifically at ACA sites, producing a 5'-OH and a 2',3'-cyclic phosphate (Zhang et al., 2003, 2005a). Cleavage was not processive and the majority of ACA sites were cleaved without subsequent degradation of the RNA. Critically, MazF also seemed to be largely specific to single-stranded RNA, thus protecting tRNA and ribosomes from cleavage despite their ACA sites. Taken together, these observations suggested that MazF acts as a general ‘mRNA interferase’ (Inouye, 2006). Since the majority of transcripts contain one or more ACA sites, translation of full-length proteins should be rare. Other cellular processes, until a shortage of nascent protein components occurs, should be largely unaffected. If MazF is subsequently sequestered by MazE, this model holds that ribosomes and tRNA should be capable of resuming normal translation and cell growth. Key predictions of the mRNA interferase model hold. First, in opposition to the programmed cell death model, delayed expression of MazE can rescue cells expressing MazF (Pedersen et al., 2002). Second, even during MazF translation S35-methionine accumulation continues, albeit at a decreased rate. In fact, ectopically expressed proteins engineered to lack ACA sites can still be produced, and at a higher purity as competing translation is inhibited by MazF cleavage (Suzuki et al., 2005). These results indicate that activation of an mRNA interferase may be a reversible mechanism for cells to temporarily slow their own growth. A relevant situation that bacteria often encounter is exposure to antibiotics. Slow growing or dormant cells are often tolerant to even bacteriocidal concentrations of antibiotics (Page and Peti, 2016). MazF, RNase toxins, and persistence Including MazF, MG1655 E. coli encodes 11 proposed RNase TA systems. Where known, the RNA motifs these toxins recognize are short (~3 nucleotides) and therefore low-complexity, 33 though some toxins require the ribosome for cleavage and some do not (see below for detailed descriptions of each toxin). Like MazF, most characterized systems are negatively auto-regulated by the toxin-antitoxin complex (Harms et al., 2018). However, all of these similarities simply deepen the mystery of why bacteria have so many of them. One proposed model is that each system behaves as a ‘switch’ that has some probability of stochastically turning ‘on’ and pushing cells into a temporary non-growing, antibiotic-tolerant state known as persistence. Persistence is the ability for a small population of cells to survive antibiotic treatment after the initial death of most of the population. If surviving cells are regrown then exposed to the same antibiotic, these persisters are still susceptible to the antibiotic on the second exposure; separating this phenomenon from inherited resistance. The idea that TA systems can generate persisters is not new. Before the term ‘toxin-antitoxin system’ was coined, E. coli mutations at an unknown loci, termed hip for high persistence, were identified in an ampicillin screen (Moyed and Bertrand, 1983). It was later shown that the mutations fell into the toxin of a toxin-antitoxin pair (Korch and Hill, 2006). Though the toxin HipA is not an RNase, ectopic expression of RNase toxins has been shown to artificially induce a persister state (Keren et al., 2004; Mok et al., 2015). Induction of persistence became a major model of the biological role of RNase toxins when it was shown that sequential deletion of 10 of MG1655 E. coli’s RNase TA systems (the Δ10 strain) caused a coincident decrease in the ability to form persisters (Maisonneuve et al., 2011). Given that ectopic expression of toxins produces a persister-like state, toxins must somehow be separated from their antitoxins. Like plasmid-borne TA systems, chromosomal antitoxins are known to be more unstable than toxins and many have been shown to be degraded by the Lon protease. As expected, a Δlon strain, like the Δ10 strain, had decreased levels of persistence compared to WT, 34 indicating that activation of the TA systems was through Lon. In a subsequent paper, the authors suggested that Lon’s degradation of antitoxins and activation of persistence is triggered by stochastic variation in the levels starvation alarmone (p)ppGpp that leads to formation of polyphosphate, a known activator of Lon (Maisonneuve et al., 2013). However, it was eventually determined that key phenotypes in these studies arose from infection with prophage during passaging (Harms et al., 2017). This included the coincident drop in persister frequency as TA systems are deleted. A subsequent study confirmed this result: an independently-constructed Δ10 strain did not have decreased persister frequency in unstressed cells (Goormaghtigh et al., 2018). However, it is still possible that persisters may be generated under unknown conditions. For example, it has been shown that TA systems in Salmonella influence the formation of persisters on upon uptake by macrophages (Helaine et al., 2014). However, a clear phenotype in a clean deletion of a single RNase TA system in E. coli remains elusive, which leaves the biological role of TA systems mysterious. MazF: specific global regulator Improvements in sequencing technology since the discovery of TA systems have opened up another avenue for exploring their function: determining their targets. Though it eventually became clear that MazF’s central motif was ACA, the observation that not all ACA sites were cleaved led to initial confusion over its precise specificity (Hazan et al., 2004; Muñoz-Gómez et al., 2004; Zhang et al., 2003). This leaves open the possibility that MazF may produce appreciable quantities of proteins even if they contain ACA sites. MazF’s newly characterized cleavage specificity led to an evolution in the programmed cell death model: MazF, through its cleavage specificity, leads to the continued expression of a set of proteins that constitute a regulatory pathway leading to either cell survival or cell death (Amitai et al., 35 2009). This decision between survival and death was further controlled by the presence of a short peptide termed ‘extracellular death factor’ that directly activates MazF (Kolodkin-Gal et al., 2007). Puzzlingly, some of the proteins that continued to be expressed contained MazF cleavage sites, so how—if not through negative regulation of MazF targets—does MazF alter the translational program of cells? In one proposed model, MazF, through cleavage of the 16S rRNA, generates a sub-population of functional, specialized ribosomes (Vesper et al., 2011). The authors showed during serine starvation induced by addition of the serine analog serine hydroxamate (SHX) an ACA site in the relatively unstructured 3'-end of 16S rRNA is cleaved by MazF, creating ribosomes incapable of using their anti-Shine-Dalgarno regions to recognize ribosome binding sites on canonically translated mRNAs. Instead, these specialized ribosomes were shown to preferentially translate mRNAs lacking their full 5' untranslated regions (5'-UTRs). This ribosome cleavage was later suggested to be reversible via the ligase RtcB (Temmel et al., 2017). They proposed, and later identified, a ‘MazF-regulon’ of ~300 coding regions with ACA sites upstream of their start codons (Sauert et al., 2016). These messages were more ribosome-associated when MazF was ectopically expressed, implying that they are more efficiently translated when leaderless. It has since been suggested that variability in MazF activation across a population of cells during stress may result in heterogeneity in the expression of MazF-regulon genes and in cell growth rates (Nikolic et al., 2017, 2018). However, this regulon is not enriched in known stress responsive genes, so what role its translation may play in stress response is unclear. In addition, though Vesper et al. do show evidence that cleavage of the rRNA by MazF is possible, the extent and ramifications of this cleavage on cellular translation are under debate. One source of concern is the relA mutations outlined in the ‘programmed cell death’ section were not 36 addressed in this work. This is of particular concern since the primary physiological stressor used to activate MazF is amino acid starvation via SHX. It is unclear what confounding effects the variability in (p)ppGpp metabolism from these mutations would have on ribosomes and translation. Subsequent studies have directly questioned MazF’s ability to generate a large pool of specialized ribosomes that selectively translate a pool of MazF-processed mRNA, and instead favor a model where MazF prevents rRNA maturation and inhibits complete translation of target mRNA (Culviner and Laub, 2018; Mets et al., 2017, 2019). I discuss this at length in Chapter 2. Even if MazF does not generate significant ribosome specialization in vivo, this model set important precedents for the study of endoribonuclease TA systems. First, it was one of the first studies to show evidence of cleavage of stable RNA by these toxins. Now, there are many examples of MazF and VapC1 homologs that target rRNA and tRNA, particularly in Mycobacterium tuberculosis (Barth et al., 2019; Schifano et al., 2013, 2014; Winther and Gerdes, 2011; Winther et al., 2013), though many of these studies suggest that toxins target translation as a key cellular process to inhibit growth, rather than remodel gene expression. Second, this MazF model linked RNase cleavage specificity directly to gene expression. However, a specific role in gene expression is complicated by the fact that TA systems are thought to be frequently horizontally transferred. For an RNase toxin, a horizontal transfer event would mean a loss of target specificity. Are newly acquired toxins capable of tailoring their cleavage specificities to a new host? Or, conversely, does having a broad sequence specificity and targeting many mRNA suit the biological function of TA systems? One route to finding out may be through better characterization of endonuclease toxin cleavage specificity in E. coli and across bacteria. I describe my efforts towards this in Chapter 3. 1 VapC is another family of RNase toxins. Though there are no copies in MG1655 E. coli, it is particularly well- represented in the Mycobacterium tuberculosis genome. 37 Below I review the literature for MG1655 E. coli’s 10 other known type II endoribonuclease TA systems besides MazF. E. coli’s chromosomal RNase toxins MG1655 E. coli’s (referred to as E. coli for brevity) chromosome encodes 11 proposed RNase toxins. All are thought to be endoribonucleases. Some of these toxins, like MazF, cleave accessible single-stranded RNA, these are often referred to as ‘translation independent’. Others, which are ‘translation dependent’ require an actively-translating ribosome for cleavage. Broadly, activation of these toxins is thought to rely on the same principles as all type II systems: antitoxins are generally more unstable than toxins, so inhibition of bulk translation may cause toxin activation. Below I review E. coli’s known endoribonuclease TA systems. RelE RelE’s antitoxin, RelB, is the site of a mutation conferring a phenotype known as the ‘delayed relaxed response.’ In normal cells, rRNA production is halted during amino acid starvation due to accumulation of the alarmone (p)ppGpp. In cells with the delayed relaxed response, rRNA production is halted for ~10 minutes after amino acid starvation before resuming (Diderichsen et al., 1977). This phenotype was shown to result from destabilization of the RelB antitoxin and dysregulation of transcription of the relBE operon, resulting in hyperactivation of RelE (Christensen and Gerdes, 2004; Overgaard et al., 2009). In this model, starvation causes an initial increase in (p)ppGpp that hyperactivates RelE in a RelB mutant background. This hyperactivation leads to cleavage of translated RNAs and ribosome stalling, relieving starvation and restoring (p)ppGpp, and thus stable RNA synthesis, to pre-starvation levels. In some strain backgrounds, the presence of relBE was shown to inhibit translation during nutritional stress (Christensen et al., 2001). However, even if it is activated, deletion of relBE and other TA systems has been shown to 38 have no clear phenotype in normal growth conditions (Goormaghtigh et al., 2018; Tsilibaris et al., 2007). RelE is the best-characterized translation-dependent RNase toxin. Specifically, RelE was shown to cleave only actively translated mRNA by incubation, cleavage and rescue experiments conducted in vitro with ribosomes and RelB antitoxin (Pedersen et al., 2003). RelE in complex with the ribosome was eventually crystalized, showing that the ribosome activates RelE by reorienting catalytic residues for acid-base catalysis (Dunican et al., 2015; Griffin et al., 2013; Neubauer et al., 2009). Though it was originally suggested to have specificity for certain codons, larger scale in vivo studies have since shown that RelE cleaves translated mRNA at their 5'-ends with little to no codon specificity (Hurley et al., 2011; Hwang and Buskirk, 2017). This 5'-end specificity likely arises from the inability of ribosomes to elongate far into transcripts, making the 3'-ends of mRNA infrequent targets. MqsR MqsR was originally discovered as an inducer of biofilm formation via the interspecies quorum sensor autoinducer 2, though this activity may be strain dependent (Gonzalez Barrios et al., 2006; Kasari et al., 2010). Like other RNase TA systems, MqsR’s role in persistence has been a subject of debate. An early persister transcriptome study identified MqsR as the most upregulated gene in slow growing cells enriched for persisters (Shah et al., 2006). Following this observation, deletion of MqsR was shown to reduce persister formation in some strains, though the strength of this effect was dependent on precise experimental parameters and was not observed in MG1655 (Kim and Wood, 2010; Luidalepp et al., 2011). The mqsRA TA system may also be protective during bile acid stress (Kwan et al., 2014). 39 There are some apparent differences in regulation of mqsRA from other TA systems. It is one of the few to have its toxin, MqsR, upstream of the antitoxin, MqsA. Despite this, ribosome profiling indicates that MqsA is still more efficiently translated than the toxin, presumably by direct binding of ribosomes to MqsA’s ribosome binding site (Li et al., 2014). MqsA itself is more structured than other antitoxins and acts as a transcription factor at locations besides the mqsRA promoter (Brown et al., 2009). Interestingly, mqsRA is not regulated by conditional cooperativity (Figure 1.2B). Instead, binding of MqsR to MqsA destabilizes the MqsA-DNA complex (Brown et al., 2013). The ramifications of these differences on regulation for MqsR activation in vivo are unclear. Regardless, MqsR has been characterized as a ribosome independent, GCU-specific endoribonuclease (Chowdhury et al., 2016; Christensen-Dalsgaard et al., 2010; Yamaguchi et al., 2009). Since MqsR is a member of the RelE family, the fact that it is ribosome-independent is unexpected. HicA The hicAB TA system is a member of the hicA family of toxins (Makarova et al., 2006). HicA was classified as a translation-independent endoribonuclease toxin by identification of HicA- dependent cleavage sites on a few model RNAs regardless of their translation status (Jørgensen et al., 2009). There is no clear consensus sequence for cleavage. Transcription regulation of the hicAB operon shares some of the non-canonical features of the mqsRA operon. Like mqsRA, the toxin is upstream of the antitoxin. In addition, HicA has been shown to destabilize binding of HicB to one of the two promoters controlling the operon, relieving autorepression (Turnbull and Gerdes, 2017; Winter et al., 2018). Intriguingly, gene expression from the operon controlled by HicB was also shown to translate very little HicA toxin compared to the upstream promoter. This implies a negative autoregulation of HicA toxicity: when HicA 40 levels increase, HicB repression of the operon is relieved, but the mRNA produced by this promoter preferentially translates the antitoxin HicB, rescuing potential HicA toxicity. HicA and MqsR highlight the many different mechanisms cells use to prevent activation of toxins. YoeB YoeB belongs to the RelE super-family of toxins. The system was originally characterized as it partially explains the toxicity of overexpression of the lon protease in E. coli (Christensen et al., 2004). Lon is thought to be the protease which degrades many antitoxins. Interestingly, none of the 4 other endonuclease TA systems known at that time were found to play a role in this lethality, implying that overexpression of lon may only appreciably activate YoeB alone. YoeB has also been shown to be activated by heat shock, and was proposed to help cells deal with the stress of heat shock (Janssen et al., 2015). Notably, the authors did not detect any negative (or positive) effect on growth from YoeB activation during heat shock, making it distinct from previously proposed roles for TA systems in growth arrest or persistence. Though it is now clear that YoeB is translation dependent (Feng et al., 2013), its specificity is still under debate. YoeB has been suggested to cleave preferentially during initiation (Zhang and Inouye, 2009), primarily at stop codons (Christensen-Dalsgaard and Gerdes, 2008; Winther and Gerdes, 2009), or at stalled ribosomes (Janssen et al., 2015). These results are difficult to compare as they focus on different model RNAs and activate YoeB differently. YafQ The YafQ toxin belongs to the RelE super-family of toxins. Like RelE, it is a translation-dependent toxin, but it also appears to have specificity for AAA codons (Maehigashi et al., 2015; Prysak et al., 2009). YafQ differs from TA systems that exhibit conditional cooperativity in that the ratio of toxin : antitoxin does not appear to alter repression from the promoter (Ruangprasert et al., 2014). 41 The YafQ gene has been hypothesized to play a role in biofilm generation as well as tolerance of biofilms to antimicrobials (Harrison et al., 2009; Kolodkin-Gal et al., 2009). YafO YafO belongs to the RelE super-family of toxins. YafO cleaves translated regions of model RNAs in vivo, in vitro cleavage required the presence of ribosomes (Christensen-Dalsgaard et al., 2010; Zhang et al., 2009). There is no clear consensus sequence at cleaved sites. Interestingly, the yafNO TA system is also transcriptionally upregulated by DNA damage, due to the upstream SOS- controlled dinB gene, which encodes DNA polymerase IV (Singletary et al., 2009). However, yafNO also has its own upstream promoter that is repressed by the antitoxin YafN (Christensen- Dalsgaard et al., 2010). ChpB The closest relative to MazF in the E. coli chromosome, ChpB and MazF share a 35% amino acid identity. ChpB was shown to cleave at the motif ACY (where Y is U, A, or G) and was shown to decrease translation, though to a lesser degree than MazF (Zhang et al., 2005b). Intriguingly, both MazF and ChpB’s RNase activities were shown to be activated by the ‘extracellular death factor’ peptide of the programed cell death model (Belitsky et al., 2011). To my knowledge, transcriptional control of the chpSB operon has not been explicitly studied. YhaV YhaV belongs to the RelE super-family of toxins. Interestingly, the antitoxin PrlF bears homology to MazE (Schmidt et al., 2007). YhaV was shown to be ribosome associated and its cleavage is translation-dependent toxin both in vitro and in vivo (Choi et al., 2017). Though the majority of identified cleavage sites were between codons, it had no clear cleavage specificity. 42 HigB HigB belongs to the RelE super-family of toxins. The toxin was shown to cleave translated regions of model RNAs in vivo, though without an apparent cleavage specificity (Christensen-Dalsgaard et al., 2010). RnlA RnlA is a relatively newly discovered TA system that was originally characterized as an anti-phage element protecting against T4 infection (Koga et al., 2011; Otsuka and Yonesaki, 2005). Intriguingly, T4 has been shown to encode a promiscuous antitoxin antagonizing RnlA and a plasmid-borne homolog LsoA (Otsuka and Yonesaki, 2012). The rnlAB system has a number of differences from other endoribonuclease TA systems. Though RnlA has been shown to cleave RNA, its specificity is unclear and may require RNase H for full activity (Naka et al., 2014). RNase H may also play a role in recruiting the antitoxin RnlB to inhibit toxicity (Naka et al., 2017). Finally, to my knowledge, there is no evidence of negative autoregulation of the rnlAB operon by either the antitoxin or toxin-antitoxin complex; instead, it was shown that IscR, a transcription factor controlling Fe-S cluster biosynthesis, represses rnlAB transcription (Otsuka et al., 2010). IV. Conclusions At the start of my graduate career, E. coli’s RNase TA systems had just been directly implicated in the formation of persisters (Maisonneuve et al., 2011). This activity seemed to be a direct extension the relatively non-specific RNase activity of toxins—preventing complete translation of essentially all mRNA. In the same year, MazF was proposed generate specialized ribosomes and activate a global stress program (Vesper et al., 2011). I could not understand how MazF, an ACA- specific nuclease, avoided clear cleavage sites in the coding regions of so many messages. But I also wondered if these two models were mutually exclusive; could individual toxins use their 43 specificity to generate a persistence-like stress program? Might all toxins simply target genes critical for rapid growth? In Chapter 2, I address this for MazF by developing an RNA-Seq pipeline to map and quantify its cleavage targets. I found that MazF does not generate specialized ribosomes nor does it lead to increased translation of cleaved messages. Instead, it broadly acts as an mRNA interferase, preventing complete translation of the majority of messages. Through this activity, and possibly through direct cleavage of precursor rRNA, MazF expression causes a severe defect in ribosome maturation. We propose this is likely the major contributor to its inhibition of cell growth. I extend this analysis to other proposed E. coli endoribonucleases in Chapter 3. After verifying that these systems cause rescuable growth arrest by co-expression with their cognate antitoxins, I conducted a similar RNA-Seq-based mapping of global cleavage sites. I find that E. coli’s other RNase toxins, like MazF, cause an inhibition of ribosome maturation and cleave a wide variety of E. coli transcripts. Finally, in Chapter 4, I summarize future directions for my work and the TA system field in general. V. References Aakre, C.D., Phung, T.N., Huang, D., and Laub, M.T. (2013). A bacterial toxin inhibits DNA replication elongation through a direct interaction with the β sliding clamp. Mol. Cell 52, 617– 628. Aizenman, E., Engelberg-Kulka, H., and Glaser, G. (1996). An Escherichia coli chromosomal “addiction module” regulated by guanosine [corrected] 3’,5’-bispyrophosphate: a model for programmed bacterial cell death. Proc. Natl. Acad. Sci. 93, 6059–6063. Alawneh, A.M., Qi, D., Yonesaki, T., and Otsuka, Y. (2016). An ADP-ribosyltransferase Alt of bacteriophage T4 negatively regulates the E scherichia coli MazF toxin of a toxin-antitoxin module. Mol. Microbiol. 99, 188–198. Amitai, S., Kolodkin-Gal, I., Hananya-Meltabashi, M., Sacher, A., and Engelberg-Kulka, H. (2009). Escherichia coli MazF leads to the simultaneous selective synthesis of both “death proteins” and “survival proteins”. PLoS Genet. 5, e1000390. Barth, V.C., Zeng, J.-M., Vvedenskaya, I.O., Ouyang, M., Husson, R.N., and Woychik, N.A. (2019). Toxin-mediated ribosome stalling reprograms the Mycobacterium tuberculosis proteome. 44 Nat. Commun. 10, 3035. Belitsky, M., Avshalom, H., Erental, A., Yelin, I., Kumar, S., London, N., Sperber, M., Schueler- Furman, O., and Engelberg-Kulka, H. (2011). The Escherichia coli Extracellular Death Factor EDF Induces the Endoribonucleolytic Activities of the Toxins MazF and ChpBK. Mol. Cell 41, 625–635. Bernard, P., and Couturier, M. (1991). The 41 carboxy-terminal residues of the miniF plasmid CcdA protein are sufficient to antagonize the killer activity of the CcdB protein. MGG Mol. Gen. Genet. 226, 297–304. Bernard, P., and Couturier, M. (1992). Cell killing by the F plasmid CcdB protein involves poisoning of DNA-topoisomerase II complexes. J. Mol. Biol. 226, 735–745. Bernard, P., Kézdy, K.E., Van Melderen, L., Steyaert, J., Wyns, L., Pato, M.L., Higgins, P.N., and Couturier, M. (1993). The F Plasmid CcdB Protein Induces Efficient ATP-dependent DNA Cleavage by Gyrase. J. Mol. Biol. 234, 534–541. Bernstein, J.A., Khodursky, A.B., Lin, P.-H., Lin-Chao, S., and Cohen, S.N. (2002). Global analysis of mRNA decay and abundance in Escherichia coli at single-gene resolution using two- color fluorescent DNA microarrays. Proc. Natl. Acad. Sci. 99, 9697–9702. Blower, T.R., Pei, X.Y., Short, F.L., Fineran, P.C., Humphreys, D.P., Luisi, B.F., and Salmond, G.P.C. (2011). A processed noncoding RNA regulates an altruistic bacterial antiviral system. Nat. Struct. Mol. Biol. 18, 185–190. Brenner, S., Jacob, F., and Meselson, M. (1961). An Unstable Intermediate Carrying Information from Genes to Ribosomes for Protein Synthesis. Nature 190, 576–581. Brown, B.L., Grigoriu, S., Kim, Y., Arruda, J.M., Davenport, A., Wood, T.K., Peti, W., and Page, R. (2009). Three dimensional structure of the MqsR:MqsA complex: A novel TA pair comprised of a toxin homologous to RelE and an antitoxin with unique properties. PLoS Pathog. 5. Brown, B.L., Lord, D.M., Grigorius, S., Peti, W., and Pages, R. (2013). The Escherichia coli toxin MqsR destabilizes the transcriptional repression complex formed between the antitoxin MqsA and the mqsRA operon promoter. J. Biol. Chem. 288, 1286–1294. Castro-Roa, D., Garcia-Pino, A., De Gieter, S., van Nuland, N.A.J., Loris, R., and Zenkin, N. (2013). The Fic protein Doc uses an inverted substrate to phosphorylate and inactivate EF-Tu. Nat. Chem. Biol. 9, 811–817. Cheverton, A.M., Gollan, B., Przydacz, M., Wong, C.T., Mylona, A., Hare, S.A., and Helaine, S. (2016). A Salmonella Toxin Promotes Persister Formation through Acetylation of tRNA. Mol. Cell 63, 86–96. Choi, W., Yamaguchi, Y., Lee, J.-W., Jang, K.-M., Inouye, M., Kim, S.-G., Yoon, M.-H., and Park, J.-H. (2017). Translation-dependent mRNA cleavage by YhaV in Escherichia coli. FEBS Lett. 591, 1853–1861. Chowdhury, N., Kwan, B.W., McGibbon, L.C., Babitzke, P., and Wood, T.K. (2016). Toxin MqsR cleaves single-stranded mRNA with various 5’ ends. Microbiologyopen 5, 370–377. Christensen-Dalsgaard, M., and Gerdes, K. (2008). Translation affects YoeB and MazF messenger RNA interferase activities by different mechanisms. Nucleic Acids Res. 36, 6472–6481. 45 Christensen-Dalsgaard, M., Jørgensen, M.G., and Gerdes, K. (2010). Three new RelE-homologous mRNA interferases of Escherichia coli differentially induced by environmental stresses. Mol. Microbiol. 75, 333–348. Christensen, S.K., and Gerdes, K. (2004). Delayed-relaxed response explained by hyperactivation of RelE. Mol. Microbiol. 53, 587–597. Christensen, S.K., Mikkelsen, M., Pedersen, K., and Gerdes, K. (2001). RelE, a global inhibitor of translation, is activated during nutritional stress. Proc. Natl. Acad. Sci. U. S. A. 98, 14328–14333. Christensen, S.K., Maenhaut-Michel, G., Mine, N., Gottesman, S., Gerdes, K., and Van Melderen, L. (2004). Overproduction of the Lon protease triggers inhibition of translation in Escherichia coli: Involvement of the yefM-yoeB toxin-antitoxin system. Mol. Microbiol. 51, 1705–1717. Culviner, P.H., and Laub, M.T. (2018). Global Analysis of the E. coli Toxin MazF Reveals Widespread Cleavage of mRNA and the Inhibition of rRNA Maturation and Ribosome Biogenesis. Mol. Cell 70, 868-880.e10. Darfeuille, F., Unoson, C., Vogel, J., and Wagner, E.G.H. (2007). An Antisense RNA Inhibits Translation by Competing with Standby Ribosomes. Mol. Cell 26, 381–392. Deutscher, M.P. (2006). Degradation of RNA in bacteria: Comparison of mRNA and stable RNA. Nucleic Acids Res. 34, 659–666. Diderichsen, B., Fiil, N.P., and Lavallé, R. (1977). Genetics of the relB locus in Escherichia coli. J. Bacteriol. 131, 30–33. Dunican, B.F., Hiller, D.A., and Strobel, S.A. (2015). Transition State Charge Stabilization and Acid-Base Catalysis of mRNA Cleavage by the Endoribonuclease RelE. Biochemistry 54, 7048– 7057. Feng, S., Chen, Y., Kamada, K., Wang, H., Tang, K., Wang, M., and Gao, Y.G. (2013). YoeB- ribosome structure: A canonical RNase that requires the ribosome for its specific activity. Nucleic Acids Res. 41, 9549–9556. Garcia-Pino, A., Balasubramanian, S., Wyns, L., Gazit, E., De Greve, H., Magnuson, R.D., Charlier, D., van Nuland, N.A.J., and Loris, R. (2010). Allostery and Intrinsic Disorder Mediate Transcription Regulation by Conditional Cooperativity. Cell 142, 101–111. Gerdes, K., Rasmussen, P.B., and Molin, S. (1986a). Unique type of plasmid maintenance function: postsegregational killing of plasmid-free cells. Proc. Natl. Acad. Sci. 83, 3116–3120. Gerdes, K., Bech, F.W., Jorgensen, S.T., Lobner-olesen, A., Rasmussen, P.B., Atlung, T., Boe, L., Karlstrom, O., Molin, S., and Von, K. (1986b). Mechanism of postsegregational killing by the hok gene product of the parB system of plasmid Rl and its homology with the relF gene product of the E. coli relB operon. EMBO J. 5, 2023–2029. Gerdes, K., Helin, K., Christensen, O.W., and Løbner-Olesen, A. (1988). Translational control and differential RNA decay are key elements regulating postsegregational expression of the killer protein encoded by the parB locus of plasmid R1. J. Mol. Biol. 203, 119–129. Gerdes, K., Nielsen, A., Thorsted, P., and Wagner, E.G.H. (1992). Mechanism of killer gene activation. Antisense RNA-dependent RNase III cleavage ensures rapid turn-over of the stable Hok, SrnB and PndA effector messenger RNAs. J. Mol. Biol. 226, 637–649. 46 Germain, E., Castro-Roa, D., Zenkin, N., and Gerdes, K. (2013). Molecular Mechanism of Bacterial Persistence by HipA. Mol. Cell 52, 248–254. Goeders, N., Chai, R., Chen, B., Day, A., and Salmond, G. (2016). Structure, Evolution, and Functions of Bacterial Type III Toxin-Antitoxin Systems. Toxins (Basel). 8, 282. Gonzalez Barrios, A.F., Zuo, R., Hashimoto, Y., Yang, L., Bentley, W.E., and Wood, T.K. (2006). Autoinducer 2 Controls Biofilm Formation in Escherichia coli through a Novel Motility Quorum- Sensing Regulator (MqsR, B3022). J. Bacteriol. 188, 305–316. Goormaghtigh, F., Fraikin, N., Putrinš, M., Hallaert, T., Hauryliuk, V., Garcia-Pino, A., Sjödin, A., Kasvandik, S., Udekwu, K., Tenson, T., et al. (2018). Reassessing the Role of Type II Toxin- Antitoxin Systems in Formation of Escherichia coli Type II Persister Cells. MBio 9, 1–14. Griffin, M.A., Davis, J.H., and Strobel, S.A. (2013). Bacterial toxin RelE: A highly efficient ribonuclease with exquisite substrate specificity using atypical catalytic residues. Biochemistry 52, 8633–8642. Gurnev, P.A., Ortenberg, R., Dörr, T., Lewis, K., and Bezrukov, S.M. (2012). Persister-promoting bacterial toxin TisB produces anion-selective pores in planar lipid bilayers. FEBS Lett. 586, 2529– 2534. Harms, A., Fino, C., Sørensen, M.A., Semsey, S., and Gerdes, K. (2017). Prophages and Growth Dynamics Confound Experimental Results with Antibiotic-Tolerant Persister Cells. MBio 8, 1– 18. Harms, A., Brodersen, D.E., Mitarai, N., and Gerdes, K. (2018). Toxins, Targets, and Triggers: An Overview of Toxin-Antitoxin Biology. Mol. Cell 70, 768–784. Harrison, J.J., Wade, W.D., Akierman, S., Vacchi-Suzzi, C., Stremick, C.A., Turner, R.J., and Ceri, H. (2009). The chromosomal toxin gene yafQ is a determinant of multidrug tolerance for Escherichia coli growing in a biofilm. Antimicrob. Agents Chemother. 53, 2253–2258. Hazan, R., Sat, B., and Engelberg-Kulka, H. (2004). Escherichia coli mazEF-Mediated Cell Death Is Triggered by Various Stressful Conditions. J. Bacteriol. 186, 3663–3669. Helaine, S., Cheverton, A.M., Watson, K.G., Faure, L.M., Matthews, S.A., and Holden, D.W. (2014). Internalization of Salmonella by Macrophages Induces Formation of Nonreplicating Persisters. Science (80-. ). 343, 204–208. Hurley, J.M., Cruz, J.W., Ouyang, M., and Woychik, N.A. (2011). Bacterial toxin RelE mediates frequent codon-independent mRNA cleavage from the 5’ end of coding regions in vivo. J. Biol. Chem. 286, 14770–14778. Hwang, J.Y., and Buskirk, A.R. (2017). A ribosome profiling study of mRNA cleavage by the endonuclease RelE. Nucleic Acids Res. 45, D327–D336. Inouye, M. (2006). The discovery of mRNA interferases: Implication in bacterial physiology and application to biotechnology. J. Cell. Physiol. 209, 670–676. Jacob, F., and Monod, J. (1961). Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3, 318–356. Jain, C., and Belasco, J.G. (1995). RNase E autoregulates its synthesis by controlling the degradation rate of its own mRNA in Escherichia coli: unusual sensitivity of the rne transcript to 47 RNase E activity. Genes Dev. 9, 84–96. Janssen, B.D., Garza-Sánchez, F., and Hayes, C.S. (2015). YoeB toxin is activated during thermal stress. Microbiologyopen 4, 682–697. Jiang, Y., Pogliano, J., Helinski, D.R., and Konieczny, I. (2002). ParE toxin encoded by the broad- host-range plasmid RK2 is an inhibitor of Escherichia coli gyrase. Mol. Microbiol. 44, 971–979. De Jonge, N., Garcia-Pino, A., Buts, L., Haesaerts, S., Charlier, D., Zangger, K., Wyns, L., De Greve, H., and Loris, R. (2009). Rejuvenation of CcdB-Poisoned Gyrase by an Intrinsically Disordered Protein Domain. Mol. Cell 35, 154–163. Jørgensen, M.G., Pandey, D.P., Jaskolska, M., and Gerdes, K. (2009). HicA of Escherichia coli defines a novel family of translation-independent mRNA interferases in bacteria and archaea. J. Bacteriol. 191, 1191–1199. Kamada, K., Hanaoka, F., and Burley, S.K. (2003). Crystal structure of the MazE/MazF complex: Molecular bases of antidote-toxin recognition. Mol. Cell 11, 875–884. Kasari, V., Kurg, K., Margus, T., Tenson, T., and Kaldalu, N. (2010). The Escherichia coli mqsR and ygiT genes encode a new toxin-antitoxin pair. J. Bacteriol. 192, 2908–2919. Keren, I., Shah, D., Spoering, A., Kaldalu, N., and Lewis, K. (2004). Specialized persister cells and the mechanism of multidrug tolerance in Escherichia coli. J. Bacteriol. 186, 8172–8180. Kim, Y., and Wood, T.K. (2010). Toxins Hha and CspD and small RNA regulator Hfq are involved in persister cell formation through MqsR in Escherichia coli. Biochem. Biophys. Res. Commun. 391, 209–213. Koga, M., Otsuka, Y., Lemire, S., and Yonesaki, T. (2011). Escherichia coli rnlA and rnlB compose a novel toxin-antitoxin system. Genetics 187, 123–130. Kolodkin-Gal, I., Hazan, R., Gaathon, A., Carmeli, S., and Engelberg-Kulka, H. (2007). A Linear Pentapeptide Is a Quorum-Sensing Factor Required for mazEF -Mediated Cell Death in Escherichia coli. Science (80-. ). 318, 652–655. Kolodkin-Gal, I., Verdiger, R., Shlosberg-Fedida, A., and Engelberg-Kulka, H. (2009). A differential effect of E. coli toxin-antitoxin systems on cell death in liquid media and biofilm formation. PLoS One 4, e6785. Korch, S.B., and Hill, T.M. (2006). Ectopic Overexpression of Wild-Type and Mutant hipA Genes in Escherichia coli: Effects on Macromolecular Synthesis and Persister Formation. J. Bacteriol. 188, 3826–3836. Kwan, B.W., Lord, D.M., Peti, W., Page, R., Benedik, M.J., and Wood, T.K. (2014). The MqsR/MqsA Toxin/Antitoxin System Protects Escherichia coli During Bile Acid Stress. Environ. Microbiol. 1–29. Leplae, R., Geeraerts, D., Hallez, R., Guglielmini, J., Drèze, P., and Van Melderen, L. (2011). Diversity of bacterial type II toxin–antitoxin systems: a comprehensive search and functional analysis of novel families. Nucleic Acids Res. 39, 5513–5525. Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635. Luidalepp, H., Joers, A., Kaldalu, N., and Tenson, T. (2011). Age of Inoculum Strongly Influences 48 Persister Frequency and Can Mask Effects of Mutations Implicated in Altered Persistence. J. Bacteriol. 193, 3598–3605. Maehigashi, T., Ruangprasert, A., Miles, S.J., and Dunham, C.M. (2015). Molecular basis of ribosome recognition and mRNA hydrolysis by the E. coli YafQ toxin. Nucleic Acids Res. 43, 8002–8012. Maisonneuve, E., Shakespeare, L.J., Jorgensen, M.G., and Gerdes, K. (2011). Bacterial persistence by RNA endonucleases. Proc. Natl. Acad. Sci. 108, 13206–13211. Maisonneuve, E., Castro-Camargo, M., and Gerdes, K. (2013). (p)ppGpp controls bacterial persistence by stochastic induction of toxin-antitoxin activity. Cell 154, 1140–1150. Makarova, K.S., Grishin, N. V., and Koonin, E. V. (2006). The HicAB cassette, a putative novel, RNA-targeting toxin-antitoxin system in archaea and bacteria. Bioinformatics 22, 2581–2584. Makarova, K.S., Wolf, Y.I., and Koonin, E. V. (2009). Comprehensive comparative-genomic analysis of Type 2 toxin-antitoxin systems and related mobile stress response systems in prokaryotes. Biol. Direct 4, 19. Marchand, I., Nicholson, A.W., and Dreyfus, M. (2001). Bacteriophage T7 protein kinase phosphorylates RNase E and stabilizes mRNAs synthesized by T7 RNA polymerase. Mol. Microbiol. 42, 767–776. Masuda, H., Tan, Q., Awano, N., Wu, K.P., and Inouye, M. (2012). YeeU enhances the bundling of cytoskeletal polymers of MreB and FtsZ, antagonizing the CbtA (YeeV) toxicity in Escherichia coli. Mol. Microbiol. 84, 979–989. Masuda, Y., Miyakawa, K., Nishimura, Y., and Ohtsubo, E. (1993). chpA and chpB, Escherichia coli chromosomal homologs of the pem locus responsible for stable maintenance of plasmid R100. J. Bacteriol. 175, 6850–6856. Van Melderen, L., Bernard, P., and Couturier, M. (1994). Lon-dependent proteolysis of CcdA is the key control for activation of CcdB in plasmid-free segregant bacteria. Mol. Microbiol. 11, 1151–1157. Mets, T., Lippus, M., Schryer, D., Liiv, A., Kasari, V., Paier, A., Maiväli, Ü., Remme, J., Tenson, T., and Kaldalu, N. (2017). Toxins MazF and MqsR cleave Escherichia coli rRNA precursors at multiple sites. RNA Biol. 14, 124–135. Mets, T., Kasvandik, S., Saarma, M., Maiväli, Ü., Tenson, T., and Kaldalu, N. (2019). Fragmentation of Escherichia coli mRNA by MazF and MqsR. Biochimie 156, 79–91. Mok, W.W.K., Park, J.O., Rabinowitz, J.D., and Brynildsen, M.P. (2015). RNA Futile Cycling in Model Persisters Derived from MazF Accumulation. MBio 6, 1–13. Moyed, H.S., and Bertrand, K.P. (1983). hipA, a newly recognized gene of Escherichia coli K-12 that affects frequency of persistence after inhibition of murein synthesis. J. Bacteriol. 155, 768– 775. Muñoz-Gómez, A.J., Santos-Sierra, S., Berzal-Herranz, A., Lemonnier, M., and Díaz-Orejas, R. (2004). Insights into the specificity of RNA cleavage by the Escherichia coli MazF toxin. FEBS Lett. 567, 316–320. Mutschler, H., Gebhardt, M., Shoeman, R.L., and Meinhart, A. (2011). A Novel Mechanism of 49 Programmed Cell Death in Bacteria by Toxin–Antitoxin Systems Corrupts Peptidoglycan Synthesis. PLoS Biol. 9, e1001033. Naka, K., Koga, M., Yonesaki, T., and Otsuka, Y. (2014). RNase HI stimulates the activity of RnlA toxin in Escherichia coli. Mol. Microbiol. 91, 596–605. Naka, K., Qi, D., Yonesaki, T., and Otsuka, Y. (2017). RnlB antitoxin of the escherichia coli RnlA- RnlB toxin-antitoxin module requires RNase HI for inhibition of RnlA Toxin activity. Toxins (Basel). 9. Naville, M., and Gautheret, D. (2009). Transcription attenuation in bacteria: Theme and variations. Briefings Funct. Genomics Proteomics 8, 482–492. Neubauer, C., Gao, Y.G., Andersen, K.R., Dunham, C.M., Kelley, A.C., Hentschel, J., Gerdes, K., Ramakrishnan, V., and Brodersen, D.E. (2009). The Structural Basis for mRNA Recognition and Cleavage by the Ribosome-Dependent Endonuclease RelE. Cell 139, 1084–1095. Nikolic, N., Didara, Z., and Moll, I. (2017). MazF activation promotes translational heterogeneity of the grcA mRNA in Escherichia coli populations. PeerJ 5, e3830. Nikolic, N., Bergmiller, T., Vandervelde, A., Albanese, T.G., Gelens, L., and Moll, I. (2018). Autoregulation of mazEF expression underlies growth heterogeneity in bacterial populations. Nucleic Acids Res. 46, 2918–2931. Novick, A., and Weiner, M. (1957). Enzyme induction as an all-or-none phenomenon. Proc. Natl. Acad. Sci. 43, 553–566. Ogura, T., and Hiraga, S. (1983). Mini-F plasmid genes that couple host cell division to plasmid proliferation. Proc. Natl. Acad. Sci. 80, 4784–4788. Otsuka, Y., and Yonesaki, T. (2005). A novel endoribonuclease, RNase LS, in Escherichia coli. Genetics 169, 13–20. Otsuka, Y., and Yonesaki, T. (2012). Dmd of bacteriophage T4 functions as an antitoxin against Escherichia coli LsoA and RnlA toxins. Mol. Microbiol. 83, 669–681. Otsuka, Y., Miki, K., Koga, M., Katayama, N., Morimoto, W., Takahashi, Y., and Yonesaki, T. (2010). IscR regulates RNase LS activity by repressing rnlA transcription. Genetics 185, 823–830. Overgaard, M., Borch, J., Jørgensen, M.G., and Gerdes, K. (2008). Messenger RNA interferase RelE controls relBE transcription by conditional cooperativity. Mol. Microbiol. 69, 841–857. Overgaard, M., Borch, J., and Gerdes, K. (2009). RelB and RelE of Escherichia coli Form a Tight Complex That Represses Transcription via the Ribbon–Helix–Helix Motif in RelB. J. Mol. Biol. 394, 183–196. Page, R., and Peti, W. (2016). Toxin-antitoxin systems in bacterial growth arrest and persistence. Nat. Chem. Biol. 12, 208–214. Pandey, D.P., and Gerdes, K. (2005). Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes. Nucleic Acids Res. 33, 966–976. Pedersen, K., Christensen, S.K., and Gerdes, K. (2002). Rapid induction and reversal of a bacteriostatic condition by controlled expression of toxins and antitoxins. Mol. Microbiol. 45, 501–510. 50 Pedersen, K., Zavialov, A. V., Pavlov, M.Y., Elf, J., Gerdes, K., and Ehrenberg, M. (2003). The Bacterial Toxin RelE Displays Codon-Specific Cleavage of mRNAs in the Ribosomal A Site. Cell 112, 131–140. Potrykus, K., and Cashel, M. (2008). (p)ppGpp: still magical? Annu. Rev. Microbiol. 62, 35–51. Prysak, M.H., Mozdzierz, C.J., Cook, A.M., Zhu, L., Zhang, Y., Inouye, M., and Woychik, N.A. (2009). Bacterial toxin YafQ is an endoribonuclease that associates with the ribosome and blocks translation elongation through sequence-specific and frame-dependent mRNA cleavage. Mol. Microbiol. 71, 1071–1087. Qi, D., Alawneh, A.M., Yonesaki, T., and Otsuka, Y. (2015). Rapid Degradation of Host mRNAs by Stimulation of RNase E Activity by Srd of Bacteriophage T4. Genetics 201, 977–987. Ramisetty, B.C.M., Raj, S., and Ghosh, D. (2016). Escherichia coli MazEF toxin-antitoxin system does not mediate programmed cell death. J. Basic Microbiol. 56, 1398–1402. Ruangprasert, A., Maehigashi, T., Miles, S.J., Giridharan, N., Liu, J.X., and Dunham, C.M. (2014). Mechanisms of toxin inhibition and transcriptional repression by Escherichia coli DinJ-YafQ. J. Biol. Chem. 289, 20559–20569. Sat, B., Hazan, R., Fisher, T., Khaner, H., Engelberg-kulka, H., and Glaser, G.A.D. (2001). Programmed cell death Escherichia coli: some antibiotics can trigger mazEF lethality. J. Bacteriol. 183, 2041–2045. Sat, B., Reches, M., and Engelberg-Kulka, H. (2003). The Escherichia coli mazEF Suicide Module Mediates Thymineless Death. J. Bacteriol. 185, 1803–1807. Sauert, M., Wolfinger, M.T., Vesper, O., Christian, M., Byrgazov, K., and Moll, I. (2016). The MazF-regulon : a toolbox for the post-transcriptional stress response in Escherichia coli. 1–16. Schifano, J.M., Edifor, R., Sharp, J.D., Ouyang, M., Konkimalla, A., Husson, R.N., and Woychik, N. a (2013). Mycobacterial toxin MazF-mt6 inhibits translation through cleavage of 23S rRNA at the ribosomal A site. Proc. Natl. Acad. Sci. U. S. A. 110, 8501–8506. Schifano, J.M., Vvedenskaya, I.O., Knoblauch, J.G., Ouyang, M., Nickels, B.E., and Woychik, N. a (2014). An RNA-seq method for defining endoribonuclease cleavage specificity identifies dual rRNA substrates for toxin MazF-mt3. Nat. Commun. 5, 3538. Schmidt, O., Schuenemann, V.J., Hand, N.J., Silhavy, T.J., Martin, J., Lupas, A.N., and Djuranovic, S. (2007). prlF and yhaV Encode a New Toxin–Antitoxin System in Escherichia coli. J. Mol. Biol. 372, 894–905. Sengupta, M., and Austin, S. (2011). Prevalence and Significance of Plasmid Maintenance Functions in the Virulence Plasmids of Pathogenic Bacteria. Infect. Immun. 79, 2502–2509. Sevin, E.W., and Barloy-Hubler, F. (2007). RASTA-Bacteria: a web-based tool for identifying toxin-antitoxin loci in prokaryotes. Genome Biol. 8, R155. Shachrai, I., Zaslaver, A., Alon, U., and Dekel, E. (2010). Cost of Unneeded Proteins in E. coli Is Reduced after Several Generations in Exponential Growth. Mol. Cell 38, 758–767. Shah, D., Zhang, Z., Khodursky, A., Kaldalu, N., Kurg, K., and Lewis, K. (2006). Persisters: A distinct physiological state of E. coli. BMC Microbiol. 6, 1–9. Short, F.L., Akusobi, C., Broadhurst, W.R., and Salmond, G.P.C. (2018). The bacterial Type III 51 toxin-antitoxin system, ToxIN, is a dynamic protein-RNA complex with stability-dependent antiviral abortive infection activity. Sci. Rep. 8, 1–10. Singletary, L.A., Gibson, J.L., Tanner, E.J., McKenzie, G.J., Lee, P.L., Gonzalez, C., and Rosenberg, S.M. (2009). An SOS-Regulated Type 2 Toxin-Antitoxin System. J. Bacteriol. 191, 7456–7465. Stock, A.M., Robinson, V.L., and Goudreau, P.N. (2000). Two-Component Signal Transduction. Annu. Rev. Biochem. 69, 183–215. Suzuki, M., Zhang, J., Liu, M., Woychik, N.A., and Inouye, M. (2005). Single Protein Production in Living Cells Facilitated by an mRNA Interferase. Mol. Cell 18, 253–261. Szekeres, S., Dauti, M., Wilde, C., Mazel, D., and Rowe-Magnus, D.A. (2007). Chromosomal toxin-antitoxin loci can diminish large-scale genome reductions in the absence of selection. Mol. Microbiol. 63, 1588–1605. Tam, J.E., and Kline, B.C. (1989a). The F plasmid ccd autorepressor is a complex of CcdA and CcdB proteins. MGG Mol. Gen. Genet. 219, 26–32. Tam, J.E., and Kline, B.C. (1989b). Control of the ccd operon in plasmid F. J. Bacteriol. 171, 2353–2360. Tan, Q., Awano, N., and Inouye, M. (2011). YeeV is an Escherichia coli toxin that inhibits cell division by targeting the cytoskeleton proteins, FtsZ and MreB. Mol. Microbiol. 79, 109–118. Temmel, H., Müller, C., Sauert, M., Vesper, O., Reiss, A., Popow, J., Martinez, J., and Moll, I. (2017). The RNA ligase RtcB reverses MazF-induced ribosome heterogeneity in Escherichia coli. Nucleic Acids Res. 45, 4708–4721. Thisted, T., and Gerdes, K. (1992). Mechanism of post-segregational killing by the hok/sok system of plasmid R1. Sok antisense RNA regulates hok gene expression indirectly through the overlapping mok gene. J. Mol. Biol. 223, 41–54. Tsilibaris, V., Maenhaut-Michel, G., Mine, N., and Van Melderen, L. (2007). What Is the Benefit to Escherichia coli of Having Multiple Toxin-Antitoxin Systems in Its Genome? J. Bacteriol. 189, 6101–6108. Tsuchimoto, S., Ohtsubo, H., and Ohtsubo, E. (1988). Two genes, pemK and pemI, responsible for stable maintenance of resistance plasmid R100. J. Bacteriol. 170, 1461–1466. Turnbull, K.J., and Gerdes, K. (2017). HicA toxin of E scherichia coli derepresses hic AB transcription to selectively produce HicB antitoxin. Mol. Microbiol. 104, 781–792. Unoson, C., and Wagner, E.G.H. (2008). A small SOS-induced toxin is targeted against the inner membrane in Escherichia coli. Mol. Microbiol. 70, 258–270. Vesper, O., Amitai, S., Belitsky, M., Byrgazov, K., Kaberdina, A.C., Engelberg-Kulka, H., and Moll, I. (2011). Selective Translation of Leaderless mRNAs by Specialized Ribosomes Generated by MazF in Escherichia coli. Cell 147, 147–157. Vogel, J., Argaman, L., Wagner, E.G.H., and Altuvia, S. (2004). The Small RNA IstR Inhibits Synthesis of an SOS-Induced Toxic Peptide. Curr. Biol. 14, 2271–2276. Wang, X., Lord, D.M., Cheng, H.-Y., Osbourne, D.O., Hong, S.H., Sanchez-Torres, V., Quiroga, C., Zheng, K., Herrmann, T., Peti, W., et al. (2012). A new type V toxin-antitoxin system where 52 mRNA for toxin GhoT is cleaved by antitoxin GhoS. Nat. Chem. Biol. 8, 855–861. Watson, J.D., Baker, T.A., Bell, S.P., Gann, A., Levine, M., and Losick, R. (2014). Molecular Biology of the Gene (Cold Spring Harbor, New York: Cold Spring Harbor Laboratory Press). Wilmaerts, D., Bayoumi, M., Dewachter, L., Knapen, W., Mika, J.T., Hofkens, J., Dedecker, P., Maglia, G., Verstraeten, N., and Michiels, J. (2018). The Persistence-Inducing Toxin HokB Forms Dynamic Pores That Cause ATP Leakage. MBio 9, 1–12. Winter, A.J., Williams, C., Isupov, M.N., Crocker, H., Gromova, M., Marsh, P., Wilkinson, O.J., Dillingham, M.S., Harmer, N.J., Titball, R.W., et al. (2018). The molecular basis of protein toxin HicA–dependent binding of the protein antitoxin HicB to DNA. J. Biol. Chem. 293, 19429–19440. Winther, K.S., and Gerdes, K. (2009). Ectopic production of VapCs from Enterobacteria inhibits translation and trans-activates YoeB mRNA interferase. Mol. Microbiol. 72, 918–930. Winther, K.S., and Gerdes, K. (2011). Enteric virulence associated protein VapC inhibits translation by cleavage of initiator tRNA. Proc. Natl. Acad. Sci. 108, 7403–7407. Winther, K.S., Brodersen, D.E., Brown, A.K., and Gerdes, K. (2013). VapC20 of mycobacterium tuberculosis cleaves the sarcin-ricin loop of 23S rRNA. Nat. Commun. 4, 1–9. Yamaguchi, Y., Park, J.-H., and Inouye, M. (2009). MqsR, a Crucial Regulator for Quorum Sensing and Biofilm Formation, Is a GCU-specific mRNA Interferase in Escherichia coli. J. Biol. Chem. 284, 28746–28753. Yamaguchi, Y., Park, J.-H., and Inouye, M. (2011). Toxin-Antitoxin Systems in Bacteria and Archaea. Annu. Rev. Genet. 45, 61–79. Yuan, J., Sterckx, Y., Mitchenall, L.A., Maxwell, A., Loris, R., and Waldor, M.K. (2010). Vibrio cholerae ParE2 Poisons DNA Gyrase via a Mechanism Distinct from Other Gyrase Inhibitors. J. Biol. Chem. 285, 40397–40408. Zaychikova, M. V., Zakharevich, N. V., Sagaidak, M.O., Bogolubova, N.A., Smirnova, T.G., Andreevskaya, S.N., Larionova, E.E., Alekseeva, M.G., Chernousova, L.N., and Danilenko, V.N. (2015). Mycobacterium tuberculosis Type II Toxin-Antitoxin Systems: Genetic Polymorphisms and Functional Properties and the Possibility of Their Use for Genotyping. PLoS One 10, e0143682. Zhang, Y., and Inouye, M. (2009). The inhibitory mechanism of protein synthesis by YoeB, an escherichia coli toxin. J. Biol. Chem. 284, 6627–6638. Zhang, Y., Zhang, J., Hoeflich, K.P., Ikura, M., Qing, G., and Inouye, M. (2003). MazF cleaves cellular mRNAs specifically at ACA to block protein synthesis in Escherichia coli. Mol. Cell 12, 913–923. Zhang, Y., Zhang, J., Hara, H., Kato, I., and Inouye, M. (2005a). Insights into the mRNA Cleavage Mechanism by MazF, an mRNA Interferase. J. Biol. Chem. 280, 3143–3150. Zhang, Y., Zhu, L., Zhang, J., and Inouye, M. (2005b). Characterization of ChpBK, an mRNA interferase from Escherichia coli. J. Biol. Chem. 280, 26080–26088. Zhang, Y., Yamaguchi, Y., and Inouye, M. (2009). Characterization of YafO, an Escherichia coli toxin. J. Biol. Chem. 284, 25522–25531. 53 Chapter 2 Global Analysis of the E. coli Toxin MazF Reveals Widespread Cleavage of mRNA and the Inhibition of rRNA Maturation and Ribosome Biogenesis This work was published as Culviner, P.H., and Laub, M.T. Mol. Cell 70, 868-880.e10. in 2018 54 Summary Toxin-antitoxin systems are widely distributed genetic modules that regulate growth and persistence in bacteria. Many systems, including E. coli MazEF, include toxins that are endoribonucleases, but the full set of targets for these toxins remains poorly defined. Previous studies on a limited set of transcripts suggested that MazF creates a pool of leaderless mRNAs that are preferentially translated by specialized ribosomes created through MazF cleavage of mature 16S rRNA. Here, using paired-end RNA-Seq and ribosome profiling, we provide a comprehensive, global analysis of MazF cleavage specificity and its targets. We find that MazF cleaves most transcripts at multiple sites within their coding regions, with very few full-length, leaderless mRNAs created. Additionally, our results demonstrate that MazF does not create a large pool of specialized ribosomes, but instead rapidly disrupts ribosome biogenesis by targeting both ribosomal protein transcripts and rRNA precursors, helping to inhibit cell growth. 55 Introduction Toxin-antitoxin systems are genetic modules widely found in bacteria and archaea that play important, but incompletely understood roles in regulating cellular growth and proliferation. Originally discovered as modules that promote plasmid inheritance, toxin-antitoxin systems are also commonly found on bacterial chromosomes, with many species encoding dozens of systems that have been categorized into a variety of types (Yamaguchi et al., 2011). The so-called type II systems involve a protein toxin that is bound and inactivated by a cognate antitoxin protein encoded in the same operon. In ways not well understood, antitoxins can be cleared from cells, thereby liberating their cognate toxins to suppress growth. Toxin-antitoxin systems have been suggested to regulate cell growth following a multitude of stresses and have been implicated, at least in some Gram-negative bacteria, in the formation of persisters, cells that survive antibiotic treatment by being in a non-proliferative, dormant state (Balaban et al., 2004; Helaine et al., 2014). Many type II toxins are endoribonucleases that cleave a variety of RNAs. The RelE family of toxins are ribosome-associated, cleaving mRNAs in the A-site of the ribosome (Pedersen et al., 2003). The VapC and MazF endoribonucleases are not ribosome-associated, and members of these families have been suggested to cleave a variety of mRNAs, rRNAs, and tRNAs (Schifano et al., 2014, 2016; Yamaguchi et al., 2011). However, the precise targets of most endoribonuclease toxins remain poorly defined and how the activities of these toxins ultimately block cell growth remains unclear. One of the best studied endoribonuclease toxins is Escherichia coli MazF. Early studies, using primarily primer extension assays, demonstrated that MazF cleaves mRNAs at some, though not all, ACA sites (Zhang et al., 2003). Because ACA sites are present in most transcripts, MazF was suggested to suppress the growth of E. coli cells by acting as a general mRNA 'interferase' (Zhang 56 et al., 2003). However, a subsequent report argued that E. coli MazF specifically cleaves ACA sites overlapping, or just 5' of, translational start sites, to produce a pool of leaderless transcripts (Vesper et al., 2011). Further, it was suggested that MazF cleaves an ACA near the 3' end of the 16S rRNA to eliminate a 43 nt fragment containing the anti-Shine Dalgarno region. These 'specialized' ribosomes lacking anti-SD regions were postulated to preferentially translate the leaderless messages also created by MazF, including genes that help cells cope with the inducing stress. Promoting translation through MazF cleavage of leader regions was proposed to be widespread in E. coli, with the MazF regulon suggested to include >300 transcripts (Sauert et al., 2016). Work by a different group also found that MazF triggered production of a 43 nt fragment corresponding to the 3' end of 16S rRNA, but then provided evidence MazF could be cleaving rRNA precursors in addition (Mets et al., 2017). However, that study did not formally separate or quantify MazF’s effects on rRNA precursors and mature rRNA. The global cleavage patterns induced by E. coli MazF have not been studied to date. Recent efforts to map the global cleavage patterns of other MazF homologs have revealed a diversity of potential targets. For instance, MazF-mt3 from Mycobacterium tuberculosis was suggested to cleave within a conserved helix/loop of 23S rRNA and to cleave off the anti-SD sequence at the 3' end of 16S rRNA, as proposed in E. coli, leading to the suggestion that the creation of specialized ribosomes for leaderless messages is a conserved mechanism (Schifano et al., 2014). The method used to map MazF-mt3 targets relied on the sequencing of RNAs having a 5'-OH terminus, which is created by many endoribonuclease toxins (Zhang et al., 2005). However, the cleavage products created by an endoribonuclease can vary widely in stability, complicating analysis and the quantification of cleavage events. 57 To address these limitations and globally map how E. coli MazF affects cellular RNA levels, we developed a quantitative method that measures changes in paired-end reads across the transcriptome upon induction of MazF. Using this approach, we systematically mapped MazF- dependent cleavages in E. coli and quantified the extent of cleavage at each site. Our results demonstrate that MazF has extended sequence specificity beyond the requisite ACA. Importantly, we also find no evidence that MazF creates a large pool of intact leaderless transcripts. We also performed ribosome profiling and found that MazF inhibits the complete translation of its mRNA targets, with no apparent preferential translation of stress response genes. Additionally, we find that MazF does not produce a substantial pool of specialized ribosomes specifically lacking the anti-SD region of the 16S rRNA. However, quite strikingly, we find that MazF cleaves several sites within rRNA precursors and within the transcripts of many ribosomal proteins. Pulse-chase analyses demonstrate unequivocally that MazF induction rapidly and almost completely inhibits ribosome biogenesis, without significantly affecting the pool of mature ribosomes. Thus, our work supports a model in which the MazF endoribonuclease toxin cleaves a wide range of cellular mRNAs and rRNA precursors to strongly block both translation and the synthesis of new ribosomes. Results High-throughput mapping and quantification of MazF-dependent cleavages To determine the locations and extent of MazF-dependent cleavages in E. coli transcripts, we developed a technique based on the strand-specific dUTP method of paired-end RNA-sequencing (Levin et al., 2010). Briefly, we quantified RNA cleavage by comparing fragment counts at each nucleotide in cells expressing mazF to a control sample (Figure 2.1A). Cleavage of an mRNA will lead to fewer paired-end fragments spanning a cleavage site, resulting in lower fragment counts at 58 Figure 2.1: An RNA-seq-based approach for mapping the cleavage targets of MazF. (A) Schematic overview of the approach for a hypothetical gene of interest with a single cleavage site (red arrowhead). Paired-end RNA-sequencing fragments (top) mapped across each nucleotide position are summed (middle) for cells expressing MazF or carrying an empty vector, and then divided to yield a cleavage ratio profile (bottom). (B) A representative growth curve for cells 59 expressing MazF or carrying an empty vector. Cells were diluted into media lacking glucose before arabinose was added. (C) Summed read counts (top) and cleavage ratio profile (bottom) for rplJ and 5' UTR of rplL. ACA sites are indicated (red arrowheads). Position of primer pairs used in (D) are shown below. (D-E) Change in RNA abundance, as measured by qRT-PCR and the RNA-Seq- based cleavage ratio, for (D) the two regions of rplJ shown in panel (C) and (E) the six genes indicated, chosen to span a range of cleavage ratios. Bars show mean ± S.D., n=3. The cleavage ratios plotted are the minimum cleavage ratios within the region amplified by each primer pair, using data from two independent replicates of + MazF and empty vector. (F) Comparison of the average changes in abundance measured by qRT-PCR and the RNA-Seq-based cleavage ratios for the regions reported in (D-E). Red line is the linear best fit. Data points are mean ± S.D., n=3. (G) Distribution of the minimum cleavage ratios within expressed coding regions (n=1083) in E. coli when comparing cells expressing MazF to those carrying an empty vector (top) or comparing two independent replicates of cells carrying an empty vector (bottom). and near the site. To regulate mazF expression, we placed it under the control of an arabinose- inducible promoter on a low-copy vector in a ΔmazF strain. In the absence of arabinose and presence of glucose, this strain grew at a rate indistinguishable from that of a control strain harboring an empty vector (Figure 2.2A). However, upon addition of arabinose, cells harboring Para-mazF showed substantially reduced growth within 10 minutes (Figure 2.1B). To maximize the detection of cleavage events driven directly by MazF, rather than any secondary effects, we extracted RNA after inducing MazF for 5 minutes. We treated control cells harboring an empty vector identically. After generating and mapping RNA-sequencing libraries, we counted the number of fragments crossing each nucleotide in the genome to yield fragment counts per nucleotide and then computed the log2 ratio of read counts (log2 MazF:empty vector), hereafter referred to as the cleavage ratio. This ratio measures RNA cleavage while controlling for gene expression differences and region-specific biases in the library generation protocol. An example of a MazF cleavage profile is shown in Figure 2.1C for rplJ, which encodes for a ribosomal protein. After MazF induction, the lowest ratios were observed in an ~50 nucleotide region overlapping the start codon, with a 2-4-fold decrease in fragment counts across the 5'-UTR 60 Figure 2.2: Growth with uninduced MazF; replicability of cleavage ratios; gene traces (A) Representative growth curve for cells with a vector carrying MazF or an empty vector, but grown without induction. To prevent induction, cells were diluted into medium with glucose and grown to late log phase, measuing OD600 at the time points indicated. (B) Scatterplot correlating the minimum cleavage ratio in windows across two independent replicates of the paired-end RNA- Seq data. Windows of 100 nucleotides were generated only from coding regions of the genome; 2,500 randomly chosen windows of 15,926 total windows were plotted. The linear best fit line was calculated using all windows. (C) Scatterplot correlating the minimum cleavage ratio in windows for RNA extracted at 5, 30, and 60 minutes. 1,868 windows of 100 nucleotides were generated only from coding regions of the genome. The red lines show the linear best fit for each correlation. These samples were not rRNA-subtracted. (D) Representative cleavage ratio profiles for six genomic regions. Cleavage ratios profiles are plotted as in Figure 2.1C. Positions of qRT-PCR primer pairs used in Figure 2.1E-F are shown. 61 and the first 30 nucleotides of the coding region of the gene. This region of the rplJ transcript contained three ACA sites, one or all of which may be cleaved by MazF. The middle of the coding region had a cleavage ratio near 0, implying less, or limited, cleavage of the three other ACA sites within rplJ. The cleavage ratio decreased toward the end of the coding region, likely reflecting cleavage at the ACA site in the intergenic region preceding the co-operonic rplL gene, followed by 3'→5' exonuclease-driven degradation into the rplJ coding region. The overall pattern of cleavage ratios for rplJ suggests that MazF does not cleave all ACA sites equally. To corroborate our RNA-sequencing results, we conducted qRT-PCR using probes specific for either the 5' region of rplJ or the middle of the transcript (primer pairs 1 and 2, respectively; Figure 2.1C). Consistent with our cleavage ratios, the qRT-PCR analysis also indicated that the 5' end of rplJ was more strongly cleaved, with an ~4-fold decrease in its abundance compared to the middle region of the transcript (Figure 2.1D). We then extended our qRT-PCR analysis to specific regions of six other transcripts, finding a strong correlation between cleavage ratios and qRT-PCR-based ratios (R2=0.92; Figure 2.1E-F). To assess the reproducibility of our method, we compared two independent replicates of the cleavage ratios. We split each transcript's coding region into windows of 100 nucleotides and then compared the minimum cleavage ratio in each window between the replicates. For this and all subsequent analyses, we chose a threshold of 64 counts in the empty vector sample to eliminate noise associated with low read counts. Using this threshold, the two replicates exhibited a high correlation (Figure 2.2B; R2 = 0.91). We also compared cleavage ratios calculated for RNA harvested at 5 min of MazF expression to that harvested at 30 or 60 min. We found a strong, but reduced correlation in each case (Figure 2.2C; R2 = 0.74 and R2 = 0.66, respectively), likely a 62 result of indirect effects that arise at later time points. To focus on the primary, direct effects of MazF, we performed our subsequent analyses on the 5 min time point. Having established the paired-end RNA-Seq method, we probed the global effects of MazF on the E. coli transcriptome by recording the lowest cleavage ratio within each coding region. Of the 1,083 genes above our expression threshold at every nucleotide, we found that 887, or 82%, had a minimum cleavage ratio < -1, i.e. 2-fold down-regulated, after 5 minutes of MazF induction compared to just 4, or 0.4%, genes in a negative control generated by calculating the cleavage ratio of two empty vector replicates (Figure 2.1G). Although the majority of transcripts were cleaved after 5 min, MazF had a wide range of effects on relative mRNA levels. The average cleavage ratio minimum was -2.1, with the lowest being -5.6. Examples of cleavage ratio profiles are shown in Figure 2.2D with all cleavage ratio minima listed in Table S1. There were 196 genes with a cleavage ratio minimum > -1, including 58 of the 64 transcripts lacking coding ACA sites and hence not expected to be direct MazF targets. Our results indicate that MazF cleavage affects the vast majority of transcripts in E. coli. However, the transcripts with the lowest cleavage ratios had a wide variety of functions, suggesting that MazF does not specifically target a particular set of genes. Sequencing of 5'-OH-terminated fragments is not a quantitative measure of RNA cleavage An alternative strategy for mapping the targets of MazF and related endoribonucleases involves the enrichment and sequencing of RNAs with 5'-OH termini (Schifano et al., 2014). To compare our method with this enrichment method, we measured the log2 ratio of 5'-OH ends at single- nucleotide resolution in cells producing MazF relative to cells harboring an empty vector, hereafter referred to as the 5'-OH ratio (Figure 2.3A). 63 Figure 2.3: 5'-OH enrichment sequencing methodology. (A) Schematic overview of the 5'-OH mapping approach for a hypothetical gene with a single cleavage site (red arrowhead). Reads generated from 5'-OH terminated RNA (top) mapped across each nucleotide position are summed at their 5' ends (middle) for cells expressing MazF or carrying an empty vector, and then divided to yield a 5'-OH ratio (bottom). A higher 5'-OH ratio corresponds to more 5'-OH ends following MazF treatment. (B) The average 5'-OH signal at all 64 motifs of length 3. The 5'-OH ratios were calculated for RNA collected after 20 minutes of induction and the ratios occurring at all instances of a motif were averaged. As expected, the average 5'-OH ratio at ACAs genome-wide was significantly higher than for any other three-nucleotide motif (Figure 2.3B). However, 5'-OH ratios and cleavage ratios were not correlated (Figure 2.4A; R2 = 0.01). For example, within rplJ (Figure 2.1C, 2.3B), our cleavage ratios and qRT-PCR analysis had indicated that cleavage near the start codon is much stronger than within the coding region, but the highest 5'-OH ratios were within the coding region. Similarly, for all regions queried by qRT-PCR (Figure 2.1D-F), we found that the 5'-OH ratio did not correlate with changes in RNA abundance (Figure 2.4C; R2=0.09). This discrepancy likely arises because some RNA fragments with 5'-OH ends may be very stable, leading to elevated 5'- OH ratios, even if only a small percentage of a transcript is cleaved at a given ACA. Conversely, some 5'-OH ends may be very unstable or short, leading to a low 5'-OH ratio, even if the majority 64 Figure 2.4: Mapping of 5'-OH termini does not accurately quantify MazF cleavage events. (A) Scatterplot of the 5’-OH ratio measured at individual ACA sites and the cleavage ratio at the same positions. Red line is the linear best fit. ACA sites in low expressed regions were not included. (B) Comparison of 5'-OH and cleavage ratio profiles for rplJ. The ratio of 5'-OH signal (+ MazF:empty vector) for cells expressing MazF for 20 minutes or carrying an empty vector is shown (top) and compared to the cleavage ratio profile for rplJ (bottom) also shown in Figure 2.1C. (C) Comparison of the change in abundance measured by qRT-PCR (Figure 2.1D-F) and the summed 5'-OH ratios at ACA sites within each amplified region. Red line is the linear best fit. Data points are mean ± S.D., n=3. of the parent molecule is cleaved. We conclude that the 5'-OH ratio is not a quantitative measure of cleavage. MazF has an extended recognition element, including nucleotides flanking a central ACA Although low cleavage ratios in our method were typically associated with at least one ACA site, not all ACAs produced low cleavage ratios, indicating that additional specificity exists. To better define the primary sequence specificity of MazF, we searched for single MazF cleavage events that likely had minimal additional degradation of the fragments produced. These cleavage events manifest as deep, narrow valleys in cleavage ratio profiles. We identified such regions by finding 65 local minima in the cleavage ratio profiles, and then determined if the cleavage ratio increased by at least 1 (i.e. a relative 2-fold change in fragment density) within 50 nucleotides on each side of the minima. As an example, dnaK has two such local minima (Figure 2.5A, regions 1 and 2). We then measured the frequency of all 64 trinucleotide motifs within 195 of these regions (Figure 2.5B, 2.6A). ACA was the only motif present in 100% of these cleaved regions and, after normalizing to frequency in the genome, was substantially more enriched than all other trinucleotides. To determine if additional nucleotides surrounding ACA affected cleavage by MazF, we examined ACAs in regions where the cleavage ratio increases by 1 within 100 nucleotides of a local minimum. As some regions had multiple ACA motifs, which may not all be cleaved, we only used those with a single ACA site. Using qualifying ACA sites (n = 239), we generated a sequence logo and log-likelihood position weight matrix of the nucleotides flanking the ACA (Figure 2.5C-D). There was a clear under-enrichment of C at the two positions preceding the ACA, a near complete lack of G at the position immediately after the ACA, and a modest underenrichment of A and C two positions after the central ACA. We conclude that MazF likely recognizes an ~7 nucleotide region. 66 Figure 2.5 MazF has an extended recognition motif with a central ACA trinucleotide. (A) Cleavage ratio profile for dnaK, indicating two regions identified as cleaved regions, with boundaries defined as the sites where cleavage ratios have increased ≥ 2-fold relative to the local minimum. Local minima are indicated by red dots. The motif scores of each ACA in the transcript, calculated using the scoring matrix in (D), are shown below the cleavage profile. (B) Plot showing the % of cleaved regions of a maximum width of 100 nucleotides with at least one instance of each possible trinucleotide motif. For % normalized to motif frequency, see Figure 2.6. (C-D) Sequence logo (C) and position weight matrix (D) for sites associated with MazF cleaved regions of a maximum width of 200 nucleotides. RNA sequences surrounding ACA sites within cleaved regions were aligned and information content in bits calculated compared to the background frequency of nucleotides surrounding ACA sites. Scores were calculated as log likelihoods of observing the nucleotide compared to the background frequency. (E) (Top) Schematic of 6S RNA reporter, indicating single-stranded region where ACA sites of varying scores were inserted and the locations of primers used to measure cleavage, and the uncleaved region used as a control. 67 Positions in the 6S RNA mutated to eliminate native ACA sites are shown in red. (Bottom) Ratios of + MazF to empty vector, as measured by qRT-PCR of the cleaved region, normalized to abundance of the uncleaved region, 10 minutes after induction of MazF, for reporters harboring the sequence indicated. Data points are mean ± S.D., n=3. (F) Distribution of scores for all ACAs in coding regions (n= 47,059). (G) (top) Distribution of the maximum ACA score within regions defined as cleaved by MazF but not used to generate the motif (n=650) compared to (bottom) the distribution of maximum ACA scores associated with randomly sampled coding regions (50 independent samples of n=650) of the same size that had at least one ACA. To directly probe how the nucleotide sequence flanking an ACA impacts MazF-dependent cleavage, we developed a reporter in which different cleavage sites were inserted within the 5' single-stranded region of B. subtilis 6S-1 RNA (Figure 2.5E, top). To measure cleavage, we co- expressed MazF, or an empty vector, with the reporter molecule and then measured, using qRT- PCR, the abundance of the region containing a given cleavage site relative to a 3' region lacking an ACA (Figure 2.5E, bottom). Using cleavage sites with a range of scores (generated using the position weight matrix in Figure 2.5D) from -8 up to 4, we found that scores were positively correlated with the extent of cleavage, confirming a role for the nucleotides flanking an ACA in MazF cleavage specificity. To test if our inferred MazF recognition motif globally affects RNA cleavage, we scored all ACA sites in E. coli coding regions; these scores ranged from -8 to +4 (Figure 2.5F). We then recorded the highest scoring ACA in each region where the cleavage ratio increased by at least 1 (i.e. a 2- fold change in fragment density) within 50 nucleotides on each side of the minima (excluding those used initially to define our motif) (Figure 2.5G, top) and, for comparison, in randomly selected coding regions having the same lengths as the set of cleaved regions (Figure 2.5G, bottom). The cleaved regions were significantly enriched in high scoring ACA sites (K-S test, p < 10-40). This analysis supports the conclusion that MazF requires more than just an ACA to efficiently cleave RNA. However, 10% of high (i.e. > 1) scoring sites occurred in coding regions 68 with a cleavage ratio ≥ 0. Thus, a high-scoring site is not always sufficient for cleavage, possibly due to secondary structure effects. Figure 2.6: Normalized enrichment; robustness of motif finding. (A) Plot showing the frequency of each possible trinucleotide motif in cleaved regions with a maximum width of 100 nucleotides normalized to the motif’s overall frequency in coding regions. (B) Plot showing the robustness of the MazF motif to different motif-finding algorithm settings. Each dot (colored based on nucleotide identity) corresponds to a nucleotide’s score within a position weight matrix. Varied parameter sets were the minima search distance (10, 25, 50, or 100), the minimum change in cleavage ratio (0.75, 1, 1.25, or 1.5), and the maximum minima region half-width (50, 100, 150, or 200), resulting in the 64 motifs plotted. Default values used throughout the rest of the paper are underlined. (C) Plot recording the percentage of high (score ≥ 1), medium (1 > score ≥ -1.5), and low (score < -1.5) scoring MazF motifs present in cleaved regions of all transcripts. Regions were defined as cleaved if their cleavage ratio was lower than the coding region’s median cleavage ratio by 1, which flags 7.5% of positions. High and medium scoring motifs were strongly enriched in these regions, while low scoring motifs and trinucleotide motifs, excluding ACA, were not enriched. The error bar shows the mean for the trinucleotide motifs ± S.D., n=63. Leaderless mRNAs are rare and not protected from MazF cleavage in their coding regions We next searched for MazF cleavages that produce leaderless mRNAs as prior studies argued that MazF cleaves many ACA sites just upstream of or overlapping translational start sites to produce a large (> 300) pool of leaderless transcripts (Sauert et al., 2016; Vesper et al., 2011). We defined a transcript as leaderless if the cleavage ratio increased ≥ 1 when transitioning from the 15 nucleotides preceding the start codon to the 5' end of the coding region. In cells producing MazF 69 for 5 minutes, we found only 41 leaderless transcripts among 941 transcripts above our expression threshold of 64 counts across their entire length in the empty vector sample (Figure 2.7A, top). We found no leaderless transcripts in a negative control generated by comparing two empty vector samples (Figure 2.7A, bottom). Leaderless transcripts can only be translated into full-length protein if they are not also cleaved within their coding region. However, of the 41 leaderless transcripts identified, 31 had additional MazF-dependent cleavages within their coding regions, i.e. at least one region with a cleavage ratio < -1 (Figure 2.8A). For instance, groL was cleaved to become leaderless, but also had an extended region of low cleavage ratios within its coding region, with a high-scoring ACA site near the local minimum (Figure 2.7B). To verify that leaderless groL is also cleaved in the coding region, we performed Northern blotting with a probe to the 5' end of the groL coding region (Figure 2.7C). Full-length, leaderless groL transcript (~1700 nt) was undetectable after inducing MazF, with several smaller fragments accumulating. In total, only 10 leaderless transcripts were identified that also have no other region more than 2-fold downregulated, a finding in sharp contrast to the previously proposed MazF regulon of 330 transcripts (Sauert et al., 2016). A complete list of transcripts and the values we used to determine if they were leaderless and full-length is available in Table S2. The set of 10 leaderless transcripts identified did not include rpsU or grcA (formerly yfiD) (Figure 2.8B)., suggested previously to be leaderless following MazF induction (Vesper et al., 2011). In each case, the ACA just upstream of the start codon had a motif score < -1.5 and there was no valley in the cleavage ratio near the start codon, consistent with MazF not targeting these leaders. To verify that MazF does not create a substantial pool of leaderless grcA, we again used Northern blotting, this time with a probe to the 5' end of the grcA coding region (Figure 2.8C). 70 Figure 2.7: MazF does not produce a large pool of leaderless transcripts. (A) Distribution of the change in cleavage ratio between the 5' end of the coding region and the leader region. The distribution calculated for cells expressing mazF compared to cells carrying an empty vector (top) is compared to that for a comparison of two independent replicates of cells carrying an empty vector (bottom), n=941. For cells producing MazF, there are only 41 genes with large (≥ 2-fold) increases in the cleavage ratio across the start codon. (B) Cleavage ratio profile (top) and ribosome footprint profile (bottom) for groL in cells producing MazF or carrying an empty vector. Summed ribosome footprints are shown in a rolling 30 nt window with Gaussian smoothing. A diagram of the groL transcript, including ACA sites and their scores, and the location of the Northern blot probe shown in panel C, is shown at the bottom. (C) Northern blot using the probe indicated in (B) at the 5' end of the coding region of groL. RNA was extracted from cells containing an empty vector or expressing MazF for 5 minutes. RNA lengths were estimated by comparison to a ssRNA ladder run together with RNA samples and visualized prior to membrane transfer. After 5 minutes of MazF induction, full-length (~450 nt) mRNA was substantially reduced and two major degradation products were detected between 150 and 300 nucleotides in length, likely corresponding to cleavage at positions 152 and 213, respectively. A band of ~400 nt that may represent leaderless grcA was visible, but very faint. Taken all together, our results indicate that 71 Figure 2.8: Leaderless transcripts are degraded in their coding regions. (A) Distribution of minimum cleavage ratios in coding regions, excluding the first 20 nucleotides. The distributions for all considered genes (top, n=941) and leaderless genes (bottom, n=41) are shown. (B) Profiles of the cleavage ratio (top) and ribosome footprints (bottom) for rpsU and grcA in cells producing MazF compared to those carrying an empty vector. Plots were generated as in Figure 2.7B. The position of the Northern probe used for grcA is shown below the schematic of the grcA gene. (C) Northern blot using a probe to the 5' end of the coding region of grcA. RNA lengths were estimated by comparison to a ssRNA ladder run together with RNA samples and visualized prior to transfer to a membrane. A faint band within the size range possible for leaderless grcA is identified with '*'. Cleavage products likely corresponding to the 3rd and 4th ACA sites within the coding region are marked with '<'. 72 MazF does not produce a large pool of abundant, leaderless mRNAs that have complete and intact coding regions. Leaderless transcripts are not preferentially translated To test whether any of the leaderless transcripts produced by MazF were preferentially translated and, more generally, to globally profile the effects of MazF on translation, we performed ribosome profiling after inducing MazF for 5 and 25 min. For the 41 leaderless transcripts identified (full- length or not), we found no evidence that MazF induction increased translation. For example, with rsuA, which was leaderless and had little additional cleavage in its coding region, we observed similar or slightly decreased ribosome footprints across the entire coding region following MazF induction (Figure 2.9A). For groL, which was leaderless and had additional cleavages within its coding region, we found that MazF induction produced an increase in ribosome footprints within the first ~500 nucleotides of the coding region (Figure 2.7B). However, there was a significant decrease in ribosome footprints extending from a cleavage site within groL coding region to the end of the transcript. Thus, MazF likely does not increase the translation of full-length, functional GroL. We then expanded our ribosome profiling analysis to all transcripts. To systematically assess changes in ribosome recruitment to mRNAs, we compared the density of ribosome footprints at the 5' ends of coding regions (nucleotides 10-60) in cells producing MazF for 5 minutes to cells carrying an empty vector (Figure 2.9B). For coding regions that met our expression threshold (n=941), there was no trend toward increased or decreased ribosome footprints at their 5' ends. The full set of leaderless transcripts (n=41) showed no significant increase in 5' ribosome footprints compared to all considered regions (one-sided t-test, p=0.43). To test if full-length, leaderless transcripts (n=10) were preferentially translated, we compared them to the set of all full-length 73 Figure 2.9: MazF does not produce preferentially translated leaderless transcripts, but inhibits translation of targets. (A) RNA cleavage ratio and ribosome footprinting profiles for rsuA, shown as in Figure 2.7B. (B) Boxplots of the change in ribosome footprints on the 5'-end (nucleotides 10-60) of coding regions between an empty vector sample and a sample expressing mazF for 5 minutes. Full-length transcripts are those < 2-fold downregulated throughout their coding region and leaderless transcripts are defined in the text. Boxplots show the median, lower and upper quartiles, and quartile ± 1.5 interquartile range. 'n.s.' indicates no statistically significant difference (p > 0.05, one-sided t-test). (C) Plot of fluorescence normalized to OD600 in + MazF:empty vector for cells 74 growing in exponential phase. A YFP construct lacking strong coding MazF sites (YFP*) containing a short 5' UTR either with (red) or without (blue) ACA sites between the Shine- Dalgarno site and the start codon was induced with MazF at t = 0 min. Data points are the mean of three biological replicates ± S.D. (D) Plot of the distribution of ribosome footprints across coding regions in MazF and empty vector samples. Expressed coding regions were divided into 10 bins in the 5'→3' direction and the number of ribosome footprints across all genes were summed in each. (E) Plot of the mean change in 3' ribosome footprints between a sample expressing mazF for 5 minutes and an empty vector sample is shown at single-nucleotide resolution for the region surrounding coding-region ACAs with scores ≥ 1 (n=6,234). For comparison, the same region is shown for randomly chosen nucleotides (n>80,000). The distances between observed local maxima in ribosome density are shown above the plot. (F) Plot of the change in ribosome density across uncleaved (blue, n=44) and cleaved (red, n=897) mRNAs after MazF induction. Coding regions were divided into 10 bins running in a 5'→3' direction and the mean change in ribosome footprints was calculated between a sample expressing mazF for 5 minutes and an empty vector sample. The mean for each bin is shown (line) ± S.D. (shaded region). (G) Distributions of summed ACA scores for transcripts that increased in 3' ribosome footprints (top, n=100) or transcripts that decreased in 3' ribosome footprints (bottom, n=666). The change in ribosome footprints was measured at positions 3' of the last ACA with a score ≥ -1.5 in the coding region. To prevent summing negative scores, only motifs with scores ≥ -1.5 were included and 1.5 was added to all scores. transcripts (n=168) and found no significant increase in 5' ribosome footprints on full-length, leaderless messages (one-sided t-test, p=0.22). Similar observations were made using ribosome profiling data for cells that produced MazF for 25 minutes (Figure 2.10A). Finally, to confirm that our definition of leaderless mRNAs did not affect these analyses, we also verified that there was no significant increase in 5' ribosome footprints for mRNAs with a MazF site (score > -1.5) upstream of the start codon (Figure 2.10B). These analyses, taken together with the inspection of individual leaderless transcripts, indicate that MazF cleavage does not generally promote ribosome recruitment to the 5' end of its targets. To directly test whether MazF cleavages just 5' of a start codon increase translation, we also measured production of a YFP reporter engineered to lack MazF cleavage sites and produced from a transcript containing a short 5'-UTR either with or without MazF sites between the ribosome binding site and the start codon. We found that there was no increase, and if anything a slight 75 Figure 2.10: Leaderless transcripts are not preferentially translated; MazF causes increased footprints on ssrA; MazF sites lead to decreased translation. (A) Boxplots of the change in ribosome footprints on the 5' end (nucleotides 10-60) of coding regions between a sample expressing mazF for 25 minutes and an empty vector sample. Plots were generated and statistical comparisons were made as in Figure 2.9B. (B) Boxplots of the change in ribosome footprints on the 5' end (nucleotides 10-60) of coding regions between a sample expressing mazF for 5 minutes (top) or 25 minutes (bottom) and an empty vector sample. Transcripts with a '5' MazF motif' (n=180) were defined as those with a MazF motif of score ≥ - 1.5 in the 15 nt upstream of the start site. (C) Bar graph of the percentage of ribosome footprints in the 5' and 3' halves of genes as well as in the ssrA gene responsible for rescuing stalled ribosomes. (D) Polysome profiles for cells with an empty vector or expressing MazF for 5 or 25 minutes. Polysomes were separated on a 10-55% linear sucrose gradient and A260 was measured across the fractions. The baseline discontinuity in the empty vector sample overlaying the 70S and first polysome peaks is a technical artifact. 76 (E) Distributions of maximum ACA score for transcripts that increased in 3' ribosome footprints (top, n=100) or transcripts that decreased in 3' ribosome footprints (bottom, n=666). The change in ribosome footprints was measured at positions 3' of the last ACA with a score ≥ -1.5 in the coding region. For regions with no ACA site, the score was set to the minimum motif score. decrease, in fluorescence per OD for the construct with MazF sites (Figure 2.9C). These results suggest that even if MazF produces leaderless mRNAs, they are not preferentially translated. MazF directly inhibits the complete translation of its mRNA targets Although MazF did not, on average, significantly change the density of ribosomes at the 5' ends of transcript coding regions (Figure 2.9B), it did prevent ribosomes from completing the translation of cleaved transcripts. Ribosome footprints progressively decreased across transcripts following MazF induction, with an ~4-fold decrease in ribosome footprints at the 3' end of transcripts compared to the 5' end (Figure 2.9D), consistent with a prior study indicating a decrease in 35S incorporation following MazF induction (Zhang et al., 2003) and with a later study showing a 5' increase in ribosome footprints following expression of another RNase toxin, RelE (Hwang and Buskirk, 2017). There was also an ~8-fold increase in ribosome footprints on the mRNA portion of tmRNA, indicating that MazF produces messages without stop codons that are rescued by the ssrA-tagging system (Figure 2.10C). To determine if the drop in ribosome density across transcripts results directly from MazF cleavage, we examined changes in ribosome footprints near all high-scoring MazF motifs regardless of their position within a given transcript (Figure 2.9E), finding an ~8-fold enrichment in ribosomes at the -1 position relative to the ACA. Three additional peaks upstream of the ACA, at successive distances of 12, 28, and 26 nucleotides likely correspond to a ‘traffic-jam’ of ribosomes awaiting rescue on MazF-cleaved transcripts. There was an ~2-fold decrease in 77 ribosome footprints immediately after ACA sites (Figure 2.9E), supporting the conclusion that MazF is the direct cause of decreased translation in cleaved mRNAs. Although MazF induction led to a significant reduction in polysomes, they were not completely eliminated (Figure 2.10D) suggesting that some translation still occurs. We surmised that the mRNAs that continue to be translated would lack MazF sites and hence be uncleaved. To test this idea, we measured the change in ribosome density following MazF induction for uncleaved transcripts (n=44), i.e. those that had no region with a cleavage ratio below 0. For these uncleaved transcripts, there was little decrease in ribosome footprints across the coding region, in clear contrast to cleaved transcripts (n=897), which dropped significantly (Figure 2.9F). These results suggest that MazF prevents the complete translation of cleaved mRNAs and helps drive a redistribution of ribosomes to uncleaved mRNAs. To generate a list of preferentially translated genes following MazF induction, we calculated the change in ribosome footprint counts at the 3' ends of all transcripts (Table S3). There were 100 transcripts with increased ribosome footprints at their 3' ends. As expected, this set of 100 transcripts had fewer high-scoring MazF sites compared to transcripts with lower 3' ribosome footprints (Figure 2.9G) as well as a lower maximum motif score (Figure 2.10E). There was, however, no obvious enrichment in the set of transcripts with increased 3' ribosome footprints for stress-response genes, or any other particular functional process. In sum, our results support the conclusion that MazF does not promote translation of any specific set of messages and instead inhibits the translation of targeted mRNAs. MazF does not generate specialized ribosomes, but does efficiently inhibit rRNA maturation The analyses thus far focused on the mRNA targets of MazF. However, a previous model also suggested that MazF cleaves off the 3' end of the 16S rRNA within mature ribosomes, thereby 78 removing the anti-Shine-Dalgarno sequence to create a pool of specialized ribosomes. To assess cleavage of rRNA, we conducted paired-end RNA-Seq, as above, but without rRNA subtraction, 5, 30, and 60 minutes after inducing MazF. We saw no significant decreases in cleavage ratios anywhere along the 16S rRNA, including near the 3' end (Figure 2.11A). To ensure that our observations were not biased by the inhibition of reverse transcriptase crossing modified m62A bases at positions 1518 and 1519, we also counted the number of fragments crossing the proposed MazF cleavage site, but terminating before position 1518. We found that the difference between + MazF and empty vector samples still did not meet our significance threshold of a 2-fold decrease. In addition, we were able to selectively enrich for fragments which arose from mature, modified ribosomes by counting only fragments with sequencing errors at modified bases, which arise from misincorporation of nucleotides during reverse transcription. Again, we found no significant cleavage of the 3' end of the 16S rRNA. Although we found no evidence that mature rRNA is a major MazF target, we did observe a striking increase in fragments arising from immature rRNA in cells expressing MazF. An early step in 16S rRNA maturation is the cleavage of the nascent RNA by RNase III at a hairpin formed between the regions immediately upstream and downstream of the mature ends (Figure 2.12A). Near the downstream RNase III cleavage site, we observed an ~64-fold increase in reads after expressing MazF for just 5 min and > 100-fold after 30 min (Figure 2.11A, right). We observed a similarly large increase at the 3' end of the 23S rRNA (Figure 2.12B) and a significant increase in reads starting 90 nucleotides upstream of the 5' end of the 23S rRNA. These results suggest that MazF induction leads to the rapid accumulation of rRNA precursors, which may reflect a disruption in rRNA maturation and lead to an inhibition of ribosome biogenesis. 79 Figure 2.11: MazF does not target mature ribosomes, but does inhibit ribosome maturation. 80 (A) (Left) Paired-end fragment counts summed across the seven 16S rRNA genes in E. coli. The green arrow outline denotes the extent of the mature rRNA. Below are the RNase III maturation sites (blue arrowheads) and the ACA sites (red arrowheads) that exist in all seven 16S loci. The maximum motif score across 16S loci is plotted above the gene diagram. (Right) Detail of the 3' end of the 16S rRNA and precursor region indicating the MazF-dependent accumulation of rRNA precursor. (B) Northern blots of total RNA using probes sensitive to the mature 3' end of the 16S (1), the precursor region that is 5' of the RNase III site (2), and the precursor region that is 3' of the RNase III site (3). The blot for probe 3 was split to enable visualization of two regions requiring different exposures. RNA lengths were estimated by comparison to a ssRNA ladder run together with RNA samples and visualized prior to membrane transfer, except the 43 nt length, which was determined by comparison to a synthesized RNA oligo. (C) RNA cleavage ratio and ribosome footprinting profiles for the S10 ribosomal protein operon, shown as in Figure 2.7B. Regions with lower footprints counts in the MazF + sample are marked with a purple line. (D) (Left) Timeline of pulse-chase experiment to measure effects of MazF induction on nascent rRNA synthesis indicating MazF induction, pulse, chase, and sampling times. (Right) Measurements of A254 to assess total RNA and 3H c.p.m. to measure nascent RNA across a sucrose gradient. Early and late timepoints were collected for both empty vector and + MazF samples. We also used Northern blotting to probe changes in the 3' region of the 16S rRNA. First, we used a probe specific to the anti-Shine-Dalgarno region of the mature 16S rRNA (Figure 2.11B). If a substantial pool of specialized ribosomes is produced, this probe should result in both the appearance of a major band at 43 nt and a significant loss in signal of the full-length 16S rRNA band. However, we observed very little, if any, signal at 43 nt and only a modest decrease in full- length 16S rRNA, even after 60 min of MazF induction. We directly quantified the 43 nt band by comparison to a synthetic standard (Figure 2.12C). There was no detectable 43-nt fragment after 5 min of Maz induction, and only ~50-100 molecules per cell after 60 min, which would represent only ~1% of the initial pool of ribosomes. This calculation assumes that all 43-nt fragment arises from the processing of mature ribosomes, rather than rRNA precursors, so the number of specialized ribosomes is likely even lower. This lack of specialized ribosomes is not the result of insufficient MazF as the levels of expression used here were sufficient to stop growth within ~10 min (Figure 2.1B) and to drive substantial amounts of RNA cleavage within 5 min (Figure 2.1G). 81 Figure 2.12: Ribosomal protein transcripts, but not ribosomes are targets of cleavage. 82 (A) Cartoon of RNA processing steps in rRNA maturation in one of the 7 rRNA loci. Known RNase sites involved in processing are indicated in red. Regions showing significant increases in our RNA-seq data following MazF induction are shaded in blue. (B) Paired-end fragment counts summed across the seven 23S rRNA genes in E. coli. The middle region of the 23S rRNA was removed to highlight the major changes at the 5' and 3' ends of the gene. The green arrow outline below denotes the extent of the mature 23S rRNA. ACA sites are marked by red arrowheads in cases where the ACA exists in all seven 23S loci. Maximum motif score across the 23S loci is plotted for each ACA above the gene diagram. The discontinuity in the data just prior to the 0 position corresponds to a region which was removed as the sequence length at this position varies between the 23S alignments. RNase III maturation sites are shown with blue arrows at the bottom. (C) Northern blots of total RNA from cells expressing MazF along side a 43 nt synthetic RNA standard with the same sequence as the region proposed to be cleaved from ribosomes by MazF. A probe sensitive to the mature 3' end of the 16S (probe 1 in Figure 2.11B) was used. (D) RNA cleavage ratio and ribosome footprinting profiles for the spc (left) and alpha (right) r-protein operons, shown as in Figure 2.7B. (E) (Left) Timeline of pulse-chase experiment for measuring the effects of MazF on mature rRNA indicating pulse, chase, MazF induction, and sampling times. (Right) Measurements of A254 to measure all RNA and c.p.m. to measure mature rRNA synthesized before MazF induction across fractions of a 5-30% sucrose gradient. Early and late timepoints were collected for both empty vector and + MazF samples. Although no significant level of 43 nt fragment arose following MazF induction, we did observe significant cleavage products ~70 and ~160 nucleotides in length after 30 min of MazF induction, with additional bands between 150 and 500 nucleotides appearing after 60 minutes. The ~70 and ~160 nucleotide cleavage fragments were also seen using a probe specific to the region between the RNase III site and the mature 3' end, a region normally degraded during rRNA maturation. The sizes of these fragments are consistent with MazF cleavage of a 16S precursor at ACA sites near nucleotides 1400 and 1500. We also used a probe specific to RNA that is 3' of the RNase III processing site, using agarose instead of polyacrylamide gels to visualize larger RNA species. In this case, MazF induction resulted in a significant increase in signal for an RNA species between ~1500-2000 nucleotides, including a band larger than the mature 16S rRNA, as well as a species <500 nucleotides. Taken all together, these results contradict a model in which large pools of specialized ribosomes are produced, and instead strongly support the conclusion that MazF (i) directly cleaves rRNA precursors and (ii) prevents the proper maturation of rRNA precursors. 83 The disruption of rRNA processing may arise from MazF directly cleaving rRNA precursors; we observed several high-scoring MazF sites in precursor regions of the rRNA, including the regions immediately 5' of both the 16S and 23S rRNAs (Figures 2.11A, 2.12B). Alternatively, or in addition, MazF may indirectly affect rRNA maturation by cleaving the transcripts of ribosomal proteins. Insufficiencies in ribosomal protein levels can prevent the proper maturation of rRNA precursors (Siibak et al., 2011), which are normally bound almost immediately upon transcription by ribosomal proteins. Thus, we inspected the transcripts of ribosomal protein operons for cleavage. For most (11 of 16) operons, we found evidence of strong MazF cleavage and reduced translation, with cleavage ratio minima followed by regions showing significantly reduced ribosome occupancy (Figures 2.11C, 2.12D). Taken together, our results suggest that a key activity of MazF may be inhibition of rRNA maturation and, consequently, ribosome biogenesis. To directly test this model, we induced the expression of MazF for 5 min, pulse-labeled cells with 3H-uridine for 5 min, and then chased with cold uridine. We took samples 10 and 25 min after chasing, and measured both the 3H-labeled RNA and the total RNA across a sucrose gradient (Figure 2.11D). In empty-vector control samples, both the radiolabeled RNA and total RNA (A254) had clear peaks corresponding to 30S, 50S, 70S and polysomes, indicating that the radiolabeled 3H-uridine was incorporated into mature ribosomes. MazF-expressing cells had a similar pattern as the control sample for total RNA. However, in striking contrast to the control, cells expressing MazF had no clear peaks for radiolabeled RNA, with the majority of the signal running as a smear above the position of 30S subunits. These results demonstrate that, upon induction, MazF rapidly and almost completely blocks the production of new ribosomes. 84 To test if mature ribosomes are also a target of MazF, we again pulse-labeled cells with 3H-uridine (but before producing MazF), chased with cold uridine, and then induced MazF expression. In this case, A254 measurements and radiolabel incorporation were essentially identical for the MazF producing cells and the empty vector control, with clear peaks in each case corresponding to the 70S, 50S, and 30S ribosome subunits (Figure 2.12E). Taken all together, our results strongly support a model in which MazF cleaves precursor rRNAs and ribosomal protein transcripts to inhibit new rRNA synthesis and ribosome biogenesis, with no evidence for significant alteration or specialization of mature ribosomes. Discussion Mapping the specificity and global cleavage patterns of endoribonuclease toxins Toxin-antitoxin systems are abundant genetic modules in bacteria and archaea that play critical roles in regulating cell growth, antibiotic persistence, and phage immunity. Many of the toxins are endoribonucleases, but their global patterns of cleavage and target specificity remain poorly characterized. Here, we described the development of a paired-end RNA-Seq-based method for systematically mapping and quantifying the cleavage of endoribonucleases. These results, along with ribosome profiling data, offer a global and comprehensive view of how the E. coli toxin MazF inhibits cell growth. Previous efforts to map the cleavage targets and specificity of endoribonuclease toxins have relied on a method involving enrichment and sequencing of the 5'-OH termini created by these toxins (Schifano et al., 2014). We implemented this approach for MazF for comparison to the paired end RNA-Seq method developed here. Although both methods can identify recognition motifs, the 5'- OH method often produced large peaks where there was minimal, if any, cleavage detected by paired-end RNA-Seq (or by qRT-PCR), and there were some clear cleavages detected by RNA- 85 Seq that had no corresponding peak in the 5'-OH data. These differences likely arise from the effects of 5'-OH termini on RNA stability. For instance, a large peak in the 5'-OH method may not reflect extensive cleavage of the parent RNA molecule, but rather high stability of the product containing the 5'-OH terminus. Thus, the 5'-OH method cannot provide reliable, quantitative assessments of the extent of cleavage genome-wide. A global analysis of RNA cleavage by MazF reveals its specificity and targets MazEF is one of the most studied toxin-antitoxin systems to date and has, in particular, been a paradigm for understanding the many systems that encode endoribonucleases. MazF is often referred to as specific for ACA sites. Although an ACA is necessary for cleavage, this trinucleotide motif is by no means sufficient. An extended sequence specificity for MazF is consistent with the structure of E. coli MazF in complex with the substrate d(AUACAUA) where the 5 central nucleotides show clear electron density (Zorzini et al., 2016). Our global study also provides important new insights into the RNA targets of MazF, which have remained poorly defined and controversial. Early studies using model transcripts suggested that MazF cleaves ACA sites that occur in most transcripts to interfere with translation, leading to MazF being dubbed an 'mRNA interferase' that inhibits translation by bulk degradation of transcripts (Zhang et al., 2003). However, subsequent studies argued that E. coli MazF reprograms translation by specifically cleaving transcripts with an ACA site near their translational start sites to generate a large pool of leaderless mRNAs that were preferentially translated by specialized ribosomes arising from MazF cleaving a single ACA near the 3' end of mature 16S rRNA. This model was attractive as it represented a potentially powerful mechanism for cells to precisely tailor their translational program during stress. There has also been significant interest recently in how 86 eukaryotes generate heterogeneous pools of ribosomes with different translational capacities, and MazF-derived ribosomes were a potential prokaryotic instance of such heterogeneity. Our combined RNA-Seq and ribosome profiling data now delineate the global patterns of cleavage by MazF and the consequences for the cell's translational program. We find that MazF cleaves a wide range of mRNAs, usually at multiple ACA sites matching the extended specificity noted above. These cleavage events, along with subsequent processing by other nucleases, lead to significant decreases in the abundance of most full-length mRNAs. We found no evidence for a substantial pool of leaderless transcripts capable of being translated into functional proteins. MazF does cleave ~40 transcripts at or just upstream of their translational start sites. However, these transcripts typically had additional, strong cleavage sites within their coding regions. Notably, the previous studies of MazF effectively ignored these additional cleavage events by (i) using primer extension with primers that hybridized upstream of any coding region cleavage sites (Vesper et al., 2011) and (ii) limiting RNA-Seq analysis to the 5' ends of transcripts (Sauert et al., 2016). Our ribosome profiling supports a model in which MazF generally inhibits translation by cleaving most mRNAs. For nearly all transcripts, MazF induction led to a significant decrease in ribosome footprints toward their 3' ends (Figure 2.9D, F), indicating that MazF leads to decreased translation of full-length proteins. This trend held even when considering just leaderless transcripts, such as groL (Figure 2.7B-C). Additionally, the set of transcripts that did show increased ribosome footprints (Figure 2.9F-G) were not obviously enriched for any particular function. In sum, we conclude that MazF does not produce a large pool of leaderless mRNAs to drive a translational reprogramming of cells to cope with stress as previously proposed (Sauert et al., 2016; Vesper et al., 2011). 87 MazF rapidly blocks rRNA maturation and ribosome biogenesis Our RNA-Seq data revealed rRNA as a major target of MazF. However, we did not find evidence of MazF cleaving off the anti-SD region of the 16S rRNA in mature ribosomes by cutting an ACA site at nucleotide 1500. We did not observe an abundant 43 nucleotide cleavage product corresponding to cleavage at nucleotide 1500 or a robust decrease in signal of the full-length 16S rRNA using a probe specific to the anti-Shine-Dalgarno region (see Figure 2.11B, probe 1). In fact, our quantification of the 43 nucleotide fragment indicated no evidence of ribosome specialization after 5 min of MazF induction, despite the extensive cleavage of mRNAs at this time point and the onset of cell growth inhibition (Figure 2.1B). Even after 60 min, the abundance of the 43 nt fragment indicates that at most ~1% of ribosomes have been truncated (Figure 2.12C). And this estimate may be high as it assumes that the 43 nt fragment arises only from mature ribosomes, a dangerous assumption given MazF’s effects on rRNA maturation (Figure 2.11D). Finally, even if a very small fraction of ribosomes are being cleaved by MazF to eliminate the anti- SD region, our ribosome profiling demonstrates that it does not lead to any substantial reprogramming of translation. Our results indicate that although MazF does not significantly impact mature rRNA, it does have a major effect on rRNA precursors. One recent study of E. coli MazF had suggested that it may cleave precursor rRNAs (Mets et al., 2017) based on 5'-OH mapping that produced a peak in the 16S precursor and several peaks within structured regions of 16S and 23S rRNA. However, these experiments were conducted after MazF was expressed for 2 hours using the 5'-OH method so whether these peaks indicate common products or rare but stable products is unclear. The prior study also reported the formation of aberrant, MazF-dependent rRNA peaks on a sucrose gradient. However, such experiments cannot discern whether those rRNA species arise from MazF targeting 88 mature or immature rRNAs. Additionally, a precursor-specific probe did not yield signal in a Northern blot of RNA from cells expressing MazF. Finally, although this prior study did suggest that rRNA precursors are a target of MazF, they also reported the 43 nt fragment identified by Moll and colleagues (Vesper et al., 2011), leaving unresolved how frequently MazF cleaves mature versus immature rRNA and what impact these cleavage events have on ribosome biogenesis. Our results now conclusively demonstrate that MazF disrupts rRNA maturation. We observed 50- 100 fold increases in RNA past the mature 3' ends of both the 16S and 23S rRNA, regions that are normally degraded during rRNA maturation and ribosome biogenesis. In E. coli, as in most bacteria, rRNA transcription produces a long rRNA precursor that adopts a complex secondary structure in which the regions immediately upstream and downstream of the 16S and 23S rRNAs hybridize to form dsRNA stems, with the nascent 16S and 23S rRNAs emanating as loops from these stems (Figure 2.12A). rRNA maturation initiates through RNase III cleavage of these dsRNA stems followed by a series of additional RNase-based processing and RNA modification steps. The disruption of this maturation process by MazF likely results from both direct cleavage of rRNA transcripts and the cleavage of transcripts encoding ribosomal proteins, which normally bind rRNA to promote their proper maturation. The net result is that MazF almost completely blocks the biogenesis of ribosomes (Figure 2.11D). Our pulse-chase analysis of ribosome biogenesis indicated a rapid and nearly complete loss of 3H-uridine incorporation into 30S, 50S, and 70S ribosomes after inducing MazF. An inability to synthesize new ribosomes may be a key mechanism by which MazF inhibits cell growth (Figure 2.1B). The cleavage of many different cellular mRNAs will also generally slow translation, but our findings suggest that the massive disruption of rRNA maturation and ribosome biogenesis is likely a major facet of the growth inhibition following MazF induction. 89 Concluding remarks In sum, our work indicates that E. coli MazF does not create a large pool of leaderless mRNAs, nor does it create specialized ribosomes lacking the anti-SD region that preferentially translate leaderless mRNAs. Instead, our results demonstrate that MazF efficiently blocks ribosome biogenesis, probably through the cleavage of rRNA precursors and ribosomal protein transcripts, which together will contribute significantly to a suppression of cell growth. Disrupting rRNA biogenesis is a well-known and powerful mechanism for blocking translation and cell growth in E. coli. One of the best studied growth regulators in E. coli is guanosine tetraphosphate (ppGpp), which accumulates following amino acid starvation (Potrykus and Cashel, 2008). By binding directly to RNA polymerase, ppGpp directly shuts down rRNA transcription, thereby slowing ribosome biogenesis, translation, and cell growth. The inhibition of ribosome synthesis by MazF without degrading or altering mature ribosomes may represent a similar strategy for reversibly blocking cell growth following the onset of stressful conditions, or potentially as a mechanism of inducing growth rate heterogeneity in a clonal population. The RNA-Seq-based method developed here enabled a global and quantitative assessment of the RNA cleavages triggered by MazF in E. coli and can now be applied to other ribonucleases. MazF homologs are found in a wide range of bacteria with some species encoding several paralogs that may have diverse targets or that may target different regions of rRNA precursors. Additionally, there are other families of toxins, including the RelE, HigB, and VapC toxins that cleave RNAs. The systematic mapping of their cleavage patterns and specificities promises to provide new insight into the biological roles and mechanisms of toxin-antitoxin systems as regulators of bacterial growth and persistence. 90 Methods Experimental Model and Subject Details Growth conditions Escherichia coli was grown in M9 (10x stock made with 64 g/L Na2HPO4-7H2O, 15 g/L KH2PO4, 2.5 g/L NaCl, 5.0 g/L NH4Cl) medium supplemented with 0.1% casamino acids, 0.4% glycerol, 2 mM MgSO4, and 0.1 mM CaCl2. Glucose at 0.4% was used to prevent leaky expression from the arabinose promoter and 0.2% arabinose was used to induce expression. Cells were grown at 37 °C and 200 rpm in an orbital shaker. Prior to liquid growth, individual colonies were selected by growth overnight on LB (10 g/L NaCl, 10 g/L tryptone, 5 g/L yeast extract) agar plates. Antibiotics were used at the following concentrations (liquid/plates): carbenicillin (50 µg mL-1 / 100 µg mL- 1), chloramphenicol (20 µg mL-1 / 30 µg mL-1). Strain construction MazF deletion strain MG1655 strain was courtesy of Kenn Gerdes (Maisonneuve et al., 2011). MazF induction, empty vector, and promoter strains were constructed by transformation of plasmids into this strain. The pKVS45 vector was courtesy of Kristala Prather. For a list of strains and plasmids used in this work see Table S4 and S5. Plasmid construction Modified pBAD30 and pBAD33 plasmids were used for expression of MazF (Guzman et al., 1995). A sequence containing a ribosome binding site (AGGAGGGATT) was added between the EcoRI and SacI sites in the MCS of the pBAD plasmids. MazF was inserted by amplification of MazF from MG1655 genomic DNA inserted into pBAD30:mazF and pBAD33:mazF using the SacI and HindIII sites. Bacillus subtilis 6S-1 RNA with mutated ACA sites (see Figure 2.5E) was purchased as a gBlock (Integrated DNA Technologies) and was inserted into pKVS45 (modified 91 to include a sc101 origin from pSB4K5) using Gibson assembly. MazF cleavage sites of varying scores were then inserted using site directed mutagenesis. The YFP* reporter constructs were constructed using Gibson assembly of a YFP* gBlock (Integrated DNA Technologies), primers generating the 5'-UTR, and the pKVS45 plasmid. They YFP* gene itself is a translational fusion of the first 28 amino acids of the rne gene and a codon-optimized YFP gene with MazF cleavage sites removed. For a list of primers used for strain construction, see Table S6. Experimental Method Details MazF induction For MazF induction, E. coli cells were grown with glucose until just prior to induction to prevent loss of toxicity. Single colonies were grown overnight in glucose. Overnight cultures were back- diluted and grown to ~0.35 OD600 in fresh media with 0.4% glucose at 37 °C in an orbital shaker at 200 rpm. Cells were pelleted by centrifugation at 4 °C and 4000 g for 5 minutes. Pellets were washed, centrifuged, and resuspended in fresh medium without glucose. OD600 was normalized across all samples by dilution into fresh medium to an OD600 of ~0.15, and cells were allowed to recover at 37 °C for at least 30 minutes, after which they were induced with 0.2% arabinose at OD600 ~0.2. RNA extraction times vary and are as noted for each experiment. RNA extraction Cells were pelleted by harvesting 1 mL of cell culture, with 110 µL of an ice cold stop solution (95% ethanol, 5% acid-buffered phenol) and spun at 13000 rpm for 30 seconds in a bench-top centrifuge. After removing supernatant, cells were flash frozen in liquid nitrogen and stored at -80 °C. Trizol (Invitrogen) was pre-heated to 65 °C and added directly to cell pellets. The mixture was shaken for 10 minutes at 65 °C and 2000 rpm in a thermomixer (Eppendorf) to lyse cells. Lysed cells were frozen at -80 °C for at least 10 minutes, thawed, and centrifuged for 5 minutes at max 92 speed at 4 °C. Trizol supernatant was mixed with 400 µL of ethanol, avoiding the pellet of cell debris. Samples were passed over a Direct-zol miniprep column (Zymo). Columns were pre- washed twice with 400 µL of provided pre-wash buffer, washed once with 700 µL of provided wash buffer, and dried for 2 minutes by centrifuging at 13000 rpm. Samples were eluted in 90 µL DEPC water. RNA was then treated with 2 µL Turbo DNase (Invitrogen) in 100 µL using provided 10x buffer. After incubating for 20 minutes at 37 °C, an additional 2 µL of DNase was added, followed by another 20 minutes at 37 °C. Reaction volume was brought to 200 µL with DEPC water and vortexed with 200 µL of acid-phenol:chloroform IAA, pH 4.5 (Invitrogen). Samples were centrifuged 10 minutes at 4 °C and top layer was extracted and ethanol precipitated with 20 µL of 3M NaOAc, 2 µL GlycoBlue (Invitrogen), and 600 µL of ice cold ethanol. Samples were incubated at -80 °C for at least 4 hours and then spun at max speed at 4 °C for 30 minutes. Samples were washed with 500 µL of ice-cold 70% ethanol, re-centrifuged for 5 minutes, air dried, and resuspended in 30 µL of DEPC water. Chemical purity and yield was quantified by NanoDrop spectrophotometer and RNA integrity was verified by running out on a Novex 6% TBE-Urea gel (Invitrogen). Paired-end library preparation The library generation protocol was a modified version of the paired-end strand-specific dUTP method using random hexamer priming (Levin et al., 2010). For libraries without rRNA removal, 500 ng of total RNA was used in the fragmentation step, skipping rRNA removal. For libraries with rRNA removal, 2-3 µg of input RNA was used in the rRNA removal step. rRNA removal: rRNA removal was conducted using the Ribo-Zero rRNA Removal Kit for Bacteria (Illumina). Provided magnetic beads were prepared individually by adding 225 µL of beads to a 1.5 mL tube, left to stand on a magnetic rack for 1 minute, washed twice with 225 µL 93 of water, and resuspended in 65 µL of provided resuspension solution with 1 µL of provided RNase inhibitor. Samples were prepared using provided reagents with 4 µL of reaction buffer, 2-3 µg of total RNA, 10 µL of rRNA removal solution in a total reaction volume of 40 µL. Samples were incubated at 68 °C for 10 minutes and at room temperature for 5 minutes. Samples were added directly to the resuspended magnetic beads, mixed by pipetting, incubated for 5 minutes at room temperature, and then incubated for 5 minutes at 50 °C. After incubation, samples were placed on magnetic rack and the supernatant was transferred to a new tube, discarding the beads. Samples were ethanol precipitated as above with a 1 hour incubation at -20 °C and resuspended in 9 µL of water. Fragmentation: RNA fragmentation was conducted using stop solution and fragmentation reagent provided with RNA Fragmentation Reagents (Invitrogen). Samples were mixed with 1 µL of 10x fragmentation reagent in a final volume of 10 µL, incubated at 70 °C for 8 minutes, placed on an ice block, and mixed with 1 µL of stop solution. Reactions were brought to 20 µL with DEPC water and ethanol precipitated using 2 µL of 3M NaOAc, 2 µL of GlycoBlue (Invitrogen), and 60 µL of ice-cold ethanol. Samples were ethanol precipitated as above with a 1 hour incubation at - 20 °C and resuspended in in 6 µL of DEPC water. cDNA synthesis: 1 µL of random primers at 3 µg/µL (Invitrogen) were added to fragmented RNA. Samples were incubated at 65 °C for 5 minutes and placed on ice for at least 1 minute. First strand synthesis was conducted by adding 4 µL of 5x first-strand buffer (Invitrogen), 2 µL of 100 mM DTT, 1 µL of 10 mM dNTPs, 1 µL of SUPERase-In (Invitrogen), and 4 µL of DEPC water. Reactions were incubated at room temperature for 2 minutes and 1 µL of Superscript III (Invitrogen) was then added. Reactions were placed on a thermocycler and incubated for 10 minutes at 25 °C, 1 hour at 50 °C, and 15 minutes at 70 °C. Reactions were brought to 200 µL 94 with 180 µL of DEPC water and vortexed with neutral phenol-chloroform isoamyl alcohol. Layers were separated by centrifugation for 10 minutes at maximum speed at 4 °C. The aqueous top layer was extracted and ethanol precipitated by adding 18.5 µL 3M NaOAc, 2 µL GlycoBlue, and 600 µL of ice-cold ethanol. Samples were incubated for 1 hour at -20 °C, centrifuged at maximum speed for 30 minutes at 4 °C, washed twice with ice-cold 70% ethanol with 5 minute centrifugations, air-dried, and resuspended in 104 µL of DEPC water. Second strand synthesis was conducted by adding 30 µL 5x second strand buffer (Invitrogen), 4 µL 10 mM dNTPs (using dUTP instead of dTTP), 4 µL 5x first strand buffer (Invitrogen), and 2 µL 100 mM DTT. Samples were mixed by pipetting and placed on ice for 5 minutes. Reactions were started by adding 1 µL RNase H (NEB), 1 µL E. coli DNA Ligase (NEB), and 4 µL E. coli DNA Polymerase I (NEB), mixing by pipetting, and incubating at 16 °C for 2.5 hours. Reactions could be frozen at -20 °C at this stage. End-repair and adapter ligation: Cleanup for subsequent reactions was conducted by Agencourt AMPure XP magnetic beads (Beckman Coulter). Note that unless otherwise stated, beads were left in the reaction to be reused in future reactions. For each sample, 100 µL of AMPure beads were added to 1.5 mL tubes and placed on the magnetic rack for ~5 minutes. Supernatent was removed and replaced with 450 µL of 20% (w/v) PEG 8000 in 2.5 M NaCl. Second strand synthesis reactions were added directly to resuspended beads and allowed to incubate at room temperature for 10 minutes. Samples were placed on the magnetic rack and left until beads formed a diffuse pellet (about 10 minutes). Beads were washed twice with 80% ethanol, leaving the tubes on the magnetic rack during washes. Residual ethanol was removed and beads were allowed to dry for 5 minutes. Beads were resuspended in 50 µL of elution buffer (Qiagen). End repair was conducted for each sample by adding 10 µL of 10x T4 DNA ligase reaction buffer (NEB), 4 µL 95 10 mM dNTPs, 25 µL water, 5 µL T4 DNA polymerase (NEB), 1 µL Klenow DNA polymerase (NEB), and 5 µL T4 PNK. Reactions were incubated for 30 minutes at 25 °C. To clean up reactions, 300 µL of PEG-NaCl solution was added to each reaction. After incubation at room temperature for 10 minutes, samples were placed on the magnetic rack until the beads pelleted, about 5 minutes. Beads were washed twice with 80% ethanol, dried for 5 minutes, and resuspended in 32 µL of elution buffer (Qiagen). 3'-adenylation was conducted by adding 5 µL NEB buffer 2 (NEB), 9 µL of water, 1 µL 10 mM dATP, and 3 µL Klenow Fragment (3'→5' exo-) (NEB) and incubated for 30 minutes at 37 °C. To clean up reactions, 150 µL of PEG-NaCl solution was added to each reaction. After incubation at room temperature for 10 minutes, samples were placed on the magnetic rack until the beads pelleted, about 5 minutes. Beads were washed twice with 80% ethanol, dried for 5 minutes, and resuspended in 23 µL of elution buffer (Qiagen). After 5 minutes of incubation, tubes were returned to the magnetic rack and eluted DNA was moved to a new tube and beads were discarded. To ligate adapters, 1 µL annealed adapter mix, 25 µL 2x quick ligase buffer (NEB), and 1 µL quick DNA ligase (NEB) was added to each sample and incubated at 25 °C for 15 minutes. Annealed adapter mix was made by mixing 25 µL of a 200 µM solution of each paired-end adapter together, heating to 90 °C for 2 minutes, cooling at 2 °C/minute for 30 minutes on a thermocycler, placing on ice, adding 50 µL of water, and storing aliquots at -20 °C. Ligation reactions were cleaned up by adding 75 µL resuspended AMPure beads (made by resuspending 100 µL of AMPure beads in 75 µL of PEG-NaCl solution). After incubation at room temperature for 10 minutes, samples were placed on the magnetic rack until the beads pelleted, about 5 minutes. Beads were washed twice with 80% ethanol, dried for 5 minutes, and resuspended in 23 µL of elution buffer (Qiagen). After 5 minutes of incubation, tubes were returned to the magnetic rack and eluted DNA was moved to a new tube and beads were discarded. The dUTP-containing second 96 strand was digested by adding 6 µL of Phusion 5x high fidelity buffer (NEB) and 1 µL of USER enzyme (NEB) and incubating at 37 °C for 15 minutes, followed by 95 °C for 5 minutes to inactivate the enzyme. Library amplification: Full PCR reactions were prepared by adding 13.3 µL 3M betaine, 3 µL 10 mM dNTPs, 14 µL 5x high fidelity buffer (NEB), 2 µL 25 µM global primer, 2 µL 25 µM barcoded primer, 34.7 µL water, 1 µL Phusion (NEB) in a final volume of 100 µL. The following thermocycler program was used: 98 °C/30 seconds, 98 °C/10 seconds, 65 °C/30 seconds, 72 °C/30 seconds, 72 °C/5 minutes. Steps 2-4 were repeated for 9-12 cycles. The cycle number was optimized prior to final amplification using 10 µL reactions. Primers were removed by adding 300 µL resuspended AMPure beads (made by resuspending 100 µL of AMPure beads in 300 µL of PEG-NaCl solution). After incubation at room temperature for 10 minutes, samples were placed in the magnetic rack until the beads were pelleted, about 5 minutes. Beads were washed twice with 80% ethanol, dried for 5 minutes, and resuspended in 20 µL of elution buffer (Qiagen). Elutions were run on an 8% TBE polyacrylamide gel (Invitrogen) for 30 minutes at 180 V. The region from 200 to 350 bp was excised, crushed, soaked in 500 µL 10 mM Tris 8.0, and extracted using a Spin- X 0.22 µm cellulose acetate column (Costar). Samples were precipitated by adding 32 µL 5M NaCl, 2 µL of GlycoBlue (Invitrogen), and 550 µL of ice-cold isopropanol. Samples were incubated at -20 °C for 1 hour, centrifuged at 4 °C at maximum speed for 30 minutes, washed twice with 500 µL of ice-cold 70% ethanol and 5 minutes of re-centrifugation, air dried, and resuspended in 11 µL of water. Paired-end sequencing was performed on an Illumina NextSeq500 at the MIT BioMicroCenter. 97 5'-OH library preparation The protocol for generating 5'-OH libraries was a modified version of the MORE RNA-seq protocol for sequencing of 5'-OH ends (Schifano et al., 2014). RNA was extracted from wild type MG1655 cells containing either pBAD33-empty or pBAD33-mazF (ML2902 and ML2903). 5'-P terminated RNA was degraded by adding 2 μL 10x Terminator exonuclease buffer A, 1 μL SUPERase-In (Invitrogen), and 1 μL Terminator exonuclease (Epicentre) to a 20 μL reaction with 1-2 μg of total RNA. Reactions were incubated 1 hour at 30 °C. Reactions were brought to 200 µL of acid-phenol:chloroform, pH 4.5 (Invitrogen). Samples were spun 10 minutes at 4 °C and the top layer was extracted and ethanol precipitated with a 4 hour incubation at -80 °C and resuspended in 34 µL of DEPC water. 5'-OH ends were phosphorylated by adding 5 μL 10x Optikinase buffer (USB), 5 μL 10 mM ATP, 1 μL SUPERase-In (Invitrogen), and 5 μL Optikinase (USB). Reactions were incubated 1 hour at 37 °C. Reactions were phenol extracted and ethanol precipitated as above, resuspending in 14.5 μL of DEPC water. 5' RNA adapters were ligated by adding 7.5 μL of 10 pmol/μL 5' RNA adapter, 3 μL of 10x ligase buffer (NEB), 3 μL of 10 mM ATP, 1 μL SUPERase- In (Invitrogen), and 1 μL of T4 RNA Ligase 1 (NEB). Reactions were incubated for 16 hours at 16 °C. Reactions were then purified on a 6% TBE-Urea (Invitrogen) gel using 2x Novex TBE- Urea sample buffer (Invitrogen). The gel was pre-run for at least 30 minutes, and samples were run 20 minutes at 200 V. The entire region above the free-adapter band was excised from the gel. Gel slices were split into four 0.5 mL tubes, centrifuged through a hole pieced in the bottom, and eluted in 600 μL 1x TE/0.3M NaCl per tube. The gel was filtered using Spin-X 0.22 µm cellulose acetate columns (Costar) and ethanol precipitations were conducted. The multiple precipitations for each sample were mixed after pelleting by resuspending in a total of 21 μL of water per sample. 10 μL from each sample was set aside in case downstream steps failed. First strand synthesis was 98 conducted with a primer with both a random region and adapter region. 1 μL of 30 pmol/μL RT primer and 1 μL of 10 mM dNTPs were added to 10 μL of gel elutions and incubated for 5 minutes at 65 °C. On ice, 4 μL of 5x first strand buffer (Invitrogen), 1 μL of 100 mM DTT, 1 μL SUPERase- In (Invitrogen), and 1 μL of Superscript II was added to each reaction. Reactions were placed on a thermocycler and incubated for 10 minutes at 25 °C, 50 minutes at 42 °C, and 15 minutes at 70 °C. 1 μL of RNase H (NEB) was added and reactions were incubated at 37 °C for 20 minutes. Reactions were then purified on a 6% TBE-Urea gel using the 2x loading buffer as above. The region from 110 – 550 nucleotides was excised by comparison with an ssRNA ladder. Each excised region was split into two 0.5 mL tubes, centrifuged through a hole pieced in the bottom, and eluted in 400 μL of 1x TE/0.3M NaCl per tube. The gel was filtered using Spin-X 0.22 µm cellulose acetate columns (Costar) and isopropanol precipitations were conducted. The multiple precipitations for each sample were mixed after pelleting by resuspending in a total of 10 μL of water per sample. Libraries were amplified by adding 18 μL of 5x Phusion HF buffer (NEB), 10 μL of 2.5 μM global 5'-OH primer, 10 μL of 2.5 μM indexed 5'-OH primer, 2 μL of 10 mM dNTP mix, and 1 μL of Phusion polymerase (NEB) to 10 μL of cDNA sample then conducting the amplification protocol following amplification protocol: 98 °C/30 seconds, 98 °C/10 seconds, 60 °C/30 seconds, 72 °C/15 seconds, 72 °C/10 minutes. Steps 2-4 were repeated for 12 cycles. The entire PCR reaction was loaded onto a 2% agarose gel and the library smear above the primer bands was extracted using the MinElute kit (Qiagen). Single-end sequencing was performed on an Illumina HiSeq2000 at the MIT BioMicroCenter. Ribosome profiling library preparation The ribosome profiling protocol was adapted from (Oh et al., 2011). Cell growth outlined in the MazF induction section was scaled up to 250 mL final culture size in 1 L flasks. Cells were 99 harvested by filtration over a 90 mm 0.2 µm filter attached to a vacuum flask pre-heated to 37 °C. After filtration, cells were scraped off the filter and immediately frozen in liquid nitrogen. To lyse cells, 650 µL of lysis buffer was prepared and flash frozen: 20 µM Tris pH 8.0, 100 mM NH4Cl, 10 mM MgCl2, 0.4% Triton X-100, 0.1% NP-40, 1 mM chloramphenicol, 15 µL of Turbo DNase (Invitrogen). Frozen lysis buffer and cells were added to liquid nitrogen cooled stainless steel grinding jars (Qiagen) and lysed on a TissueLyser II (Qiagen) instrument 5 times at 15 Hz x 3 minutes, re-cooling jars in between. Lysate was thawed and centrifuged at 20,000 rpm at 4 °C on a tabletop centrifuge. A portion of the lysate from this stage was saved and run on sucrose gradients (see below) without MNase treatment to verify that polysomes were present. To separate monosomes ~0.5 mg of RNA was mixed with 750 U of MNase, 100 U of SUPERase-In (Invitrogen), and 5 mM CaCl2, adding additional lysis buffer to achieve a final volume of 200 μL. Reactions were incubated for 1 hour at 25 °C and then were quenched by adding 2.4 μL of 500 mM EGTA. Monosomes were isolated using a 10-55% sucrose gradient generated on a Gradient Master (BioComp) in a buffer of 20 mM Tris 8.0, 100 mM NH4Cl, 10 mM MgCl and 1 mM chloramphenicol. Sampleos were centrifuged in an SW41 rotor at 35000 rpm for 2.5 hours. Gradients were fractionated and the monosome fraction was collected. To isolate monosome RNA, SDS was added to 1% and acid phenol chloroform IAA (Invitrogen) pre-warmed to 65 °C was added 1:1. Samples were shaken at 1400 rpm at 65 °C on a thermomixer for 5 minutes and then chilled on ice for 5 minutes. The aqueous layer was extracted and mixed with acid phenol chloroform a second time. After the second extraction, samples were isopropanol precipitated and resuspended in 11 μL of 10 mM Tris 7. Then, 20 μg of RNA was loaded onto a Novex 15% TBE- Urea (Invitrogen) gel using 2x TBE-urea sample buffer (Invitrogen) and run for 65 minutes at 200 V. The region between 15-45 bases was excised by comparison with a 10 bp DNA ladder. Each 100 excised region was centrifuged through a hole pieced in the bottom of a 0.5 mL tube, and eluted in 500 μL of 10 mM Tris 7. The gel was filtered using Spin-X 0.22 µm cellulose acetate columns (Costar) and isopropanol precipitations were conducted. RNA was resuspended in 15 μL of 10 mM Tris 7. To prepare for ligation of the linker, RNA was dephosphorylated by adding 2 μL of T4 PNK buffer, 1 μL of SUPERase-In (Invitrogen), and 2 μL of T4 PNK (NEB) and incubating for 1 hour at 37 °C. The enzyme was heat inactivated by incubating for 10 minutes at 75 °C. Samples were purified by isopropanol precipitation and resuspended in 11 μL of 10 mM Tris 7. Linker was ligated by diluting 30 ng of RNA in 5 μL of 10 mM Tris 7 and adding 10 μL of 50% PEG 8000 (NEB), 2 μL of 10x T4 RNA Ligase 2 buffer (NEB), 1 μL of water, 1 μL of 100 μM linker, and 1 μL of T4 ligase 2, truncated (NEB). Reactions were incubated at 25 °C for 2.5 hours, purified by isopropanol precipitation, and resuspended in 6 μL of 10 mM Tris 7. For size selection, samples were run on a Novex 10% TBE-Urea gel (Invitrogen) for 50 minutes at 200 V using 2x sample buffer (Invitrogen). The region between 35-65 bases was excised, eluted and purified as above. Samples were resuspended in 10 μL of 10 mM Tris 7. Reverse transcription was conducted by adding 1 μL of 10 mM dNTPs, 0.5 μL of 25 μM RT oligo oCJ485, and 1.5 μL of DEPC water. Mixture was denatured at 65 °C for 5 minutes and then placed on ice before adding 4 μL of 5x first strand buffer (Invitrogen), 1 μL of SUPERase-In (Invitrogen), 1 μL of 100 mM DTT, and 1 μL of Superscript III (Invitrogen). Reactions were incubated for 30 minutes at 50 °C and quenched by adding 2.3 μL of 1 M NaOH. RNA was degraded by incubating at 95 °C for 15 minutes. Samples were then mixed with 2x TBE-Urea loading buffer (Invitrogen) and run (2 lanes per sample) on a Novex 10% TBE-Urea gel at 200V for 80 minutes. The cDNA region excluding the free RT primer was excised, eluted and purified as above except that 500 μL of 10 mM Tris 8 was used for elution rather than Tris 7. The sample was resuspended in 15 μL of 10 mM Tris 8. The 101 cDNA was circularized by adding 1 μL of 1 mM ATP, 2 μL of 10x CircLigase buffer, 1 μL of 50 mM MnCl2, and 1 μL of CircLigase (Epicentre). The reaction was incubated at 60 °C for 1 hour, after which an additional 1 μL of CircLigase was added before incubating for another hour. The enzyme was deactivated by incubating the reaction for 10 minutes at 80 °C. A 5 μL aliquot of circularized cDNA was used for rRNA subtraction. A subtraction oligo mix was prepared by mixing 77 μL of o1055, 4 μL of o1056, 17 μL of o1057, and 2 μL of o1058 using 100 μM stocks. Next, 5 μL of cDNA was mixed with 1 μL of subtraction oligo mix, 1 μL of 20x SSC (Invitrogen), and 3 μL of DEPC water. Using a thermocycler, reactions were incubated at 98 °C for 75 seconds. Then the temperature was linearly decreased from 98 °C to 37 °C over 1 hour and finally the temperature was held at 37 °C for 20 minutes for hybridization. In parallel, 25 μL of MyOne streptavidin C1 Dynabeads (Invitrogen) were prepared by washing 3 times with 1x B&W buffer and resuspending in 10 μL of 2x B&W buffer; 2x B&W buffer was 10 mM Tris 7.5, 1 mM EDTA, 2 M NaCl, and 0.01% Tween. Beads were heated to 37 °C and mixed 1:1 with the hybridization reaction and incubated at 37 °C for 15 minutes. The supernatant was recovered using a magnetic rack and was isopropanol precipitated and resuspended in 10 μL of 10 mM Tris 8. Libraries were PCR amplified in a reaction with 1x high fidelity buffer (NEB), 200 μM dNTPs, 500 nM o231, 500 nM indexing primer, 0.6 μL of Phusion (NEB) in a 60 μL final reaction volume. The following amplification protocol was used: 98 °C/30 seconds, 98 °C/10 seconds, 60 °C/10 seconds, 72 °C/5 seconds. Steps 2-4 were repeated for 12 cycles. PCR reactions were purified on a Novex 8% TBE (Invitrogen) gel run at 180 V for 50 minutes. The library region was excised, eluted and purified as above using 500 μL of 10 mM Tris 8. Samples were isopropanol precipitated, resuspended in 11 μL of 10 mM Tris 8, and submitted for sequencing. Single-end sequencing was performed on an Illumina NextSeq500 at the MIT BioMicroCenter. 102 qRT-PCR acquisition and analysis Reverse transcriptase reactions were conducted by mixing 100 ng Random Primers (Invitrogen), 250 ng total RNA, and 10 nmol dNTPs in 13 µL nuclease-free water. Reactions were incubated at 65 °C for 5 minutes and placed on ice. After cooling, reactions were mixed with first strand buffer (Invitrogen) at 1X, 100 nmol DTT, and 20 U SUPERase-In (Invitrogen). After allowing the reactions to come to room temperature, 200 U of SuperScript III (Invitrogen) was added to yield a final reaction volume of 20 µL. Reactions were incubated with the following thermocycler program: 25 °C/5 minutes, 50 °C/1 hour, 70 °C/15 minutes. 5 U of RNase H (NEB) was then added and reactions were incubated for 20 minutes at 37 °C. qPCR reactions were prepared with 2x SYBR FAST Master Mix (Roche) and 300 nM of each qPCR primer in a 10 µL final volume. cDNA from reverse transcription reactions was diluted with nuclease-free water. All experimental samples and standard curves were loaded onto a 384-well plate in triplicate for qPCR. qPCR was conducted in a LightCycler 480 system (Roche) using the following thermocycler program: 95 °C/10 minutes, 95 °C/15 seconds, 60 °C/30 seconds, 72°C/30 seconds with 40 cycles of steps 2-4. Cp values were calculated using the LightCycler 480 software at the second derivative maximum. Technical replicates were averaged to yield a final Cp value for each sample and standard curve point. On each plate, relative quantities of cDNA in a given sample were calculated by comparison to a least-square fit on a 2-fold dilution standard curve (Cp vs. log-transformed standard fold dilution). Target region quantities were normalized to control region quantities and normalized change from + MazF to empty vector was calculated. Error across three biological replicates was propagated and reported in the final value. For a complete list of control and target region primers, see Table S5. 103 6S RNA reporter To use the B. subtilis 6S RNA reporter, + MazF and empty vector plasmid containing strains (ML2883 and ML2884) were transformed with reporter expression plasmids (purified from ML2885-ML2891). Cells were then grown as in the ‘MazF induction’ section with the addition of 100 ng/mL anhydrotetracycline in the wash and final MazF induction media to induce expression of the reporter construct. qRT-PCR was performed as described above using the 5' end containing the MazF site as the target region and the 3' end of the reporter as the control region. RNA was extracted 10 minutes after induction of MazF. YFP* translation reporter construct To use the B. subtilis 6S RNA reporter, + MazF and empty vector plasmid containing strains (ML2902 and ML2903) were transformed with reporter YFP* expression systems with or without ACA sites between the Shine-Dalgarno site and the start codon (purified from ML2902 and ML2903). Cultures were grown as in the ‘MazF induction’ section, but 100 ng/mL of anhydrotetracycline was added at the same time as arabinose induction of MazF or the empty vector. Cultures were plated in triplicate on a Synergy H1 plate reader (BioTek) at 37°C with orbital shaking at 237 rpm. OD600 and YFP fluorescence were recorded every 5 minutes for 2 hours and triplicate samples were averaged. YFP signal was normalized to blanked OD600 and the ratio of the + MazF to empty vector sample was calculated. In Figure 2.9C, we report the average and S.D. of 3 biological replicates. Northern blotting For Northern blots, induction and RNA extraction were conducted as above. For all blots except Figure 2.11B probe 3, 200-1000 ng of total RNA and low range ssRNA ladder was loaded onto Novex 6% TBE-Urea gels (Invitrogen) using 2x sample buffer (Invitrogen) and run for 40-60 104 minutes at 100 V. Ladder positions relative to rRNA bands were recorded using SYBR Gold (Invitrogen). RNA was transferred onto Amersham Hybond-N+ nylon membrane (GE) using a Trans-Blot Turbo semi-dry transfer apparatus (BioRad) for 90 minutes at 0.4 A. For Figure 2.11B probe 3, 800 ng of RNA was loaded onto a denaturing 1% agarose gel with formaldehyde load dye (Invitrogen) and run for 3 hours at 100 V. ssRNA ladder (NEB) was used for size comparison. RNA was transferred onto Amersham Hybond-N+ nylon membrane (GE) using capillary transfer overnight. RNA was crosslinked to the membrane using the autocrosslink setting on a UV Stratalinker 1800 (Stratagene). Oligonucleotide probes were radiolabeled in a 25 μL reaction by mixing 1 μL of 10 μM oligo, 2.5 μL of 10x T4 PNK buffer (NEB), 7.5 μL of [γ-32P] ATP (PerkinElmer), and 1 μL of T4 PNK (NEB). Reactions were incubated at 37 °C for 30 minutes and the enzyme was inactivated by incubating at 65 °C for 20 minutes. Free ATP was removed using a NucAway spin columns (Invitrogen). Membranes were pre-hybridized by adding 10 mL of pre- heated ULTRAhyb-Oligo (Invitrogen) to the membrane and incubating at 42 °C for 30 minutes with rotation in a hybridization oven. 5-20 μL of radiolabeled probe was added and hybridization was allowed to proceed overnight. After hybridization, membranes were washed twice with 2x SSC (Invitrogen) / 0.5% SDS, sealed in plastic bags and incubated at room temperature with a storage phosphor screen for 4-16 hours. Images were recorded with a Typhoon FLA 9500 (GE) instrument. ImageJ was used to crop images and lower the upper limit of the display range to make bands visible for figures. Quantification of 43 nt fragment and rRNA cleavage estimate Northern blots were run as above. The concentration of synthetic 43 nt marker (IDT) and total RNA loaded were measured by Qubit hsRNA assay (Invitrogen) before loading on gel. After imaging, ImageJ was used to quantify signal from the marker standard curve and the total RNA 105 samples. Amount of RNA at the 43 nt band was quantified by linear interpolation from the standard curve. We estimated the fraction of cleaved ribosomes by first calculating the fraction of the ribosome which would be cleaved off (~0.01). Then, we determined the expected ng of 43 nt fragment we would observe if 100% of ribosomes were cleaved using the amount of total RNA we loaded in each lane assuming that 85% of total RNA is rRNA. Finally, we compared our observed 43 nt fragment in each lane to the expected value to estimate the percentage of ribosomes that may be cleaved. The molecules of 43 nt fragment per cell calculation is based on an estimate of ~10,000 ribosomes per cell. Isotopic labeling of mature and nascent rRNA Cells were grown as above in 40 mL final culture volume with a pulse of 5 μCi of [5, 6-3H] uridine (PerkinElmer) and 1000-fold excess chase of cold uridine at times indicated. Cells were harvested by centrifugation at 10000 g for 1 minute at 4 °C. Cell pellets were placed on ice and resuspended in 300 μL of lysis buffer: 20 mM Tris 8, 100 mM NH4Cl, 10 mM MgCl2, 0.5 mM EDTA, and 6 mM β-mercaptoethanol with 1 μL of Ready-Lyse (Epicentre), 5 μL of SUPERase-In (Invitrogen), and 2 μL of TURBO DNase (Invitrogen) added directly to each resuspension. Lysis reactions were incubated on ice for 5 minutes. Reactions were then incubated at -80 °C for 10 minutes and incubated at 4 °C for 30 minutes. This freeze-thaw cycle was repeated for a total of 3 freeze-thaws to lyse the cells. Cell debris was then removed by centrifugation for 20 minutes at 4 °C at maximum speed on a tabletop centrifuge. The supernatant was loaded onto a 5-30% linear sucrose gradient generated on a Gradient Master (BioComp) instrument in a buffer of 20 mM Tris 8.0, 100 mM NH4Cl, and 10 mM MgCl. Samples were centrifuged in an SW41 rotor at 35000 rpm for 4 hours. Gradients were fractionated by poking a hole in the bottom of the centrifuge tube and collecting 40-50 ~200 μL fractions in a 96-well plate. A portion of each fraction was back-diluted in water 106 and A254 was measured. 100 μL of each fraction was added to 4 mL of Ecoscint H (National Diagnostics) and 3H cpm was measured on a TRI-CARB 4910 TR liquid scintillation counter (PerkinElmer). Data Analysis Details Sequencing read mapping and normalization FASTQ files for each barcode were mapped to the MG1655 genome (NC_000913.2) using bowtie2 (version 2.1.0) with the following arguments: -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 -p 6 -I 40 -X 300 (Langmead and Salzberg, 2012). The samtools (version 0.1.19) suite (Li et al., 2009) was used via the pysam library (version 0.9.1.4) for interconversion of BAM and SAM file formats and conducting indexing. Adapter sequences were trimmed from ribosome profiling reads. Gene names, coding region positions, gene ontology categories were extracted from ecocyc.org. Paired-end sequencing, coding regions: For each uniquely mapped paired fragment, one count was added for all positions between and including the 5' and 3' ends of the forward and reverse strand reads. To correct for variability in sequencing depth, counts at each position were divided by a sample size factor. Briefly, counts recorded in each coding region were summed for all samples and then the geometric mean was taken across samples to yield a reference sample. The size factor for a given sample was the median counts in coding regions after normalizing counts to the reference sample. Except in figures where replicates were compared, reported counts were the average of the log2 of two replicates after adding a pseudocount to all positions and normalizing to the sample size factor. The cleavage ratio at each nucleotide was then calculated as the log2 transformed + MazF:empty vector ratio. Paired-end sequencing, rRNA loci: For all paired fragments, one count was added for all positions between and including the 5' and 3' ends of the forward and reverse strand reads. To correct for 107 variability in sequencing depth, counts at each position were normalized by calculating counts per million counts. Prior to normalization, a pseudocount was added at each position. To determine the number of counts mapping across conserved regions of the different rRNA loci, we made alignments of the 16S and 23S genes including surrounding immature rRNA regions using Clustal Omega. Using the alignment, we made a consensus map of the mature regions of the 16S and 23S genes. For each position in the consensus map, the counts occurring at the corresponding positions in the genome were summed. Consensus positions which did not exist in all loci (i.e. insertions) were left undefined and appear as blank regions on plots. Consensus positions with mismatches at one or more loci were allowed. The summed counts for each sample were log2 transformed and two replicates were averaged before plotting. 5'-OH sequencing: For all reads, a single count was added at the position corresponding to the 5' end. To correct for variability in sequencing depth, counts at each position were normalized by calculating counts per million reads uniquely mapped. Prior to normalization, a pseudocount was added at each position. The 5'-OH ratio was the log2 transformed + MazF:empty vector ratio. Ribosome profiling: For uniquely mapped reads, a single count was added at the position corresponding to the 5' or 3' end of the read, depending on the analysis. To correct for variability in sequencing depth, reads at each position were normalized by calculating reads per million reads uniquely mapped. For visualization of ribosome position on gene plots, a rolling 30 nucleotide window was used to sum read counts from the 3' ends of reads, adding a single pseudo-count to enable log2 transformation. Gaussian smoothing (σ=20 nucleotides, filter truncation at 4σ) was also conducted on the rolling sum to enable easy visual comparison of + MazF and empty vector samples. 108 Comparison of qRT-PCR data to sequencing data To compare RNA-Seq and qRT-PCR data, we compared the RNA abundance measured by qRT- PCR to the set of RNA-Seq values across the entire region assayed by qRT-PCR. In Figure 2.1D- E, the minimum cleavage ratio value in the region was compared to the measured RNA abundance. The four dots correspond to cleavage ratios calculated using all possible pairings of two + MazF replicates and two empty vector replicates. In Figure 2.1F, the minimum cleavage ratio was calculated by averaging all of the replicate cleavage ratios. In Figure 2.4C, the sum of 5'-OH values at ACA sites within the amplified region was compared to the measured RNA abundance. Assessing the reproducibility of the cleavage ratio Most analyses with the cleavage ratio involved taking the minimum cleavage ratio in a gene or region. Thus, if any position in this region had few reads, it would reduce the certainty in measurements of the entire region. To simulate measurements of this type, we split coding regions into non-overlapping windows of 100 nucleotides. Windows were generated by combining all contiguous coding regions and stepping the maximum number of 100 nucleotide windows in a left to right direction across each contiguous region. To assess the reproducibility of the cleavage ratio at 5 minutes (Figure 2.2B), any regions which had, at nucleotide position, fewer counts in the empty vector sample than the expression cutoff of 64 counts were discarded, leaving 15,926 regions. The minimum cleavage ratios in these regions were compared. This protocol was also used to compare 5 minute, 30 minute, and 60 minute cleavage ratios in Figure 2.2C, leaving 1,868 regions. Note that these cleavage ratios were calculated using 30 minute empty vector data for all samples in Figure 2.2C. 109 Identifying cleaved regions and the MazF motif To determine additional sequence specificity beyond ACA, we set out to find a set of ACAs that were very likely to be MazF targets to compare against the general set of ACAs. To do this, we looked for deep and narrow valleys in the cleavage ratio that had single ACA sites in them, as these were likely to be the recognition site for MazF. To find local minima, we conducted Gaussian smoothing of the cleavage ratio using σ = 40 nucleotides, truncating the filter at 4σ. Next, we found local minima on the smoothed cleavage ratio and searched the region ±25 nucleotides from the identified local minima for the minimum value on the unsmoothed cleavage ratio. To ensure that the valley was relatively narrow, we next looked 100 nucleotides upstream and downstream and found the first position on both sides that was ≥1 larger in the cleavage ratio, equivalent to a 2-fold increase in relative RNA abundance; these left and right nucleotides defined the cleaved region. If the cleavage ratio did not increase by 1 within 100 nucleotides on either side, the region was removed from consideration. To ensure that all cleaved regions were transcribed and of high certainty, any region that had any position that was not in a coding region or was below the expression cutoff of 64 counts was removed. The above algorithm was also used to define regions to test the frequency of tri-nucleotide motifs with the exception that 50 nucleotides, rather than 100, was the maximum distance allowed for an increase of 1 in the cleavage ratio. To determine the motif, we recorded the nucleotides ±3 from ACAs that were the only ACA in the cleaved region. The surrounding nucleotides were also recorded for all ACA sites occurring in coding regions. The frequency of nucleotides at cleaved ACAs and background (all coding) ACAs were compared to generate a sequence logo (Figure 2.5C) and position weight matrix (Figure 2.5D) for the three upstream/downstream positions relative to the ACA. 110 Verification of the motif To determine if high scoring motifs were enriched in cleaved regions (Figure 2.5G), we used the same algorithm and settings to find regions as we used to find motifs with the exception that we did not filter the regions based on how many ACA sites they had. To ensure that the motif does not change significantly with different algorithm settings, we varied the minima search distance (10, 25, 50, or 100), the minimum change in cleavage ratio (0.75, 1, 1.25, or 1.5), and the maximum region half-width (50, 100, 150, or 200), resulting in the 64 different motifs (Figure 2.6B). The preferences for particular nucleotides were stable and all calculated position weight matrices correlated well (mean R2 = 0.92, minimum R2 = 0.74). The motif finding algorithm used above relies on the formation of stable MazF cleavage products (the edges of the ‘valley’). To ensure our motif was not strongly affected by this, we verified that high scoring MazF sites were enriched in cleaved regions regardless of the stability of the fragments generated by MazF cleavage. To do this, we classified as cleaved a region of any size with a cleavage ratio value ≤ -1 after median- normalizing the cleavage ratio across each coding region. By this metric, 7.5% of all positions were cleaved. Then, for each score class of motif and all 3-nt motifs apart from ACA, we recorded the fraction of these sites which occurred in cleaved or uncleaved regions of the genome (Figure 2.6C). Cleaved regions accounted for 31.5, 16.5 and 9.4% of high (score ≥ 1), medium (1 > score ≥ -1.5) and low (-1.5 > score) scoring sites, respectively. The 3-nt motifs had a mean of 7.5% occurring in the background region, the same as the percent of positions classified as cleaved in this analysis. Based on this analysis, we conclude that higher scoring motifs are more enriched in cleaved regions regardless of the stability of the degradation products. 111 Identification of leaderless transcripts To determine if genes were leaderless and/or cleaved in their coding region, we only considered genes that met our expression threshold both across their entire coding region and in the 15 nucleotides just upstream of the translation start site (n=941). The amount of possible leaderless transcript was estimated by comparing each of the 15 positions upstream of the transcription start site to the position 15 nucleotides ahead (for a total of 15 comparisons). We define the maximum of these 15 values as an upper-bound estimate on the amount of leaderless transcript relative to unprocessed transcript. Observation of ribosome footprints near MazF motifs To generate single-nucleotide-level plots of the change in 3' ribosome footprints near MazF motifs, we collected the regions surrounding a category of ACA sites for which the ±100 nucleotides were all in coding regions. Random regions were also selected with the same requirement. Then, for each nucleotide position within all regions, the log2 change in footprints was calculated after expression of MazF for 5 minutes. Finally, the average was calculated across the regions to generate the plotted data. Any nucleotide positions which were undefined due to having no footprints in either the empty vector of MazF treatment were ignored. Calculation of changes in 3' ribosome footprints The number of 3' ribosome footprints in a given gene was calculated by summing ribosome footprints in the nucleotides following the last MazF motif (score ≥ -1.5). This avoided inflation of ribosome counts in MazF-cleaved samples by ribosome ‘traffic-jams’ upstream of cleaved sites. To be included in this analysis, genes had to be expressed (as defined above) in their leader region and across their entire length (n=941) and also had to have at least 30 nucleotides following the last MazF motif (n=766). If there was no motif, footprints in the entire coding region were 112 summed. These values were then used to calculate the change in ribosome footprints after MazF induction (Figure 2.9F, Table S3). Though not included in the above analyses, in genes where there were less than 30 nucleotides after the last MazF motif, the summed footprints in the last 50 nucleotides were included in Table S3 for reference. Data and Software Availability Sequencing data Processed data used in analyses is available in Table S1 (minimum cleavage ratios in expressed genes), Table S2 (identification of leaderless and full-length, leaderless genes), and Table S3 (changes in ribosome footprint counts at the 5' and 3' ends of genes). The raw sequencing files and nucleotide-resolution counts and cleavage ratios have been deposited on GEO database under the ID code GSE107330. Northern blot data Raw images of all Northern blots are provided on Mendeley Data at: doi:10.17632/msk9pcd3mm.1 Tables Key resources table and supplemental tables can downloaded from the online version of this document. Acknowledgements We thank M. LeRoux, M. Guo, and J. Davis for comments on the manuscript. M.T.L. is an Investigator of the Howard Hughes Medical Institute (HHMI). This work also supported by an NSF predoctoral graduate research fellowship to P.H.C. 113 Author Contributions P.H.C. performed all experiments. P.H.C. and M.T.L. designed the experiments, performed analyses, and wrote the paper. Declaration of Interests The authors declare no competing interests. References Balaban, N.Q., Merrin, J., Chait, R., Kowalik, L., and Leibler, S. (2004). Bacterial persistence as a phenotypic switch. Science 305, 1622–1625. Guzman, L.M., Belin, D., Carson, M.J., and Beckwith, J. (1995). Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J. Bacteriol. 177, 4121–4130. Helaine, S., Cheverton, A.M., Watson, K.G., Faure, L.M., Matthews, S.A., and Holden, D.W. (2014). Internalization of Salmonella by Macrophages Induces Formation of Nonreplicating Persisters. Science (80-. ). 343, 204–208. Hwang, J.Y., and Buskirk, A.R. (2017). A ribosome profiling study of mRNA cleavage by the endonuclease RelE. Nucleic Acids Res. 45, D327–D336. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359. Levin, J.Z., Yassour, M., Adiconis, X., Nusbaum, C., Thompson, D.A., Friedman, N., Gnirke, A., and Regev, A. (2010). Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078– 2079. Maisonneuve, E., Shakespeare, L.J., Jorgensen, M.G., and Gerdes, K. (2011). Bacterial persistence by RNA endonucleases. Proc. Natl. Acad. Sci. 108, 13206–13211. Mets, T., Lippus, M., Schryer, D., Liiv, A., Kasari, V., Paier, A., Maiväli, Ü., Remme, J., Tenson, T., and Kaldalu, N. (2017). Toxins MazF and MqsR cleave Escherichia coli rRNA precursors at multiple sites. RNA Biol. 14, 124–135. Oh, E., Becker, A.H., Sandikci, A., Huber, D., Chaba, R., Gloge, F., Nichols, R.J., Typas, A., Gross, C.A., Kramer, G., et al. (2011). Selective Ribosome Profiling Reveals the Cotranslational Chaperone Action of Trigger Factor In Vivo. Cell 147, 1295–1308. 114 Pedersen, K., Zavialov, A. V., Pavlov, M.Y., Elf, J., Gerdes, K., and Ehrenberg, M. (2003). The Bacterial Toxin RelE Displays Codon-Specific Cleavage of mRNAs in the Ribosomal A Site. Cell 112, 131–140. Potrykus, K., and Cashel, M. (2008). (p)ppGpp: still magical? Annu. Rev. Microbiol. 62, 35–51. Sauert, M., Wolfinger, M.T., Vesper, O., Müller, C., Byrgazov, K., and Moll, I. (2016). The MazF- regulon: A toolbox for the post-transcriptional stress response in Escherichia coli. Nucleic Acids Res. 44, 6660–6675. Schifano, J.M., Vvedenskaya, I.O., Knoblauch, J.G., Ouyang, M., Nickels, B.E., and Woychik, N. a (2014). An RNA-seq method for defining endoribonuclease cleavage specificity identifies dual rRNA substrates for toxin MazF-mt3. Nat. Commun. 5, 3538. Schifano, J.M., Cruz, J.W., Vvedenskaya, I.O., Edifor, R., Ouyang, M., Husson, R.N., Nickels, B.E., and Woychik, N.A. (2016). tRNA is a new target for cleavage by a MazF toxin. Nucleic Acids Res. 44, 1256–1270. Siibak, T., Peil, L., Dönhöfer, A., Tats, A., Remm, M., Wilson, D.N., Tenson, T., and Remme, J. (2011). Antibiotic-induced ribosomal assembly defects result from changes in the synthesis of ribosomal proteins. Mol. Microbiol. 80, 54–67. Vesper, O., Amitai, S., Belitsky, M., Byrgazov, K., Kaberdina, A.C., Engelberg-Kulka, H., and Moll, I. (2011). Selective Translation of Leaderless mRNAs by Specialized Ribosomes Generated by MazF in Escherichia coli. Cell 147, 147–157. Yamaguchi, Y., Park, J.-H., and Inouye, M. (2011). Toxin-Antitoxin Systems in Bacteria and Archaea. Annu. Rev. Genet. 45, 61–79. Zhang, Y., Zhang, J., Hoeflich, K.P., Ikura, M., Qing, G., and Inouye, M. (2003). MazF cleaves cellular mRNAs specifically at ACA to block protein synthesis in Escherichia coli. Mol. Cell 12, 913–923. Zhang, Y., Zhang, J., Hara, H., Kato, I., and Inouye, M. (2005). Insights into the mRNA Cleavage Mechanism by MazF, an mRNA Interferase. J. Biol. Chem. 280, 3143–3150. Zorzini, V., Mernik, A., Lah, J., Sterckx, Y.G.J., De Jonge, N., Garcia-Pino, A., De Greve, H., Verse, W., and Loris, R. (2016). Substrate recognition and activity regulation of the Escherichia coli mRNA endonuclease MazF. J. Biol. Chem. 291, 10950–10960. 115 Chapter 3 Specificity and Growth Inhibition in E. coli’s Endoribonuclease Toxins This work is under preparation for submission as Culviner, P.H. and Laub, M.T. 116 Summary Toxin-antitoxin systems are widely distributed genetic modules that can reversibly inhibit the host bacteria’s growth. Many systems encode toxins that are endoribonucleases. In a previous study, we found that the toxin MazF rapidly inhibits ribosome biogenesis through a combination of targeting both ribosomal protein transcripts and rRNA precursors. However, E. coli encodes 10 additional endoribonuclease TA systems. Notably, 6 of which require active translation of their target RNA and thus cannot directly cleave rRNA. Here, using an RNA-Seq-based approach we developed for MazF, we identify the cleavage targets of 8 of these toxins. We show that toxins cleave a significant proportion of E. coli transcripts with limited nucleotide sequence specificity. Like MazF, we find no evidence of degradation or targeting of mature ribosomes. Instead, we observed that each toxin inhibits rRNA processing and ribosome biogenesis. We propose that toxin cleavage of ribosomal protein transcripts disrupts the proper protein stoichiometry of nascent ribosomes leading to the formation of a variety of stalled ribosome precursors. 117 Introduction Toxin-antitoxin systems (TA systems) are genetic modules distributed across bacteria and archaea that are capable of regulating the growth of their host cell. Though they were first characterized as plasmid maintenance systems, many bacterial chromosomes also encode a diverse set of TA systems from a number of evolutionary families with different methods of inhibiting their host cell’s growth (Pandey and Gerdes, 2005). Type II TA systems are comprised of a co-operonic toxin protein that inhibits growth and an antitoxin protein that binds the toxin, preventing its activity. Under still poorly-characterized conditions, the antitoxins, which are relatively unstable proteins, may be degraded or liberated from their cognate toxins, thereby freeing the toxins to inhibit growth (Harms et al., 2018). This growth inhibition has been argued to provide multiple benefits to cells. Some TA systems appear to be activated during phage infection, with toxin activation resulting in abortive infection, an altruistic suicide that prevents the phage from hijacking the cellular machinery to propagate itself (Blower et al., 2011; Koga et al., 2011). TA systems have also frequently been suggested to play a role in adaptation to various stresses, particularly those where slow growth might improve survival (Page and Peti, 2016). In support of this role, the ectopic expression of multiple toxins causes cells to enter a persister-like state in which they are non-growing and antibiotic tolerant (Bokinsky et al., 2013; Keren et al., 2004; Mok et al., 2015). Additionally, the transcription of many TA systems increases in response to a range of stresses. However, clear deletion phenotypes for TA systems remain elusive (Goormaghtigh et al., 2018). Although the functions of TA systems remain unclear, their abundance and prevalence in bacteria strongly suggests that they are important under some conditions. Many bacteria encode dozens of TA systems, often with many paralogous systems in which the toxins have a common mechanism 118 of action. For instance, E. coli MG1655 encodes 11 type II TA systems where the toxin is known or predicted to have RNase activity. The RNase toxins of type II TA systems have bene found to target a variety of RNAs, including mRNAs, tRNAs, and rRNAs (Barth et al., 2019; Hwang and Buskirk, 2017; Mets et al., 2019; Schifano et al., 2014). However, the specificities and RNA targets of most RNase toxins are still poorly characterized, in part because most prior studies have examined cleavage of a few model transcripts. In a prior study, we developed and used a quantitative RNA-Seq-based approach to systematically map the RNA targets of E. coli MazF. We found that MazF leads to rapid, widespread cleavage of mRNAs, leading to a global disruption in translation (Culviner and Laub, 2018). Additionally, we found that MazF drives the accumulation of rRNA precursors, likely by a combination of direct cleavage of nascent rRNA and by cleavage some ribosomal protein transcripts to prevent their translation, leading to consequent defects in rRNA processing and ribosomal assembly. These results suggested that MazF inhibits cell growth by targeting mRNAs and by inhibiting the synthesis of new ribosomes. This mechanism of growth inhibition is enabled in part by MazF’s relatively low-complexity, and thus highly abundant, cleavage site, with a core requirement of the trinucleotide ACA and some additional specificity provided by additional nucleotides on either side of the ACA. Notably, our prior study of MazF also strongly refuted an earlier model suggesting that MazF cleaves mature rRNA to create ‘specialized’ ribosomes that translate leaderless mRNAs also created by MazF (Vesper et al., 2011). Where characterized, other RNase toxins also have relatively low-complexity cleavage motifs (Yamaguchi and Inouye, 2011). Some RNase toxins, such as E. coli RelE, have almost no sequence specificity, but are ribosome-associated and cleave mRNAs in a translation-dependent manner with a clear, reading frame bias (Hwang and Buskirk, 2017). There is evidence that six of the 119 RNase toxins in E. coli specifically target translated RNAs; crystal structures of some of these indicate that the ribosome repositions key catalytic amino acids resulting in cleavage only at specific positions relative to the reading frame (Neubauer et al., 2009). Whether the ribosome- associated toxins impact rRNAs and ribosome biogenesis, like MazF does, is not clear. Here, we used the same RNA-Seq-based methodology used to study MazF to examine each of 9 different RNase toxins from E. coli. We find that each toxin numerous mRNAs, though the precise locations and extent of cleavage within each transcript varied for each toxin. Most toxins had limited sequence specificity, though several toxins, particularly the translation-dependent toxins, had a bias to cleavage near the 5′-ends of coding regions. Analysis of rRNAs indicated little direct cleavage of the 16S and 23S rRNAs, but every toxin led to increased levels of rRNA precursors, indicative of defects in RNase III processing. Using high-resolution sucrose gradients, we demonstrated that these defects lead to substantial defects in ribosome biogenesis, along with the appearance of abnormal rRNA peaks unique to each toxin. As with MazF, we find no evidence that mature ribosomes are degraded or modified. Thus, our data indicate that although E. coli endoribonuclease toxins cleave different sets of RNAs to different extents, they each inhibit the biogenesis of new ribosomes. Critically, this inhibition of ribosome biogenesis occurs even for the translation-dependent toxins that are unable to cut rRNA directly, strongly favoring a model in which ribosome biogenesis is disrupted indirectly through the decreased synthesis of ribosomal proteins whose transcripts are cleaved by the toxins. 120 Figure 3.1: RNase toxins rescuably inhibit cell growth. (A) Sample data from expression of empty / empty and toxin (MazF) / antitoxin (MazE) containing vectors. Cells were grown to mid-log then back-diluted onto plates containing arabinose (red, toxin induction) or arabinose and aTc (blue, toxin and antitoxin induction). Three replicates (faded lines) were used to calculate the mean (solid lines). (B) Data from growth curves between 30 minutes and 2 hours (dotted lines in 3.1A) were used to calculate doubling times for toxin (red) and toxin and antitoxin (blue) expression. Error bars show the standard deviation of 3 replicates. Results Most toxins inhibit cell growth and can be antagonized by a cognate antitoxin Before determining the effects of toxin expression on E. coli transcripts, we first sought to verify that each toxin is, in fact, toxic and that toxicity could be rescued by co-expression of its cognate antitoxin. To this end, we placed each toxin under the control of an arabinose-inducible promoter and each antitoxin under the control of a tetracycline-inducible promoter. Cognate pairs of toxins and antitoxins were then combined and expressed from compatible, low-copy plasmids in an otherwise wild-type E. coli MG1655 background. Individual strains harboring each cognate toxin- antitoxin pair were grown to mid-exponential phase (OD600 ~ 0.25-0.3) in M9 glycerol, followed by back dilution to OD600 ~ 0.1 with the addition of either arabinose or arabinose and anhydrotetracycline to induce the toxin alone and or the toxin and antitoxin, respectively (Figure 3.1A). Growth from each experiment was monitored for up to 3 hrs and growth between 30 min 121 and 2 hr used to calculate doubling times (Figure 3.1B). For 8 of 11 of the TA systems we observed robust toxicity upon induction of toxin alone, with doubling times increasing by > 2-fold compared to the empty vector strain. In each of these 8 cases, coexpression of the cognate antitoxin restored growth, with doubling times comparable to those seen for a strain harboring empty vectors. For the other 3 cases: HicA was toxic, though not to the same degree as other toxins, and rescued by coexpression with HicB, while for YafQ and RnlA, toxicity was not replicable (data not shown). The variable toxicity for YafQ and RnlA may indicate that the endogenous copy of the cognate antitoxin was sufficient to inhibit induced toxicity or that another, uncontrolled factor influences toxicity. In subsequent analyses, we focused on the 9 systems for which the toxin exhibited toxicity that could be rescued by coexpression of the cognate antitoxin. Toxin expression results in widespread degradation of E. coli transcripts To determine how toxins affect E. coli mRNAs, we conducted paired-end, strand-specific RNA- Seq after expressing each toxin for 10 minutes, and then mapped the full nucleotide coverage of all sequenced fragments. As before with MazF (Culviner and Laub, 2018), we calculated a cleavage ratio (log2 of the toxin:empty-vector ratio of fragment densities) at each nucleotide in each transcript (Figure 2.1A). A negative cleavage ratio indicates that a region is either cleaved or somehow destabilized following expression of an RNase toxin. We chose a 10 minute time-point to minimize secondary, indirect effects of toxin expression while allowing for expression of the toxin. For comparison to the toxins, we also performed the same paired-end, strand-specific RNA- Seq on cells treated for 5 minutes with either rifampicin, which blocks transcription, or chloramphenicol, which blocks translation. To determine the effects of expressing individual toxins, we first examined the cleavage ratios across a given transcript, hereafter called a cleavage profile, for several highly-expressed genes 122 (atpA-I, dnaKJ, and groL) (Figure 3.2A-C). If toxins act as mRNases, they should generate cleavage ratio valleys within transcripts as seen previously with MazF (Culviner and Laub, 2018). In contrast, if a toxin indirectly affects the transcription of a gene, it should produce an increase or decrease in the cleavage ratios across an entire transcript while maintaining the general shape of the profile. The rifampicin and chloramphenicol data provide a reference to help assess changes that arise from the effects of housekeeping RNases when either transcription or translation is disrupted. Two patterns were immediately apparent in the selected transcripts shown in Figure 3.2. First, toxin expression generally resulted in more variability in cleavage ratios across coding regions than either rifampicin or chloramphenicol treatment, supporting the notion that these RNase toxins are likely cleaving and destabilizing a wide variety of sites. Second, the different toxins have different specificities: though all four toxins shown destabilized parts of the operons shown, the sites of minimum cleavage ratio were different for each toxin. Notably, for both MqsR and RelE there was an apparent 5′-bias in where cleavage ratio minima were found. For RelE, the regions with the lowest cleavage ratios were toward the 5′-ends of dnaK (Figure 3.2A, red star), atpE, and atpD (Figure 3.2C, red stars), whereas MqsR expression produced a cleavage ratio minimum near the 5′ end of atpA (Figure 3.2C, blue star). By contrast, MazF and ChpB did not appear to have similar 5′ biases in terms of cleavage ratio minima. A 5′ bias for cleavage ratio minima was expected for RelE, whose activity is ribosome-dependent such that the majority of cleavage occurs near the translational start site (Hwang and Buskirk, 2017). It was unclear why MqsR, which was suggested to be a ribosome-independent RNase (Christensen- Dalsgaard et al., 2010), would prefer the 5′-ends of coding regions. Importantly, the cleavage profiles for each transcript following rifampicin or chloramphenicol treatment were significantly smoother throughout the length of the transcript indicating that the patterns arising from toxin 123 Figure 3.2: Example cleavage ratio profiles. Cleavage ratios are plotted across three genomic regions. Top plot shows 5 minutes of rifampicin (orange) or chloramphenicol (blue) treatment. Bottom 4 plots compare expression individual toxins (black); all toxins were expressed for 10 minutes with the exception of MazF which was expressed for 5 minutes. Red and blue stars show genes with cleavage near their 5'-ends for RelE and MqsR, respectively. 124 expression reflect their endoribonuclease activity rather than a global cessation of gene expression and housekeeping RNase activity. To more systematically assess the cleavage patterns of each toxin, we examined each transcript's minimum cleavage ratio and it's normalized minimum, calculated as the difference between the minimum and median cleavage ratio. Again, for comparison, we did the same analysis on wild- type cells treatment with rifampicin or chloramphenicol. For all of these analyses, we examined only transcripts with at least 64 normalized counts across the entire coding region in the untreated sample (hereafter referred to as ‘well-expressed genes’). Low values for the raw and normalized minima will arise from transcripts with a significant cleavage ratio valley relative to the rest of the gene, likely due to activity of the expressed toxin. Boxplots summarizing the distribution of cleavage ratio minima for all well-expressed transcripts are shown in Figure 3.3A-B. We found that all toxins except YafO exhibited distributions of minima values that differed significantly from the empty vector control and the rifampicin and chloramphenicol controls, indicating widespread cleavage of mRNAs. For each toxin, the distributions had long negative tails indicating the presence of many extreme cleavage ratio valleys (Figure 3.3A-B). There was no obvious bimodality in these distributions indicative of a set of ‘targeted’ and ‘spared’ mRNAs. Most toxins, again with YafO as an exception, led to a minimum cleavage ratio < −1, i.e. > 2-fold down, in more than 50% of all expressed coding regions (Figure 3.3C). In contrast, the empty vector control had only 0.6% of genes meeting this threshold, with rifampicin and chloramphenicol treatment each yielding only ~30% of genes with cleavage ratio minima < −1. Taken together, these results indicate that the expression of most endoribonuclease toxins results in a major remodeling of the E. coli transcriptome and that toxins are cleaving mRNA in vivo, not simply causing their destabilization and degradation indirectly via transcription or translation inhibition. 125 Figure 3.3: Toxins target a wide variety of genes. (A) To identify the depth of valleys relative to the rest of the gene, the difference of minimum and median cleavage ratio were calculated for each well-expressed coding region. Low values indicate deep valleys. Boxplots show the median, lower and upper quartiles, and quartile ± 1.5 interquartile range. (B) The minimum cleavage ratio in each well-expressed coding region. Boxplots show the median, lower and upper quartiles, and quartile ± 1.5 interquartile range. (C) The percent of well- expressed coding regions with at least one position with a cleavage ratio below -1, or a relative 2- fold downregulation of RNA abundance at the site. Toxins have limited but variable cleavage specificity Inspection of the set of transcripts targeted by each toxin, i.e. those with the lowest minima, did not reveal or suggest any clear functional groups of genes, with the set of cleaved transcripts including genes with a wide variety of cellular functions. Nevertheless, and despite the number of genes affected, each toxin may still have some target specificity at the sequence level. MazF requires the trinucleotide motif ACA, with our prior work showing that the two flanking nucleotides on either side of the ACA provide some additional specificity. To assess the sequence specificity of each toxin, we focused on well-defined cleavage sites that manifest as deep, narrow valleys. We identified such cases by determining local minima within cleavage profiles and then selecting those minima where the cleavage ratio increased by at least 0.75 within 50 nucleotides 126 on both sites; valleys were required to at least partially overlap coding regions. With these stringent parameters, each toxin generated between 3 (YafO) and 507 (MazF) cleavage valleys (Figure 3.4A, left); YafO was excluded from subsequent analyses of specificity due to the low statistical power associated with so few clear valleys. We measured the occurrence of all 64 possible sequences of length 3 in the defined valleys for each toxin (Figure 3.4A, right). MazF showed the strongest signal in this analysis with 98.6% of regions containing an ACA site, as expected. Analysis of ChpB also showed a strong sequence preference, with the trinucleotide TAC appearing in 92.0% of regions. For the other toxins, no trinucleotide sequence was found in > 90% of cleavage valleys. The previously proposed cleavage site for MqsR, GCU, was among the top three sequences (Yamaguchi et al., 2009). The top hit for all of the known translation-dependent toxins (RelE, YoeB, YhaV, and HigB) was AUG, likely due to the proximity of many cleavage regions to the translational start site. Notably, MqsR also featured AUG as one of the most frequent trinucleotides even though prior work has suggested that it cuts mRNA in a ribosome-independent manner (Christensen-Dalsgaard et al., 2010). In general, the relative lack of sequence specificity seen here, combined with our observation that a large fraction of transcripts is cleaved by each toxin (Figure 3.3C), supports a model in which toxins act primarily as general mRNA interferases, not specific regulators of gene expression. We also examined whether each toxin preferentially cleaves near the 5′- or 3′-ends of targeted transcripts. For this analysis, each valley was categorized by its location relative to coding regions: upstream, near the 5′-end, in the middle region, near the 3′-end, or downstream. Apart from the middle region, which varied in length, each of these categories were 50 nucleotides in length for each gene. We plotted the number of cleavage sites observed in each region after normalizing to the total size in nucleotides of each region type (Figure 3.4B). To account for effects from upstream 127 Figure 3.4: Toxin cleavage specificity. (A) Frequency of all 64 sequences of length 3 in cleaved regions. For each toxin, selected sequences described in text are highlighted (black outline). (B) Region-size normalized occurrence of cleavage sites upstream, at the 5'-end, in the middle region, at the 3'-end, or downstream of coding regions. (C) Analysis as in (A), except only for coding regions without nearby upstream or downstream genes. (D) Summary of 5'- and 3'-biases data calculated from (B): the ratio of upstream and 5'-end or downstream and 3'-end cleavage sites to middle cleavage sites. or downstream genes, we also plotted the normalized number of cleavage valleys after removing transcripts with nearby upstream and downstream genes (Figure 3.4C, left). Finally, to summarize 128 the cleavage location biases of each toxin, we plotted the ratio of the 5′ or 3′ cleavages to middle region cleavages (Figure 3.4C, right). As noted in the examples above (Figure 3.2), MazF did not exhibit a strong bias toward any particular region of mRNAs, consistent with it being a translation-independent RNase. In contrast, the most extreme biases towards cleavage of 5′-ends were in the translation dependent toxins (RelE, YoeB and HigB), as expected; if translation is required for cleavage, mRNAs that get cleaved near their 5′-ends would have few, if any, downstream ribosomes on their 3′-ends, thereby producing the 5′ bias observed. Intriguingly, MqsR exhibited a substantial 5′ bias and, as noted, shows an enrichment for AUG near cleavage valleys. Additionally, we noted that MqsR is a member of the RelE superfamily of toxins, with RelE an established and prototypical translation- dependent RNase. Thus, although MqsR has been shown to cleave untranslated RNA in vitro, our results suggest that the ribosome, though not required, may potentiate cleavage by MqsR. The translation-independent toxins ChpB and HicA also exhibited a modest 5′-bias. Because these toxins are thought to cleave only single-stranded, unstructured RNA, this slight tendency toward cleavage near the 5′-ends of mRNAs may be driven by a relative lack of RNA structure surrounding start codons (Burkhardt et al., 2017). We also noticed that for the ribosome-dependent toxins except HigB, there was a modest 3′ bias as well, even after removing cases with nearby genes (Figure 3.4C). YoeB in particular had an equal propensity for cleaving near the 5'-and 3'-ends of target genes. This result is in agreement with an earlier studies that showed on a model transcript that YoeB cleaved at the start codon and at the stop codon with limited cleavage in between (Christensen-Dalsgaard and Gerdes, 2008; Christensen et al., 2004). The mechanistic basis of a 3'-bias in cleavage for YoeB, or the other translation-dependent RNases, is not clear. 129 Toxins cause a buildup of rRNA precursors and cleave ribosomal protein transcripts While each RNase toxin resulted in widespread cleavage of mRNA, we also wanted to see whether they affected rRNAs, as MazF does. For the translation-independent toxins ChpB, HicA and MqsR, the toxins could directly target rRNAs. For all toxins, rRNAs could be indirectly affected as the cleavage of ribosomal protein transcripts and consequent disruption of ribosomal protein translation is known to disrupt rRNA processing and ribosome biogenesis (Siibak et al., 2009). To look for rRNA cleavage and effects on rRNA biogenesis, we plotted reads aligning to any of the seven 16S or 23S loci in E. coli MG1655 after expression of a toxin (Figure 3.5A). The most pronounced changes, relative to the empty vector control, were in the 16S profiles. The expression of each toxin generated > 4-fold more fragment density in the region immediately downstream of the 3′-end of the mature 16S rRNA, and all toxins except MazF generated ~2-fold more signal immediately upstream of the 5′-end of the mature 16S (Figure 3.5B). These regions with increased read density correspond to the regions of the rRNA precursor that normally form a hairpin required for processing by RNase III during rRNA maturation suggesting that this early step of ribosome biogenesis is somehow inhibited by RNase toxin expression (Figure 2.12A). The 23S profiles also showed increases in read density near the RNase III processing sites, though these differences were smaller in magnitude (Figure 3.5C). We saw no evidence of cleavage within the regions that correspond to the mature 16S and 23S rRNAs apart from a local minimum near position 750 in the 23S rRNA following induction of some of the toxins (Figure 3.5A, starred). However, we suspect that this decrease in read density is likely not due to rRNA cleavage as positions 745, 746 and 747 are the sites of m1G, Y, and m5U modifications, respectively. Such RNA modifications can lead to a significant decrease in the ability of reverse transcriptase to produce cDNA during RNA-Seq library preparation; notably, 130 Figure 3.5: RNase toxins inhibit rRNA maturation. (A) Sequencing depth normalized number of fragments mapping to the 16S (left) or the 23S (right) rRNA across the 7 rRNA operons. Expression of toxin (red) was compared to the relevant empty vector (blue). RNase III cleavage sites are shown as blue arrows (top). rRNA modification site described in text is starred (top). All RNA was harvested after 10 minutes, except MazF which was at 30 minutes. (B) Maximum observed difference in 16S fragment counts + and – toxin from the mature end of toxin to the RNase III site. (C) As in (B) but for the 23S. (D) Minimum cleavage ratio in all (n = 54) ribosomal protein genes. 131 this region is a local minima in the empty vector control. Thus, some of the toxins may indirectly affect the extent of modification at these sites, thereby altering the efficiency of library generation for these regions of rRNAs without directly cleaving these sites. In support of this hypothesis, we observed significantly decreased signal at this site even with the translation-dependent toxins RelE, HigB, and YafO, that should be unable to cleave untranslated RNAs. Taken together, we conclude that each RNase toxin causes a defect in proper rRNA maturation leading to an accumulation of precursor rRNA fragments, particularly those near RNase III processing sites. Further, because translation-dependent toxins also inhibit rRNA maturation, we infer that direct cleavage of rRNA by a toxin is not required for a defect in rRNA maturation to manifest. Instead, we hypothesized that the defect in rRNA biogenesis arises from the cleavage of ribosomal protein transcripts and a consequent deficiency in one or more ribosomal proteins; deficiencies in ribosomal proteins is well to disrupt rRNA processing in E. coli (Siibak et al., 2009). To determine if toxins are disrupting ribosomal protein translation, we calculated the minimum cleavage ratio for each ribosomal protein transcript (Figure 3.5D). Though there was a wide range in minimum cleavage ratios, each toxin resulted in cleavage of at least some ribosomal protein transcripts relative to the empty vector sample. Taken together with our rRNA cleavage profiles, we concluded that all endoribonuclease toxins in E. coli inhibit ribosome biogenesis. Toxin expression inhibits translation and collapses polysomes To directly measure the effects of toxin expression on ribosome biogenesis, we used sucrose density gradients to separate polysomes, monosomes, and ribosomal subunits extracted from cells expressing a given toxin for 1 hour. We measured A260 continuously across the gradients and also collected sequential fractions for additional analyses. With the sucrose concentrations used here, polysomes were in the final fraction and the pellet (Figure 3.6A). 132 Figure 3.6: Toxins alter global translation and ribosome profiles. (A) Measurements of A260 of cell lysate across a 5-20% sucrose gradient after toxin expression. The left of the gradient is the top and the final fraction includes the resuspended polysome pellet. Thick lines are the average of 8 (empty) or 4 (toxins) replicates (thin transparent lines). Aberrant features arising from toxin expression are highlighted (B) Fraction of the volume-normalized mature ribosome single of the volume-normalized coming from the final polysome fraction. (C) Correlation of the mean of the minimum cleavage ratio for highly expressed regions (see Figure 3.3B) and the percent of ribosomes as polysomes from (B). Pearson r of the correlation is shown. Our prior work demonstrated that MazF inhibits global translation, leading to a substantial decrease in the abundance of polysomes. To test whether a similar phenomenon occurred with the other toxins, we measured the fraction of A260 signal in assembled ribosomes that is found in polysomes, i.e. by calculating the A260 signal in polysomes relative to the total signal in all ribosomes. In the empty vector samples, this value was 49%. For every toxin except YafO, this value was less than 49%, ranging from 9% for MazF to 42% for YhaV (Figure 3.6B). These ratios roughly correlated with the mean of the cleavage ratio minima across all transcripts observed by RNA-Seq (r = 0.79; Figure 3.6C), suggesting a link between the extent of RNA cleavage and the 133 effect on global translation. In sum, these results indicate that most endoribonuclease toxins in E. coli lead to a decrease in global translation and a collapse of polysomes. Another common feature of these sucrose gradient profiles was the appearance of smears (e.g. with MazF), new peaks (e.g. with RelE), or additional signal between the top of the gradient and the monosome fractions (e.g. with HicA) (Figure 3.6A). These effects were present even for toxins like HicA that had relatively minor effects on the polysome fraction. These new features in the sucrose gradient profiles were highly reproducible, but different for each toxin, and likely reflect the accumulation of aberrant products during ribosome biogenesis. The different, but reproducible nature of these aberrant products likely reflects, or in fact is driven by, the different sets of ribosomal protein transcripts targeted by each toxin. Toxins inhibit ribosome biogenesis and do not degrade mature ribosomes The results presented thus far indicate that nearly every endoribonuclease toxin in E. coli leads to (i) increases in rRNA precursors, (ii) cleavage of ribosomal protein transcripts, and (iii) significant decreases in polysomes with corresponding increases in various ribosome fragments. Thus, we hypothesized that each toxin inhibits ribosome biogenesis, as we had previously shown for MazF. To directly test this hypothesis, we tracked the incorporation of radiolabeled uridine into ribosomes following toxin expression. To do this, we grew and harvested cells as above (Figure 3.6), but added 3H-uridine 10 minutes after toxin induction (Figure 3.7A). For the empty vector control, this 50 minute time period was sufficient to incorporate ~30% of the radioactive signal into mature ribosomes and the polysome pellet. The expression of each toxin reduced the incorporation of radiolabel into mature ribosomes (Figure 3.7B-C), although the extent of inhibition varied between the toxins. Importantly, the translation-dependent toxin YoeB showed a level of inhibition similar 134 Figure 3.7: Toxins inhibit ribosome biogenesis. (A) Timeline of experiment to measure effects of MazF induction on nascent rRNA synthesis indicating toxin induction, pulse, and sampling times. (B) Empty vector and toxin sucrose gradients as in Figure 3.5A. Average A260 values are plotted in blue. Normalized CPMs are plotted on the right axis. Replicate values for each fraction are shown as dots, averages as lines. The 3H signal axis is broken to show top point as well as shape of signal in the ribosomal region. (C) Summed 3H signal from the ribosome and polysome fractions. (D) Summed 3H signal between the top (left) fractions and ribosome (right) fractions. 135 to the translation-independent MazF, supporting the notion that an inhibition of translation alone is sufficient to inhibit ribosome biogenesis and that the direct cleavage of rRNA is not necessary. The labeling experiment in Figure 3.7 also suggested that the new peaks and smears observed in the polysome profiles following toxin induction were, in fact, unassembled or aberrant ribosome precursors. While toxin expression decreased signal in the mature ribosome fractions (monosomes + polysomses), it increased signal in the region between the top of the gradient and the monosome fractions, which is typically where the 30S, 50S, and immature subunits appear (Figure 3.7D). To verify that these peaks were precursors and not mature ribosomes that were degraded or disassembled following toxin expression, we also conducted a pulse-chase experiment in which the 3H-uridine was added before toxin expression (Figure 3.8A). In this case, tritium signal was maintained (or, for some toxins, increased) in the mature ribosome fractions (Figure 3.8B-C) and there was no increase in signal in the precursor regions (Figure 3.8D). Thus, these findings strongly indicate that mature ribosomes are not degraded or disassembled upon toxin induction. Taken all together, our observations suggest that each toxin creates a unique set of ribosome precursors incapable of efficiently or properly maturing into normal, 70S particles. Because translation- dependent toxins also produce such precursors, we favor a model where an improper ratio of ribosomal proteins unique to each toxin’s cleavage specificity leads to a defect in ribosome biogenesis, likely as a primary means of suppressing cell growth, as seen with MazF. Discussion Toxin expression causes widespread degradation of E. coli transcripts A central question of toxin-antitoxin system biology is why a single bacteria encodes so many different systems (Goormaghtigh et al., 2018). For RNase toxins, this question might be answered in part by exploring if individual toxins target the same transcripts to inhibit growth or if individual 136 Figure 3.8: Mature ribosomes are not degraded by toxins. (A) Timeline of experiment to measure effects of MazF induction on mature rRNA indicating pulse, chase, toxin induction, and sampling times. (B) Empty vector and toxin sucrose gradients as in Figure 3.6B. (C) Summed 3H signal from the ribosome and polysome fractions. (D) Summed 3H signal between the top (left) fractions and ribosome (right) fractions. toxins have functional regulons. However, to date there has been no systematic, global characterization of E. coli’s RNase TA systems. Using an RNA-Seq-based approach we developed to characterize the toxin MazF (Culviner and Laub, 2018), we quantitatively mapped the cleavage specificity of 9 of E. coli’s RNase toxins. For each toxin, though the targets and extent of cleavage 137 varied, expression led to cleavage of a significant proportion of the E. coli transcriptome (Figure 3.3C). However, in contrast to our study of MazF, we found less evidence of clear nucleotide specificity in the other toxins (Figure 3.3A). For many toxins, we observed an increased tendency to form cleavage valleys in the 5′-ends of messages (Figure 3.4B-D), indicative of either toxin’s dependency on translation or possibly decreased structure near messages start codons. Regardless, most toxins showed evidence of decreased translation that roughly correlated with the extent of cleavage across the transcriptome (Figure 3.6C). Due to the broad cleavage specificity of the toxins and the number of transcripts degraded, we were curious if they may inhibit ribosome biosynthesis. Toxins inhibit ribosome biogenesis through cleavage of ribosome protein transcripts Ribosome biogenesis involves the processing of a single transcript into 3 separate RNA molecules through the concerted action of multiple RNases. Ribosomal proteins bind the nascent rRNA co- transcriptionally, and are required in the correct stoichiometry for efficient folding and maturation of ribosomes. We had previously shown that the toxin MazF disrupts this process, likely through a combination of cleaving ACA sites in the nascent rRNA and inhibition of ribosomal protein translation. This study shows all toxins assayed inhibited proper ribosome biogenesis to some extent and led to the formation of aberrant precursors (Figure 3.7B-C). Intriguingly, MazF was the only toxin that did not generate a clearly defined new peak in the precursor region and instead resulted in a decrease in the 50S and 30S subunit peaks and appearance of a shoulder on the typical mRNA/unincorporated nucleotide peak at the top of the gradient. These results imply that MazF may, indeed, cleave nascent rRNAs to an extent that even other translation-independent toxins do not. Regardless, 5 of the toxins included have been shown to be translation-dependent. Therefore, these toxin’s degradation of translated RNA alone is sufficient to inhibit proper ribosome biogenesis. For example, YoeB inhibited incorporation of radiolabel to a similar extent to the 138 translation-independent toxins MazF and ChpB (Figure 3.7C). Since all toxins inhibited synthesis of ribosomal proteins (Figure 3.5D), we conclude that inhibition of ribosomal protein synthesis is either the sole or a key contributor to decreased ribosome biogenesis, depending on the toxin. Additional characterization of aberrant precursor peaks to determine protein and RNA content may provide further evidence of this conclusion. Concluding remarks If the purpose of TA systems is to reversibly inhibit growth, they do this effectively. By inhibiting ribosome biogenesis without degrading mature ribosomes, they ensure that growth stops rapidly, but can be restored without the energy investment of synthesizing a new pool of ribosomes. More broadly, since we found only limited nucleotide specificity in our toxins, our results support a model where RNase toxins have cleavage specificities arising less from RNA sequence and more from general RNA properties, leading to the degradation of wide swaths of the transcriptome. Though all of these systems are able to inhibit growth by degrading ribosomal proteins to some degree, it is possible they have evolved to degrade other complex cellular processes. For example, the proper synthesis of phage particles in infected bacterial cells requires the timed synthesis of numerous proteins; the type II RNase RnlA and a number of type III RNase toxins have been shown to possess anti-phage activity (Blower et al., 2011; Koga et al., 2011). Maintenance of this broad cleavage specificity may enable RNase TA systems to target these complex processes even after transfer to new hosts and genetic drift of individual RNA target sites. 139 Methods Experimental Model Details Growth conditions Escherichia coli was grown in M9 (10x stock made with 64 g/L Na2HPO4-7H2O, 15 g/L KH2PO4, 2.5 g/L NaCl, 5.0 g/L NH4Cl) medium supplemented with 0.1% casamino acids, 0.4% glycerol, 2 mM MgSO4, and 0.1 mM CaCl2. For strains with only ectopic toxins and no ectopic antitoxins present, glucose at 0.4% was used to prevent leaky expression from the arabinose promoter and 0.2% arabinose was used to induce expression. When both toxin and antitoxin were present, leaky expression of antitoxin was sufficient to prevent loss of toxicity. Unless otherwise noted, cells were grown at 37 °C and 200 rpm in an orbital shaker. Prior to liquid growth, individual colonies were selected by growth overnight on LB (10 g/L NaCl, 10 g/L tryptone, 5 g/L yeast extract) agar plates. Antibiotics were used at the following concentrations (liquid/plates): carbenicillin (50 µg mL-1 / 100 µg mL-1), chloramphenicol (20 µg mL-1 / 30 µg mL-1). Strain construction Strains used were either the MazF deletion strain courtesy of Kenn Gerdes (see Chapter 2) for MazF RNA-Seq experiments or WT MG1655 for all other experiments. Plasmid construction Modified pBAD30 plasmids were used for expression of toxins (Guzman et al., 1995). A sequence containing a ribosome binding site was added between the EcoRI and SacI sites in the MCS of the pBAD plasmids. Toxins were amplified from the MG1655 E. coli chromosome. Antitoxins were were inserted into pKVS45 (see Chapter 2) using Gibson assembly. For some TA systems (HicAB, YhaV, MqsR, HigB), expression levels of toxin and/or antitoxin were altered by changing the 5'- UTRs of the encoding plasmids to enable rescuable toxicity. 140 Experimental Method Details Induction of toxicity and rescue Individual strains harboring each cognate toxin-antitoxin pair were grown from overnights to mid- exponential phase (OD600 ~ 0.25-0.3) in M9 glycerol followed by back dilution to OD600 ~ 0.1 onto a 24-well plate with wells containing either arabinose (0.2% final concentration) or arabinose and anhydrotetracycline (100 ng/mL final concentration) to induce the toxin alone and or the toxin and antitoxin, respectively. Cells were grown in a Synergy H1 plate reader with double-orbital shaking at 37°C. Growth was monitored every 5 minutes with path-length-corrected OD600. Preparation of RNA-Seq libraries RNA was extracted and libraries were prepared as described in Chapter 2 (paired-end library preparation). All RNA-Seq libraries were prepared by ectopic expression of toxin alone using 0.2% arabinose in M9 glycerol with cells grown to mid-exponential phase. For MazF, the data from Chapter 2 was used; mRNA sequencing was from 5 minutes post-induction and rRNA sequencing was from 30 minutes post-induction. For all other toxins, both mRNA and rRNA sequencing were 10 minute time points. For all toxins, rRNA sequencing samples were not ribozero treated. Isotopic labeling of mature and nascent rRNA Cells containing both toxin and antitoxin vectors were grown in M9 media with a 40 mL final culture volume with a pulse of 5 μCi of [5, 6-3H] uridine and 1000-fold excess chase of cold uridine at times indicated. Cells were harvested by centrifugation at 10000 g for 1 minute at 4 °C. Cell pellets were washed with once with lysis buffer then respun. Lysis was conducted as described in Chapter 2. The supernatant was loaded onto a 5-20% linear sucrose gradient generated on a Gradient Master (BioComp) instrument in a buffer of 20 mM Tris 8.0, 100 mM NH4Cl, and 10 mM MgCl. Samples were centrifuged in an SW41 rotor at 35000 rpm for 4 hours. Gradients were 141 fractionated using the Gradient Master instrument with continuous monitoring of A260. 100 μL of each fraction was added to 4 mL of Ecoscint H (National Diagnostics) and 3H cpm was measured on a TRI-CARB 4910 TR liquid scintillation counter (PerkinElmer). Measured CPM values were normalized to the average volumes of each fraction and to a measurement of CPM quenching across a sucrose gradient standard. Data Analysis Details Mapping of RNA-Seq data and identification of cleavage valleys was as described in Chapter 2. When specific parameters of the peak finding algorithm vary (e.g. allowed width of cleavage valley, minimum change in cleavage ratio), these variations are described in the text. 142 References Barth, V.C., Zeng, J.-M., Vvedenskaya, I.O., Ouyang, M., Husson, R.N., and Woychik, N.A. (2019). Toxin-mediated ribosome stalling reprograms the Mycobacterium tuberculosis proteome. Nat. Commun. 10, 3035. Blower, T.R., Pei, X.Y., Short, F.L., Fineran, P.C., Humphreys, D.P., Luisi, B.F., and Salmond, G.P.C. (2011). A processed noncoding RNA regulates an altruistic bacterial antiviral system. Nat. Struct. Mol. Biol. 18, 185–190. Bokinsky, G., Baidoo, E.E.K., Akella, S., Burd, H., Weaver, D., Alonso-Gutierrez, J., Garcia- Martin, H., Lee, T.S., and Keasling, J.D. (2013). HipA-Triggered Growth Arrest and -Lactam Tolerance in Escherichia coli Are Mediated by RelA-Dependent ppGpp Synthesis. J. Bacteriol. 195, 3173–3182. Burkhardt, D.H., Rouskin, S., Zhang, Y., Li, G.W., Weissman, J.S., and Gross, C.A. (2017). Operon mRNAs are organized into ORF-centric structures that predict translation efficiency. Elife 6, 1–23. Christensen-Dalsgaard, M., and Gerdes, K. (2008). Translation affects YoeB and MazF messenger RNA interferase activities by different mechanisms. Nucleic Acids Res. 36, 6472–6481. Christensen-Dalsgaard, M., Jørgensen, M.G., and Gerdes, K. (2010). Three new RelE-homologous mRNA interferases of Escherichia coli differentially induced by environmental stresses. Mol. Microbiol. 75, 333–348. Christensen, S.K., Maenhaut-Michel, G., Mine, N., Gottesman, S., Gerdes, K., and Van Melderen, L. (2004). Overproduction of the Lon protease triggers inhibition of translation in Escherichia coli: Involvement of the yefM-yoeB toxin-antitoxin system. Mol. Microbiol. 51, 1705–1717. Culviner, P.H., and Laub, M.T. (2018). Global Analysis of the E. coli Toxin MazF Reveals Widespread Cleavage of mRNA and the Inhibition of rRNA Maturation and Ribosome Biogenesis. Mol. Cell 70, 868-880.e10. Goormaghtigh, F., Fraikin, N., Putrinš, M., Hallaert, T., Hauryliuk, V., Garcia-Pino, A., Sjödin, A., Kasvandik, S., Udekwu, K., Tenson, T., et al. (2018). Reassessing the Role of Type II Toxin- Antitoxin Systems in Formation of Escherichia coli Type II Persister Cells. MBio 9, 1–14. Guzman, L.M., Belin, D., Carson, M.J., and Beckwith, J. (1995). Tight regulation, modulation, and high-level expression by vectors containing the arabinose PBAD promoter. J. Bacteriol. 177, 4121–4130. Harms, A., Brodersen, D.E., Mitarai, N., and Gerdes, K. (2018). Toxins, Targets, and Triggers: An Overview of Toxin-Antitoxin Biology. Mol. Cell 70, 768–784. Hwang, J.Y., and Buskirk, A.R. (2017). A ribosome profiling study of mRNA cleavage by the endonuclease RelE. Nucleic Acids Res. 45, D327–D336. Keren, I., Shah, D., Spoering, A., Kaldalu, N., and Lewis, K. (2004). Specialized persister cells and the mechanism of multidrug tolerance in Escherichia coli. J. Bacteriol. 186, 8172–8180. Koga, M., Otsuka, Y., Lemire, S., and Yonesaki, T. (2011). Escherichia coli rnlA and rnlB compose a novel toxin-antitoxin system. Genetics 187, 123–130. 143 Mets, T., Kasvandik, S., Saarma, M., Maiväli, Ü., Tenson, T., and Kaldalu, N. (2019). Fragmentation of Escherichia coli mRNA by MazF and MqsR. Biochimie 156, 79–91. Mok, W.W.K., Park, J.O., Rabinowitz, J.D., and Brynildsen, M.P. (2015). RNA Futile Cycling in Model Persisters Derived from MazF Accumulation. MBio 6, 1–13. Neubauer, C., Gao, Y.-G., Andersen, K.R., Dunham, C.M., Kelley, A.C., Hentschel, J., Gerdes, K., Ramakrishnan, V., and Brodersen, D.E. (2009). The structural basis for mRNA recognition and cleavage by the ribosome-dependent endonuclease RelE. Cell 139, 1084–1095. Page, R., and Peti, W. (2016). Toxin-antitoxin systems in bacterial growth arrest and persistence. Nat. Chem. Biol. 12, 208–214. Pandey, D.P., and Gerdes, K. (2005). Toxin-antitoxin loci are highly abundant in free-living but lost from host-associated prokaryotes. Nucleic Acids Res. 33, 966–976. Schifano, J.M., Vvedenskaya, I.O., Knoblauch, J.G., Ouyang, M., Nickels, B.E., and Woychik, N. a (2014). An RNA-seq method for defining endoribonuclease cleavage specificity identifies dual rRNA substrates for toxin MazF-mt3. Nat. Commun. 5, 3538. Siibak, T., Peil, L., Xiong, L., Mankin, A., Remme, J., and Tenson, T. (2009). Erythromycin- and chloramphenicol-induced ribosomal assembly defects are secondary effects of protein synthesis inhibition. Antimicrob. Agents Chemother. 53, 563–571. Vesper, O., Amitai, S., Belitsky, M., Byrgazov, K., Kaberdina, A.C., Engelberg-Kulka, H., and Moll, I. (2011). Selective Translation of Leaderless mRNAs by Specialized Ribosomes Generated by MazF in Escherichia coli. Cell 147, 147–157. Yamaguchi, Y., and Inouye, M. (2011). Regulation of growth and death in Escherichia coli by toxin-antitoxin systems. Nat. Rev. Microbiol. 9, 779–790. Yamaguchi, Y., Park, J.-H., and Inouye, M. (2009). MqsR, a Crucial Regulator for Quorum Sensing and Biofilm Formation, Is a GCU-specific mRNA Interferase in Escherichia coli. J. Biol. Chem. 284, 28746–28753. 144 Chapter 4 Conclusions and Future Directions This work is unpublished. 145 I. Conclusions My graduate work has focused on the intersection of toxin-antitoxin (TA) biology and RNA biology. Where they are known, the cleavage motifs of RNase toxins seem to be short and low- complexity and provide few clues to their possible functions. We know that expressing toxins ectopically rapidly inhibits cell growth. But why, then, does a single bacteria encode so many systems to accomplish a simple goal that seems so at odds with survival? Much recent work has focused on toxins as mediators of persistence or tolerance to antibiotics (Page and Peti, 2016). In support of this notion, TA systems are transcriptionally upregulated in persisters (Keren et al., 2004). Further, populations of slow-growing cells produced by ectopic expression of toxins contain numerous induced persisters (Bokinsky et al., 2013; Keren et al., 2004; Mok et al., 2015). Despite this, deletion of 10 of E. coli’s RNase TA systems does not decrease the formation of persisters during normal growth (Goormaghtigh et al., 2018). Instead, might RNase toxins have functional regulons? Could their limited sequence specificity lead to targeted degradation of some transcripts to yield an adaptive phenotype? Many phenotypes and biological functions have been suggested for individual toxins: MazF mediating programmed cell death and general stress response (Vesper et al., 2011), MqsR promoting biofilm formation (Gonzalez Barrios et al., 2006), YoeB protecting cells from heat shock (Janssen et al., 2015), and RnlA protecting populations from T4 infection (Koga et al., 2011). In my graduate work, I set out to see what the RNA cleavage activity of each toxin could tell us about the functional role of toxins and the consequences of their activation. MazF, one of the first RNase toxins characterized, has long been known to cleave RNA at the motif ACA (Zhang et al., 2003). More recently, MazF was suggested to globally regulate translation by generating specialized ribosomes through a single MazF cleavage removing the anti- 146 Shine-Dalgarno sequence from the 16S (Vesper et al., 2011). These ribosomes were shown to selectively translate leaderless mRNA generated by MazF cleavage of the 5'-UTR. I wondered how MazF avoided cleaving the many ACAs in the coding regions of these messages. To reconcile these two models, we generated a RNA-Seq-based pipeline for quantitatively measuring the effects of RNase expression on the E. coli transcriptome (Figure 2.1A). Though we found MazF has additional nucleotide specificity beyond ACA (Figure 2.5C-D), the majority of E. coli transcripts are degraded following just 5 minutes of MazF expression (Figure 2.1G). In disagreement with the specialized ribosome model, we found that MazF generated few leaderless mRNA, and many of these few were cleaved by MazF in their coding regions (Figures 2.7A, 2.8A). Further, these leaderless transcripts were not more efficiently translated during MazF expression. Instead, MazF represses translation and sequesters ribosomes in traffic-jams on the 5'-ends of messages (Figure 2.9E). We also observed that instead of cleaving mature ribosomes, MazF severely impairs ribosome biogenesis (Figure 2.11D, 2.12E). We conclude that MazF inhibits ribosome biogenesis through a combination of cleavage of ribosomal protein mRNA and direct cleavage of rRNA precursors. With this approach in hand, I set out to see if E. coli’s other RNase toxins were non-specific RNases that also inhibit ribosome biogenesis. Of particular interest were the translation-dependent toxins, as these would be incapable of directly cleaving rRNA. Is an RNase toxin’s cleavage of ribosomal protein transcripts sufficient to inhibit ribosome biogenesis? To do this, we analyzed the 8 other RNase toxins that reliably inhibited growth. Using the same RNA-Seq approach we used for MazF, found that these toxins caused significant degradation of the transcriptome (Figure 3.3B-C). Intriguingly, we found less evidence of nucleotide specificity in these toxins than in MazF; instead, many toxins showed biases towards cleaving near the start codons of messages 147 (Figure 3.4B-D). Using high resolution fractionation of ribosomes, we observed all other toxins also inhibited ribosome biogenesis (Figure 3.7C). Combined with the observation that toxins target ribosomal proteins, we infer that degradation of ribosomal protein transcripts by an RNase toxin is sufficient to inhibit ribosome biogenesis. Intriguingly, each toxin generated a unique pattern of aberrant precursor peaks. We propose that each toxin generates a unique, but improper stoichiometry of ribosomal proteins leading to blocks at different stages of ribosome biogenesis. However, even with our description of their cleavage targets, the functional role of E. coli’s RNase TA systems is still unclear. Toxins cleave a wide variety of coding regions that produce proteins with many different functions. We found no clear support for toxins specifically targeting one class of genes. However, TA systems are often encoded on plasmids and are commonly horizontally transferred; for an RNase, a new host would mean an entirely different set of targeted mRNAs. Perhaps these RNases tend to have broad specificities to facilitate activity after horizontal transfer. What, then, is the biological function of RNase toxins? In the above work, I show that toxins are able to inhibit ribosome biosynthesis and new protein translation without degrading mature ribosomes. Therefore, toxins provide a mechanism to rapidly inhibit growth and division without trashing the cell’s energy investment in existing ribosomes. If activated during the correct circumstances, RNase TA systems might aid survival of some cells by reversibly slowing their growth. However, it is also possible that ribosomes are not the intended target. Type III TA systems (all of which encode RNases) have been shown to have anti-phage activity leading to abortive infection. Assembling new phage particles, like ribosomes, requires sequential formation of protein complexes to form the capsid and tail (Aksyuk and Rossmann, 2011). My data on toxins and ribosome biogenesis shows that RNase toxins are well-suited to disrupt the synthesis and thus assembly of complex macromolecular structures. 148 II. Future Directions Better characterize nucleotide and non-nucleotide predictors of toxin RNase cleavage For some of the toxins (MazF, ChpB, MqsR), a preference for particular set of nucleotides was clear (Figure 3.4A). However, particularly for translation-dependent toxins, I was unable to find a nucleotide preference apart from the enrichment of AUG likely arising from their propensity to cleave near the 5'-ends of coding regions. Further, my attempts to find cleavage sites using commonly-used sequence specificity algorithms such as MEME also did not identify any clear motifs (Bailey et al., 2009). My inability to identify even MazF’s known cleavage site of ACA using MEME and other existing algorithms is why I turned to the 3-mer seeded approach to find specificity I used in Chapters 2 & 3. However, despite this apparent lack of nucleotide specificity, each toxin had a set of preferred cleavage targets. Expression of these RNases generate clear cleavage valleys at defined locations (Figure 3.2). Evidently, there is still more to learn about how each toxin chooses its cleavage sites. A possible explanation is that toxins have preferences for non-nucleotide characteristics of RNA. A simple hypothesis might be that translation-independent toxins prefer transcripts with limited RNA structure while translation-dependent toxins cleave coding regions that are more efficiently translated. Preferences for broad categories of messages like these would likely lead to correlations between the cleavage targets of individual toxins. To determine if these correlations exist, I measured the spearman correlation of cleavage targets between toxins by comparing the minimum cleavage ratios for all well-expressed coding regions (Figure 4.1A). Toxins had a range over overlap in their gene targets; HicA and YafO had a correlation of 0.90, while HicA and MqsR had a correlation of 0.57 (Figure 4.1B). Another fascinating feature was that MqsR and three of the translation-dependent toxins formed a cluster of high correlations (Figure 4.1A, red box). That 149 Figure 4.1: Shared toxin targets; alternative methods to find motifs. (A) Spearman correlations of the minimum cleavage ratios calculated per well-expressed gene. (B) Sample data used to generate part (A); YafO and HicA or MqsR and HicA minimum cleavage ratios per well expressed gene, respectively. (C) Top-performing filters from a convolutional neural network trained on ChpB or MqsR cleavage data from Chapter 3. Filter values were converted to position weight matrices. High scoring bases indicate toxin preference and identify UAC and GCU has probable core motifs for ChpB and MqsR, respectively. toxins show overlap in their targets argues that they may have some common features that they target. To determine if toxins target common non-nucleotide features of RNAs, I propose comparing my cleavage data set with a few existing data sets. First, genes that are common targets of multiple RNases in my data set can be identified. Next, existing global data sets for RNA structure (Burkhardt et al., 2017), translation efficiency (Li et al., 2014), translational pausing (Li et al., 2012; Mohammad et al., 2016), etc. can be compared to determine if there is enrichment for particular non-nucleotide characteristics in these sets of genes. Alternatively, correlations between these characteristics and the cleavage specificities of individual toxins may also be conducted. 150 I also suspect that toxins may have nucleotide specificity that I was unable to resolve with the techniques described in Chapter 3. With my RNA-Seq method, I am able to quantify changes in gene expression due to the expression of toxins. However, I am unable to determine the precise location of cleavage and instead have to look for the enrichment of possible cleavage motifs in regions of ~50-200 nt. To address this, I propose combining my approach with existing approaches to determine the appearance of new 5'-ends from cleavage events at single-nucleotide resolution (Lalanne et al., 2018; Schifano et al., 2014). Combining these methods, I may be able to more accurately discern the nucleotide specificity the RNase toxins other than MazF, ChpB and MqsR. Finally, both the nucleotide and non-nucleotide information from cleavage sites could be combined into an integrated model of cleavage specificity. In Chapter 2, I show MazF’s cleavage specificity using a position weight matrix built by overlapping all ACA sites in cleaved regions (Figure 2.5C- D). However, recent work has shown that convolutional neural networks (CNNs), a tool borrowed from machine learning, can identify the cleavage specificity of both RNA- and DNA-binding proteins (Alipanahi et al., 2015). Briefly, a simple CNN-based approach can be imagined as a compendium of randomly-initialized position weight matrices, here called filters, that are trained on positive and negative pieces of training data (for example, regions that bind a particular protein and regions that do not). These filters are combined into a final model that also accounts for interactions between the filters themselves; for example, a model might detect that two distinct sequences are required for binding. These filters can later be extracted from the model and converted into a position weight matrix. I conducted a pilot study using cleaved regions as positive training examples and uncleaved regions as negative training examples and identified cleavage sites for ChpB and MqsR (Figure 4.1C). The high-scoring core of these filter’s cleavage motifs (UAC and GCU for ChpB and MqsR, respectively) match my results in Chapter 3 and previously 151 published results for these toxin’s core motifs (Figure 3.4A), indicating that this approach may be appropriate for identifying cleavage motifs. An additional advantage of neural network approaches is that they can be modified to take into account additional data beyond nucleotide sequence. For example, integration of RNA structure data into neural network-based models of RNA-binding can improve prediction of binding sites (Zhang et al., 2015). As I suggested above, I suspect including information about RNA structure and reading frame may be useful for predicting cleavage sites for translation-independent and translation-dependent toxins, respectively. Compare toxin cleavage sites across bacterial and phage genomes A straightforward pipeline to identify toxin cleavage specificity might open up a few avenues of research. If a given RNase toxin has a set of RNA targets that are important for its biological function, these targets might be maintained across multiple host species. Using the wealth of sequenced genomes, I propose identifying a set of homologous RNase toxins that have been inherited vertically into their hosts. These toxins could then be expressed in E. coli to build a model of their cleavage specificity using the RNA-Seq-based methodologies described throughout this thesis. Each model could be used to computationally predict the cleavage specificity of the toxin in the original host genome. As a control, one toxin might be expressed both in E. coli and in the native host to measure the accuracy of predictions. If their cleavage specificity serves some biological role, key targets or gene classes might be conserved through evolutionary time. Further, if particular gene classes should not be targeted, we might observe them never entering the set of targets. Even if no classes of conserved or non- conserved genes are observed, such a study would still provide information about the evolution of nucleotide cleavage specificity across a family of RNase toxins. This analysis would not have to be limited to bacterial genomes. Since some TA systems have demonstrated anti-phage activity, 152 Figure 4.2: Identification of toxin activation on a single-cell level. (A) Schematic of construct. A YFP gene engineered to have few MazF and ChpB cleavage sites is fused to the 5'-UTR and N-terminal end of the rne gene. Under normal conditions RNase E processively degrades this transcript by recognizing a structure in the 5'-UTR. When MazF is present, it cleaves the structure from the 5'-UTR, preventing the intramolecular degradation of the YFP by RNase E. (B) Flow cytometry measurement of single-cell fluorescence from increasing concentrations of arabinose controlling expression of MazF and ChpB. the cleavage specificity of RNase toxins could be compared to the genome sequences of their host’s viral predators to determine if viral transcripts are undergoing selection to avoid cleavage by RNases. Use single-cell reporters of toxin activity to screen for toxin activation Cleavage specificities of toxins might also be used to identify activation of toxins at a single-cell level. In Chapter 2, I described a qPCR-based assay to identify MazF cleavage in bulk population (Figure 2.5E). If cleavage of an RNA could activate a florescent signal, this would tie toxin activation to a signal that can be read out on a single cell level. As a proof of concept, I built a construct that consisted of the 5'-UTR of the rne gene and an engineered YFP lacking RNase 153 cleavage sites (Figure 4.2A). The rne gene encodes RNase E, which autoregulates itself by degrading its own mRNA through first binding to a structured region of its 5'-UTR (Jain and Belasco, 1995). In my construct, this destabilizing 5'-UTR was modified to include an inactivating MazF cleavage site. Thus, when active MazF was present, the construct was stabilized leading to higher YFP expression levels, shown here by flow cytometry (Figure 4.3B). As a positive control, I show here the bimodality of expression in MazF expressed from the arabinose promoter at varying levels of arabinose (Figure 4.2B, middle). This behavior is caused by on/off behavior of the arabinose importer (Megerle et al., 2008). As a negative control, I show that ChpB, which has a different cleavage site, does not activate the system (Figures 3.4A; 4.2B, bottom). Combined with a flow cytometer, a reporter gene such as this might be used for a high-throughput screen of various stressful conditions for activation of toxins in small sub-populations of bacteria. III. References Aksyuk, A.A., and Rossmann, M.G. (2011). Bacteriophage Assembly. Viruses 3, 172–203. Alipanahi, B., Delong, A., Weirauch, M.T., and Frey, B.J. (2015). Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 1–9. Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, W.W., and Noble, W.S. (2009). MEME Suite: Tools for motif discovery and searching. Nucleic Acids Res. 37, 202–208. Bokinsky, G., Baidoo, E.E.K., Akella, S., Burd, H., Weaver, D., Alonso-Gutierrez, J., Garcia- Martin, H., Lee, T.S., and Keasling, J.D. (2013). HipA-Triggered Growth Arrest and -Lactam Tolerance in Escherichia coli Are Mediated by RelA-Dependent ppGpp Synthesis. J. Bacteriol. 195, 3173–3182. Burkhardt, D.H., Rouskin, S., Zhang, Y., Li, G.W., Weissman, J.S., and Gross, C.A. (2017). Operon mRNAs are organized into ORF-centric structures that predict translation efficiency. Elife 6, 1–23. Gonzalez Barrios, A.F., Zuo, R., Hashimoto, Y., Yang, L., Bentley, W.E., and Wood, T.K. (2006). Autoinducer 2 Controls Biofilm Formation in Escherichia coli through a Novel Motility Quorum- Sensing Regulator (MqsR, B3022). J. Bacteriol. 188, 305–316. Goormaghtigh, F., Fraikin, N., Putrinš, M., Hallaert, T., Hauryliuk, V., Garcia-Pino, A., Sjödin, A., Kasvandik, S., Udekwu, K., Tenson, T., et al. (2018). Reassessing the Role of Type II Toxin- Antitoxin Systems in Formation of Escherichia coli Type II Persister Cells. MBio 9, 1–14. 154 Jain, C., and Belasco, J.G. (1995). RNase E autoregulates its synthesis by controlling the degradation rate of its own mRNA in Escherichia coli: unusual sensitivity of the rne transcript to RNase E activity. Genes Dev. 9, 84–96. Janssen, B.D., Garza-Sánchez, F., and Hayes, C.S. (2015). YoeB toxin is activated during thermal stress. Microbiologyopen 4, 682–697. Keren, I., Shah, D., Spoering, A., Kaldalu, N., and Lewis, K. (2004). Specialized persister cells and the mechanism of multidrug tolerance in Escherichia coli. J. Bacteriol. 186, 8172–8180. Koga, M., Otsuka, Y., Lemire, S., and Yonesaki, T. (2011). Escherichia coli rnlA and rnlB compose a novel toxin-antitoxin system. Genetics 187, 123–130. Lalanne, J.-B., Taggart, J.C., Guo, M.S., Herzel, L., Schieler, A., and Li, G.-W. (2018). Evolutionary Convergence of Pathway-Specific Enzyme Expression Stoichiometry. Cell 173, 749- 761.e38. Li, G.-W., Oh, E., and Weissman, J.S. (2012). The anti-Shine-Dalgarno sequence drives translational pausing and codon choice in bacteria. Nature 484, 538–541. Li, G.-W., Burkhardt, D., Gross, C., and Weissman, J.S. (2014). Quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. Cell 157, 624–635. Megerle, J.A., Fritz, G., Gerland, U., Jung, K., and Rädler, J.O. (2008). Timing and dynamics of single cell gene expression in the arabinose utilization system. Biophys. J. 95, 2103–2115. Mohammad, F., Woolstenhulme, C.J., Green, R., and Buskirk, A.R. (2016). Clarifying the Translational Pausing Landscape in Bacteria by Ribosome Profiling. Cell Rep. 14, 686–694. Mok, W.W.K., Park, J.O., Rabinowitz, J.D., and Brynildsen, M.P. (2015). RNA Futile Cycling in Model Persisters Derived from MazF Accumulation. MBio 6, 1–13. Page, R., and Peti, W. (2016). Toxin-antitoxin systems in bacterial growth arrest and persistence. Nat. Chem. Biol. 12, 208–214. Schifano, J.M., Vvedenskaya, I.O., Knoblauch, J.G., Ouyang, M., Nickels, B.E., and Woychik, N. a (2014). An RNA-seq method for defining endoribonuclease cleavage specificity identifies dual rRNA substrates for toxin MazF-mt3. Nat. Commun. 5, 3538. Vesper, O., Amitai, S., Belitsky, M., Byrgazov, K., Kaberdina, A.C., Engelberg-Kulka, H., and Moll, I. (2011). Selective Translation of Leaderless mRNAs by Specialized Ribosomes Generated by MazF in Escherichia coli. Cell 147, 147–157. Zhang, S., Zhou, J., Hu, H., Gong, H., Chen, L., Cheng, C., and Zeng, J. (2015). A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 44, 1–14. Zhang, Y., Zhang, J., Hoeflich, K.P., Ikura, M., Qing, G., and Inouye, M. (2003). MazF cleaves cellular mRNAs specifically at ACA to block protein synthesis in Escherichia coli. Mol. Cell 12, 913–923. 155