High-throughput sequencing of RNA 5'- and 3'-termini yields insights into viral and vertebrate gene expression
Author(s)
Koppstein, David N. P. (David Neal Pira)
DownloadFull printable version (31.85Mb)
Other Contributors
Massachusetts Institute of Technology. Department of Biology.
Advisor
David P. Bartel.
Terms of use
Metadata
Show full item recordAbstract
Next-generation sequencing techniques are unparalleled in their resolution and dynamic range, but are limited by read depletion at transcript ends. Protocols that specifically target these ends overcome this limitation and enable the study of biological phenomena that would otherwise prove refractory to RNA-Seq. Here, we use two such techniques to study heterogeneous sequences at the 5' ends of influenza transcripts and alternative polyadenylation at the 3' ends of vertebrate transcripts. The 5' ends of influenza mRNAs include heterogeneous sequences derived from host RNAs. In a process termed cap snatching, the viral polymerase cleaves host RNAs ~10-13 nucleotides downstream of their caps and uses the resulting fragments to prime viral transcription. High-throughput 5' rapid amplification of cDNA ends resulted in 54 million chimeric reads containing host-derived leaders. These sequences provided evidence for stuttering during transcription initiation and an influence of the viral template on the extent of realignment. Accounting for realignment suggested a common preference by the polymerase irrespective of the viral template, and suggested that a single base pair is sufficient to prime transcription. Mapping trimmed leaders to annotated transcription start sites (TSSs) revealed that the most abundant leaders correspond to small nuclear RNAs, consistent with cap snatching of nascent transcripts. The 3' ends of mRNAs are generally appended with a poly(A) tail, but alternative polyadenylation sites may vary depending on cellular context. 3P-Seq is a method that specifically captures alternative polyadenylation sites without relying on oligo(dT) priming, which may cause artifacts. Applying 3P-Seq to eukaryotic model organisms improved their gene annotations and provided insight into targeting by microRNAs, a class of ~21-23 nucleotide RNAs that mediate mRNA destabilization. The isoform ratios of transcripts containing miR-155 sites were predictive of the extent to which these transcripts would respond to miR-155 transfection. Conversely, knocking out miR-22 in mice specifically upregulated isoforms containing miR-22 sites, suggesting that microRNAs reciprocally affect the 3'-UTR landscape. Lastly, analysis of other datasets derived from zebrafish embryos revealed broad lengthening of 3'-UTR isoforms during development and noncanonical polyadenylation during the maternal-to-zygotic transition.
Description
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Biology, 2015. Cataloged from PDF version of thesis. Includes bibliographical references.
Date issued
2015Department
Massachusetts Institute of Technology. Department of BiologyPublisher
Massachusetts Institute of Technology
Keywords
Biology.