MicroRNAs: Principles of Target Recognition and Developmental Roles 
by 
 
Vikram Agarwal 
 
B.S. Biology (2009) 
University of Texas at Austin 
 
 
SUBMITTED TO THE COMPUTATIONAL AND SYSTEMS BIOLOGY 
GRADUATE PROGRAM IN PARTIAL FULFILLMENT OF 
THE REQUIREMENTS FOR THE DEGREE OF 
 
DOCTOR OF PHILOSOPHY 
AT THE 
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
 
September 2015 
 
© 2015 Vikram Agarwal. All rights reserved. 
 
The author hereby grants to MIT permission to reproduce and to distribute publicly paper 
and electronic copies of this thesis document in whole or in part in any medium now 
known or hereafter created. 
 
 
Signature of author…………………………………………………………………............. 
Vikram Agarwal 
Computational and Systems Biology Program 
August 28, 2015 
 
 
Certified by……………………………………………………………………………….... 
David P. Bartel 
Professor of Biology 
Thesis Supervisor 
 
 
Accepted by………………………………………………………………………………... 
Christopher Burge 
Professor of Biology and Biological Engineering 
Director, Computational and Systems Biology Graduate Program 
1
  
2
MicroRNAs: Principles of Target Recognition and Developmental Roles 
 
by 
 
Vikram Agarwal 
 
Submitted to the Computational and Systems Biology Program on August 28, 2015,  
In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 
 
Abstract 
 
MicroRNAs (miRNAs) are ~21–24 nt non-coding RNAs that mediate the degradation 
and translational repression of target mRNAs. The genomes of vertebrate organisms 
encode hundreds of miRNAs, each of which may regulate hundreds of mRNA targets. 
Thus, miRNAs are crucial post-transcriptional regulators engaged in vast regulatory 
networks. To date, the characteristics of these networks remain mysterious due to the 
difficulty of identifying miRNA targets through either experimental or computational 
means. To understand the physiological roles of miRNAs in animal species, it is of 
fundamental importance to elucidate the structure of the targeting networks in which they 
participate. 
 
The recognition of a miRNA target is guided largely by perfect Watson-Crick base 
pairing interactions between nucleotides 2–7 from the 5′ end of the miRNA (i.e., the 
“seed” region) and complementary motifs embedded in the 3′ UTRs of the target 
mRNAs. The prevalence of these motifs throughout the transcriptome poses a challenge 
to our understanding of how specificity emerges: since the presence of a motif is not 
sufficient to mediate target repression, what contextual features discriminate effective 
target sites from ineffective ones? Further complicating this is the proposition that “non-
canonical” sites lacking perfect seed pairing might mediate repression, which would 
expand the potential number of functional target sites by orders of magnitude. In the 
second chapter of this work, we define the features that predict effective miRNA target 
sites, incorporating their relative influence into a quantitative model which can out-
perform existing computational models and experimental approaches in target 
identification. 
 
Though the molecular roles of miRNAs in gene regulation have long been appreciated, 
the functions of most miRNAs in living organisms has remained elusive. In the third 
chapter of this work, we discuss the consequences of genetic ablation of miR-196, a 
deeply conserved miRNA that is predicted to simultaneously repress many HOX genes, 
in the mouse. We propose a role for miR-196 in the spatial patterning of the vertebrate 
axial skeleton. Isolating the cell populations that express the miRNA during early 
mammalian development, we attempt to characterize the direct in vivo targets of miR-196 
and dissect the molecular underpinnings of the phenotypes observed. 
 
Thesis Advisor: David P. Bartel 
Title: Professor of Biology
3
4
Acknowledgments 
 
I am indebted to my professor, David Bartel, for being an outstanding mentor and role 
model during the course of my graduate work. His level of scientific rigor, enduring 
patience, attention to detail, and ready willingness to offer his help and extensive feedback, 
has made a lasting impression on me and will undoubtedly influence my style of scientific 
inquiry throughout the course of my life. 
 
I thank my thesis committee members, Phil Sharp and Chris Burge, for providing me 
extensive feedback on my work throughout the years, and for their helpful advice on 
career opportunities. I also thank Gary Ruvkun for serving as my outside committee 
member. There are also many professors at MIT who taught their courses with great 
passion, and their inspiring methods of teaching have greatly impacted my interests in 
biology and computer science. 
 
I am grateful the graduate students and postdocs who mentored me throughout these 
years. Robin Friedman in particular was instrumental in patiently explaining the 
statistical methods in phylogenomics that he developed. I have also had countless 
discussions with David Garcia, Jin-Wu Nam, Alex Subtelny, Igor Ulitsky, Olivia 
Rissland, and Junjie Guo that have broadened the scope of my thinking and heavily 
impacted the work presented in this thesis. 
 
I thank my scientific collaborators over the years, particularly Rémy Denzler and Markus 
Stoffel, with whom I had the opportunity to explore interesting questions concerning 
physiology. Rémy has also been a great friend and I have been lucky to have great fun in 
our travels together. My work with Eddy McGlinn reignited my interests in exploring 
developmental questions, and I thank her for giving me the opportunity to work with her 
group and for helping me understand the biology and improve my communication of the 
work in presentations. 
 
The Bartel lab has been an incredible place to work and I couldn’t have asked for a more 
welcoming home. Beyond colleagues, the people in the lab have been close friends and I 
appreciate every member of the past and present. Inside the lab, they’ve made it a great 
environment to discuss ideas openly together, and outside the lab, they’ve made 
Cambridge and Boston a great town to explore together. 
 
I thank everyone in the Computational and Systems Biology (CSB) class of 2009 (Chris, 
Anna, Adrian, Zi, and Xuebing) for their continued friendship, as well as friends in the 
Microbiology (Mark, Nicole, and Chris) and Biology (Josh and Brian) programs for 
making my experiences in Cambridge tremendously enjoyable and memorable. I also 
thank Bonnielee Whang and Jacquie Carota for their support of the CSB program. 
 
Lastly, I thank my family for supporting me throughout my life and inspiring an interest 
in exploring scientific questions early on. My brother has been greatly influential in my 
work and it has been a pleasure learning from his life experiences, many of which I’ve 
paralleled in mine.
5
6
Table of Contents
Abstract ........................................................................................................................................... 3 
Acknowledgements ......................................................................................................................... 5 
Chapter 1. Introduction ................................................................................................................. 9 
The many layers of gene regulation ............................................................................................ 9 
MicroRNAs: Discovery and biological roles ................................................................................. 11 
Biogenesis of microRNAs and mechanisms of targeting ............................................................... 15 
Computational approaches to microRNA target prediction ........................................................... 17 
Experimental approaches to microRNA target identification ........................................................ 23 
References ............................................................................................................................... 28 
Chapter 2. Predicting effective microRNA target sites in mammalian mRNAs ..................... 37 
Abstract .................................................................................................................................... 38 
Introduction ............................................................................................................................. 38 
Results ..................................................................................................................................... 43 
Inefficacy of recently reported non-canonical binding sites.................................................. 43 
Confirmation that miRNAs bind to non-canonical sites despite their inefficacy ................... 48 
Improving dataset quality for model development ............................................................... 52 
Selecting features and building a regression model for target prediction .............................. 55 
Improvement over previous methods ................................................................................... 59 
Similar response of targets predicted from the model and the most informative CLIP 
experiments .................................................................................................................. 63 
The TargetScan database (v7.0) .......................................................................................... 66 
Discussion ................................................................................................................................ 69 
Materials and Methods ............................................................................................................ 78 
Microarray, RNA-seq, and RPF dataset processing ............................................................. 78 
Crosslinking and other interactome datasets ........................................................................ 80 
Motif discovery for non-canonical binding sites .................................................................. 82 
Microarray dataset normalization ........................................................................................ 83 
RNA structure prediction .................................................................................................... 84 
Calculation of PCT scores .................................................................................................... 85 
Selection of mRNAs for regression modeling ...................................................................... 86 
Scaling the scores of each feature ........................................................................................ 87 
Stepwise regression and multiple linear regression models .................................................. 88 
Collection and processing of previous predictions ............................................................... 89 
3′-UTR profiles for TargetScan7 predictions ....................................................................... 90 
MicroRNA sets for TargetScan7 ......................................................................................... 92 
TargetScan7 predictions ...................................................................................................... 93 
Acknowledgements ................................................................................................................. 96 
References ............................................................................................................................... 97 
Figures and figure legends ................................................................................................ 105 
Tables ............................................................................................................................... 140
Chapter 3. Independent regulation of vertebral number and vertebral identity by 
microRNA-196 paralogs ........................................................................................................... 143 
Abstract ................................................................................................................................. 144 
Introduction .......................................................................................................................... 144 
Results .................................................................................................................................. 148 
7
Differential transcription of miR-196a1 and miR-196a2 in the developing embryo ............ 148 
Genetic deletion of miR-196 leads to altered vertebral identity ............................................ 149 
Genetic deletion of miR-196 leads to an increase in vertebral number ................................. 151 
Transcriptome alterations are detected following allelic removal of miR-196 activity ........ 152 
Hox cluster expression dynamics are altered in miR-196 mutant embryos .......................... 153 
Identification of additional direct targets of miR-196 ..........................................................155 
miR-196 activity is required for signaling pathways associated with axis elongation, 
segmentation and the trunk-to-tail transition ........................................................................ 156 
miR-196 has the potential to modulate Wnt signaling by multiple mechanisms .................. 158 
Discussion ............................................................................................................................. 160 
miR-196 activity is essential for vertebral identity ............................................................... 160 
miR-196 activity constrains total vertebral number .............................................................. 162 
Materials and Methods ......................................................................................................... 165 
miR-196a1GFP and miR-196a2GFP knock-in construction ..................................................... 165 
miR-196a1–/– and miR-196a2–/– and miR-196b–/– generation ................................................. 165 
Mouse skeletal preparation and analysis ............................................................................... 166 
In situ hybridization .............................................................................................................. 166 
FACS sorting and RNA-seq sample preparation .................................................................. 166 
RNA-seq and category enrichment analysis ......................................................................... 166 
miRNA target analysis .......................................................................................................... 167 
Permutation test for significance testing ............................................................................... 168 
In vitro luciferase assay ........................................................................................................ 168 
Chick electroporation and in vivo BatLuc reporter analysis ................................................. 168 
Acknowledgements ............................................................................................................... 169 
References ............................................................................................................................. 171 
Figures and figure legends ............................................................................................... 177 
Tables .............................................................................................................................. 197 
Chapter 4. Future Directions ................................................................................................... 199 
Quantitative models of miRNA targeting in Drosophila ..................................................... 199 
Conservation of miRNA targeting networks among bilaterians ......................................... 201 
References ............................................................................................................................ 203 
Appendix 1. Global analysis of the effect of different cellular contexts on microRNA 
targeting ..................................................................................................................................... 205 
Appendix 2. Assessing the ceRNA hypothesis with quantitative measurements of miRNA 
and target abundance ............................................................................................................... 219 
Appendix 3. Expanded identification and characterization of mammalian circular 
RNAs .......................................................................................................................................... 231 
Curriculum Vitae ......................................................................................................................... 247 
8
Chapter 1. Introduction 
 
The many layers of gene regulation 
It is a remarkable experience to marvel at the diversity of forms among the organisms 
inhabiting our planet. Plants and animals exhibit a wide range of shapes, sizes, and 
behaviors; they have adapted to most habitats, conquering the seas, lands, and skies. It is 
likely that the morphological diversity that is observed throughout life is largely a result 
of two evolutionary processes: the birth of genes and acquisition of novel gene function 
(Kaessmann, 2010; Tautz and Domazet-Loso, 2011; Carvunis et al., 2012) as well as 
gene regulatory innovation (Wray, 2007; Carroll, 2008). While gene innovation may have 
played a greater role early in evolutionary time (i.e., between 3–3.5 billion years ago) 
(David and Alm, 2011), organismal complexity in higher eukaryotes may have instead 
arisen from the sophisticated regulation of gene expression (Levine and Tjian, 2003). 
 The central dogma of molecular biology details the predominant mode of 
information flow in cells: genes are encoded in DNA, transcribed into messenger RNAs 
(mRNAs), and these mRNAs are translated into proteins (Crick, 1970). A large body of 
evidence suggests that every step of this process appears to be intricately regulated, and 
the cell has exploited a variety of modes of regulation to exponentiate the range of 
cellular behaviors possible with a limited set of protein-coding genes. A paradigm in 
molecular biology has become that the genome does not just passively encode genes, but 
rather that it carries a set of instructions to coordinate the expression of those genes in 
time and space (Jacob and Monod, 1961). With stunning foresight, Jacob and Monod 
postulated that proteins may recognize cis-regulatory DNA or RNA sequences and 
thereby modulate the expression or translation of an mRNA (1961). Subsequent work has 
9
reinforced this model of transcriptional control by unraveling the genome-wide 
architecture of protein binding events to cis-regulatory DNA elements (Ren et al., 2000; 
Harbison et al., 2004). Similarly, it has been demonstrated that cis-regulatory sequences 
within mRNA can orchestrate mRNA splicing, export from the nucleus to the cytoplasm, 
localization, translation rate, and degradation rate (Glisovic et al., 2008). 
Global measurements of transcription rate, mRNA degradation rate, translation 
rate, and protein degradation rate among mRNAs confirms that each process is amenable 
to regulation. The variability in the distributions of these rates cannot be accounted for as 
a trivial result of measurement error, and in many scenarios the precise molecular 
mechanisms explaining a proportion of the variability are known. Recent studies have 
attempted to dissect the relative contributions of each form of regulation in explaining 
steady state protein abundance. While initial estimates arrived at a conclusion that 
variability in translational regulation was the predominant force determining protein 
levels (Schwanhausser et al., 2011), revised estimates propose a predominant role of 
transcriptional regulation, with about 73% contribution, relative to an 11% contribution 
of mRNA decay, 8% contribution of translation rate, and 8% contribution of protein 
decay (Li et al., 2014). However, these estimates ignore the fact that throughout 
development, protein abundances are not at steady state, but rather change dynamically 
with time in response to environmental and cellular signals. So far, it appears that 
changes in mRNA levels (i.e., a combination of mRNA synthesis and degradation rates) 
also explain ~90% of protein fold changes in a dynamic response to an environment cue, 
although protein translation and degradation rates together explain ~60% of absolute 
protein changes in this context (Jovanovic et al., 2015). Taken together, these studies 
10
highlight the crucial importance of understanding each node of gene regulation in order 
to acquire a comprehensive portrait of the gene regulatory networks that govern cellular 
behavior and organismal development. 
 
MicroRNAs: Discovery and biological roles 
MicroRNAs (miRNAs) are ~21–24 nt non-coding RNAs involved in post-transcriptional 
gene regulation (Bartel, 2004). The first known miRNA was discovered as a non-protein-
coding product of the gene lin-4, a regulator of developmental timing in C. elegans (Lee 
et al., 1993). Interestingly, it was found that the miRNA possesses a sequence with 
antisense complementarity to multiple stretches of nucleotides in the lin-14 3′ UTR (Lee 
et al., 1993; Wightman et al., 1993), a region known to govern the regulation of LIN-14 
protein production and consequently the timing of larval development (Wightman et al., 
1991). This observation strongly implied that miRNAs could associate directly to target 
mRNAs and repress the production of the encoded protein. It soon became clear that this 
phenomenon was not a peculiarity of the worm, but rather that there also exist other 
miRNAs such as let-7 (Reinhart et al., 2000) that are deeply conserved across animal life 
(Pasquinelli et al., 2000). Moreover, these two miRNA genes were not sporadic 
examples, but rather comprise an abundant and diverse class of small regulatory RNAs 
(Lagos-Quintana et al., 2001; Lau et al., 2001; Lee and Ambros, 2001), prevalent across 
multiple kingdoms of eukaryotic life, including both plants and animals (Reinhart et al., 
2002; Lim et al., 2003). 
The miRNAs of these species repress a diverse suite of targets, although the 
mechanisms of targeting differ between plants and animals. While plants require 
11
extensive complementary with most of the miRNA (Rhoades et al., 2002), pairing 
between a 7 nt region in the miRNA and a complementary motif in the target mRNA is 
necessary to mediate target repression in animals (Doench and Sharp, 2004) and 
sufficient to predict conserved mRNA targets above the noise of false-positive 
predictions (Brennecke et al., 2005; Krek et al., 2005; Lewis et al., 2005). Animal 
miRNAs can thereby be broadly classified into miRNA families depending on the 
identity of their 7 nt region sequence, as different miRNAs sharing this sequence tend to 
share a similar repertoire of targets (Lewis et al., 2005; Anderson et al., 2008). 
An interesting property that arose from the widespread characterization of 
miRNAs in many animal lineages was the appreciation that many of them have ancient 
origins and are deeply conserved across the animal phylogeny (Grimson et al., 2008; 
Wheeler et al., 2009). Animal miRNAs appear to have evolved concomitantly with the 
beginnings of multicellularity, as they cannot be detected in choanoflagellates, a single-
celled organism considered to be an outgroup to the metazoans (Grimson et al., 2008). 
Following the classification of animal miRNAs into families, it was soon realized that 
many miRNA families have persisted for ~580–670 million years since the emergence of 
bilaterian and metazoan life (Figure 1). Even more miRNA families have emerged among 
the vertebrate and invertebrate clades more recently in evolutionary time (Figure 1), with 
thousands of additional species-specific miRNAs being continually annotated in 
databases (Griffiths-Jones et al., 2008; Kozomara and Griffiths-Jones, 2014). 
Given the preponderance of miRNAs among animal species and their deep 
conservation, it is only natural to wonder about their biological functions. The generation 
of miRNA knockouts in animals has provided a powerful framework by which to 
12
evaluate the in vivo functions of individual miRNAs. While lin-4 and let-7 were 
discovered to play roles in cell-fate decisions during early C. elegans larval development 
(Lee et al., 1993; Wightman et al., 1993; Reinhart et al., 2000), little was known about 
the functions of other worm miRNAs. Through the genetic dissection of components 
governing the left-right asymmetry of chemoreception, the lsy-6 miRNA was discovered 
to repress cog-1, a transcription factor that mediates this cell-fate decision in the worm 
(Johnston and Hobert, 2003). Strikingly, a systematic knockout of ~90 additional worm 
miRNAs revealed that most miRNAs and their families are essential neither for 
development nor viability (Miska et al., 2007; Alvarez-Saavedra and Horvitz, 2010), 
making it challenging to address why they have been conserved. Potentially explaining 
this is the finding that many phenotypes can be observed when such knockouts are 
instead profiled in genetically sensitized backgrounds (Brenner et al., 2010). 
Figure 1. Deep conservation of miRNA families across animal life. Phylogeny of 
animal life, with single-celled choanoflagellates serving as an outgroup species. 
Each number on a node of the tree represents the number of shared miRNA families 
that likely existed in the common ancestor of all of the extant species branching from 
it. Numbers are derived from the latest annotation of conserved miRNA families 
released in targetscan.org.  
 
13
Furthermore, a large-scale miRNA knockout study in the fly revealed that nearly 80% of 
miRNAs exhibit a phenotype, often related to survival and lifespan (Chen et al., 2014). 
Parallel work has revealed functions for miRNAs in vertebrate species. 
MicroRNAs are collectively crucial for early vertebrate development. Losing the ability 
to produce miRNAs in the mouse results in severe abnormalities during day 7.5 of 
embryonic development (E7.5), ultimately resulting in lethality (Bernstein et al., 2003). 
Similarly, in early zebrafish development, loss of miRNAs compromises brain 
morphogenesis (Giraldez et al., 2005), potentially due to the role of miR-430 in the 
timely clearance of maternally deposited mRNA, a process that is crucial during the 
maternal-to-zygotic transition (Giraldez et al., 2006). MicroRNAs also have diverse 
physiological roles in mammals, impacting limb and axial skeletal development [e.g., 
miR-196 (Hornstein et al., 2005; McGlinn et al., 2009)], muscle development and cardiac 
function [e.g., miR-1 (Zhao et al., 2007) and miR-208 (van Rooij et al., 2007)], immune 
system T cell and B cell development [e.g., miR-150 (Xiao et al., 2007) and miR-155 
(Rodriguez et al., 2007; Thai et al., 2007)], immune system granulocytes differentiation 
[e.g., miR-223 (Johnnidis et al., 2008)], the control of the cell cycle and cancer [e.g., miR-
17~92 cluster (Ventura et al., 2008)], and skeletal system osteoclast growth [e.g., miR-34 
(Krzeszinski et al., 2014)]. Mirroring the initial findings of the worm, a systematic 
knockout of ~50 conserved miRNAs in the mouse identified few miRNAs that impact 
viability (Park et al., 2012), potentially due to functional redundancy among different 
members of the same miRNA family. 
Given the deep conservation of miRNAs and the complex spatiotemporal 
expression dynamics that they exhibit during animal development, it is counterintuitive to 
14
observe that the loss of many individual miRNAs generates only subtle phenotypes. 
Collectively, these findings have led many to suggest that most animal miRNAs may 
have evolved to tune the expression of their targets (Bartel and Chen, 2004), 
preferentially targeting lowly abundant mRNAs (Farh et al., 2005) to reduce their 
expression noise (i.e., in conjunction with increased transcription), thereby enhancing the 
precision of protein output during development (Schmiedel et al., 2015). 
 
Biogenesis of microRNAs and mechanisms of targeting 
Encoded from genes, animal miRNAs arise as the final consequence of a multistage 
biogenesis pathway (Figure 2). Like mRNAs, most miRNAs are transcribed by RNA 
Polymerase II, as constituents of precursors termed primary miRNAs (or “pri-miRNAs”) 
(Lee et al., 2004). A unique property of pri-miRNAs is that they encode a ~60-70 nt 
region that folds into a hairpin RNA secondary structure, which is found in either introns 
of protein-coding genes or the exons or introns of non-coding RNA transcripts. The 
hairpin is recognized by the RNase III enzyme Drosha (Lee et al., 2003), which along 
with the co-factor DGCR8 (Denli et al., 2004; Gregory et al., 2004), cleaves the pri-
miRNA about 11 nt from the base of the hairpin, thus liberating a miRNA precursor (or 
“pre-miRNA”). While many hairpins exist in the genome, those recognized by Drosha 
tend to have distinguishing structural features (Lim et al., 2003), and frequently possess 
primary sequence motifs, such as a CNNC motif downstream of the basal stem, a UG at 
the base of the stem, and a UGUG motif in the apical loop (Auyeung et al., 2013). After 
the pre-miRNA is exported to the cytoplasm by Exportin-5 (Yi et al., 2003; Lund et al., 
2004), the RNase III enzyme Dicer cleaves it to yield a double-stranded duplex (Grishok 
15
et al., 2001; Hutvagner et al., 2001; Ketting et al., 2001; Knight and Bass, 2001). Finally, 
one strand of this duplex, termed the mature miRNA, is loaded into Argonaute based 
upon the thermodynamic asymmetry of the duplex (Khvorova et al., 2003; Schwarz et al., 
2003). 
Argonaute (Ago) proteins are the effectors of miRNA-mediated repression and 
are central to the mechanism of target recognition (Grishok et al., 2001; Meister et al., 
2004; Vaucheret et al., 2004). It is within the context of this ribonucleoprotein complex 
that Argonaute provides a molecular scaffold for the miRNA to nucleate pairing to a 
target RNA through its 5′ end (Schirle et al., 2014). It is common for plant miRNAs to 
have near-perfect complementarity to their targets, resulting in the Ago-mediated 
cleavage and ultimate degradation of the target (Rhoades et al., 2002). Although this 
Figure 2. Biogenesis of animal miRNAs and targeting mechanisms. 
Transcriptional activity within the genome gives rise to a hairpin-forming primary 
transcript. Drosha recognizes and cleaves this substrate, which is exported to the 
cytosol and cleaved by Dicer into an RNA duplex. One strand of this duplex is loaded 
into Argonaute, and upon recognition of a target mRNA through interactions in the 
miRNA seed region this mature miRNA is competent to modestly repress translation 
(grey lines) or destabilize the target mRNA (black lines). As an alternate mechanism, 
if the miRNA pairs very extensively to a target, it can mediate mRNA cleavage. 
 
16
mechanism also exists in animals, in practice it is rare among most species, with HOXB8 
being one of the few known cleavage targets of an animal miRNA (Yekta et al., 2004). 
Instead, the recognition of animal miRNA targets is thought to be guided predominately 
by perfect Watson-Crick base pairing interactions between nucleotides 2–7 from the 5′ 
end of the miRNA (i.e., the “seed” region) and complementary motifs embedded in the 3′ 
UTRs of the target mRNAs (Lewis et al., 2003; Doench and Sharp, 2004; Brennecke et 
al., 2005; Lewis et al., 2005; Lim et al., 2005; Bartel, 2009). Numerous studies have 
reported that functional regions that pair to the seed (i.e., seed matches) are enriched in 
the 3′ UTRs of transcripts relative to the 5′ UTRs and ORFs (Lewis et al., 2005; Grimson 
et al., 2007; Baek et al., 2008).  This effect has been attributed to the fact that both 5′ 
UTR and ORF sites exist in the path of actively scanning and translating ribosomes, 
respectively (Grimson et al., 2007; Gu et al., 2009). Rather than directing cleavage, Ago 
binding to a seed match results in the recruitment of the CCR4-NOT deadenylase 
complex through an intermediate scaffold protein known as GW182 (Behm-Ansmant et 
al., 2006; Eulalio et al., 2008; Braun et al., 2011; Chekulaeva et al., 2011; Fabian et al., 
2011). Although deadenylation leads to a brief period of translational repression (Bazzini 
et al., 2012; Eichhorn et al., 2014), the predominant effect of a miRNA is to orchestrate 
the degradation of a target mRNA (Baek et al., 2008; Guo et al., 2010; Eichhorn et al., 
2014). 
 
 
Computational approaches to microRNA target prediction 
Many of the principles of miRNA target recognition were discovered through 
computational means, either through evolutionary analyses investigating the signals of 
17
selection, or through analyses of miRNA perturbation datasets to uncover determinants of 
targeting. Although the high complementarity of plant miRNA targets made it 
straightforward to derive simple rules to predict such targets (Rhoades et al., 2002; Jones-
Rhoades and Bartel, 2004; Allen et al., 2005), it was quickly realized that the prediction 
of animal miRNA targets was more challenging. An analysis of preferentially conserved 
miRNA-pairing motifs among three mammalian genomes revealed a signature of 
enriched pairing to the miRNA 5′ end relative to the sequences of shuffled miRNAs 
(Lewis et al., 2003). Soon thereafter, it was realized that animal miRNAs recognize 
several classes of target sites (also known as “miRNA recognition elements”) that 
typically range from 6–8 nt in length (Figure 3A). These are called “canonical site types” 
because they each maintain perfect Watson–Crick pairing to the seed region of the 
miRNA (Bartel, 2009). The five canonical site types, each having a signature of 
conservation among vertebrate genomes, are the 8mer site [match to miRNA positions 2–
8 with an A opposite position 1 (Lewis et al., 2005)], 7mer-m8 site [position 2–8 match 
(Lewis et al., 2003; Brennecke et al., 2005; Krek et al., 2005; Lewis et al., 2005)], 7mer-
A1 site [position 2–7 match with an A opposite position 1 (Lewis et al., 2005)], 6mer site 
[position 2–7 match (Lewis et al., 2005)], and offset 6mer site [position 3–8 match 
(Friedman et al., 2009)]. It was discovered that the preference for the conservation of an 
adenosine opposite position 1 is independent of the miRNA nucleotide identity (Lewis et 
al., 2005). Collectively, these rules of pairing have been among the most sensitive signals 
in detecting animal miRNA targets, and many algorithms search for canonical sites in 3′ 
UTRs as an initial step towards the identification of miRNA targets (Lewis et al., 2003; 
Lewis et al., 2005; Gaidatzis et al., 2007; Grimson et al., 2007; Nielsen et al., 2007; 
18
Wang and El Naqa, 2008; Garcia et al., 2011; Anders et al., 2012; Reczko et al., 2012). 
Despite extensive efforts, other site types have not been identified that exhibit a 
genome-wide signal for preferential conservation, including those possessing only a 
single mismatch or G:U wobble position to the seed region (Friedman et al., 2009). 
However, these findings do not preclude that possibility that such functional binding sites 
exist, or even that some are truly conserved. Indeed, there are a few confirmed instances 
in which effective sites have been observed to lack canonical seed pairing (i.e., called 
“non-canonical” sites, Figure 3B). For example, very extensive pairing to the 3′ region of 
the miRNA can compensate for a wobble or mismatch to one of the seed positions 
(Brennecke et al., 2005; Bartel, 2009), as exemplified by the two let-7 sites within the 3′ 
UTR of C. elegans lin-41 (Reinhart et al., 2000). These 3′-supplementary sites are 
exceedingly rare, with conserved miRNA families in mammals and nematodes each 
averaging <1 preferentially conserved 3′-supplementary site (Friedman et al., 2009; Jan et 
al., 2011). Other relatively rare, yet effective sites include centered sites, which have 11–
Figure 3. Site types recognized by a miRNA. A) Five canonical site types, often located in 
3′ UTRs, which can be recognized by miRNAs. The sites pair perfectly to the miRNA seed 
region through Watson-Crick base pairing interactions (vertical black lines), aside from an 
unpaired adenosine that is favored in 8mer or 7mer-A1 sites. B) Two non-canonical site 
types, characterized by a mismatch or bulge in the seed–target interface, which can be 
recognized by miRNAs. 
 
19
12 contiguous Watson–Crick pairs to the center of the miRNA (Shin et al., 2010). Many 
computational techniques have attempted to identify additional non-canonical sites 
(Miranda et al., 2006; Kertesz et al., 2007; Griffiths-Jones et al., 2008; Betel et al., 2010; 
Liu et al., 2010; Sturm et al., 2010; Wen et al., 2011; Vejnar and Zdobnov, 2012; Marin 
et al., 2013; Bandyopadhyay et al., 2015; Gumienny and Zavolan, 2015), though the 
utility of these predictions remains unclear given that these sites show no evidence for 
preferential conservation. 
The length and information content of the motifs that miRNAs recognize 
influence the frequency of finding such motifs in genomic sequences. Because plant 
miRNAs require extensive complementarity to repress targets, they tend to have a small 
number of targets, which are often important developmental regulators such as 
transcription factors and hormone signaling proteins (Rhoades et al., 2002; Jones-
Rhoades and Bartel, 2004; Allen et al., 2005). In contrast, the small size of animal 
miRNA target sites endows them with the property that they occur frequently in 3′ UTRs, 
which opens the possibility that the network of miRNA targets is much larger in animals. 
One method of assessing the scope of miRNA targeting in animals has been to quantify 
the signal for enrichment of predicted miRNA target sites relative to control k-mer 
sequences with the same length and similar nucleotide composition. These estimates have 
evolved with time depending upon: i) the availability of sequenced genomes for 
comparative analysis among species, ii) the quality of genome-wide multiple sequence 
alignments, and iii) the sophistication of evolutionary genomic techniques to detect 
signals for selection. 
The first attempts to estimate this number suggested that miRNAs conserved 
20
among vertebrates target at least 400 mRNAs, or 1–2% of mRNAs (Lewis et al., 2003). 
As more mammalian genomes became available, this estimate expanded to 20–30% of 
mRNAs (Lewis et al., 2005; Xie et al., 2005). Finally, a method that accounted for the 
relatedness of species among a phylogeny and controlled for both dinucleotide and 3′ 
UTR conservation rates significantly expanded this estimate, implicating >60% of 
mRNAs as having undergone selective pressure to maintain pairing to miRNAs 
(Friedman et al., 2009). This finding illustrates the widespread connectivity of the 
miRNA targeting network (Figure 4), with >400 conserved targeting interactions on 
average per conserved miRNA family, and 4–5 conserved sites on average per mRNA 
(Friedman et al., 2009). In reality, the number of functional miRNA target sites is likely 
much higher as most sites are non-conserved yet can still function to reduce mRNA 
levels and protein output (Farh et al., 2005; Krutzfeldt et al., 2005; Lim et al., 2005; 
Grimson et al., 2007; Baek et al., 2008; Selbach et al., 2008). 
Despite the immensity of the miRNA–target regulatory network, the vast majority 
of target sites confer little to no repression (Figure 4). This implies that the mere presence 
of a target site is not always sufficient to mediate repression, and that other determinants 
Figure 4. Widespread connectivity of 
miRNAs in gene-regulatory networks. 
Graph of the proposed connectivity 
structure of a typical vertebrate-conserved 
miRNA in its network (above). The length 
of each edge emerging from the miRNA 
represents the amount of repression 
conferred upon that individual target. Most 
edges are short, indicating a large number 
of targets are weakly repressed. View from 
the perspective of a conserved mRNA 
(below). On average, each mRNA has 4–5 
conserved sites in its 3′ UTR for different 
miRNAs, and more non-conserved sites 
(not shown). 
 
 
21
can influence site efficacy. Over the years, computational work re-analyzing 
transcriptome-wide data in the context of a miRNA perturbation has revealed a number of 
such determinants. The earliest determinants were discovered as features that simply 
display a correlation to increased site efficacy, and could thereby be utilized to generate 
predictive models of target site efficacy. Factors that have somehow been shown to 
influence site efficacy include A/U composition in the site’s 3′ UTR (Robins and Press, 
2005; Hausser et al., 2009), site conservation (Nielsen et al., 2007; Friedman et al., 2009), 
A/U composition in vicinity of the target site (Grimson et al., 2007; Nielsen et al., 2007), 
proximity of the site to the stop codon or poly(A) tail (Grimson et al., 2007), 3′ UTR 
length (Hausser et al., 2009), target sites in the ORF (Grimson et al., 2007; Reczko et al., 
2012), RNA secondary structure in vicinity of the target site (Kertesz et al., 2007), 
thermodynamic stability of base-pairing (Garcia et al., 2011), and target site abundance in 
the transcriptome (Arvey et al., 2010; Garcia et al., 2011). The very best targets of a 
miRNA often have multiple 3′ UTR binding sites, as these sites typically behave either 
independently (Grimson et al., 2007; Nielsen et al., 2007) or cooperatively (Grimson et 
al., 2007) depending upon their distance from each other. 
In building quantitative models of miRNA target prediction, different groups have 
each evaluated only a subset of these features. Early work trained the parameters of a 
regression model on experimental data after using hand-selected features (Grimson et al., 
2007; Kertesz et al., 2007; Nielsen et al., 2007; Garcia et al., 2011). In principle, a better 
approach would be to automate the selection of features using techniques from machine 
learning, which would avoid the potential pitfalls of having preconceptions of which 
features are useful. Many algorithms have attempted this (Wang and El Naqa, 2008; Betel 
22
et al., 2010; Liu et al., 2010; Wen et al., 2011; Reczko et al., 2012; Vejnar and Zdobnov, 
2012), but in practice their empirical performance remains unclear because there have 
been few comprehensive comparisons to evaluate their predictive accuracy. Furthermore, 
the quality of such a model depends heavily upon the nature of the training set used. The 
prediction of miRNA targets is crucial in assessing our understanding of the features 
influencing miRNA targeting, generating predictions of targets that the experimental 
community can prioritize as candidates of interest, and interpreting the functions of 
individual miRNAs in the context of the gene-regulatory networks to which they belong. 
 
 
Experimental approaches to microRNA target identification 
In the quest to characterize miRNA targets, experimentation has proven crucial for both 
assessing site efficacy and for directly probing miRNA–target interactions. The earliest 
experiments were low-throughput, validating the effects of single miRNA–target 
interactions using luciferase reporter assays (Doench and Sharp, 2004). In such an 
experiment, a miRNA was transfected into cultured cells and relative luciferase activity 
was measured for reporters fused to a 3′ UTR harboring a wild type or mutated target site 
for the miRNA. To parallelize such measurements and quantify the effects of a miRNA 
on endogenous genes, improved high-throughput methods were developed (Figure 5A), 
using microarrays to assess the effects of a transfected miRNA on the entire 
transcriptome (Lim et al., 2005). This approach, which obtains global mRNA fold change 
information in the context of a miRNA transfection, has provided a valuable resource for 
inquiry into the determinants that influence mRNA repression (Lim et al., 2005; 
Birmingham et al., 2006; Jackson et al., 2006a; Jackson et al., 2006b; Schwarz et al., 
23
2006; Grimson et al., 2007; Linsley et al., 2007; Anderson et al., 2008). It has uncovered 
the key principle that the site type is the most important determinant in predicting the 
amount of repression an mRNA will experience (Grimson et al., 2007; Nielsen et al., 
2007) (Figure 5B). It has also confirmed that miRNAs can regulate the expression levels 
of hundreds of mRNAs simultaneously (Lim et al., 2005), reinforcing the enormity of 
animal miRNA–target regulatory networks (Figure 4). 
The crosslinking and immunoprecipitation (CLIP) approach has emerged as 
powerful means of interrogating RNA–protein interactions (Ule et al., 2003). Such an 
approach depends upon the property that ultraviolet light can induce covalent crosslinks 
Figure 5. Measuring the effect of miRNAs on the transcriptome. A) Outline of a typical 
miRNA transfection experiment performed in mammalian cell culture, in which relative mRNA 
expression levels are measured using microarrays, and compared to each other in miRNA 
transfection (the experimental group) vs mock transfection (the control group) conditions. B) A 
plot of cumulative distributions of mRNA fold changes from the experiment devised in part (A) 
can be generated, comparing mRNAs lacking a canonical site in their 3′ UTR (black line) to 
those possessing a single instance of the indicated canonical site type in their 3′ UTR (colored 
lines). Each point of the plot represents the proportion of mRNAs with fold changes less than or 
equal to the corresponding fold change value on the x-axis. The distribution of fold changes is 
left-shifted in mRNAs possessing sites, indicating a global pattern of down-regulation of these 
mRNAs. Furthermore, the magnitude of this shift is indicative of relative strength of the site 
type. Figure 5B is reproduced from Friedman et al. (2009). 
 
24
between amino acids and nucleic acids within short distances. Immunoprecipitation of the 
RNA-binding protein of interest following this crosslinking step can help isolate the 
fragments of RNA that it interacts with (Ule et al., 2003). CLIP combined with high-
throughput sequencing [i.e. “HITS-CLIP” (Chi et al., 2009; Loeb et al., 2012)] and 
photoactivatable-ribonucleoside-enhanced variants of the technique [i.e. “PAR-CLIP” 
(Hafner et al., 2010; Lipchina et al., 2011)] have thus become important orthogonal 
approaches to identify regions of RNA bound by Argonaute in vivo. These approaches all 
observe significant enrichment for seed-matched sites that are cognate to highly abundant 
miRNAs in the vicinity of the crosslinks, validating their ability to detect authentic sites. 
However, they also suffer from the possibility of identifying many false positives, due in 
part to non-specificity of the IP (Friedersdorf and Keene, 2014), cross-linking bias 
(Lambert et al., 2014), and the difficulty of controlling for spurious background signals 
arising from highly abundant mRNAs (Jaskiewicz et al., 2012). The interpretation of 
CLIP datasets are further complicated by the fact that cells express a diversity of 
miRNAs, and information regarding which footprint corresponds to which miRNA has 
been lost. Therefore, a great deal of effort has been placed to infer the specific miRNA 
associated with each region sequenced (Chi et al., 2009; Hafner et al., 2010; Kishore et 
al., 2011; Jaskiewicz et al., 2012; Khorshid et al., 2013; Majoros et al., 2013). In attempts 
to circumvent these problems, other biochemical strategies have been devised. One such 
technique is called IMPACT-seq (identification of miRNA-responsive elements by pull-
down and alignment of captive transcripts—sequencing), which sequences mRNA 
fragments that co-purify with a biotinylated miRNA without the need for crosslinking 
(Tan et al., 2014). Another is called CLASH (crosslinking, ligation, and sequencing of 
25
hybrids), a high-throughput technique that generates miRNA–mRNA chimeras, which 
each identify a miRNA and the mRNA region that it binds (Helwak et al., 2013). A re-
analysis found that many miRNA–mRNA chimeras also exist in Ago CLIP datasets, 
likely due to the activity of an endogenous RNA ligase (Grosswendt et al., 2014). 
Although chimeras unambiguously identify miRNA–mRNA interactions, they too have 
limitations in that: i) chimeras are rare and comprise only a small subpopulation of the 
sequencing data, and ii) certain types of interactions may be favored in the miRNA–
mRNA ligation over others, giving a potentially biased representation of targeting 
interactions (Helwak et al., 2013; Grosswendt et al., 2014). 
Many of the RNA–protein interactions recovered from CLIP do not contain 
canonical sites that are cognate to any miRNA known to be expressed in the 
corresponding cell line (Hafner et al., 2010; Chi et al., 2012). This observation has led to 
the proposition that novel types of non-canonical sites might explain the detection of 
these “orphan clusters”. The first novel non-canonical site to be discovered as enriched in 
CLIP data was the “nucleation-bulge” site, which is characterized by a single-nucleotide 
bulge in the target site between nucleotides 6–7 of the miRNA (Chi et al., 2012). Another 
study identified ~30 non-canonical miR-155 sites—each with heterogeneous styles of 
pairing in the seed region—in wild-type but not miR-155 knockout T cells (Loeb et al., 
2012). Data derived from CLASH further extended the types of non-canonical sites, 
implicating sites with stronger pairing in the center or 3′ end of a miRNA as governing 
binding (Helwak et al., 2013). Chimeras detected in CLIP, in contrast, have suggested 
that a variety of non-canonical sites exist in worms and mammals, although these sites 
tend to maintain pairing to the seed region of the miRNA (Grosswendt et al., 2014). 
26
Finally, methods not relying on crosslinking have proposed that weak pairing in the 5′ or 
3′ end of the miRNA is sufficient for binding (Tan et al., 2014). Because each study 
converges on different styles of non-canonical pairing, there does not appear to exist a 
unifying theme to explain the types of non-canonical sites that have been observed. The 
studies only agree that novel non-canonical sites that can mediate mRNA repression are 
more widespread than previously imagined. They all propose to expand the definition of 
functional target sites to incorporate non-canonical sites, a move poised to at least double 
the number of functional sites. Collectively, experiments focused on collecting 
transcriptome-wide data have provided foundational information for unraveling the 
structure of miRNA regulatory networks.  
27
References 
Allen, E., Xie, Z., Gustafson, A.M., and Carrington, J.C. (2005). microRNA-directed 
phasing during trans-acting siRNA biogenesis in plants. Cell 121, 207-221. 
Alvarez-Saavedra, E., and Horvitz, H.R. (2010). Many Families of C. elegans 
MicroRNAs Are Not Essential for Development or Viability. Current Biology 20, 
367-373. 
Anders, G., Mackowiak, S.D., Jens, M., Maaskola, J., Kuntzagk, A., Rajewsky, N., 
Landthaler, M., and Dieterich, C. (2012). doRiNA: a database of RNA 
interactions in post-transcriptional regulation. Nucleic Acids Res 40, D180-D186. 
Anderson, E.M., Birmingham, A., Baskerville, S., Reynolds, A., Maksimova, E., Leake, 
D., Fedorov, Y., Karpilow, J., and Khvorova, A. (2008). Experimental validation 
of the importance of seed complement frequency to siRNA specificity. RNA 14, 
853-861. 
Arvey, A., Larsson, E., Sander, C., Leslie, C.S., and Marks, D.S. (2010). Target mRNA 
abundance dilutes microRNA and siRNA activity. Mol Syst Biol 6, 363. 
Auyeung, V.C., Ulitsky, I., McGeary, S.E., and Bartel, D.P. (2013). Beyond secondary 
structure: primary-sequence determinants license pri-miRNA hairpins for 
processing. Cell 152, 844-858. 
Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The 
impact of microRNAs on protein output. Nature 455, 64-71. 
Bandyopadhyay, S., Ghosh, D., Mitra, R., and Zhao, Z. (2015). MBSTAR: multiple 
instance learning for predicting specific functional binding sites in microRNA 
targets. Sci Rep 5, 8004. 
Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 
116, 281-297. 
Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 
215-233. 
Bartel, D.P., and Chen, C.Z. (2004). Micromanagers of gene expression: the potentially 
widespread influence of metazoan microRNAs. Nature Reviews Genetics 5, 396-
400. 
Bazzini, A.A., Lee, M.T., and Giraldez, A.J. (2012). Ribosome Profiling Shows That 
miR-430 Reduces Translation Before Causing mRNA Decay in Zebrafish. 
Science 336, 233-237. 
Behm-Ansmant, I., Rehwinkel, J., Doerks, T., Stark, A., Bork, P., and Izaurralde, E. 
(2006). MRNA degradation by miRNAs and GW182 requires both CCR4 : NOT 
deadenylase and DCP1 : DCP2 decapping complexes. Genes & Development 20, 
1885-1898. 
Bernstein, E., Kim, S.Y., Carmell, M.A., Murchison, E.P., Alcorn, H., Li, M.Z., Mills, 
A.A., Elledge, S.J., Anderson, K.V., and Hannon, G.J. (2003). Dicer is essential 
for mouse development. Nature Genetics 35, 215-217. 
Betel, D., Koppal, A., Agius, P., Sander, C., and Leslie, C. (2010). Comprehensive 
modeling of microRNA targets predicts functional non-conserved and non-
canonical sites. Genome Biol 11, R90. 
Birmingham, A., Anderson, E.M., Reynolds, A., Ilsley-Tyree, D., Leake, D., Fedorov, Y., 
Baskerville, S., Maksimova, E., Robinson, K., Karpilow, J., et al. (2006). 3' UTR 
28
seed matches, but not overall identity, are associated with RNAi off-targets. Nat 
Methods 3, 199-204. 
Braun, J.E., Huntzinger, E., Fauser, M., and Izaurralde, E. (2011). GW182 proteins 
directly recruit cytoplasmic deadenylase complexes to miRNA targets. Mol Cell 
44, 120-133. 
Brennecke, J., Stark, A., Russell, R.B., and Cohen, S.M. (2005). Principles of 
microRNA-target recognition. PLoS Biol 3, e85. 
Brenner, J.L., Jasiewicz, K.L., Fahley, A.F., Kemp, B.J., and Abbott, A.L. (2010). Loss 
of individual microRNAs causes mutant phenotypes in sensitized genetic 
backgrounds in C. elegans. Curr Biol 20, 1321-1325. 
Carroll, S.B. (2008). Evo-devo and an expanding evolutionary synthesis: a genetic theory 
of morphological evolution. Cell 134, 25-36. 
Carvunis, A.R., Rolland, T., Wapinski, I., Calderwood, M.A., Yildirim, M.A., Simonis, 
N., Charloteaux, B., Hidalgo, C.A., Barbette, J., Santhanam, B., et al. (2012). 
Proto-genes and de novo gene birth. Nature 487, 370-374. 
Chekulaeva, M., Mathys, H., Zipprich, J.T., Attig, J., Colic, M., Parker, R., and 
Filipowicz, W. (2011). miRNA repression involves GW182-mediated recruitment 
of CCR4-NOT through conserved W-containing motifs. Nat Struct Mol Biol 18, 
1218-1226. 
Chen, Y.W., Song, S.L., Weng, R.F., Verma, P., Kugler, J.M., Buescher, M., Rouam, S., 
and Cohen, S.M. (2014). Systematic Study of Drosophila MicroRNA Functions 
Using a Collection of Targeted Knockout Mutations. Developmental Cell 31, 784-
800. 
Chi, S.W., Hannon, G.J., and Darnell, R.B. (2012). An alternative mode of microRNA 
target recognition. Nat Struct Mol Biol 19, 321-327. 
Chi, S.W., Zang, J.B., Mele, A., and Darnell, R.B. (2009). Argonaute HITS-CLIP 
decodes microRNA-mRNA interaction maps. Nature 460, 479-486. 
Crick, F. (1970). Central dogma of molecular biology. Nature 227, 561-563. 
David, L.A., and Alm, E.J. (2011). Rapid evolutionary innovation during an Archaean 
genetic expansion. Nature 469, 93-96. 
Denli, A.M., Tops, B.B.J., Plasterk, R.H.A., Ketting, R.F., and Hannon, G.J. (2004). 
Processing of primary microRNAs by the Microprocessor complex. Nature 432, 
231-235. 
Doench, J.G., and Sharp, P.A. (2004). Specificity of microRNA target selection in 
translational repression. Genes Dev 18, 504-511. 
Eichhorn, S.W., Guo, H.L., McGeary, S.E., Rodriguez-Mias, R.A., Shin, C., Baek, D., 
Hsu, S.H., Ghoshal, K., Villen, J., and Bartel, D.P. (2014). mRNA Destabilization 
Is the Dominant Effect of Mammalian MicroRNAs by the Time Substantial 
Repression Ensues. Molecular Cell 56, 104-115. 
Eulalio, A., Huntzinger, E., and Izaurralde, E. (2008). GW182 interaction with Argonaute 
is essential for miRNA-mediated translational repression and mRNA decay. Nat 
Struct Mol Biol 15, 346-353. 
Fabian, M.R., Cieplak, M.K., Frank, F., Morita, M., Green, J., Srikumar, T., Nagar, B., 
Yamamoto, T., Raught, B., Duchaine, T.F., et al. (2011). miRNA-mediated 
deadenylation is orchestrated by GW182 through two conserved motifs that 
interact with CCR4-NOT. Nature Structural & Molecular Biology 18, 1211-
29
U1252. 
Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., 
and Bartel, D.P. (2005). The widespread impact of mammalian MicroRNAs on 
mRNA repression and evolution. Science 310, 1817-1821. 
Friedersdorf, M.B., and Keene, J.D. (2014). Advancing the functional utility of PAR-
CLIP by quantifying background binding to mRNAs and lncRNAs. Genome Biol 
15, R2. 
Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian 
mRNAs are conserved targets of microRNAs. Genome Research 19, 92-105. 
Gaidatzis, D., Nimwegen, E., Hausser, J., and Zavolan, M. (2007). Inference of miRNA 
targets using evolutionary conservation and pathway analysis. BMC 
Bioinformatics 8, 248. 
Garcia, D.M., Baek, D., Shin, C., Bell, G.W., Grimson, A., and Bartel, D.P. (2011). 
Weak seed-pairing stability and high target-site abundance decrease the 
proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18, 1139-1146. 
Giraldez, A.J., Cinalli, R.M., Glasner, M.E., Enright, A.J., Thomson, J.M., Baskerville, 
S., Hammond, S.M., Bartel, D.P., and Schier, A.F. (2005). MicroRNAs regulate 
brain morphogenesis in zebrafish. Science 308, 833-838. 
Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, 
A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenylation and 
clearance of maternal mRNAs. Science 312, 75-79. 
Glisovic, T., Bachorik, J.L., Yong, J., and Dreyfuss, G. (2008). RNA-binding proteins 
and post-transcriptional gene regulation. FEBS Lett 582, 1977-1986. 
Gregory, R.I., Yan, K.P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch, N., and 
Shiekhattar, R. (2004). The Microprocessor complex mediates the genesis of 
microRNAs. Nature 432, 235-240. 
Griffiths-Jones, S., Saini, H.K., van Dongen, S., and Enright, A.J. (2008). miRBase: tools 
for microRNA genomics. Nucleic Acids Res 36, D154-158. 
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. 
(2007). MicroRNA targeting specificity in mammals: determinants beyond seed 
pairing. Molecular Cell 27, 91-105. 
Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., 
Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. (2008). Early origins and 
evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455, 
1193-1197. 
Grishok, A., Pasquinelli, A.E., Conte, D., Li, N., Parrish, S., Ha, I., Baillie, D.L., Fire, A., 
Ruvkun, G., and Mello, C.C. (2001). Genes and mechanisms related to RNA 
interference regulate expression of the small temporal RNAs that control C-
elegans developmental timing. Cell 106, 23-34. 
Grosswendt, S., Filipchyk, A., Manzano, M., Klironomos, F., Schilling, M., Herzog, M., 
Gottwein, E., and Rajewsky, N. (2014). Unambiguous Identification of 
miRNA:Target Site Interactions by Different Types of Ligation Reactions. 
Molecular Cell. 
Gu, S., Jin, L., Zhang, F.J., Sarnow, P., and Kay, M.A. (2009). Biological basis for 
restriction of microRNA targets to the 3 ' untranslated region in mammalian 
mRNAs. Nat Struct Mol Biol 16, 144-150. 
30
Gumienny, R., and Zavolan, M. (2015). Accurate transcriptome-wide prediction of 
microRNA targets and small interfering RNA off-targets with MIRZA-G. Nucleic 
Acids Res. 
Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs 
predominantly act to decrease target mRNA levels. Nature 466, 835-840. 
Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., 
Rothballer, A., Ascano, M., Jungkamp, A.C., Munschauer, M., et al. (2010). 
Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA 
Target Sites by PAR-CLIP. Cell 141, 129-141. 
Harbison, C.T., Gordon, D.B., Lee, T.I., Rinaldi, N.J., Macisaac, K.D., Danford, T.W., 
Hannett, N.M., Tagne, J.B., Reynolds, D.B., Yoo, J., et al. (2004). Transcriptional 
regulatory code of a eukaryotic genome. Nature 431, 99-104. 
Hausser, J., Landthaler, M., Jaskiewicz, L., Gaidatzis, D., and Zavolan, M. (2009). 
Relative contribution of sequence and structure features to the mRNA binding of 
Argonaute/EIF2C-miRNA complexes and the degradation of miRNA targets. 
Genome Research 19, 2009-2020. 
Helwak, A., Kudla, G., Dudnakova, T., and Tollervey, D. (2013). Mapping the human 
miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 
654-665. 
Hornstein, E., Mansfield, J.H., Yekta, S., Hu, J.K.H., Harfe, B.D., McManus, M.T., 
Baskerville, S., Bartel, D.P., and Tabin, C.J. (2005). The microRNA miR-196 acts 
upstream of Hoxb8 and Shh in limb development. Nature 438, 671-674. 
Hutvagner, G., McLachlan, J., Pasquinelli, A.E., Balint, E., Tuschl, T., and Zamore, P.D. 
(2001). A cellular function for the RNA-interference enzyme Dicer in the 
maturation of the let-7 small temporal RNA. Science 293, 834-838. 
Jackson, A.L., Burchard, J., Leake, D., Reynolds, A., Schelter, J., Guo, J., Johnson, J.M., 
Lim, L., Karpilow, J., Nichols, K., et al. (2006a). Position-specific chemical 
modification of siRNAs reduces "off-target'' transcript silencing. RNA 12, 1197-
1205. 
Jackson, A.L., Burchard, J., Schelter, J., Chau, B.N., Cleary, M., Lim, L., and Linsley, 
P.S. (2006b). Widespread siRNA "off-target" transcript silencing mediated by 
seed region sequence complementarity. RNA 12, 1179-1187. 
Jacob, F., and Monod, J. (1961). Genetic regulatory mechanisms in the synthesis of 
proteins. J Mol Biol 3, 318-356. 
Jan, C.H., Friedman, R.C., Ruby, J.G., and Bartel, D.P. (2011). Formation, regulation and 
evolution of Caenorhabditis elegans 3'UTRs. Nature. 
Jaskiewicz, L., Bilen, B., Hausser, J., and Zavolan, M. (2012). Argonaute CLIP--a 
method to identify in vivo targets of miRNAs. Methods 58, 106-112. 
Johnnidis, J.B., Harris, M.H., Wheeler, R.T., Stehling-Sun, S., Lam, M.H., Kirak, O., 
Brummelkamp, T.R., Fleming, M.D., and Camargo, F.D. (2008). Regulation of 
progenitor cell proliferation and granulocyte function by microRNA-223. Nature 
451, 1125-1129. 
Johnston, R.J., and Hobert, O. (2003). A microRNA controlling left/right neuronal 
asymmetry in Caenorhabditis elegans. Nature 426, 845-849. 
Jones-Rhoades, M.W., and Bartel, D.P. (2004). Computational identification of plant 
MicroRNAs and their targets, including a stress-induced miRNA. Molecular Cell 
31
14, 787-799. 
Jovanovic, M., Rooney, M.S., Mertins, P., Przybylski, D., Chevrier, N., Satija, R., 
Rodriguez, E.H., Fields, A.P., Schwartz, S., Raychowdhury, R., et al. (2015). 
Immunogenetics. Dynamic profiling of the protein life cycle in response to 
pathogens. Science 347, 1259038. 
Kaessmann, H. (2010). Origins, evolution, and phenotypic impact of new genes. Genome 
Res 20, 1313-1326. 
Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., and Segal, E. (2007). The role of site 
accessibility in microRNA target recognition. Nat Genet 39, 1278-1284. 
Ketting, R.F., Fischer, S.E.J., Bernstein, E., Sijen, T., Hannon, G.J., and Plasterk, R.H.A. 
(2001). Dicer functions in RNA interference and in synthesis of small RNA 
involved in developmental timing in C-elegans. Genes & Development 15, 2654-
2659. 
Khorshid, M., Hausser, J., Zavolan, M., and van Nimwegen, E. (2013). A biophysical 
miRNA-mRNA interaction model infers canonical and noncanonical targets. Nat 
Methods 10, 253-255. 
Khvorova, A., Reynolds, A., and Jayasena, S.D. (2003). Functional siRNAs and miRNAs 
exhibit strand bias. Cell 115, 505-505. 
Kishore, S., Jaskiewicz, L., Burger, L., Hausser, J., Khorshid, M., and Zavolan, M. 
(2011). A quantitative analysis of CLIP methods for identifying binding sites of 
RNA-binding proteins. Nat Methods 8, 559-564. 
Knight, S.W., and Bass, B.L. (2001). A role for the RNase III enzyme DCR-1 in RNA 
interference and germ line development in Caenorhabditis elegans. Science 293, 
2269-2271. 
Kozomara, A., and Griffiths-Jones, S. (2014). miRBase: annotating high confidence 
microRNAs using deep sequencing data. Nucleic Acids Research 42, D68-D73. 
Krek, A., Grun, D., Poy, M.N., Wolf, R., Rosenberg, L., Epstein, E.J., MacMenamin, P., 
da Piedade, I., Gunsalus, K.C., Stoffel, M., et al. (2005). Combinatorial 
microRNA target predictions. Nat Genet 37, 495-500. 
Krutzfeldt, J., Rajewsky, N., Braich, R., Rajeev, K.G., Tuschl, T., Manoharan, M., and 
Stoffel, M. (2005). Silencing of microRNAs in vivo with 'antagomirs'. Nature 
438, 685-689. 
Krzeszinski, J.Y., Wei, W., Huynh, H., Jin, Z., Wang, X., Chang, T.C., Xie, X.J., He, L., 
Mangala, L.S., Lopez-Berestein, G., et al. (2014). miR-34a blocks osteoporosis 
and bone metastasis by inhibiting osteoclastogenesis and Tgif2. Nature 512, 431-
435. 
Lagos-Quintana, M., Rauhut, R., Lendeckel, W., and Tuschl, T. (2001). Identification of 
novel genes coding for small expressed RNAs. Science 294, 853-858. 
Lambert, N., Robertson, A., Jangi, M., McGeary, S., Sharp, P.A., and Burge, C.B. 
(2014). RNA Bind-n-Seq: quantitative assessment of the sequence and structural 
binding specificity of RNA binding proteins. Mol Cell 54, 887-900. 
Lau, N.C., Lim, L.P., Weinstein, E.G., and Bartel, D.P. (2001). An abundant class of tiny 
RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 
858-862. 
Lee, R.C., and Ambros, V. (2001). An extensive class of small RNAs in Caenorhabditis 
elegans. Science 294, 862-864. 
32
Lee, R.C., Feinbaum, R.L., and Ambros, V. (1993). The C. elegans heterochronic gene 
lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843-
854. 
Lee, Y., Ahn, C., Han, J.J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., 
Kim, S., et al. (2003). The nuclear RNase III Drosha initiates microRNA 
processing. Nature 425, 415-419. 
Lee, Y., Kim, M., Han, J.J., Yeom, K.H., Lee, S., Baek, S.H., and Kim, V.N. (2004). 
MicroRNA genes are transcribed by RNA polymerase II. Embo Journal 23, 4051-
4060. 
Levine, M., and Tjian, R. (2003). Transcription regulation and animal diversity. Nature 
424, 147-151. 
Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing, often flanked 
by adenosines, indicates that thousands of human genes are microRNA targets. 
Cell 120, 15-20. 
Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. (2003). 
Prediction of mammalian microRNA targets. Cell 115, 787-798. 
Li, J.J., Bickel, P.J., and Biggin, M.D. (2014). System wide analyses have underestimated 
protein abundances and the importance of transcription in mammals. Peerj 2. 
Lim, L.P., Glasner, M.E., Yekta, S., Burge, C.B., and Bartel, D.P. (2003). Vertebrate 
microRNA genes. Science 299, 1540. 
Lim, L.P., Lau, N.C., Garrett-Engele, P., Grimson, A., Schelter, J.M., Castle, J., Bartel, 
D.P., Linsley, P.S., and Johnson, J.M. (2005). Microarray analysis shows that 
some microRNAs downregulate large numbers of target mRNAs. Nature 433, 
769-773. 
Linsley, P.S., Schelter, J., Burchard, J., Kibukawa, M., Martin, M.M., Bartz, S.R., 
Johnson, J.M., Cummins, J.M., Raymond, C.K., Dai, H., et al. (2007). Transcripts 
targeted by the microRNA-16 family cooperatively regulate cell cycle 
progression. Mol Cell Biol 27, 2240-2252. 
Lipchina, I., Elkabetz, Y., Hafner, M., Sheridan, R., Mihailovic, A., Tuschl, T., Sander, 
C., Studer, L., and Betel, D. (2011). Genome-wide identification of microRNA 
targets in human ES cells reveals a role for miR-302 in modulating BMP 
response. Genes & Development 25, 2173-2186. 
Liu, H., Yue, D., Chen, Y., Gao, S.J., and Huang, Y. (2010). Improving performance of 
mammalian microRNA target prediction. BMC Bioinformatics 11, 476. 
Loeb, G.B., Khan, A.A., Canner, D., Hiatt, J.B., Shendure, J., Darnell, R.B., Leslie, C.S., 
and Rudensky, A.Y. (2012). Transcriptome-wide miR-155 Binding Map Reveals 
Widespread Noncanonical MicroRNA Targeting. Molecular Cell 48, 760-770. 
Lund, E., Guttinger, S., Calado, A., Dahlberg, J.E., and Kutay, U. (2004). Nuclear export 
of microRNA precursors. Science 303, 95-98. 
Majoros, W.H., Lekprasert, P., Mukherjee, N., Skalsky, R.L., Corcoran, D.L., Cullen, 
B.R., and Ohler, U. (2013). MicroRNA target site identification by integrating 
sequence and binding information. Nat Methods 10, 630-633. 
Marin, R.M., Sulc, M., and Vanicek, J. (2013). Searching the coding region for 
microRNA targets. RNA 19, 467-474. 
McGlinn, E., Yekta, S., Mansfield, J.H., Soutschek, J., Bartel, D.P., and Tabin, C.J. 
(2009). In ovo application of antagomiRs indicates a role for miR-196 in 
33
patterning the chick axial skeleton through Hox gene regulation. Proceedings of 
the National Academy of Sciences of the United States of America 106, 18610-
18615. 
Meister, G., Landthaler, M., Patkaniowska, A., Dorsett, Y., Teng, G., and Tuschl, T. 
(2004). Human Argonaute2 mediates RNA cleavage targeted by miRNAs and 
siRNAs. Mol Cell 15, 185-197. 
Miranda, K.C., Huynh, T., Tay, Y., Ang, Y.S., Tam, W.L., Thomson, A.M., Lim, B., and 
Rigoutsos, I. (2006). A pattern-based method for the identification of microRNA 
binding sites and their corresponding heteroduplexes. Cell 126, 1203-1217. 
Miska, E.A., Alvarez-Saavedra, E., Abbott, A.L., Lau, N.C., Hellman, A.B., McGonagle, 
S.M., Bartel, D.P., Ambros, V.R., and Horvitz, H.R. (2007). Most Caenorhabditis 
elegans microRNAs are individually not essential for development or viability. 
Plos Genetics 3, 2395-2403. 
Nielsen, C.B., Shomron, N., Sandberg, R., Hornstein, E., Kitzman, J., and Burge, C.B. 
(2007). Determinants of targeting by endogenous and exogenous microRNAs and 
siRNAs. RNA 13, 1894-1910. 
Park, C.Y., Jeker, L.T., Carver-Moore, K., Oh, A., Liu, H.J., Cameron, R., Richards, H., 
Li, Z.M., Adler, D., Yoshinaga, Y., et al. (2012). A Resource for the Conditional 
Ablation of microRNAs in the Mouse. Cell Reports 1, 385-391. 
Pasquinelli, A.E., Reinhart, B.J., Slack, F., Martindale, M.Q., Kuroda, M.I., Maller, B., 
Hayward, D.C., Ball, E.E., Degnan, B., Muller, P., et al. (2000). Conservation of 
the sequence and temporal expression of let-7 heterochronic regulatory RNA. 
Nature 408, 86-89. 
Reczko, M., Maragkakis, M., Alexiou, P., Grosse, I., and Hatzigeorgiou, A.G. (2012). 
Functional microRNA targets in protein coding sequences. Bioinformatics 28, 
771-776. 
Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., 
Horvitz, H.R., and Ruvkun, G. (2000). The 21-nucleotide let-7 RNA regulates 
developmental timing in Caenorhabditis elegans. Nature 403, 901-906. 
Reinhart, B.J., Weinstein, E.G., Rhoades, M.W., Bartel, B., and Bartel, D.P. (2002). 
MicroRNAs in plants. Genes Dev 16, 1616-1626. 
Ren, B., Robert, F., Wyrick, J.J., Aparicio, O., Jennings, E.G., Simon, I., Zeitlinger, J., 
Schreiber, J., Hannett, N., Kanin, E., et al. (2000). Genome-wide location and 
function of DNA binding proteins. Science 290, 2306-2309. 
Rhoades, M.W., Reinhart, B.J., Lim, L.P., Burge, C.B., Bartel, B., and Bartel, D.P. 
(2002). Prediction of plant microRNA targets. Cell 110, 513-520. 
Robins, H., and Press, W.H. (2005). Human microRNAs target a functionally distinct 
population of genes with AT-rich 3' UTRs. Proc Natl Acad Sci USA 102, 15557-
15562. 
Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R., van 
Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A., et al. (2007). Requirement of 
bic/microRNA-155 for normal immune function. Science 316, 608-611. 
Schirle, N.T., Sheu-Gruttadauria, J., and MacRae, I.J. (2014). Structural basis for 
microRNA targeting. Science 346, 608-613. 
Schmiedel, J.M., Klemm, S.L., Zheng, Y., Sahay, A., Bluthgen, N., Marks, D.S., and van 
Oudenaarden, A. (2015). Gene expression. MicroRNA control of protein 
34
expression noise. Science 348, 128-132. 
Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W., 
and Selbach, M. (2011). Global quantification of mammalian gene expression 
control. Nature 473, 337-342. 
Schwarz, D.S., Ding, H.L., Kennington, L., Moore, J.T., Schelter, J., Burchard, J., 
Linsley, P.S., Aronin, N., Xu, Z.S., and Zamore, P.D. (2006). Designing siRNA 
that distinguish between genes that differ by a single nucleotide. PLoS Genetics 2, 
1307-1318. 
Schwarz, D.S., Hutvagner, G., Du, T., Xu, Z.S., Aronin, N., and Zamore, P.D. (2003). 
Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199-208. 
Selbach, M., Schwanhausser, B., Thierfelder, N., Fang, Z., Khanin, R., and Rajewsky, N. 
(2008). Widespread changes in protein synthesis induced by microRNAs. Nature 
455, 58-63. 
Shin, C., Nam, J.W., Farh, K.K.H., Chiang, H.R., Shkumatava, A., and Bartel, D.P. 
(2010). Expanding the MicroRNA Targeting Code: Functional Sites with 
Centered Pairing. Molecular Cell 38, 789-802. 
Sturm, M., Hackenberg, M., Langenberger, D., and Frishman, D. (2010). TargetSpy: a 
supervised machine learning approach for microRNA target prediction. BMC 
Bioinformatics 11. 
Tan, S.M., Kirchner, R., Jin, J., Hofmann, O., McReynolds, L., Hide, W., and Lieberman, 
J. (2014). Sequencing of Captive Target Transcripts Identifies the Network of 
Regulated Genes and Functions of Primate-Specific miR-522. Cell Reports 8, 
1225-1239. 
Tautz, D., and Domazet-Loso, T. (2011). The evolutionary origin of orphan genes. Nat 
Rev Genet 12, 692-702. 
Thai, T.H., Calado, D.P., Casola, S., Ansel, K.M., Xiao, C., Xue, Y., Murphy, A., 
Frendewey, D., Valenzuela, D., Kutok, J.L., et al. (2007). Regulation of the 
germinal center response by microRNA-155. Science 316, 604-608. 
Ule, J., Jensen, K.B., Ruggiu, M., Mele, A., Ule, A., and Darnell, R.B. (2003). CLIP 
identifies Nova-regulated RNA networks in the brain. Science 302, 1212-1215. 
van Rooij, E., Sutherland, L.B., Qi, X., Richardson, J.A., Hill, J., and Olson, E.N. (2007). 
Control of stress-dependent cardiac growth and gene expression by a microRNA. 
Science 316, 575-579. 
Vaucheret, H., Vazquez, F., Crete, P., and Bartel, D.P. (2004). The action of 
ARGONAUTE1 in the miRNA pathway and its regulation by the miRNA 
pathway are crucial for plant development. Genes Dev 18, 1187-1197. 
Vejnar, C.E., and Zdobnov, E.M. (2012). MiRmap: comprehensive prediction of 
microRNA target repression strength. Nucleic Acids Res 40, 11673-11683. 
Ventura, A., Young, A.G., Winslow, M.M., Lintault, L., Meissner, A., Erkeland, S.J., 
Newman, J., Bronson, R.T., Crowley, D., Stone, J.R., et al. (2008). Targeted 
deletion reveals essential and overlapping functions of the miR-17 through 92 
family of miRNA clusters. Cell 132, 875-886. 
Wang, X.W., and El Naqa, I.M. (2008). Prediction of both conserved and nonconserved 
microRNA targets in animals. Bioinformatics 24, 325-332. 
Wen, J., Parker, B.J., Jacobsen, A., and Krogh, A. (2011). MicroRNA transfection and 
AGO-bound CLIP-seq data sets reveal distinct determinants of miRNA action. 
35
RNA 17, 820-834. 
Wheeler, B.M., Heimberg, A.M., Moy, V.N., Sperling, E.A., Holstein, T.W., Heber, S., 
and Peterson, K.J. (2009). The deep evolution of metazoan microRNAs. Evol Dev 
11, 50-68. 
Wightman, B., Burglin, T.R., Gatto, J., Arasu, P., and Ruvkun, G. (1991). Negative 
regulatory sequences in the lin-14 3'-untranslated region are necessary to generate 
a temporal switch during Caenorhabditis elegans development. Genes Dev 5, 
1813-1824. 
Wightman, B., Ha, I., and Ruvkun, G. (1993). Posttranscriptional regulation of the 
heterochronic gene lin-14 by lin-4 mediates temporal pattern formation in C. 
elegans. Cell 75, 855-862. 
Wray, G.A. (2007). The evolutionary significance of cis-regulatory mutations. Nat Rev 
Genet 8, 206-216. 
Xiao, C., Calado, D.P., Galler, G., Thai, T.H., Patterson, H.C., Wang, J., Rajewsky, N., 
Bender, T.P., and Rajewsky, K. (2007). MiR-150 controls B cell differentiation 
by targeting the transcription factor c-Myb. Cell 131, 146-159. 
Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., 
and Kellis, M. (2005). Systematic discovery of regulatory motifs in human 
promoters and 3' UTRs by comparison of several mammals. Nature 434, 338-345. 
Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 
mRNA. Science 304, 594-596. 
Yi, R., Qin, Y., Macara, I.G., and Cullen, B.R. (2003). Exportin-5 mediates the nuclear 
export of pre-microRNAs and short hairpin RNAs. Genes & Development 17, 
3011-3016. 
Zhao, Y., Ransom, J.F., Li, A., Vedantham, V., von Drehle, M., Muth, A.N., Tsuchihashi, 
T., McManus, M.T., Schwartz, R.J., and Srivastava, D. (2007). Dysregulation of 
cardiogenesis, cardiac conduction, and cell cycle in mice lacking miRNA-1-2. 
Cell 129, 303-317. 
 
36
Chapter 2. Predicting effective microRNA target sites in mammalian mRNAs 
 
Vikram Agarwal1,2,3, George W. Bell1, Jin-Wu Nam1,2,4, David P. Bartel1,2 
 
1Howard Hughes Medical Institute and Whitehead Institute for Biomedical Research, 
Cambridge, Massachusetts 02142, USA 
2Department of Biology, Massachusetts Institute of Technology, Cambridge, 
Massachusetts 02139, USA 
3Computational and Systems Biology Program, Massachusetts Institute of Technology, 
Cambridge, Massachusetts 02139, USA 
4Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 
133-791, Korea 
 
V.A. carried out computational analysis. G.W.B. overhauled the TargetScan website. J-
W.N. helped process 3P-seq data. V.A. and D.P.B. conceived of the project, designed the 
analyses, and wrote the paper. 
 
Published as: 
Agarwal V, Bell GW, Nam J-W, Bartel DP. "Predicting effective microRNA target sites 
in mammalian mRNAs". eLife 4:e05005.   
37
Abstract 
MicroRNA targets are often recognized through pairing between the miRNA seed region 
and complementary sites within target mRNAs, but not all of these canonical sites are 
equally effective, and both computational and in vivo UV-crosslinking approaches 
suggest that many mRNAs are targeted through non-canonical interactions. Here, we 
show that recently reported non-canonical sites do not mediate repression despite binding 
the miRNA, which indicates that the vast majority of functional sites are canonical. 
Accordingly, we developed an improved quantitative model of canonical targeting, using 
a compendium of experimental datasets that we pre-processed to minimize confounding 
biases. This model, which considers site type and another 14 features to predict the most 
effectively targeted mRNAs, performed significantly better than existing models and was 
as informative as the best high-throughput in vivo crosslinking approaches. It drives the 
latest version of TargetScan (v7.0; targetscan.org), thereby providing a valuable resource 
for placing miRNAs into gene-regulatory networks. 
 
Introduction 
MicroRNAs (miRNAs) are ~22-nt RNAs that mediate post-transcriptional gene 
repression (Bartel, 2004). Bound with an Argonaute protein to form a silencing complex, 
miRNAs function as sequence-specific guides, directing the silencing complex to 
transcripts, primarily through Watson–Crick pairing between the miRNA seed (miRNA 
nucleotides 2–7) and complementary sites within the 3′ untranslated regions (3′ UTRs) of 
target RNAs (Lewis et al., 2005; Bartel, 2009). The miRNAs conserved to fish have been 
grouped into 87 families, each with a unique seed region. On average, each of these 
38
families has >400 conserved targeting interactions, and together these interactions 
involve most mammalian mRNAs (Friedman et al., 2009). In addition, many 
nonconserved interactions also function to reduce mRNA levels and protein output (Farh 
et al., 2005; Krutzfeldt et al., 2005; Lim et al., 2005; Baek et al., 2008; Selbach et al., 
2008). Accordingly, miRNAs have been implicated in a wide range of biological 
processes in worms, flies, and mammals (Kloosterman and Plasterk, 2006; Bushati and 
Cohen, 2007; Stefani and Slack, 2008). Critical for understanding miRNA biology is the 
accurate prediction of miRNA–target interactions. Although numerous advances have 
been made, accurate and specific target predictions remain a challenge. 
Analysis of preferentially conserved miRNA-pairing motifs within 3′ UTRs has 
led to the identification of several classes of target sites (Bartel, 2009). The most effective 
canonical site types, listed in order of decreasing preferential conservation and efficacy, 
are the 8mer site [Watson–Crick match to miRNA positions 2–8 with an A opposite 
position 1 (Lewis et al., 2005)], 7mer-m8 site [position 2–8 match (Lewis et al., 2003; 
Brennecke et al., 2005; Krek et al., 2005; Lewis et al., 2005)],  and 7mer-A1 site 
[position 2–7 match with an A opposite position 1 (Lewis et al., 2005)]. Experiments 
have confirmed that the preference for an adenosine opposite position 1 is independent of 
the miRNA nucleotide identity (Grimson et al., 2007; Nielsen et al., 2007; Baek et al., 
2008) and due to the specific recognition of the target adenosine within a binding pocket 
of Argonaute (Schirle et al., 2014). Two other canonical site types, each associated with 
weaker preferential conservation and much lower efficacy (Friedman et al., 2009), are the 
6mer [position 2–7 match (Lewis et al., 2005)] and offset 6mer [position 3–8 match 
(Friedman et al., 2009)]. Pairing to the 3′ end of the miRNA can supplement canonical 
39
sites, although evidence for the use of this 3′-supplementary pairing is observed for no 
more than 5% of the seed-matched sites (Brennecke et al., 2005; Lewis et al., 2005; 
Grimson et al., 2007; Friedman et al., 2009). 
Some effective sites lack canonical seed pairing. For example, very extensive 
pairing to the 3′ region of the miRNA can compensate for a wobble or mismatch to one 
of the seed positions (Brennecke et al., 2005; Bartel, 2009), as exemplified by the two let-
7 sites within the 3′ UTR of C. elegans lin-41 (Reinhart et al., 2000). Although these 3′-
supplementary sites can be detected above background when searching for preferentially 
conserved pairing configurations, they are exceedingly rare, with conserved miRNA 
families in mammals and nematodes each averaging <1 preferentially conserved 3′-
supplementary site (Friedman et al., 2009; Jan et al., 2011). Other relatively rare, yet 
effective sites include centered sites, which have 11–12 contiguous Watson–Crick pairs 
to the center of the miRNA (Shin et al., 2010), and cleavage sites, which have the very 
extensive pairing required for Argonaute-catalyzed slicing of the mRNA (Yekta et al., 
2004; Davis et al., 2005; Karginov et al., 2010; Shin et al., 2010). The existence of 
additional, still-to-be-characterized types of non-canonical sites is suggested by the large 
number of mRNA regions that crosslink to the silencing complex in vivo yet lack known 
site types matching the cognate miRNA (Chi et al., 2012; Loeb et al., 2012; Helwak et 
al., 2013; Khorshid et al., 2013; Grosswendt et al., 2014). 
With the prediction of hundreds of conserved targets for most mammalian 
miRNAs (and even more nonconserved targets), knowing which targets are expected to 
be most responsive to each miRNA provides important information for both large-scale 
network analyses and detailed experimental follow-up. As previously mentioned, the type 
40
of site (e.g., whether the site is an 8mer or a 7mer-A1) strongly influences the efficacy of 
repression. The number of sites also influences efficacy, with each additional site 
typically acting independently to impart additional repression (Grimson et al., 2007; 
Nielsen et al., 2007), although sites between 8–40 nt of each other tend to act 
cooperatively, and those < 8 nt of each other act competitively (Grimson et al., 2007). 
Additional features of site context help explain why a given site (e.g., a 7mer-m8 site to 
miR-1) can be more effective in one 3′ UTR than it is in another. These features include 
the positioning of the site outside of the path of the ribosome [which includes the first 15 
nt of the 3′ UTR (Grimson et al., 2007)] and the positioning of the site within 3′-UTR 
segments that are more accessible to the silencing complex, as measured by either high 
local AU content (Grimson et al., 2007; Nielsen et al., 2007), high AU content of the 
entire 3′ UTR (Robins and Press, 2005; Hausser et al., 2009), shorter distance from a 3′-
UTR terminus (Grimson et al., 2007), shorter 3′-UTR length (Hausser et al., 2009; Betel 
et al., 2010; Wen et al., 2011; Reczko et al., 2012), or less stable predicted competing 
secondary structure (Robins et al., 2005; Ameres et al., 2007; Kertesz et al., 2007; Long 
et al., 2007; Tafer et al., 2008). Conserved sites are also more effective, in part because 
they tend to reside in more favorable contexts (Grimson et al., 2007; Nielsen et al., 2007). 
Features of the miRNA can also influence site efficacy, with sites being more effective if 
the miRNA has lower target-site abundance (TA) (Arvey et al., 2010; Garcia et al., 2011) 
and stronger predicted seed-pairing stability (SPS) (Garcia et al., 2011). 
 Multiple features can be considered together to build quantitative models of 
targeting efficacy (Grimson et al., 2007; Nielsen et al., 2007; Wang and El Naqa, 2008; 
Betel et al., 2010; Liu et al., 2010; Garcia et al., 2011; Wen et al., 2011; Reczko et al., 
41
2012; Vejnar and Zdobnov, 2012; Marin et al., 2013; Gumienny and Zavolan, 2015). Our 
recent model, called the context-plus (context+) model, considers the features of our 
original context scores [i.e., site type, 3′-supplementary pairing, local AU content, and 
distance from the closest 3′-UTR end (Grimson et al., 2007)], plus two miRNA features 
[TA and SPS (Garcia et al., 2011)]. Although the context+ model was trained using 
multiple regression on 74 high-throughput datasets, the features used to distinguish 
effective sites (the three features of the original context scores) were identified using only 
11 datasets, implying that additional features might be identified through analysis of the 
additional datasets. 
 Here, we examined the function of non-canonical binding sites identified in recent 
studies and found that mRNAs with these sites are not more repressed than mRNAs 
without sites, despite finding compelling evidence that many of these noncanocial sites 
bind the silencing complex in vivo. This finding justified a focus on the statistical 
modeling of canonical, seed-matched sites within 3′ UTRs, which mediate the vast 
majority of repression that can be predicted with current methods. To this end, we pre-
processed the 74 datasets to minimize confounding biases and then used stepwise 
regression to identify the most informative features from a large set of potential targeting 
features. This approach unbiasedly selected 14 features, which were combined to develop 
the context++ model of miRNA targeting efficacy. The context++ model was more 
predictive than any published model and at least as predictive as the most informative in 
vivo crosslinking approaches. As the engine powering the latest version of TargetScan 
(v7.0; targetscan.org), this model provides a valuable resource for placing the miRNAs of 
human, mouse, zebrafish, and other vertebrate species into their respective gene-
42
regulatory networks. 
 
 
Results 
Inefficacy of recently reported non-canonical binding sites 
Several high-throughput crosslinking-immunoprecipitation (CLIP) approaches have been 
applied to identify sites that bind Argonaute in vivo (Chi et al., 2009; Hafner et al., 2010; 
Helwak et al., 2013; Grosswendt et al., 2014). These experiments all observe significant 
enrichment for cognate seed-matched sites in the vicinity of the crosslinks, which 
validates their ability to detect authentic sites. Despite this enrichment, some crosslinks 
do not correspond to canonical sites to the relevant miRNAs, raising the prospect that 
these results might reveal novel types of non-canonical binding that could mediate 
repression. Indeed, five studies have reported crosslinking to non-canonical binding sites 
proposed to mediate repression (Chi et al., 2012; Loeb et al., 2012; Helwak et al., 2013; 
Khorshid et al., 2013; Grosswendt et al., 2014). In addition, another biochemical study 
has reported the identification of non-canonical sites without using any crosslinking (Tan 
et al., 2014). Reasoning that these experimental datasets might provide a resource for 
defining of novel types of sites to be used in target prediction, we re-examined the 
functionality of these sites in mediating target mRNA repression. 
We first examined the efficacy of “nucleation-bulge” sites (Chi et al., 2012), 
which were identified from analysis of differential CLIP (dCLIP) results reporting the 
clusters that appear in the presence of miR-124 (Chi et al., 2009). Nucleation-bulge sites 
consist of 8 nt motifs paired to positions 2–8 of their cognate miRNA seed, with the 
nucleotide opposing position 6 protruding as a bulge but sharing Watson-Crick 
43
complementarity to miRNA position 6. Meta-analysis of miRNA and small-RNA 
transfection datasets revealed significant repression of mRNAs with the canonical site 
types but found no evidence for repression of mRNAs that contain nucleation-bulge sites 
but lack perfectly paired seed-matched sites in their 3′ UTRs (Figure 1–figure 
supplements 1A–B). Reasoning that the nucleation-bulge site might be only marginally 
effective, we examined the early zebrafish embryo with and without Dicer, analyzing the 
targeting by miR-430, the most highly expressed miRNA of the early embryo. Even in 
this system, one of the most sensitive systems for detecting the effects of targeting (where 
a robust repression is observed for mRNAs with only a single 6mer or offset 6mer sites to 
miR-430), we observed no evidence for repression of mRNAs with nucleation-bulge sites 
to miR-430 (Figure 1A, Figure 1–figure supplement 1C, and Figure 1–figure supplement 
4A). Because the nucleation-bulge sites were originally identified and characterized as 
sites to miR-124, we next tried focusing on only miR-124–mediated repression. 
However, even in this more limited context, the mRNAs with nucleation-bulge sites were 
no more repressed than mRNAs without sites (Figure 1–figure supplements 1D–F). 
 
Another study examined the response of 32 mRNAs that lack canonical miR-155 sites yet 
crosslink to Argonaute in wild-type T cells but not T cells isolated from miR-155 
knockout mice (Loeb et al., 2012). As previously observed, we found that the levels of 
these mRNAs tended to increase in T cells lacking miR-155 (Figure 1B). However, a 
closer look at the distribution of mRNA fold changes between wild-type and knockout 
cells revealed a pattern not normally observed for mRNAs with a functional site type. As 
illustrated for the mRNAs with canonical sites (including those supported by CLIP), 
44
when a miRNA is knocked out, the cumulative distribution of fold changes for mRNAs 
with functional site types diverges most from the no-site distribution at the top of the 
curve, which represents the most strongly derepressed mRNAs (Figure 1B). However, for 
the mRNAs harboring non-canonical miR-155 sites, the distribution of fold changes 
converged with the no-site distribution at the top of the curve (Figure 1B), raising doubt 
as to whether non-canonical binding of these mRNAs mediates repression. To investigate 
these mRNAs further, we examined their response to the miR-155 loss in helper T cell 
subtypes 1 and 2 (Th1 and Th2, respectively) and B cells, which are other lymphocytic 
cells in which significant derepression of miR-155 targets is observed in cells lacking 
miR-155 (Rodriguez et al., 2007; Eichhorn et al., 2014). In contrast to mRNAs with 
canonical sites, the mRNAs with non-canonical sites showed no evidence of derepression 
in the knockout cells of each of these cell types, which reinforced the conclusion that 
non-canonical binding of miR-155 does not lead to repression of these mRNAs (Figure 
1C and Figure 1–figure supplement 2). 
 We next probed the functionality of non-canonical interactions identified by 
CLASH (crosslinking, ligation, and sequencing of hybrids), a high-throughput technique 
that generates miRNA–mRNA chimeras, which each identify a miRNA and the mRNA 
region that it binds (Helwak et al., 2013). As previously observed, mRNAs with CLASH-
identified non-canonical interactions involving miR-92 tended to be slightly up-regulated 
upon knockdown of miR-92 in HEK293 cells (Figure 1D). However, a closer look at the 
mRNA fold-change distributions again revealed a pattern not typically observed for 
mRNAs with a functional site type, with convergence with the no-site distribution in the 
region expected to be most divergent. Therefore, we examined a second dataset 
45
monitoring mRNA changes after knocking down miR-92 and other miRNAs in HEK293 
cells (Hafner et al., 2010). As reported recently (Wang, 2014), the slight up-regulation 
observed for mRNAs with CLASH-identified non-canonical interactions in the original 
dataset was not reproducible in the second dataset (Figure 1E). Moreover, mRNAs with 
non-canonical interactions to other miRNAs showed no sign of derepression when the 
cognate miRNAs were knocked down (Figure 1–figure supplement 3A–B). To mirror the 
original analysis of CLASH-identified interactions (Helwak et al., 2013), our analysis 
included sites located in any region of the mature mRNA (Figures 1D–E and Figure 1–
figure supplement 3A). No significant difference from the no-site control distribution was 
observed when restricting our analysis to mRNAs with CLASH-identified non-canonical 
sites in their 3′ UTRs (Figure 1–figure supplement 3B).  
 Many miRNA–mRNA chimeras can also be found in standard AGO CLIP 
datasets, presumably generated by an endogenous ligase acting in cell lysates during 
workup (Grosswendt et al., 2014). Global experiments examining function of these 
interactions group the mRNAs with non-canonical interactions together with those with 
canonical interactions (Grosswendt et al., 2014), and thus the signal for function might 
arise from only canonical interactions. Indeed, when we re-examined the response of 
these mRNAs to miRNA knockdown, those with chimera-identified canonical sites 
tended to be derepressed, whereas those with only chimera-identified non-canonical sites 
did not (Figure 1F and Figure 1–figure supplements 3C–E). Although at first glance this 
finding might seem at odds with the elevated evolutionary conservation of chimera-
identified non-canonical sites (Grosswendt et al., 2014), we found that this conservation 
signal was not smaller for the sites of less conserved miRNAs and therefore was not 
46
indicative of functional miRNA binding (Figure 1–figure supplement 5). Instead, this 
signal might occur for the same reason that artificial sRNAs tend to target conserved 
regions of 3′ UTRs (Nielsen et al., 2007). 
 Next, we evaluated the response of non-canonical sites modeled by MIRZA, an 
algorithm that utilizes CLIP data in conjunction with a biophysical model to predict 
target sites (Khorshid et al., 2013). As noted by others (Majoros et al., 2013), the 
definition of non-canonical MIRZA sites was more expansive than that used elsewhere 
and did not exclude sites with canonical 6mer or offset 6mer seed matches. Indeed, when 
focusing on only targets without 6mer or offset 6mer seed matches, the top 100 non-
canonical MIRZA targets showed no sign of efficacy (Figure 1G). 
 Finally, we examined non-canonical clusters identified by IMPACT-seq 
(identification of miRNA-responsive elements by pull-down and alignment of captive 
transcripts—sequencing), a method that sequences mRNA fragments that co-purify with 
a biotinylated miRNA without crosslinking (Tan et al., 2014). Although the mRNAs with 
an IMPACT-seq-supported canonical site were down-regulated upon the transfection of 
the cognate miRNA, those with an IMPACT-seq-supported non-canonical site responded 
no differently than mRNAs lacking a site (Figure 1H). 
 Collectively, the novel non-canonical sites recently identified in high-throughput 
CLIP and other biochemical studies imparted no detectable repression when monitoring 
mRNA changes. The same was true when examining ribosome-profiling or proteomic 
datasets to capture repression also occurring at the level of translation (Figure 1–figure 
supplement 4).  
 All of our analyses of experimentally identified non-canonical sites examined the 
47
ability of the sites to act in mRNAs that had no seed-matched site to the same miRNA in 
their 3′ UTRs. Any non-canonical site found in a 3′ UTR that also had a seed-matched 
site to the same miRNA was not considered because any response could be attributed to 
the canonical site. At first glance, excluding these co-occuring sites might seem to allow 
for the possibility that the experimentally identified non-canonical sites could contribute 
to repression when in the same 3′ UTR as a canonical site, even though they are 
ineffective in 3′ UTRs without canonical sites. However, in mammals, canonical sites to 
the same miRNA typically act independently (Grimson et al., 2007; Nielsen et al., 2007), 
and we have no reason to think that non-canonical sites would behave differently. More 
importantly, although the non-canonical sites examined were in mRNAs that had no 
seed-matched 3′-UTR site to the same miRNA, most were in mRNAs that had seed-
matched 3′-UTR sites to other miRNAs that were highly expressed in the cells. 
Therefore, even if the non-canonical sites could only function when coupled to a 
canonical site, we would have observed a signal for their function in our analyses. 
 
Confirmation that miRNAs bind to non-canonical sites despite their inefficacy 
The inefficacy of recently reported non-canonical sites was surprising when considering 
evidence that the dCLIP clusters without cognate seed matches are nonetheless enriched 
for imperfect pairing to the miRNA, which would not be expected if those clusters were 
merely non-specific background (Chi et al., 2012; Loeb et al., 2012). Indeed, our analysis 
of motifs within the dCLIP clusters for miR-124 and miR-155 confirmed that those 
without a canonical site to the miRNA were enriched for miRNA pairing (Figure 2A). 
Although one of the motifs identified within CLIP clusters that appeared after 
48
transfection of miR-124 into HeLa cells yet lacked a canonical miR-124 site did not 
match the miRNA (Figure 2–figure supplement 1C), the top motif, as identified by 
MEME (Bailey and Elkan, 1994), had striking complementarity to the miR-124 seed 
region (Figure 2A). This human miR-124 non-canonical motif matched the “nucleation-
bulge” motif originally found for miR-124 in the mouse brain (Chi et al., 2012). 
Although the top motif identified within the subset of miR-155 dCLIP clusters that 
lacked a canonical site to miR-155 was not identified with confidence, it had only a 
single mismatch to the miR-155 seed, which would not have been expected for a motif 
identified by chance. 
 Previous analysis of CLASH-identified interactions shows that the top MEME-
identified motifs usually pair to the miRNA, although for many miRNAs this pairing falls 
outside of the seed region (Helwak et al., 2013). Repeating this analysis, but focusing on 
only interactions without canonical sites, confirmed this result (Figure 2B) (Helwak et al., 
2013). Applying this type of analysis to non-canonical interactions identified from 
miRNA–mRNA chimeras in standard AGO CLIP datasets confirmed that these 
interactions are also enriched for pairing to the miRNA (Grosswendt et al., 2014). As 
previously shown (Grosswendt et al., 2014), these interactions were more specific to the 
seed region than were the CLASH-identified interactions (Figure 2B). Comparison of all 
the chimera data with all the CLASH data showed that a higher fraction of the chimeras 
captured canonical interactions and that a higher fraction captured interactions within 3′ 
UTRs (Figure 2–figure supplement 1A). These results, implying that the chimera 
approach is more effective than CLASH at capturing functional sites that mediate 
repression, motivated a closer look at the chimera-identified interactions that lacked a 
49
canonical site, despite our finding that these interactions do not mediate repression. In the 
human and nematode datasets (and less so in the mouse dataset), these interactions were 
enriched for motifs that corresponded to non-canonical sites that paired to the miRNA 
seed region (Figure 2B and Figure 2–figure supplement 2). Inspection of these motifs 
revealed that the most enriched nucleotides typically preserved Watson–Crick pairing in 
a core 4–5 nts within the seed region, with tolerance to mismatches or G:U wobbles 
observed at varied positions, depending on the miRNA, potentially reflecting seed-
specific structural or energetic features, or perhaps context-dependent biases in 
crosslinking or ligation (Figure 2C and Figure 2–figure supplement 1B).  
Motifs for only a few miRNAs had a bulged nucleotide, and if a bulge was 
observed it was in the mRNA strand and not in the miRNA strand, as expected if the 
Argonaute protein imposed geometric constraints in the seed of the miRNA. The miR-
124 nucleation-bulge site was enriched in mouse chimera interactions (Figure 2–figure 
supplement 2A), as it had been in the human and mouse dCLIP clusters (Figure 2A) (Chi 
et al., 2012). However, despite identification of this miR-124 interaction in datasets from 
two methods and two species, this style of bulged pairing was not detected for any other 
miRNA. Interestingly, for all other cases in which a bulge in the recognition motif was 
observed (human miR-33 and miR-374, and C. elegans miR-50 and miR-58), the bulge 
was between the nucleotides that paired to miRNA nucleotides 4 and 5 (Figure 2–figure 
supplement 1B and Figure 2–figure supplement 2B). A bulge is observed between the 
analogous nucleotides of validated targets of Arabidopsis miR398 (Jones-Rhoades and 
Bartel, 2004), whereas single-nucleotide bulges between other seed-pairing positions 
have not been reported in other validated plant targets. A bulge between these nucleotides 
50
is also observed in the first let-7 site in the C. elegans lin-41 3′ UTR, one of the 
archetypal 3′-compensatory sites (Reinhart et al., 2000; Bartel, 2009). Taken together, 
these observations suggest that the most tolerated bulge in miRNA seed pairing is 
between the target nucleotides that pair to miRNA nucleotides 4 and 5. 
 Some motifs, particularly the more degenerate ones, were found in most of the 
interactions, whereas other motifs were found in only a small minority (Figure 2C and 
Figure 1–figure supplement 1B). We suspect that many of the interactions lacking the 
top-scoring motifs also involve non-canonical binding sites, some of which might 
function through degenerate versions of the motif that happened to have scored highest in 
the MEME analysis. Nonetheless, some interactions or CLIP clusters lacking the top-
scoring motifs might represent background (Friedersdorf and Keene, 2014), and indeed a 
few with the motif or even with a canonical site might represent background. 
 In sum, our analyses of the CLIP datasets confirmed that many of the CLIP 
clusters and CLASH/chimera interactions lacking a seed match nonetheless capture 
authentic miRNA-binding sites—otherwise the top enriched motifs would not pair so 
often to the cognate miRNA. Despite this ability to bind the miRNA in vivo and to 
function in the sense that they contribute to cellular target-site abundance (Denzler et al., 
2014), we classify the CLIP-identified non-canonical sites as non-functional with respect 
to repression because they showed no sign of mediating repression (Figure 1 and Figure 
1–figure supplements 1–4). Thus, the only known non-canonical site types that mediate 
repression are the 3′-supplementary, centered, and cleavage site types, which together 
comprise <1% of the effective sites that currently can be predicted (Friedman et al., 2009; 
Shin et al., 2010). Although we cannot exclude the possibility that additional types of 
51
functional non-canonical sites might exist but have not yet been characterized to the point 
that they can be used for miRNA target prediction (Lal et al., 2009), our analysis of the 
CLIP results justified a focus on the abundant site types that are predictive of targeting 
and at least marginally functional, i.e., the canonical seed-matched sites, including 6mer 
and offset 6mer sites. 
 
Improving dataset quality for model development 
To identify features involved in mammalian miRNA targeting, we analyzed the results of 
microarray datasets reporting the mRNA changes after transfecting either a miRNA or 
siRNA (together referred to as small RNAs, abbreviated as sRNAs) into HeLa cells. 
From the published datasets, we used the set of 74 experiments that had previously been 
selected because each 1) had a clear signal for sRNA-based repression, 2) was acquired 
using the same Agilent array platform, and 3) reported on the effects of a unique seed 
sequence (Garcia et al., 2011). 
 Despite the differences among the 74 transfected sRNAs, mRNA fold changes of 
some arrays were highly correlated with those of others, which indicated that sRNA-
independent effects dominated (Figure 3A). When all 74 datasets were compared against 
each other, those from either the same group of experiments (Anderson et al., 2008) or 
the same transfection protocol (Jackson et al., 2006a; Jackson et al., 2006b; Grimson et 
al., 2007) tended to cluster strongly together based on their common transcriptome-wide 
responses to different transfected sRNAs (Figure 3B), indicating the likely presence of 
batch effects (Leek et al., 2010) that could obscure detection of features associated with 
miRNA targeting. 
52
A parameter known to confound the accurate measurement of mRNA responses 
on microarrays is the relative AU content within 3′ UTRs (Elkon and Agami, 2008). 
Indeed, when considering mRNAs without a canonical site to the transfected sRNA, we 
found that 3′-UTR AU content often correlated with mRNA fold changes. Moreover, the 
extent and direction of the correlation was similar for different datasets from the same 
publication but differed when comparing to datasets from other publications (Figure 3C). 
A second parameter that helped explain the correlated sRNA-independent effects for 
related datasets was 3′-UTR length (Saito and Satrom, 2012), which exhibited patterns of 
correlation similar to those observed for 3′-UTR AU content (Figure 3C). Our 
observation that AU content and 3′-UTR length correlated so differently with global 
expression changes when comparing results from different publications helps explain 
why different 3′-UTR features previously seemed to have such variable predictive power 
in different experimental contexts (Hausser et al., 2009; Wen et al., 2011; Gumienny and 
Zavolan, 2015). 
 Another phenomenon known to systematically perturb the levels of mRNAs 
without sites to the transfected sRNA is the derepression of mRNAs with sites for 
endogenous miRNAs, presumably through competition between the transfected sRNA 
and the endogenous miRNAs for limiting components of the silencing pathway (Khan et 
al., 2009; Saito and Satrom, 2012). Statistically significant derepression was indeed 
observed for mRNAs with sites to eight of the 10 miRNA families most frequently 
sequenced in HeLa cells (Figure 3–figure supplements 1A–B). 
 To correct for biases that were independent of the sequence of the introduced 
sRNA, we used partial least-squares regression (PLSR) to estimate—for each transfection 
53
experiment—the component of the transcriptome response that was similar in other 
highly correlated experiments, and we then subtracted this estimate from the observed 
response (Supplementary file 1). Applying our technique to all the mRNAs in each of the 
74 datasets largely eliminated the correlations observed between datasets (Figures 3D–E), 
as well as the correlations observed between mRNA fold changes and either AU content 
or 3′-UTR length (Figure 3F), which lowered the risk that these effects that are 
independent of the sRNA sequence would confound subsequent analyses of sRNA 
targeting efficacy. Moreover, our technique eliminated the signal for derepression of 
endogenous miRNA targets (Figure 3–figure supplement 1C), suggesting that it did the 
same for any other biases unrelated to the sequence of the transfected sRNA that have yet 
to be identified. Reducing these biases substantially reduced the variance in the response 
for mRNAs without sites to the sRNA, which substantially enhanced the net signal for 
sRNA-mediated repression of site-containing mRNAs observed in individual arrays 
(Figure 3G) and all arrays in aggregate (Figure 3H). 
 Previous studies of miRNA targeting have relied on 3′-UTR annotations from 
databases such as RefSeq, without accounting for abundant alternative 3′-UTR isoforms 
present in the tissue or cell line of interest (Tian et al., 2005). The presence of more than 
one abundant 3′-UTR isoform for a gene would confound interpretation of 3′-UTR-
related features, such as 3′-UTR length, or distance from the closest 3′-UTR end (Nam et 
al., 2014). Moreover, the shorter 3′-UTR isoforms might not include some target sites, 
which would cause these sites to appear ineffective when in fact they are not present 
(Sandberg et al., 2008; Mayr and Bartel, 2009; Lianoglou et al., 2013; Nam et al., 2014). 
To avoid these complications, we examined 3′-UTR isoform quantifications previously 
54
generated for HeLa cells (Nam et al., 2014) using poly(A)-position profiling by 
sequencing (3P-seq) (Jan et al., 2011), and developed our model using the dominant 
mRNA from the subset of genes for which ≥90% of the 3P-seq tags corresponded to a 
single 3′-UTR isoform. To isolate the effects of single sites, we also used the subset of 
these mRNAs for which the 3′ UTR possessed a single seed match to the transfected 
sRNA (Supplementary file 1). 
 
Selecting features and building a regression model for target prediction 
To improve our model of mammalian target-site efficacy, we considered 26 features as 
potentially informative of efficacy. These included features of the sRNAs, features of the 
sites (including their contexts and positions within the mRNAs), and features of the 
mRNAs, many of which had been used or at least considered in previous efforts (Table 
1). 
 One of the 26 features was site PCT (probability of conserved targeting), which 
estimates the probability of the site being preferentially conserved because it is targeted 
by the cognate miRNA (Friedman et al., 2009). Prior to use, our PCT scores were updated 
to take advantage of improvements in both mouse and human 3′-UTR annotations 
(Harrow et al., 2012; Flicek et al., 2014), the additional sequenced vertebrate genomes 
aligned to the mouse and human genomes (Karolchik et al., 2014), and our expanded set 
of miRNA families broadly conserved among vertebrate species, which increased from 
87 to 111 families. Using these updates increased sensitivity, with our estimate for the 
number of human 3′-UTR sites conserved above background increasing from ~46,400 
(Friedman et al., 2009) to ~62,300. The PCT score on its own correlates with site efficacy, 
55
and when using the same set of 3′ UTRs this correlation increased only modestly for the 
new scores (data not shown), consistent with the notion that the evolutionary signal was 
already nearly saturated in the previous analysis of 23 species spanning the vertebrate 
tetrapods (Friedman et al., 2009). Nonetheless, we used our updated PCT score as a 
feature for sites of broadly conserved miRNAs within our training set. 
 A second feature that we re-evaluated was the predicted structural accessibility of 
the site. As scored previously, the degree to which the site nucleotides were predicted to 
be free of pairing to flanking 3′-UTR regions was not informative after controlling for the 
contribution of local AU content (Grimson et al., 2007). However, analysis inspired by 
work on siRNA site accessibility (Tafer et al., 2008) suggested an improved scoring 
scheme for this feature. For this analysis we used RNAplfold (Bernhart et al., 2006) to 
predict the unpaired probabilities for variable-sized windows in the proximity of the site 
and then examined the relationship between these probabilities and the repression 
associated with sites in our compendium of normalized datasets, while controlling for 
local AU content and other features of the context+ model (Figure 4A). Based on these 
results, which resembled those reported previously (Tafer et al., 2008), we scored 
predicted structural accessibility (SA) as proportional to the log10 value of the unpaired 
probability for a 14-nt region centered on the match to miRNA nucleotides 7 and 8. 
 Having assembled a set of candidate features, we used the stepAIC function from 
the “MASS” R package (Venables and Ripley, 2002) to determine which features were 
most useful for modeling site efficacy. This function uses stepwise regression to build 
models with increasing numbers of features until it reaches the optimal Akaike 
Information Criterion (AIC) value. The AIC evaluates the tradeoff between the benefit of 
56
increasing the likelihood of the regression fit and the cost of increasing the complexity of 
the model by adding more variables. For each of the four seed-matched site types, models 
were built for 1000 samples of the dataset. Each sample included 70% of the mRNAs 
with single sites to the transfected sRNA from each experiment (randomly selected 
without replacement), reserving the remaining 30% as a test set. Compared to our 
context-only and context+ models (Grimson et al., 2007; Garcia et al., 2011), the new 
stepwise regression models were significantly better at predicting site efficacy when 
evaluated using their corresponding held-out test sets, as illustrated for the each of four 
site types (Figure 4B). 
 Reasoning that features most predictive would be robustly selected, we focused 
on 14 features selected in nearly all 1000 bootstrap samples for at least two site types 
(Table 1). These included all three features considered in our original context-only model 
(minimum distance from 3′-UTR ends, local AU composition and 3′-supplementary 
pairing), the two added in our context+ model (SPS and TA), as well as nine additional 
features (3′-UTR length, ORF length, predicted structural accessibility, the number of 
offset 6mer sites in the 3′ UTR and 8mer sites in the ORF, the nucleotide identity of 
position 8 of the target, the nucleotide identity of positions 1 and 8 of the sRNA, and site 
conservation). Other features were frequently selected for only one site type (e.g., ORF 
7mer-A1 sites, ORF 7mer-m8 sites, and 5′-UTR length; Table 1). Presumably these and 
other features were not robustly selected because either their correlation with targeting 
efficacy was very weak (e.g., the 7 nt ORF sites) or they were strongly correlated to a 
more informative feature, such that they provided little additional value beyond that of 
the more informative feature (e.g., 3′-UTR AU content compared to the more informative 
57
feature, local AU content). 
 Using the 14 robustly selected features, we trained multiple linear regression 
models on all of the data. The resulting models, one for each of the four site types, were 
collectively called the context++ model (Figure 4C and Figure 4–Source data 1). For 
each feature, the sign of the coefficient indicated the nature of the relationship. For 
example, mRNAs with either longer ORFs or longer 3′ UTRs tended to be more resistant 
to repression (indicated by a positive coefficient), whereas mRNAs with either 
structurally accessible target sites or ORF 8mer sites tended to be more prone to 
repression (indicated by a negative coefficient). Based on the relative magnitudes of the 
regression coefficients, some newly incorporated features such as 3′-UTR length and 
ORF length contributed similarly to features previously incorporated in the context+ 
model, such as SPS, TA, and local AU (Figure 4C). New features with an intermediate 
level of influence included the number of ORF 8mer sites and site conservation as well as 
the presence of a 5′ G in the sRNA (Figure 4C), the latter perhaps a consequence of 
differential sRNA loading efficiency. The weakest features included the sRNA and target 
position 8 identities as well as the number of offset 6mer sites. The identity of sRNA 
nucleotide 8 exhibited a complex pattern that was site-type dependent. Relative to a 
position-8 U in the sRNA, a position-8 C further decreased efficacy of sites with a 
mismatch at this position (6mer or 7mer-A1 sites), whereas a position-8 A had the 
opposite effect (Figure 4C). Similarly, a position-8 C in the site also conferred decreased 
efficacy of 6mer and 7mer-A1 sites relative to a position-8 U in the target (Figure 4C). 
Allowing interaction terms when developing the model, including a term that captured 
the potential interplay between these positions, did not provide sufficient benefit to 
58
justify the more complex model. 
 
Improvement over previous methods 
We compared the predictive performance of our context++ model to that of the most 
recent versions of seventeen in silico tools for predicting miRNA targets, including 
AnTar (Wen et al., 2011), DIANA-microT-CDS (Reczko et al., 2012), ElMMo 
(Gaidatzis et al., 2007), MBSTAR (Bandyopadhyay et al., 2015), miRanda-MicroCosm 
(Griffiths-Jones et al., 2008), miRmap (Vejnar and Zdobnov, 2012), mirSVR (Betel et al., 
2010), miRTarget2 (Wang and El Naqa, 2008), MIRZA-G (Gumienny and Zavolan, 
2015), PACCMIT-CDS (Marin et al., 2013), PicTar2 implemented for predictions 
conserved through mammals, chicken, or fish (PicTarM, PicTarC, and PicTarF, 
respectively) (Anders et al., 2012), PITA (Kertesz et al., 2007), RNA22 (Miranda et al., 
2006), SVMicrO (Liu et al., 2010), TargetRank (Nielsen et al., 2007), and TargetSpy 
(Sturm et al., 2010); as well as successive versions of TargetScan, which offer context 
scores (Grimson et al., 2007), PCT scores (Friedman et al., 2009), or context+ scores 
(Garcia et al., 2011) as options for ranking predictions (TargetScan5, TargetScan.PCT, or 
TargetScan6, respectively) for either all mRNAs with a canonical 7–8 nt 3′-UTR site 
(TargetScan.All) or those with only broadly conserved sites (TargetScan.Cons). To the 
best of our knowledge, algorithms excluded from the comparison either were not de novo 
prediction algorithms (i.e., relying purely on consensus techniques or experimental data), 
did not provide a pre-computed database of results, or lacked a numerical value (or 
ranking) of either target-prediction confidence or mRNA responsiveness. To test the 
performance of the included methods, we used the results of seven microarray datasets 
59
that each monitor mRNA changes after transfection of a conserved miRNA into HCT116 
cells containing a hypomorphic mutant for Dicer (Linsley et al., 2007). These datasets 
differ from those used during development and training of our model with respect to both 
the cell type and the identities of the sRNAs. To prevent our model from gaining an 
advantage over methods that used standard 3′-UTR annotations, we used RefSeq-
annotated 3′ UTRs (rather than 3P-seq–supported annotations) to generate the context++ 
test-set predictions, choosing the longest 3′ UTR to represent genes with multiple 
annotated 3′ UTRs. For each 3′ UTR containing multiple sites to the cognate miRNA, the 
context++ scores of individual sites were summed to generate the total context++ score to 
be used to rank that predicted target. 
 The number of potential miRNA–mRNA interactions considered by the different 
methods varied greatly (Figure 5A), which reflected the varied strategies and priorities of 
these prediction efforts. Out of a concern for prediction specificity, many efforts only 
consider interactions involving 7–8 nt seed-matched sites. Accordingly, we first tested 
how well each of the methods predicted the repression of mRNAs with at least one 
canonical 7–8 nt 3′-UTR site (Figure 5B). The context++ model performed substantially 
better than the most predictive published model, which was TargetScan6.All. Of 
algorithms derived from other groups, DIANA-microT-CDS, miRTarget2, miRanda-
miRSVR, MIRZA-G (and its derivatives), and TargetRank were the most predictive, with 
performance within range of TargetScan5.All (Figure 5B). 
 Part of the reason that some algorithms performed more poorly is that they 
consider relatively few potential miRNA–target interactions (Figure 5A).  For example, 
the drop in performance observed between TargetScan.All and TargetScan.Cons 
60
illustrates the effect of limiting analysis to the more highly conserved sites.  Nonetheless, 
the performance of TargetScan.Cons relative to other methods that consider relatively 
few sites shows that a signal can be observed in this assay even when a very limited 
number of interactions are scored (Figures 5A–B), presumably because much of the 
functional targeting is through conserved interactions. Indeed, the performance of 
ElMMO and TargetScan.PCT illustrate what can be achieved by scoring just the extent of 
site conservation and no other parameter. 
 In an attempt to maximize prediction sensitivity, some efforts consider many 
interactions that lack a canonical 7–8 nt 3′-UTR site (Figure 5A). However, all of these 
algorithms performed poorly in predicting the response of mRNAs lacking such sites 
(Figure 5C). The two algorithms achieving any semblance of prediction accuracy did so 
by predicting some of the canonical interactions with known marginal efficacy.  These 
were DIANA-microT-CDS, which captured modest effects of canonical sites in ORFs 
(Reczko et al., 2012; Marin et al., 2013), and the context++ model, which captured the 
modest effects of canonical 6mers in 3′ UTRs (as modified by the 14 features, which 
included offset 6mers and 8mer ORF sites) (Figure 5C). The algorithms designed to 
identify many non-canonical sites performed much more poorly in this test (r2 < 0.004), 
consistent with the idea that the vast majority of mRNAs without canonical sites either do 
not change in response to the miRNA or change in an unpredictable fashion as a 
secondary effect of introducing the miRNA. 
 Another way to evaluate the performance of targeting algorithms is to examine 
the repression of the top predicted targets.  Compared to the r2 test, this approach does 
not penalize efforts that either impose more stringent cutoffs to achieve higher prediction 
61
specificity or implement scoring schemes that are not designed to correlate directly with 
site efficacy. Perhaps most importantly, this approach aligns with the goals of a biologist 
considering the top-ranked predictions in an attempt to focus on those most likely to 
undergo substantial repression. When choosing an average of 16 predicted targets for 
each of the seven test-set miRNAs, we found that these top 112 predictions of the 
context++ model were significantly more repressed than the top predictions from earlier 
versions of TargetScan (Figure 5D) and the top predictions of the other algorithms 
(Figure 5–figure supplement 1A). 
 Despite the success of the context++ model, not all of the fold changes for its top 
predicted targets were negative; for the test set, the distribution of these fold changes 
intersected 0.0 at a cumulative fraction of 0.92, indicating that mRNAs for 8% of the top 
predictions increased rather than decreased with transfection of the cognate miRNA 
(Figure 5D). In principle, these mRNAs could still be authentic targets that are repressed 
in these cells but nonetheless had increased expression values because of either 
experimental noise or secondary effects of introducing the miRNA overwhelming the 
signal for miRNA-mediated repression. Alternatively, some or all of these mRNAs could 
be false-positive predictions. Because only half of the false-positive predictions would be 
expected to have positive fold changes in the presence of the miRNA, our best estimate 
of the upper limit on the false-positive predictions was 2 × 8%, or 16%, at this cutoff (for 
which an average of 16 top predictions per miRNA are considered). At the same cutoff, 
the distribution of fold changes for each of the previous algorithms intersected 0.0 at a 
cumulative fractions ranging from 0.58–0.88 (Figure 5–figure supplement 1A), which 
implied lower prediction specificity than that observed for the context++ model, with 
62
correspondingly higher estimates for the upper limits of false positives among their top 
predictions, ranging from 24–84%. 
 To evaluate the performance of top-ranked predictions more systematically, we 
examined median repression of the predicted targets over a broad spectrum of cutoffs, 
ranging from an average of 4–4096 predictions per miRNA (Figure 5E). Regardless of 
the cutoff, the top context++ predictions were the most repressed. The top predictions of 
most other algorithms were repressed significantly more than expected by chance, 
although the median repression of some (MBSTAR, RNA22, PACCMIT-CDS, and 
AnTarCLIP) did not exceed the median repression of all mRNAs with a canonical 7–8 nt 
3′-UTR site (Figure 5E). Plotting average fold changes rather than median fold changes 
resulted in very similar relative performances (Figure 5–figure supplements 1B–C). 
 After eliminating interactions that could involve canonical 7–8 nt 3′-UTR sites, 
the remaining top predictions were modestly repressed at best (Figure 5F). The most 
repressed predicted targets without canonical 7–8 nt 3′-UTR sites were those of the 
context++ model, which scored predictions with canonical 6mer 3′-UTR sites. For 
algorithms designed to identify many non-canonical sites, the top predictions without 7–8 
nt 3′-UTR sites were essentially unresponsive to the transfected miRNA, which indicated 
that if effective non-canonical sites for these seven miRNAs exist, they are not enriched 
among the predictions of these algorithms. 
 
Similar response of targets predicted from the model and the most informative 
CLIP experiments 
We used our context++ model to overhaul the TargetScan predictions (as described in the 
63
next section), and as a third way of testing this model, we compared the performance of 
these TargetScan7 predictions with that of in vivo CLIP experiments. When doing this 
comparison we took care to evaluate sets of predictions that each were the same size as 
the cognate set of CLIP-supported targets, whereas some previous analyses compare 
expansive sets of computational predictions (e.g., all mRNAs with a 6mer site) to 
relatively small sets of biochemically supported predictions (Chi et al., 2009; Lipchina et 
al., 2011; Loeb et al., 2012; Grosswendt et al., 2014; Tan et al., 2014). mRNAs with 
expression signals approaching the array background were not considered. This exclusion 
was particularly important when comparing to CLIP results; CLIP can only evaluate 
mRNAs expressed in the cells, which would impart a trivial relative advantage if the 
computational predictions included targets that appeared unresponsive because they were 
expressed below the array background. The non-canonical CLIP-supported targets were 
also not considered, as we had already shown that they do not respond to the miRNA 
(Figure 1 and Figure 1–figure supplements 1–4) and we did not want the inclusion of 
these easily recognized false positives to impart a disadvantage to CLIP. Regardless of 
the set of canonical CLIP-supported targets examined, we did not find a setting in which 
they responded significantly better than did the cohort of TargetScan7 predictions, and in 
some cases, the TargetScan7 predictions performed significantly better (Figures 6A–J). 
Similar results were observed when comparing the repression of our predictions to that of 
mRNAs identified biochemically without crosslinking, using either pulldown-seq or 
IMPACT-seq (Tan et al., 2014), again focusing on only mRNAs with canonical sites 
(Figures 6K–L). Thus, for identifying consequential miRNA–target interactions, the 
TargetScan7 model is not only more convenient than experimental determination of 
64
binding sites, it is also at least as effective. The analogous conclusion was reached from 
analyses using the context++ model without the use of improved annotation and 
quantitation of 3′-UTR isoforms (data not shown). 
 As mentioned earlier, mRNAs that increase rather than decrease in the presence 
of the miRNA can indicated the presence of false positives in a set of candidate targets. 
Examination of the mRNA fold-change distributions from the perspective of false 
positives revealed no advantage of the experimental approaches over our predictions. 
When compared to the less informative CLIP datasets, the TargetScan7 predictions 
included fewer mRNAs that increased, and when compared to the CLIP datasets that 
performed as well as the predictions, the TargetScan7 predictions included a comparable 
number of mRNAs that increased, implying that the TargetScan7 predictions had no 
more false-positive predictions than did the best experimental datasets. 
 Because some sets of canonical biochemically supported targets performed as 
well as their cohort of top TargetScan7 predictions, we considered the utility of focusing 
on mRNAs identified by both approaches. In each comparison, the set of mRNAs that 
were both canonical biochemically supported targets and within the cohort of top 
TargetScan7 predictions tended to be more responsive. However, these intersecting 
subsets included much fewer mRNAs than the original sets, and when compared to an 
equivalent number of top TargetScan7 predictions, each intersecting set performed no 
better than did its cohort of top TargetScan7 predictions (Figure 6). Therefore, 
considering the CLIP results to restrict the top predictions to a higher-confidence set is 
useful but not more useful than simply implementing a more stringent computational 
cutoff. Likewise, taking the union of the CLIP-supported targets and the cohort of 
65
predictions, rather than the intersection, did not generate a set of targets that was more 
responsive than an equivalent number of top TargetScan7 predictions (data not shown). 
 
The TargetScan database (v7.0) 
As already mentioned, we used the context++ model to rank miRNA target predictions 
presented in the most recent version (v7.0) of the TargetScan database (targetscan.org), 
thereby making our results accessible to others working on miRNAs. For simplicity, we 
developed the context++ model using mRNAs without abundant alternative 3′-UTR 
isoforms, and to make fair comparisons with the output of previous models, we tested the 
context++ model using only the longest RefSeq-annotated isoform. Nevertheless, 
considering the usage of alternative 3′-UTR isoforms, which can influence both the 
presence and scoring of target sites, significantly improves the performance of miRNA 
targeting models (Nam et al., 2014). Thus, our overhaul of the TargetScan predictions 
incorporated both the context++ scores and current isoform information when ranking 
mRNAs with canonical 7–8 nt miRNA sites in their 3′ UTRs. The resulting 
improvements applied to the predictions centered on human, mouse, and zebrafish 3′ 
UTRs (TargetScanHuman, TargetScanMouse, and TargetScanFish, respectively); and by 
3′-UTR homology, to the conserved and nonconserved predictions in chimp, rhesus, rat, 
cow, dog, opossum, chicken, and frog; as well as to the conserved predictions in 74 other 
sequenced vertebrate species, thereby providing a valuable resource for placing miRNAs 
into gene-regulatory networks. 
 Because the main gene-annotation databases (e.g., RefSeq and Ensembl/Gencode) 
are still in the process of incorporating the information available on 3′-UTR isoforms, the 
66
first step in the TargetScan overhaul was to compile a set of reference 3′ UTRs that 
represented the longest 3′-UTR isoforms for representative ORFs of human, mouse, and 
zebrafish. These representative ORFs were chosen among the set of transcript annotations 
sharing the same stop codon, with alternative last exons generating multiple 
representative ORFs per gene. The human and mouse databases started with Gencode 
annotations (Harrow et al., 2012), for which 3′ UTRs were extended, when possible, 
using RefSeq annotations (Pruitt et al., 2012), recently identified long 3′-UTR isoforms 
(Miura et al., 2013), and 3P-seq clusters marking more distal cleavage and 
polyadenylation sites (Nam et al., 2014). Zebrafish reference 3′ UTRs were similarly 
derived in a recent 3P-seq study (Ulitsky et al., 2012). 
 For each of these reference 3′-UTR isoforms, 3P-seq datasets were used to 
quantify the relative abundance of tandem isoforms, thereby generating the isoform 
profiles needed to score features that vary with 3′-UTR length (len_3UTR, min_dist, and 
off6m) and assign a weight to the context++ score of each site, which accounted for the 
fraction of 3′-UTR molecules containing the site (Nam et al., 2014). For each 
representative ORF, our new web interface depicts the 3′-UTR isoform profile and 
indicates how the isoforms differ from the longest Gencode annotation (Figure 7). 
 3P-seq data were available for seven developmental stages or tissues of zebrafish, 
enabling isoform profiles to be generated and predictions to be tailored for each of these. 
For human and mouse, however, 3P-seq data were available for only a small fraction of 
tissues/cell types that might be most relevant for end users, and thus results from all 3P-
seq datasets available for each species were combined to generate a meta 3′-UTR isoform 
profile for each representative ORF. Although this approach reduces accuracy of 
67
predictions involving differentially expressed tandem isoforms, it nonetheless 
outperforms the previous approach of not considering isoform abundance at all, 
presumably because isoform profiles for many genes are highly correlated in diverse cell 
types (Nam et al., 2014). 
 For each 6–8mer site, we used the corresponding 3′-UTR profile to compute the 
context++ score and to weight this score based on the relative abundance of tandem 3′-
UTR isoforms that contained the site (Nam et al., 2014). Scores for the same miRNA 
family were also combined to generate cumulative weighted context++ scores for the 3′-
UTR profile of each representative ORF, which provided the default approach for 
ranking targets with at least one 7–8 nt site to that miRNA family. Effective non-
canonical site types, i.e., 3′-compensatory and centered sites, were also predicted. Using 
either the human or mouse as a reference, predictions were also made for orthologous 3′ 
UTRs of other vertebrate species. 
 As an option for tetrapod species, the user can also request predicted targets of 
broadly conserved miRNAs to be ranked based on their aggregate PCT scores (Friedman 
et al., 2009), as updated in this study. The user can also obtain predictions from the 
perspective of each protein-coding gene, viewed either as a table of miRNAs (ranked by 
either total context++ score or aggregate PCT score) or as the mapping of 7–8 nt sites (as 
well as non-canonical sites) shown beneath the 3′-UTR profile and above the 3′-UTR 
sequence alignment (Figure 7). A flowchart summarizing the TargetScan overhaul is 
provided (Figure 7–figure supplement 1). 
 
  
68
Discussion 
Starting with an expanded and improved compendium of sRNA transfection datasets, we 
identified 14 features that each correlate with target repression and add predictive value 
when incorporated into a quantitative model of miRNA targeting efficacy. This model 
performed better than previous models and at least as well as the best high-throughput 
CLIP approaches. 
 Because our model was trained on data derived from a single cell type, a potential 
concern was its generalizability to other cell types. Heightening this concern is the recent 
report of widespread dependency of miRNA-mediated repression on cellular context 
(Erhard et al., 2014). However, other work addressing this question shows that after 
accounting for the different cellular repertoires of expressed mRNAs, the target response 
is remarkably consistent between different cell types, with alternative usage of 3′-UTR 
isoforms being the predominant mechanism shaping cell-type-specific differences in 
miRNA targeting (Nam et al., 2014). Testing the model across diverse cell types 
confirmed its generalizability; it performed at least as well as the best high-throughput 
CLIP approaches in each of the contexts examined (Figure 6). Of course, this testing was 
restricted to only those targets that were expressed in each cellular context. Likewise, to 
achieve this highest level of performance, any future use of our model or its predictions 
would also require filtering of the predictions to focus on only the miRNAs and mRNAs 
co-expressed in the cells of interest. 
 One of the more interesting features incorporated into the context++ model is SA 
(the predicted structural accessibility of the site). Freedom from occlusive mRNA 
structure has long been considered a site-efficacy determinant (Robins et al., 2005; 
69
Ameres et al., 2007; Kertesz et al., 2007; Long et al., 2007; Tafer et al., 2008) and 
proposed as the underlying mechanistic explanation for the utility of other features, 
including global 3′-UTR AU content (Robins and Press, 2005; Hausser et al., 2009), local 
AU content (Grimson et al., 2007; Nielsen et al., 2007), minimum distance of the site 
(Grimson et al., 2007), and 3′-UTR length (Hausser et al., 2009; Betel et al., 2010; Wen 
et al., 2011; Reczko et al., 2012). The challenge has been to predict and score site 
accessibility in a way that is informative after controlling for local AU content, which is 
important for speaking to the importance of less occlusive secondary structure as opposed 
to involvement of some AU-binding activity (Grimson et al., 2007). The selection of the 
SA feature in all 1000 bootstrap samples of all four site types showed that it provided 
discriminatory power apart from that provided by local AU content and other correlated 
features, which reinforced the idea that the occlusive RNA structure does indeed limit site 
efficacy. This being said, local AU content, minimum distance of the site, and 3′-UTR 
length were each also selected in nearly all 1000 bootstrap samples for most site types 
(Table 1), which suggests that either these features were selected for reasons other than 
their correlation with site accessibility or the definition and scoring of our SA feature has 
additional room for improvement. 
 Our ability to confidently identify additional features that each contribute to 
improved prediction of targeting efficacy was enhanced by our pre-processing of the 
experimental datasets, which minimized variation from biases unrelated to the sRNA 
sequence. Yet despite applying this same normalization procedure to our test set, the 
observed r2 value of 0.14 implied that our model explained only 14% of the variability 
observed among mRNAs with canonical 7–8 nt 3′-UTR sites (Figure 4B). The r2 value 
70
increased to 0.15 when considering the usage of alternative 3′-UTR isoforms, but 85% of 
the variability remained unexplained. Error in the microarray measurements, different 
sRNA transfection efficiencies, variable incorporation of sRNAs into the silencing 
complex, and secondary effects of introducing the sRNA presumably made major 
contributions to the unexplained variability. Nonetheless, imperfections of the context++ 
model also contributed, raising the question of how much the model might be improved 
by identifying additional features or developing better methods for scoring and 
combining existing features. In analysis not described, we evaluated the utility of other 
types of regression (e.g., linear regression models with interaction terms, lasso/elastic 
net-regularized regression, multivariate adaptive regression splines, random forest, 
boosted regression trees, and iterative Bayesian model averaging) and found their 
performance to be comparable to that of stepwise regression but their resulting models to 
be considerably more complex and thus less interpretable. 
 One way to evaluate the extent to which the context++ model might be improved 
is to consider the degree to which its performance depends on the site-conservation 
feature. Because sites under selective pressure preferentially possess molecular features 
required for efficacy, inclusion of the site-conservation feature indirectly recovers some 
of the information that would otherwise be lost when informative molecular features are 
missing or imperfectly scored. As more informative molecular features are identified and 
included in a model, less information remains to be captured, and thus the site-
conservation feature cannot contribute as much to the performance of the model. The 
site-conservation feature (PCT) was chosen in all 1000 bootstrap samples of each of the 
three major site types, which showed that the molecular features of our model still do not 
71
fully capture all the determinants under selective pressure. However, PCT was not one of 
the most informative features (Figure 4C). Moreover, when tested as in Figure 5B, a 
model trained on only site type and the other 13 molecular features performed nearly as 
well as the full context++ model (r2 of 0.126, compared to 0.139 for the full model). This 
drop in r2 of only 0.013 was substantially less than the 0.044 r2 observed for the site-
conservation feature on its own (Figure 5B, TargetScan.PCT), which suggested that when 
predicting the response of the test-set mRNAs with the major canonical site types, the 
context++ model captured 70% (calculated as [0.044–0.013]/0.044) of the information 
potentially imparted by molecular features. 
 The relatively minor contribution of site conservation highlights the ability of the 
context++ model to predict the efficacy of nonconserved sites. Although, everything else 
being equal, its score for a conserved site is slightly better than that for a nonconserved 
site, this difference does not prevent inclusion of nonconserved sites from the top 
predictions. Its general applicability to all canonical sites is useful for evaluating not only 
nonconserved sites to conserved miRNAs but also all sites for nonconserved miRNAs 
(e.g., Figures 6K–L), including viral miRNAs, as well as the off-targets of synthetic 
siRNAs and shRNAs. 
 Our analyses show that recent computational and experimental approaches, 
including the different types of CLIP, all fail to identify non-canonical targets that are 
repressed more than control transcripts (Figure 1, Figure 5C, and Figure 5F), which 
reopens the question of whether more than a miniscule fraction of miRNA-mediated 
repression is mediated through non-canonical sites. Although CLIP approaches can 
identify non-canonical sites that bind the miRNA with some degree of specificity (Figure 
72
2), these non-canonical binding sites do not function to mediate detectable repression. 
Thus far, the only functional non-canonical sites that can be predicted are 3′-
compensatory sites, cleavage sites, and centered sites, which together comprise only a 
very small fraction (<1%) of the functional sites that can be predicted with comparable 
accuracy (Bartel, 2009; Shin et al., 2010). The failure of computational methods to find 
many functional non-canonical sites cannot rule out the possibility that many of these 
sites might still exist; if such sites are recognized through unimagined determinants, 
computational efforts might have missed them. CLIP approaches, on the other hand, 
provide information that is independent of proposed pairing rules or other hypothesized 
recognition determinants. Therefore, our analysis of the CLIP results, which detected no 
residual repression after accounting for canonical interactions, provide the most 
compelling evidence to date on this issue. Unless there is a substantial technical bias in 
the CLIP approach (such as a large unanticipated disparity in the propensity of non-
canonical interactions to crosslink), the inability of current CLIP approaches to identify 
non-canonical targets that are repressed more than control transcripts argues strongly 
against the existence of many functional non-canonical targets. 
 Why might the CLIP-identified non-canonical sites fail to mediate repression 
(Figure 1) despite binding the miRNA in vivo (Figure 2)? Perhaps these sites are 
ineffective because perfect seed pairing is required for repression. For example, perfect 
seed pairing might favor binding of a downstream effector, either directly by contributing 
to its binding site or indirectly through an Argonaute conformational change that favors 
its binding. However, this explanation is difficult to reconcile with the activity of 3′-
compensatory and centered sites, which can mediate repression despite their lack of 
73
perfect seed pairing (Bartel, 2009; Shin et al., 2010), and the activity of Argonaute 
artificially tethered to an mRNA, which can mediate repression without any pairing to the 
miRNA (Pillai et al., 2004; Eulalio et al., 2008). Therefore, a more plausible explanation 
is that the CLIP-identified non-canonical sites bind the miRNA too transiently to mediate 
repression. This explanation for the inefficacy of the recently identified non-canonical 
sites in the 3′ UTRs resembles that previously proposed for the inefficacy of most 
canonical sites in ORFs: In both cases the ineffective sites bind to the miRNA very 
transiently—the canonical sites in ORFs dissociating quickly because of displacement by 
the ribosome (Grimson et al., 2007; Gu et al., 2009), and the CLIP-identified non-
canonical sites in 3′ UTRs dissociating quickly because they lack both seed pairing and 
the extensive pairing outside the seed characteristic of effective non-canonical sites (3′-
compensatory and centered sites) and thus have intrinsically fast dissociation rates. 
 The idea that newly identified non-canonical sites bind the miRNA too transiently 
to mediate repression raises the question of how CLIP could have identified so many of 
these sites in the first place; shouldn’t crosslinking be a function of site occupancy, and 
shouldn’t occupancy be a function of dissociation rates? Part of the answer to these 
questions comes with the realization that the transcriptome has many more non-canonical 
binding sites than canonical ones. The motifs identified in the non-canonical interactions 
have information contents as low as 5.6 bits, and thus are much more common in 3′ 
UTRs than canonical 6mer or 7mer sites (12 bits and 14 bits, respectively). This high 
abundance of the non-canonical binding sites would help offset the low occupancy of 
individual non-canonical sites, such that at any moment more than half of the bound 
miRNA might reside at non-canonical sites, yielding more non-canonical than canonical 
74
sites when using experimental approaches with such high specificity that they can 
identify a site with only a single read (Figure 2–figure supplement 1A).  
 Although the high abundance of non-canonical sites partly explains why CLIP 
identifies these sites in such high numbers, it cannot provide the complete answer. Some 
non-canonical sites in the CLASH and chimera datasets are supported by multiple reads, 
and all the dCLIP-identified non-canonical sites of the miR-155 study (Loeb et al., 2012) 
are supported by multiple reads. How could some CLIP clusters with ineffective, non-
canonical sites have as much read support as some with effective, canonical sites? Our 
answer to this question rests on the recognition that cluster read density does not perfectly 
correspond to site occupancy (Friedersdorf and Keene, 2014), with the other key factors 
being mRNA expression levels and crosslinking efficiency. In principle, normalizing the 
CLIP tag numbers to the mRNA levels minimizes the first factor, preventing a low-
occupancy site in a highly expressed mRNA from appearing as well supported as a high-
occupancy site in a lowly expressed mRNA (Chi et al., 2009; Jaskiewicz et al., 2012). 
Accounting for differential crosslinking efficiencies is a far greater challenge. RNA–
protein UV crosslinking is expected to be highly sensitive to the identity, geometry, and 
environment of the crosslinking constituents, leading to the possibility that the 
crosslinking efficiency of some sites is orders of magnitude greater than that of others. 
When considered together with the high abundance of non-canonical sites, variable 
crosslinking efficiency might explain why so many ineffective non-canonical sites are 
identified. Overlaying a wide distribution of crosslinking efficiencies onto the many 
thousands of ineffective, non-canonical sites could yield a substantial number of sites at 
the high-efficiency tail of the distribution for which the tag support matches that of 
75
effective canonical sites. Similar conclusions are drawn for other types of RNA-binding 
interactions when comparing CLIP results with binding results (Lambert et al., 2014). 
 Variable crosslinking efficiency also explains why many top predictions of the 
context++ model are missed by the CLIP methods, as indicated by the modest overlap in 
the CLIP identified targets and the top predictions (Figure 6). The crosslinking results are 
not only variable from site to site, which generates false negatives for perfectly functional 
sites, but they are also variable between biological replicates (Loeb et al., 2012), which 
imposes a challenge for assigning dCLIP clusters to a miRNA. Although this challenge is 
mitigated in the CLASH and chimera approaches, which provide unambiguous 
assignment of the miRNAs to the sites, the ligation step of these approaches occurs at low 
frequency and presumably introduces additional biases, as suggested by the different 
profile of non-canonical sites identified by the two approaches (Figure 2B and Figure 2–
figure supplement 1A). For example, CLASH identifies non-canonical pairing to the 3′ 
region of miR-92 (Helwak et al., 2013), whereas the chimera approach identified non-
canonical pairing to the 5′ region of this same miRNA (Figure 2C). Because of the false 
negatives and biases of the CLIP approaches, the context++ model, which has its own 
flaws, achieves an equal or better performance than the published CLIP studies. 
 Our observation that CLIP-identified non-canonical sites fail to mediate 
repression reasserts the primacy of canonical seed pairing for miRNA-mediated gene 
regulation. Compared to canonical sites, effective non-canonical sites (i.e., 3′-
compensatory sites and centered sites) are rare because they require many more base 
pairs to the miRNA (Bartel, 2009; Shin et al., 2010) and thus together make up <1% of 
the effective target sites predicted to date. The requirement of so much additional pairing 
76
to make up for a single mismatch to the seed is proposed to arise from several sources. 
The advantage of propagating continuous pairing past miRNA nucleotide 8 (as occurs for 
centered sites) might be largely offset by the cost of an unfavorable conformational 
change (Bartel, 2009; Schirle et al., 2014). Likewise, the advantage of resuming pairing 
at the miRNA 3′ region (as occurs for 3′-compensatory sites) might be partially offset by 
either the relative disorder of these nucleotides (Bartel, 2009) or their unfavorable 
arrangement prior to seed pairing (Schirle et al., 2014). In contrast, the seed backbone is 
pre-organized to favor A-form pairing, with bases of nucleotides 2–5 accessible to 
nucleate pairing (Nakanishi et al., 2012; Schirle and MacRae, 2012). Moreover, perfect 
pairing propagated through miRNA nucleotide 7 creates the opportunity for favorable 
contacts to the minor groove of the seed:target duplex (Schirle et al., 2014). 
 Our overhaul of the TargetScan website integrated the output of the context++ 
model with the most current 3′-UTR-isoform data to provide any biologist with an 
interest in either a miRNA or a potential miRNA target convenient access to the 
predictions, with an option of downloading code or bulk output suitable for more global 
analyses. In our continuing efforts to improve the website, several additional 
functionalities will also soon be provided. To facilitate the exploration of co-targeting 
networks involving multiple miRNAs (Tsang et al., 2010; Hausser and Zavolan, 2014), 
we will provide the option of ranking predictions based on the simultaneous action of 
several independent miRNA families, to which relative weights (e.g., accounting for 
relative miRNA expression levels or differential miRNA activity in a cell type of interest) 
can be optionally assigned. To offer predictions for transcripts not already in the 
TargetScan database (e.g., novel 3′ UTRs or long non-coding RNAs, including circular 
77
RNAs), we will provide a mechanism to compute context++ scores interactively for a 
user-specified transcript. Likewise, to offer predictions for a novel sRNA sequence (e.g., 
off-target predictions for an siRNA), we will provide a mechanism to retrieve context++ 
scores interactively for a user-specified sRNA. To visualize the expression signature that 
results from perturbing a miRNA, we will provide a tool for the user to input 
mRNA/protein fold changes from high-throughput experiments and obtain a cumulative 
distribution plot showing the response of predicted targets relative to that of mRNAs 
without sites. Thus, with the current and future improvements to TargetScan, we hope to 
enhance the productivity of miRNA research and the understanding of this intriguing 
class of regulatory RNAs. 
 
Materials and Methods 
Microarray, RNA-seq, and RPF dataset processing 
A list of microarray, ribosome profiling, and proteomic datasets used for analyses, as well 
as the corresponding figures in which they were used, are provided (Table 2). We 
considered developing the model using RNA-seq data rather than microarray data, but 
microarray datasets were still much more plentiful and were equally suitable for 
measuring the effects of sRNAs. Unless pre-processed microarray data were provided by 
previous studies (as indicated in Table 2), raw data were processed using Bioconductor 
release 2.14 in the R programming language v3.1.1 (Gentleman et al., 2004; Team, 
2014). Affymetrix data were first background-corrected with the “gcrma” R package (Wu 
et al., 2004), whereas Illumina BeadArray data from the miR-302 knockdown and miR-
522 transfection datasets (Lipchina et al., 2011; Tan et al., 2014) were processed and 
78
background-corrected using the “lumiR” and “lumiExpresso” functions in the “lumi” R 
package (Du et al., 2008). A robust linear regression model was then used to fit to the 
probe intensities using the “lmFit” function (parameter “method=’robust’”) in the 
“limma” R package v3.6.9 (Smyth, 2004; Smyth, 2005), computing differential 
expression information with the provided eBayes function. Probe IDs were then 
converted to RefSeq or Ensembl IDs (e.g., using the hgu133plus2ENSEMBL and 
IlluminaID2nuID/lumiHumanAllENSEMBL functions to convert Affymetrix and 
BeadArray probe IDs, respectively), and the fold change for each mRNA was computed 
as the median fold change for all probes corresponding to the mRNA. Finally, because 
about half of the genes in the genome were either not expressed in the cell type 
examined, or were expressed at a level that was so close to the background that they were 
prone to have noisy fold-change measurements, the following filters were applied: 
i) For microarray datasets examining the effect of either knocking down either miR-92 or 
25 miRNA families in HEK293 cells (Hafner et al., 2010; Helwak et al., 2013), 
transfecting miR-7 or miR-124 into HEK293 cells (Hausser et al., 2009), knocking out 
miR-155 in Th1 or Th2 cells (Rodriguez et al., 2007), or transfecting each of the 7 
miRNAs in HCT116 cells (Linsley et al., 2007), we computed the mean signal for each 
mRNA (averaging the signal with and without the miRNA), and retained mRNAs 
exceeding the median of this distribution. 
ii) For microarray datasets examining the effect of injecting miR-430 into MZDicer 
embryos (Giraldez et al., 2006) or knocking out miR-155 in T cells (Loeb et al., 2012), 
we required the mean signal intensity of an mRNA to exceed 3.0 and 2.5, respectively. 
iii) For Illumina BeadArray datasets examining the effect of either knocking down miR-
79
302/367 (Lipchina et al., 2011) or transfecting miR-522 (Tan et al., 2014), we required 
the mean signal intensity to exceed 7.5 and 7.0, respectively. 
iv) For all 74 small-RNA transfections, we required mRNA expression levels to exceed 
10 reads per million (RPM), as quantified by RNA-seq in mock-transfected HeLa cells 
(Guo et al., 2010). 
v) For analysis of RNA-seq or RPF datasets examining the effect of either losing Dicer in 
zebrafish embryos (Bazzini et al., 2012), transfecting miR-124 into HEK293, HeLa, or 
Huh7 cells (Nam et al., 2014), or knocking out miR-155 in B cells (Eichhorn et al., 
2014), we required mRNA expression levels to exceed 10 RPM, as quantified in the 
condition lacking the perturbed miRNA. 
vi) For analysis of proteomic results, we used the pre-computed data provided in the table 
of significantly detectable peptides (Selbach et al., 2008). 
 These thresholds were chosen based upon visual inspection of plots evaluating the 
relationship between mean expression level and fold change (commonly known as “MA 
plots” in the context of microarrays), attempting to balance the tradeoff between maximal 
sample size and reduced noise. The overall conclusions were robust to the choice of the 
threshold. After imposing the threshold, all fold-change values were centered by 
subtracting the median fold-change value of the “no-site” mRNAs in each sRNA 
perturbation experiment, except in the case of Figure 5–figure supplements 1B–C, in 
which data were mean-centered. 
 
Crosslinking and other interactome datasets 
When available, target genes identified using high-throughput CLIP data were collected 
80
from the supplemental materials of the corresponding studies (Lipchina et al., 2011; Loeb 
et al., 2012; Helwak et al., 2013; Grosswendt et al., 2014). For the original PAR-CLIP 
study (Hafner et al., 2010), targets were inferred from an online resource of all 
endogenous HEK293 clusters 
(http://www.mirz.unibas.ch/restricted/clipdata/RESULTS/CLIP_microArray/Antago_mir
_vs_ALL_AGO.txt) as well as clusters observed after transfection of either miR-7 
(http://www.mirz.unibas.ch/restricted/clipdata/RESULTS/miR7_TRANSFECTION/miR
7_TRANSFECTION.html) or miR-124 
(http://www.mirz.unibas.ch/restricted/clipdata/RESULTS/miR124_TRANSFECTION/mi
R124_TRANSFECTION.html). For dCLIP-supported miR-124 sites identified in the 
original high-throughput CLIP study (Chi et al., 2009), we used clusters whose genomic 
coordinates were provided by S.-W. Chi (Supplementary file 3), extracting the 
corresponding sequences using the “getfasta” utility in BEDTools v2.20.1 (parameters “-s 
-name -tab ”) (Quinlan and Hall, 2010). When evaluating the function of non-canonical 
sites supported by CLIP or IMPACT-seq (Figures 1A–H and Figure 1–figure 
supplements 1–4), a cluster (or CLASH/chimera interaction) with a 6–8mer site (but not 
only an offset 6mer site, unless otherwise indicated in the figure legends) corresponding 
to the cognate miRNA was classified as harboring a canonical site. Otherwise, the cluster 
(or CLASH/chimera interaction) was classified as containing a non-canonical site, and 
the corresponding mRNA was carried forward for functional evaluation as a non-
canonical CLIP-supported target if it also had no cognate 6–8mer sites (but allowing 
offset 6mer sites) in its 3′ UTR (using either RefSeq or Ensembl 3′-UTR annotations as 
appropriate for the gene IDs published by the CLIP study). When comparing the response 
81
of canonical CLIP-supported targets to that of TargetScan7 predictions (Figure 6), the 
canonical CLIP-supported sites were additionally required to fall within (and on the same 
DNA strand as) annotated 3′ UTRs, as evaluated by the intersectBED utility in BEDTools 
v2.20.1 (parameter “-s”) (Quinlan and Hall, 2010). 
 
Motif discovery for non-canonical binding sites 
To identify non-canonical modes of binding, all CLASH interactions assigned to a 
particular miRNA family (defined as all mature miRNA sequences sharing a common 
sequence in nucleotide positions 2–8) were collected. Interactions containing the cognate 
canonical site type (offset 6mer, 6mer, 7mer-m8, 7mer-A1, or 8mer) were removed. For 
all miRNA families with at least 50 unique CLASH interactions remaining, enriched 
motifs were evaluated using MEME version 4.9.0 (parameters “-p 100 -dna -mod zoops -
nmotifs 10 -minw 4 -maxw 8 -maxsize 1000000000”) (Bailey and Elkan, 1994). All 
motifs with an E-value < 10–3 are reported along with their E-values rounded to the 
nearest log-unit. Instances in which a top-ranked motif exceeded this E-value were also 
reported if the motif was an approximate complementary match to the miRNA. For each 
miRNA family, the top motif identified by MEME was aligned to a representative mature 
miRNA using FIMO (parameters “--norc --motif 1 --thresh 0.01”) (Grant et al., 2011), 
considering the reverse complement of the mature miRNA with the last nucleotide of this 
reverse complement changed to an A (to capture the enrichment of an adenosine across 
from the 5′ nucleotide of a miRNA, as occurs in 8mer and 7mer-A1 sites). Logos were 
also manually examined to determine if any mapped to the mature miRNA with a bulged 
nucleotide. The same procedure was performed for chimera interactions, for dCLIP 
82
clusters reported for miR-124 and miR-155, and for IMPACT-seq clusters reported for 
miR-522. 
 
Microarray dataset normalization 
For each of the 74 transfection experiments of the compendium (Table 2), data were first 
partitioned into the mRNA fold changes (log2) measured in the given experiment (the 
response variable) as well as a matrix of the corresponding mRNA fold changes for the 
remaining 73 datasets (the predictor variables). A PLSR model was then trained to 
predict the response using information from the predictor variables. When training the 
model, PLSR took into account the correlated structure of the predictor matrix, 
decomposing it into a low-dimensional representation that maximally explained the 
response variable. 
 
Stating the procedure more formally, let Z be an n x m matrix consisting of log2(mRNA 
fold change) measurements of n mRNAs in response to the sRNA transfected in each of 
m experiments. Let yi represent measurements for all mRNAs in the ith experiment of Z, 
and Xī represent measurements for all mRNAs from all experiments except for the ith 
experiment in Z. Finally, let Tī be a matrix with identical dimensions as Xī, with entries tj,k 
= 1 if the 3′ UTR of mRNA j in Xī contains a canonical 7–8 nt match to the small RNA 
transfected in experiment k in Xī, and tj,k = 0 otherwise. Missing values in Z represent 
cases in which the mRNA signal in the microarray was too low to be reliably measured. 
The following algorithm was used to normalize each yi for i ∈ {1…74}: 
83
i)  For values in Tī in which tj,k = 1, the corresponding value xj,k in Xī was removed, which 
prevented the loss of signal in yi,j due to sRNA-mediated regulation of the mRNA in 
two independent experiments. 
ii)  mRNAs in yi, Xī, and Tī were removed if the log2(mRNA fold change) was either 
undefined in yi or undefined in greater than 50% of experiments in Xī. 
iii) For the remaining missing values in Xī, values were imputed using the k-nearest 
neighbors algorithm, using k = 20, as implemented in the impute.knn function in the 
“impute” R package (Troyanskaya et al., 2001). Results were robust to the choice of 
imputation algorithm (data not shown). 
iv) To remove biases afflicting yi, yi was predicted from Xī using partial least squares 
regression, as implemented in the plsr function in the “pls” R package (Mevik and 
Wehrens, 2007). Ten-fold cross-validation was used to choose an appropriate number 
of components in the regression. Values of yi were then adjusted to their residuals as 
such: yi ← yi - ŷi, where ŷi was the vector of predicted values of yi from the regression. 
An analogous normalization procedure was performed for each of the seven transfection 
experiments of the test set (Supplementary File 2). 
 
RNA structure prediction 
3′ UTRs were folded locally using RNAplfold (Bernhart et al., 2006), allowing the 
maximal span of a base pair to be 40 nucleotides, and averaging pair probabilities over an 
80 nt window (parameters -L 40 -W 80), parameters found to be optimal when evaluating 
siRNA efficacy (Tafer et al., 2008). For each position 15 nt upstream and downstream of 
84
a target site, and for 1–15 nt windows beginning at each position, the partial correlation 
of the log10(unpaired probability) to the log2(mRNA fold change) associated with the site 
was plotted, controlling for known determinants of targeting used in the context+ model, 
which include min_dist, local_AU, 3P_score, SPS, and TA (Garcia et al., 2011). For the 
final predicted structural accessibility score used as a feature, we computed the log10 of 
the probability that a 14-nt segment centered on the match to sRNA positions 7 and 8 was 
unpaired. 
 
Calculation of PCT scores 
We updated human PCT scores using the following datasets: i) 3′ UTRs derived from 
19,800 human protein-coding genes annotated in Gencode version 19 (Harrow et al., 
2012), and ii) 3′-UTR multiple sequence alignments (MSAs) across 84 vertebrate species 
derived from the 100-way multiz alignments in the UCSC genome browser, which used 
the human genome release hg19 as a reference species (Kent et al., 2002; Karolchik et al., 
2014). We did not use all 100 species because, with the exception of coelacanth (a lobe-
finned fish more related to the tetrapods), the fish species were excluded due to their poor 
quality of alignment within 3′ UTRs. Likewise, we updated the mouse scores using: i) 3′ 
UTRs derived from 19,699 mouse protein-coding genes annotated in Ensembl 77 (Flicek 
et al., 2014), and ii) 3′-UTR MSAs across 52 vertebrate species derived from the 60-way 
multiz alignments in the UCSC genome browser, which used the mouse genome release 
mm10 as a reference species (Kent et al., 2002; Karolchik et al., 2014). As before, we 
partitioned 3′ UTRs into ten conservation bins based upon the median branch-length 
score (BLS) of the reference-species nucleotides (Friedman et al., 2009). However, to 
85
estimate branch lengths of the phylogenetic trees for each bin, we concatenated 
alignments within each bin using the “msa_view” utility in the PHAST package v1.1 
(parameters “--unordered-ss --in-format SS --out-format SS --aggregate $species_list --
seqs $species_subset”, where $species_list contains the entire species tree topology and 
$species_subset contains the topology of the subtree spanning the placental mammals) 
(Siepel and Haussler, 2004). We then fit trees for each bin using the “phyloFit” utility in 
the PHAST package v1.1, utilizing the generalized time-reversible substitution model and 
a fixed-tree topology provided by UCSC (parameters “-i SS --subst-mod REV --tree 
$tree”, where $tree is the Newick format tree of the placental mammals) (Siepel and 
Haussler, 2004). PCT parameters and scores were then calculated as described, estimating 
the signal of conservation for each seed family relative to that of its corresponding 50 
control k-mers, matched for k-mer length and rate of dinucleotide conservation at varying 
branch-length windows (Friedman et al., 2009). All phylogenetic trees and PCT 
parameters are available for download at the TargetScan website (targetscan.org). 
 
Selection of mRNAs for regression modeling 
The mRNAs were selected to avoid those from genes with multiple highly expressed 
alternative 3′-UTR isoforms, which would have otherwise obscured the accurate 
measurement of features such as len_3UTR or min_dist, and also created situations in 
which the response was diminished because some isoforms lacked the target site. HeLa 
3P-seq results (Nam et al., 2014) were used to identify genes in which a dominant 3'-
UTR isoform comprised ≥90% of the transcripts (Supplementary file 1). For each of 
these genes, the mRNA with the dominant 3′-UTR isoform was carried forward, together 
86
with the ORF and 5′-UTR annotations previously chosen from RefSeq (Garcia et al., 
2011). Sequences of these mRNA models are provided as Supplemental Material at 
http://bartellab.wi.mit.edu/publication.html. To prevent the presence of multiple 3′-UTR 
sites to the transfected sRNA from confounding attribution of an mRNA change to an 
individual site, these mRNAs were further filtered within each dataset to consider only 
mRNAs that contained a single 3′-UTR site (either an 8mer, 7mer-m8, 7mer-A1, or 
6mer) to the cognate sRNA. 
 
Scaling the scores of each feature 
Features that exhibited skewed distributions, such as len_5UTR, len_ORF, and 
len_3UTR were log10 transformed (Table 1), which made their distributions 
approximately normal. These and other continuous features were then normalized to the 
[0, 1] interval as described [e.g., see Supplementary Figure 5 in (Garcia et al., 2011)], 
except a trimmed normalization was implemented to prevent outlier values from 
distorting the normalized distributions. For each value, the 5th percentile of the feature 
was subtracted from the value, and the resulting quantity was divided by the difference 
between the 95th and 5th percentiles of the feature. Percentile values are provided for the 
subset of continuous features that were scaled (Table 3). The trimmed normalization 
facilitated comparison of the contributions of different features to the model, with 
absolute values of the coefficients serving as a rough indication of their relative 
importance. 
 
  
87
Stepwise regression and multiple linear regression models 
We generated 1000 bootstrap samples, each including 70% of the data from each 
transfection experiment of the compendium of 74 datasets (Supplementary file 1), with 
the remaining data reserved as a held-out test set. For each bootstrap sample, stepwise 
regression, as implemented in the stepAIC function from the “MASS” R package 
(Venables and Ripley, 2002), was used to both select the most informative combination 
of features and train a model. Feature selection maximized the Akaike information 
criterion (AIC), defined as: -2 ln(L) + 2k, where L was the likelihood of the data given the 
linear regression model and k was the number of features or parameters selected. The 
1000 resulting models were each evaluated based on their r2 to the corresponding test set. 
To illustrate the utility of adding features not included in our previous models, these r2 
values were compared to those obtained when re-training the multiple linear regression 
coefficients on each bootstrap sample using only the features of either the context-only or 
the context+ model, and computing r2 values on the corresponding test sets. The stepwise 
regression was implemented independently for each of the site types, and a final set of 
features was chosen as those that were selected for at least 99% of the bootstrap samples 
of at least two site types. Using this group of features and the entire compendium of 74 
datasets as a training set, we trained a multiple linear regression model for each site type 
(Figure 4–Source data 1). As done previously for TargetScan6 predictions, scores for 
8mer, 7mer-m8, 7mer-A1, and 6mer sites were bounded to be no greater than –0.03, –
0.02, –0.01, and 0, respectively, thereby creating a piece-wise linear function for each site 
type. 
 
88
Collection and processing of previous predictions 
To compare predictions from different miRNA target prediction tools, we collected the 
following freely downloadable predictions: AnTar (predictions from either miRNA-
transfection or CLIP-seq models) (Wen et al., 2011), DIANA-microT-CDS (September 
2013) (Reczko et al., 2012), ElMMo v5 (January 2011) (Gaidatzis et al., 2007), 
MBSTAR (all predictions) (Bandyopadhyay et al., 2015), miRanda-MicroCosm v5 
(Griffiths-Jones et al., 2008), miRmap v1.1 (September 2013) (Vejnar and Zdobnov, 
2012), mirSVR (August 2010) (Betel et al., 2010), miRTarget2 (from miRDB v4.0, 
January 2012) (Wang, 2008; Wang and El Naqa, 2008), MIRZA-G (sets predicted either 
with or without conservation features and either with or without more stringent seed-
match requirements, March 2015) (Gumienny and Zavolan, 2015), PACCMIT-CDS (sets 
predicted either with or without conservation features) (Marin et al., 2013), PicTar2 
(from the doRina web resource; sets conserved to either fish, chicken, or mammals) 
(Krek et al., 2005; Anders et al., 2012), PITA Catalog v6 (3/15 flank for either “All” or 
“Top” predictions, August 2008) (Kertesz et al., 2007), RNA22 (May 2011) (Miranda et 
al., 2006), SVMicrO (Feb 2011) (Liu et al., 2010), TargetRank (all scores from web 
server) (Nielsen et al., 2007), TargetSpy (all predictions) (Sturm et al., 2010), TargetScan 
v5.2 (either conserved or all predictions, June 2011) (Grimson et al., 2007), and 
TargetScan v6.2 (either conserved predictions ranked by the context+ model or all 
predictions ranked by either the context+ model or PCT scores, June 2012) (Friedman et 
al., 2009; Garcia et al., 2011). For algorithms providing site-level predictions (i.e., 
ElMMo, MBSTAR, miRSVR, PITA, RNA22, and TargetScan), scores were summed 
within genes or transcripts (if available) to acquire an aggregate score. For algorithms 
89
providing multiple transcript-level predictions (i.e., miRanda-MicroCosm, PACCMIT-
CDS, and TargetSpy), the transcript with the best score was selected as the representative 
transcript isoform.  In all cases, predictions with gene symbol or Ensembl ID formats 
were translated into RefSeq format. When computing r2 to the test sets, mRNAs that were 
not predicted by the algorithm to be a target were assigned the worst score in the range of 
all scores generated by the algorithm. 
 
3′-UTR profiles for TargetScan7 predictions 
To build databases of human and mouse 3′-UTR profiles, we began with the “basic” set 
of protein-coding gene models deposited in Gencode v19 (human hg19 assembly) and 
Gencode vM3 (mouse mm10 assembly), respectively (Harrow et al., 2012). For each 
unique stop codon in each set of gene models, we selected the transcript with the longest 
3′ UTR as its representative transcript. If other datasets indicated that the 3′ UTRs of 
these representative transcripts have longer tandem isoforms, we extended them 
accordingly, using additional annotations provided by i) the “comprehensive” set of 
Gencode gene models (Harrow et al., 2012), ii) all mRNAs in the RefSeq database (Pruitt 
et al., 2012), downloaded from the refGene database through the UCSC table browser 
(Kent et al., 2002), and iii) 3′-UTR extensions supported by RNA-seq evidence (Miura et 
al., 2013), after transforming mm9 to mm10 coordinates using liftOver (Hinrichs et al., 
2006). We then used 3P-seq clusters from human and mouse (Nam et al., 2014) (again 
after transforming coordinates with liftOver) to further extend 3′ UTRs when possible, 
searching within a 5400 nt region downstream of the stop codon (excluding the regions 
containing annotated introns) for a cleavage and polyadenylation site supported by at 
90
least one 3P-seq cluster, prohibiting the search to extend beyond the start position of any 
annotated downstream exon. The 5400 nt window was chosen because the 99th percentile 
of the lengths of previously annotated mouse and human 3′ UTRs was ~5400 nt. 
Zebrafish 3′ UTRs for TargetScanFish were identical to those annotated previously 
(Ulitsky et al., 2012). For each representative transcript, 3P-seq clusters mapping within 
the extended 3′ UTR were used to quantify the relative levels of alternative tandem 
isoforms, thereby generating a 3′-UTR profile. For human and mouse transcripts, all 3P-
seq datasets for cell lines/tissues of each species were combined, after normalizing for the 
sequencing depth (i.e., number of uniquely mapping tags) of each dataset, to generate 
meta profiles. To perform this normalization, the number of tags overlapping the 3′ UTR 
of each annotated transcript was first summed. A matrix of summed tag counts for each 
cell line/tissue and for each transcript was then compiled, removing transcripts with no 
tags in any cell type. This matrix was then upper-quartile normalized by calculating the 
75th quantile of counts in each cell type, using the calcNormFactors function (parameter 
“method=’upperquartile’”) in the “edgeR” R package (Robinson et al., 2010). These 
scaling factors were then applied to all tags, and the normalized tag counts corresponding 
to each 3P-seq cluster from different cell lines/tissues were summed. To accommodate 
cases in which the longest annotated 3′ UTR did not have tag support, a one-tag 
pseudocount was added to the longest tandem 3′-UTR isoform. The 3′-UTR profiles were 
then generated and used to compute the affected isoform ratio (AIR) and weighted 
context++ score for each predicted target site as depicted in Figures 2A and 3A, 
respectively, of Nam et al. (2014). For zebrafish transcripts, profiles were generated for 
each developmental stage with a 3P-seq dataset. All input and output annotation files as 
91
well as scripts are available for download at TargetScan (targetscan.org). 
 
MicroRNA sets for TargetScan7 
When partitioning miRNA families according to their conservation level, we began with 
a high-confidence set of human miRNAs supported by small-RNA sequencing (T. 
Tuschl, personal communication) that shared nucleotides 2–8 with a mouse miRNA 
supported by small-RNA sequencing (Chiang et al., 2010).  We then extracted 100-way 
multiz alignments of each mature miRNA from the UCSC Genome Browser and counted 
the number of species for which nucleotides 2–8 of the miRNA did not change. As an 
initial pass, those conserved among ≥40 species were classified as mammalian conserved, 
and those conserved among >60 species were classified as more broadly conserved 
among vertebrate species. Due to poorer quality alignments for more distantly related 
species, this procedure misclassified several more broadly conserved miRNAs as 
mammalian conserved. Therefore, mammalian conserved miRNAs that aligned with 
>90% homology to a mature miRNA from chicken, frog, or zebrafish, as annotated in 
miRBase release 21 (Kozomara and Griffiths-Jones, 2014), were re-classified as more 
broadly conserved. In addition, miR-489 was included in the broadly conserved set of 
TargetScanHuman (but not TargetScanMouse) despite having a seed substitution in 
mouse. 
 Some mammalian pri-miRNAs give rise to two or three abundant miRNA 
isoforms that have different seeds, either because both strands of the miRNA duplex load 
into Argonaute with near-equal efficiencies or because processing heterogeneity gives 
rise to alternative 5′ termini (Azuma-Mukai et al., 2008; Morin et al., 2008; Wu et al., 
92
2009; Chiang et al., 2010). To annotate these abundant isoforms, we identified all 
isoforms expressed with at least 33% of reads mapping to the same start position relative 
to the most abundantly mapped start position on the precursor hairpin. These isoforms 
were carried forward as mammalian conserved isoforms if they also satisfied this 
property in the mouse small-RNA sequencing data (Chiang et al., 2010), and as broadly 
conserved isoforms if they satisfied this property in zebrafish small-RNA sequencing 
data available in miRBase. Adhering to the miRNA naming convention, if two isoforms 
mapped to the 5′ and 3′ arms of the hairpin they were named “–5p” and “–3p”, 
respectively, and if two isoforms were processed from the same arm they were named 
“.1” and “.2” in decreasing order of their abundance, as detected in the human. 
 All mature miRNAs were downloaded from miRBase release 21 (Kozomara and 
Griffiths-Jones, 2014).  Those that matched a conserved miRNA at nucleotides 2–8 were 
considered part of that miRNA family.  All miRNAs and miRNA isoforms annotated in 
miRBase but not meeting our criteria for conservation in mammals or beyond were also 
grouped into families based on the identity of nucleotides 2–8 and were classified as 
poorly conserved miRNAs (which included many small RNAs misclassified as 
miRNAs). All mammalian or broadly conserved and poorly conserved miRNA seed 
families are available for download at TargetScan (targetscan.org). 
 
TargetScan7 predictions 
TargetScan (v7.0) provides the option of ranking predicted targets of mammalian 
miRNAs according to either cumulative weighted context++ score (CWCS), which ranks 
based upon the predicted repression, or aggregate PCT score of the longest 3′-UTR 
93
isoform, which ranks based upon the confidence that targeting is evolutionarily conserved 
(Figure 7–figure supplement 1). For each predicted target, the CWCS estimated the total repression expected from multiple sites to the same miRNA. This score was calculated using the 3'-UTR profiles to weight the marginal effect of each additional site to the miRNA while also taking into account the predicted mRNA depletion resulting from any downstream sites 
to the same miRNA. This approach was improved over that we used previously to 
calculate total wContext+ scores (Nam et al., 2014), in that it did not over-estimate the 
aggregate effect of multiple sites in distal isoforms. For each miRNA family, 8mer, 7mer-
m8, 7mer-A1, and 6mer sites were first filtered to remove overlapping sites, and for each 
reference 3' UTR, nonoverlapping sites to the same miRNA were numbered from 1 to n, 
starting at the distal end of the 3' UTR. For each site i, from 1 to n, the cumulative 
predicted repression at that site (Ci) was calculated as Ci = C(i–1) + (1 – 2CSi)(AIRi – C(i–
1)), in which CSi and AIRi were the context++ score and AIR of site i, and the (1 – 
2CSi)(AIRi – C(i–1)) term predicted the marginal repression of site i, in which the predicted 
repression at the site (1 – 2CSi) was modified based on the fraction of mRNAs containing 
that site (AIRi) as reduced by the mRNA depletion predicted to occur from the action of 
any more distal sites (C(i–1), assigning C0 as 0). The CWCS was then calculated as log2(1 
– Cn), in which Cn was the Ci at the most proximal site of the reference 3' UTR. For each 
reference 3' UTR, CWCSs were calculated for each member of a miRNA family, and the 
score from the member with the greatest predicted repression was chosen to represent 
that family, and the reference 3' UTR with the most 3P-seq tags was chosen to represent 
the gene. 
94
When scoring features that can vary with 3′-UTR length (Min_dist, Len_3UTR, and Off6m), a weighted score was used that accounted for the abundance of each 3′-UTR tandem isoform in which the site existed, as estimated from a compendium of 3P-seq datasets from the same species (Nam et al., 2014). Although 6mer sites are used to calculate cumulative weighted context++ scores, and 6mer sites are tallied in the tables, the locations of these 6mer sites are not displayed, and targets with only 6mer sites are not listed. When calculating PCT scores, the most abundant 3′-UTR isoform as defined by 3P-seq was used to determine the conservation bin to which the 3′ UTR belonged. Sites corresponding to poorly conserved and mammalian-
conserved miRNA seed families or sites overlapping annotated ORF regions were 
assigned PCT scores of zero. For TargetScanFish, genome-wide alignment quality in 
zebrafish 3′ UTRs was not of sufficient quality to compute PCT scores, so a PCT value of zero was assigned to all sites when computing context++ scores. All PCT parameters and parameters for tree branch lengths and regression models, along with pre-computed context++ scores for human, mouse, zebrafish, and other vertebrate species are available for download (targetscan.org). Perl scripts using these parameters to compute context++ scores, weighted context++ scores, CWCSs, and aggregate PCT scores are also provided (targetscan.org). Predictions are also 
made for homologous 3′ UTRs of other vertebrate species, using either human-centric or mouse-centric 3′-UTR definitions and corresponding MSAs. 
 
  
95
Acknowledgements 
We thank the Bioinformatics and Research Computing group at the Whitehead Institute 
(I. Barrasa, B. Yuan, Y. Huang, and P. Thiru) for help implementing improvements to the 
TargetScan website, A. Subtelny for providing insight into positional effects of the 
miRNA seed, I. Ulitsky for initial help with 3P-seq analysis, R. Friedman for discussions 
regarding the computation of PCT parameters, T. Tuschl for sharing an unpublished list 
of the most frequently sequenced human miRNA isoforms, G. Agarwal for discussions 
regarding normalization techniques, G. Kudla for help processing the microarray data 
from the CLASH study, S.-W. Chi and R. B. Darnell for confirmation of the mRNAs 
identified as miR-124 targets in their dCLIP study, O. Rissland and J. Guo for critical 
reading of the manuscript, and members of the Bartel lab for helpful discussions. This 
work was supported by a National Science Foundation Graduate Research Fellowship (to 
V.A.) and an NIH grant GM067031 (to D.P.B.). D.P.B. is an investigator of the Howard 
Hughes Medical Institute.  
96
References 
Ameres, S.L., Martinez, J., and Schroeder, R. (2007). Molecular basis for target RNA 
recognition and cleavage by human RISC. Cell 130, 101-112. 
Anders, G., Mackowiak, S.D., Jens, M., Maaskola, J., Kuntzagk, A., Rajewsky, N., 
Landthaler, M., and Dieterich, C. (2012). doRiNA: a database of RNA 
interactions in post-transcriptional regulation. Nucleic Acids Res 40, D180-D186. 
Anderson, E.M., Birmingham, A., Baskerville, S., Reynolds, A., Maksimova, E., Leake, 
D., Fedorov, Y., Karpilow, J., and Khvorova, A. (2008). Experimental validation 
of the importance of seed complement frequency to siRNA specificity. RNA 14, 
853-861. 
Arvey, A., Larsson, E., Sander, C., Leslie, C.S., and Marks, D.S. (2010). Target mRNA 
abundance dilutes microRNA and siRNA activity. Mol Syst Biol 6, 363. 
Azuma-Mukai, A., Oguri, H., Mituyama, T., Qian, Z.R., Asai, K., Siomi, H., and Siomi, 
M.C. (2008). Characterization of endogenous human Argonautes and their 
miRNA partners in RNA silencing. Proceedings of the National Academy of 
Sciences of the United States of America 105, 7964-7969. 
Baek, D., Villen, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008). The 
impact of microRNAs on protein output. Nature 455, 64-71. 
Bailey, T.L., and Elkan, C. (1994). Fitting a mixture model by expectation maximization 
to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol 2, 28-36. 
Bandyopadhyay, S., Ghosh, D., Mitra, R., and Zhao, Z. (2015). MBSTAR: multiple 
instance learning for predicting specific functional binding sites in microRNA 
targets. Sci Rep 5, 8004. 
Bartel, D.P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 
116, 281-297. 
Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 
215-233. 
Bazzini, A.A., Lee, M.T., and Giraldez, A.J. (2012). Ribosome Profiling Shows That 
miR-430 Reduces Translation Before Causing mRNA Decay in Zebrafish. 
Science 336, 233-237. 
Bernhart, S.H., Hofacker, I.L., and Stadler, P.F. (2006). Local RNA base pairing 
probabilities in large sequences. Bioinformatics 22, 614-615. 
Betel, D., Koppal, A., Agius, P., Sander, C., and Leslie, C. (2010). Comprehensive 
modeling of microRNA targets predicts functional non-conserved and non-
canonical sites. Genome Biol 11, R90. 
Birmingham, A., Anderson, E.M., Reynolds, A., Ilsley-Tyree, D., Leake, D., Fedorov, Y., 
Baskerville, S., Maksimova, E., Robinson, K., Karpilow, J., et al. (2006). 3' UTR 
seed matches, but not overall identity, are associated with RNAi off-targets. Nat 
Methods 3, 199-204. 
Brennecke, J., Stark, A., Russell, R.B., and Cohen, S.M. (2005). Principles of 
microRNA-target recognition. PLoS Biol 3, e85. 
Bushati, N., and Cohen, S.M. (2007). MicroRNA functions. Annual Review of Cell and 
Developmental Biology 23, 175-205. 
Chi, S.W., Hannon, G.J., and Darnell, R.B. (2012). An alternative mode of microRNA 
target recognition. Nat Struct Mol Biol 19, 321-327. 
97
Chi, S.W., Zang, J.B., Mele, A., and Darnell, R.B. (2009). Argonaute HITS-CLIP 
decodes microRNA-mRNA interaction maps. Nature 460, 479-486. 
Chiang, H.R., Schoenfeld, L.W., Ruby, J.G., Auyeung, V.C., Spies, N., Baek, D., 
Johnston, W.K., Russ, C., Luo, S., Babiarz, J.E., et al. (2010). Mammalian 
microRNAs: experimental evaluation of novel and previously annotated genes. 
Genes & Development 24, 992-1009. 
Davis, E., Caiment, F., Tordoir, X., Cavaille, J., Ferguson-Smith, A., Cockett, N., 
Georges, M., and Charlier, C. (2005). RNAi-mediated allelic trans-interaction at 
the imprinted Rtl1/Peg11 locus. Current Biology 15, 743-749. 
Denzler, R., Agarwal, V., Stefano, J., Bartel, D.P., and Stoffel, M. (2014). Assessing the 
ceRNA Hypothesis with Quantitative Measurements of miRNA and Target 
Abundance. Molecular Cell 54, 766-776. 
Du, P., Kibbe, W.A., and Lin, S.M. (2008). lumi: a pipeline for processing Illumina 
microarray. Bioinformatics 24, 1547-1548. 
Eichhorn, S.W., Guo, H.L., McGeary, S.E., Rodriguez-Mias, R.A., Shin, C., Baek, D., 
Hsu, S.H., Ghoshal, K., Villen, J., and Bartel, D.P. (2014). mRNA Destabilization 
Is the Dominant Effect of Mammalian MicroRNAs by the Time Substantial 
Repression Ensues. Molecular Cell 56, 104-115. 
Elkon, R., and Agami, R. (2008). Removal of AU bias from microarray mRNA 
expression data enhances computational identification of active microRNAs. 
PLoS Comput Biol 4, e1000189. 
Erhard, F., Haas, J., Lieber, D., Malterer, G., Jaskiewicz, L., Zavolan, M., Dolken, L., 
and Zimmer, R. (2014). Widespread context dependency of microRNA-mediated 
regulation. Genome Res 24, 906-919. 
Eulalio, A., Huntzinger, E., and Izaurralde, E. (2008). GW182 interaction with Argonaute 
is essential for miRNA-mediated translational repression and mRNA decay. Nat 
Struct Mol Biol 15, 346-353. 
Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge, C.B., 
and Bartel, D.P. (2005). The widespread impact of mammalian MicroRNAs on 
mRNA repression and evolution. Science 310, 1817-1821. 
Flicek, P., Amode, M.R., Barrell, D., Beal, K., Billis, K., Brent, S., Carvalho-Silva, D., 
Clapham, P., Coates, G., Fitzgerald, S., et al. (2014). Ensembl 2014. Nucleic 
Acids Res 42, D749-755. 
Friedersdorf, M.B., and Keene, J.D. (2014). Advancing the functional utility of PAR-
CLIP by quantifying background binding to mRNAs and lncRNAs. Genome Biol 
15, R2. 
Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian 
mRNAs are conserved targets of microRNAs. Genome Research 19, 92-105. 
Gaidatzis, D., Nimwegen, E., Hausser, J., and Zavolan, M. (2007). Inference of miRNA 
targets using evolutionary conservation and pathway analysis. BMC 
Bioinformatics 8, 248. 
Garcia, D.M., Baek, D., Shin, C., Bell, G.W., Grimson, A., and Bartel, D.P. (2011). 
Weak seed-pairing stability and high target-site abundance decrease the 
proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18, 1139-1146. 
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, 
B., Gautier, L., Ge, Y., Gentry, J., et al. (2004). Bioconductor: open software 
98
development for computational biology and bioinformatics. Genome Biol 5, R80. 
Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K., Enright, 
A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenylation and 
clearance of maternal mRNAs. Science 312, 75-79. 
Grant, C.E., Bailey, T.L., and Noble, W.S. (2011). FIMO: scanning for occurrences of a 
given motif. Bioinformatics 27, 1017-1018. 
Griffiths-Jones, S., Saini, H.K., van Dongen, S., and Enright, A.J. (2008). miRBase: tools 
for microRNA genomics. Nucleic Acids Res 36, D154-158. 
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. 
(2007). MicroRNA targeting specificity in mammals: determinants beyond seed 
pairing. Molecular Cell 27, 91-105. 
Grosswendt, S., Filipchyk, A., Manzano, M., Klironomos, F., Schilling, M., Herzog, M., 
Gottwein, E., and Rajewsky, N. (2014). Unambiguous Identification of 
miRNA:Target Site Interactions by Different Types of Ligation Reactions. 
Molecular Cell. 
Gu, S., Jin, L., Zhang, F.J., Sarnow, P., and Kay, M.A. (2009). Biological basis for 
restriction of microRNA targets to the 3 ' untranslated region in mammalian 
mRNAs. Nat Struct Mol Biol 16, 144-150. 
Gumienny, R., and Zavolan, M. (2015). Accurate transcriptome-wide prediction of 
microRNA targets and small interfering RNA off-targets with MIRZA-G. Nucleic 
Acids Res. 
Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian microRNAs 
predominantly act to decrease target mRNA levels. Nature 466, 835-840. 
Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P., 
Rothballer, A., Ascano, M., Jungkamp, A.C., Munschauer, M., et al. (2010). 
Transcriptome-wide Identification of RNA-Binding Protein and MicroRNA 
Target Sites by PAR-CLIP. Cell 141, 129-141. 
Harrow, J., Frankish, A., Gonzalez, J.M., Tapanari, E., Diekhans, M., Kokocinski, F., 
Aken, B.L., Barrell, D., Zadissa, A., Searle, S., et al. (2012). GENCODE: the 
reference human genome annotation for The ENCODE Project. Genome Research 
22, 1760-1774. 
Hausser, J., Landthaler, M., Jaskiewicz, L., Gaidatzis, D., and Zavolan, M. (2009). 
Relative contribution of sequence and structure features to the mRNA binding of 
Argonaute/EIF2C-miRNA complexes and the degradation of miRNA targets. 
Genome Research 19, 2009-2020. 
Hausser, J., and Zavolan, M. (2014). Identification and consequences of miRNA-target 
interactions--beyond repression of gene expression. Nat Rev Genet 15, 599-612. 
Helwak, A., Kudla, G., Dudnakova, T., and Tollervey, D. (2013). Mapping the human 
miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153, 
654-665. 
Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., 
Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., et al. (2006). The UCSC Genome 
Browser Database: update 2006. Nucleic Acids Res 34, D590-598. 
Jackson, A.L., Burchard, J., Leake, D., Reynolds, A., Schelter, J., Guo, J., Johnson, J.M., 
Lim, L., Karpilow, J., Nichols, K., et al. (2006a). Position-specific chemical 
modification of siRNAs reduces "off-target'' transcript silencing. RNA 12, 1197-
99
1205. 
Jackson, A.L., Burchard, J., Schelter, J., Chau, B.N., Cleary, M., Lim, L., and Linsley, 
P.S. (2006b). Widespread siRNA "off-target" transcript silencing mediated by 
seed region sequence complementarity. RNA 12, 1179-1187. 
Jan, C.H., Friedman, R.C., Ruby, J.G., and Bartel, D.P. (2011). Formation, regulation and 
evolution of Caenorhabditis elegans 3'UTRs. Nature. 
Jaskiewicz, L., Bilen, B., Hausser, J., and Zavolan, M. (2012). Argonaute CLIP--a 
method to identify in vivo targets of miRNAs. Methods 58, 106-112. 
Jones-Rhoades, M.W., and Bartel, D.P. (2004). Computational identification of plant 
MicroRNAs and their targets, including a stress-induced miRNA. Molecular Cell 
14, 787-799. 
Karginov, F.V., Cheloufi, S., Chong, M.M.W., Stark, A., Smith, A.D., and Hannon, G.J. 
(2010). Diverse Endonucleolytic Cleavage Sites in the Mammalian Transcriptome 
Depend upon MicroRNAs, Drosha, and Additional Nucleases. Molecular Cell 38, 
781-788. 
Karolchik, D., Barber, G.P., Casper, J., Clawson, H., Cline, M.S., Diekhans, M., Dreszer, 
T.R., Fujita, P.A., Guruvadoo, L., Haeussler, M., et al. (2014). The UCSC 
Genome Browser database: 2014 update. Nucleic Acids Research 42, D764-
D770. 
Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M., and 
Haussler, D. (2002). The human genome browser at UCSC. Genome Research 12, 
996-1006. 
Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., and Segal, E. (2007). The role of site 
accessibility in microRNA target recognition. Nat Genet 39, 1278-1284. 
Khan, A.A., Betel, D., Miller, M.L., Sander, C., Leslie, C.S., and Marks, D.S. (2009). 
Transfection of small RNAs globally perturbs gene regulation by endogenous 
microRNAs. Nature Biotechnology 27, 549-555. 
Khorshid, M., Hausser, J., Zavolan, M., and van Nimwegen, E. (2013). A biophysical 
miRNA-mRNA interaction model infers canonical and noncanonical targets. Nat 
Methods 10, 253-255. 
Kishore, S., Jaskiewicz, L., Burger, L., Hausser, J., Khorshid, M., and Zavolan, M. 
(2011). A quantitative analysis of CLIP methods for identifying binding sites of 
RNA-binding proteins. Nat Methods 8, 559-564. 
Kloosterman, W.P., and Plasterk, R.H.A. (2006). The diverse functions of MicroRNAs in 
animal development and disease. Developmental Cell 11, 441-450. 
Kozomara, A., and Griffiths-Jones, S. (2014). miRBase: annotating high confidence 
microRNAs using deep sequencing data. Nucleic Acids Research 42, D68-D73. 
Krek, A., Grun, D., Poy, M.N., Wolf, R., Rosenberg, L., Epstein, E.J., MacMenamin, P., 
da Piedade, I., Gunsalus, K.C., Stoffel, M., et al. (2005). Combinatorial 
microRNA target predictions. Nat Genet 37, 495-500. 
Krutzfeldt, J., Rajewsky, N., Braich, R., Rajeev, K.G., Tuschl, T., Manoharan, M., and 
Stoffel, M. (2005). Silencing of microRNAs in vivo with 'antagomirs'. Nature 
438, 685-689. 
Lal, A., Navarro, F., Maher, C.A., Maliszewski, L.E., Yan, N., O'Day, E., Chowdhury, 
D., Dykxhoorn, D.M., Tsai, P., Hofmann, O., et al. (2009). miR-24 Inhibits cell 
proliferation by targeting E2F2, MYC, and other cell-cycle genes via binding to 
100
"seedless" 3'UTR microRNA recognition elements. Molecular Cell 35, 610-625. 
Lambert, N., Robertson, A., Jangi, M., McGeary, S., Sharp, P.A., and Burge, C.B. 
(2014). RNA Bind-n-Seq: quantitative assessment of the sequence and structural 
binding specificity of RNA binding proteins. Mol Cell 54, 887-900. 
Leek, J.T., Scharpf, R.B., Bravo, H.C., Simcha, D., Langmead, B., Johnson, W.E., 
Geman, D., Baggerly, K., and Irizarry, R.A. (2010). Tackling the widespread and 
critical impact of batch effects in high-throughput data. Nature Reviews Genetics 
11, 733-739. 
Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing, often flanked 
by adenosines, indicates that thousands of human genes are microRNA targets. 
Cell 120, 15-20. 
Lewis, B.P., Shih, I.H., Jones-Rhoades, M.W., Bartel, D.P., and Burge, C.B. (2003). 
Prediction of mammalian microRNA targets. Cell 115, 787-798. 
Lianoglou, S., Garg, V., Yang, J.L., Leslie, C.S., and Mayr, C. (2013). Ubiquitously 
transcribed genes use alternative polyadenylation to achieve tissue-specific 
expression. Genes & Development 27, 2380-2396. 
Lim, L.P., Lau, N.C., Garrett-Engele, P., Grimson, A., Schelter, J.M., Castle, J., Bartel, 
D.P., Linsley, P.S., and Johnson, J.M. (2005). Microarray analysis shows that 
some microRNAs downregulate large numbers of target mRNAs. Nature 433, 
769-773. 
Linsley, P.S., Schelter, J., Burchard, J., Kibukawa, M., Martin, M.M., Bartz, S.R., 
Johnson, J.M., Cummins, J.M., Raymond, C.K., Dai, H., et al. (2007). Transcripts 
targeted by the microRNA-16 family cooperatively regulate cell cycle 
progression. Mol Cell Biol 27, 2240-2252. 
Lipchina, I., Elkabetz, Y., Hafner, M., Sheridan, R., Mihailovic, A., Tuschl, T., Sander, 
C., Studer, L., and Betel, D. (2011). Genome-wide identification of microRNA 
targets in human ES cells reveals a role for miR-302 in modulating BMP 
response. Genes & Development 25, 2173-2186. 
Liu, H., Yue, D., Chen, Y., Gao, S.J., and Huang, Y. (2010). Improving performance of 
mammalian microRNA target prediction. BMC Bioinformatics 11, 476. 
Loeb, G.B., Khan, A.A., Canner, D., Hiatt, J.B., Shendure, J., Darnell, R.B., Leslie, C.S., 
and Rudensky, A.Y. (2012). Transcriptome-wide miR-155 Binding Map Reveals 
Widespread Noncanonical MicroRNA Targeting. Molecular Cell 48, 760-770. 
Long, D., Lee, R., Williams, P., Chan, C.Y., Ambros, V., and Ding, Y. (2007). Potent 
effect of target structure on microRNA function. Nat Struct Mol Biol 14, 287-294. 
Majoros, W.H., Lekprasert, P., Mukherjee, N., Skalsky, R.L., Corcoran, D.L., Cullen, 
B.R., and Ohler, U. (2013). MicroRNA target site identification by integrating 
sequence and binding information. Nat Methods 10, 630-633. 
Marin, R.M., Sulc, M., and Vanicek, J. (2013). Searching the coding region for 
microRNA targets. RNA 19, 467-474. 
Mayr, C., and Bartel, D.P. (2009). Widespread shortening of 3'UTRs by alternative 
cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673-
684. 
Mevik, B.H., and Wehrens, R. (2007). The pls package: Principal component and partial 
least squares regression in R. Journal of Statistical Software 18. 
Miranda, K.C., Huynh, T., Tay, Y., Ang, Y.S., Tam, W.L., Thomson, A.M., Lim, B., and 
101
Rigoutsos, I. (2006). A pattern-based method for the identification of microRNA 
binding sites and their corresponding heteroduplexes. Cell 126, 1203-1217. 
Miura, P., Shenker, S., Andreu-Agullo, C., Westholm, J.O., and Lai, E.C. (2013). 
Widespread and extensive lengthening of 3' UTRs in the mammalian brain. 
Genome Res 23, 812-825. 
Morin, R.D., O'Connor, M.D., Griffith, M., Kuchenbauer, F., Delaney, A., Prabhu, A.L., 
Zhao, Y., McDonald, H., Zeng, T., Hirst, M., et al. (2008). Application of 
massively parallel sequencing to microRNA profiling and discovery in human 
embryonic stem cells. Genome Res 18, 610-621. 
Nakanishi, K., Weinberg, D.E., Bartel, D.P., and Patel, D.J. (2012). Structure of yeast 
Argonaute with guide RNA. Nature 486, 368-374. 
Nam, J.W., Rissland, O.S., Koppstein, D., Abreu-Goodger, C., Jan, C.H., Agarwal, V., 
Yildirim, M.A., Rodriguez, A., and Bartel, D.P. (2014). Global Analyses of the 
Effect of Different Cellular Contexts on MicroRNA Targeting. Molecular Cell 53, 
1031-1043. 
Nielsen, C.B., Shomron, N., Sandberg, R., Hornstein, E., Kitzman, J., and Burge, C.B. 
(2007). Determinants of targeting by endogenous and exogenous microRNAs and 
siRNAs. RNA 13, 1894-1910. 
Pillai, R.S., Artus, C.G., and Filipowicz, W. (2004). Tethering of human Ago proteins to 
mRNA mimics the miRNA-mediated repression of protein synthesis. RNA 10, 
1518-1525. 
Pruitt, K.D., Tatusova, T., Brown, G.R., and Maglott, D.R. (2012). NCBI Reference 
Sequences (RefSeq): current status, new features and genome annotation policy. 
Nucleic Acids Res 40, D130-135. 
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for 
comparing genomic features. Bioinformatics 26, 841-842. 
Reczko, M., Maragkakis, M., Alexiou, P., Grosse, I., and Hatzigeorgiou, A.G. (2012). 
Functional microRNA targets in protein coding sequences. Bioinformatics 28, 
771-776. 
Reinhart, B.J., Slack, F.J., Basson, M., Pasquinelli, A.E., Bettinger, J.C., Rougvie, A.E., 
Horvitz, H.R., and Ruvkun, G. (2000). The 21-nucleotide let-7 RNA regulates 
developmental timing in Caenorhabditis elegans. Nature 403, 901-906. 
Robins, H., Li, Y., and Padgett, R.W. (2005). Incorporating structure to predict 
microRNA targets. Proc Natl Acad Sci USA 102, 4006-4009. 
Robins, H., and Press, W.H. (2005). Human microRNAs target a functionally distinct 
population of genes with AT-rich 3' UTRs. Proc Natl Acad Sci USA 102, 15557-
15562. 
Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor 
package for differential expression analysis of digital gene expression data. 
Bioinformatics 26, 139-140. 
Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R., van 
Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A., et al. (2007). Requirement of 
bic/microRNA-155 for normal immune function. Science 316, 608-611. 
Saito, T., and Satrom, P. (2012). Target gene expression levels and competition between 
transfected and endogenous microRNAs are strong confounding factors in 
microRNA high-throughput experiments. Silence 3, 3. 
102
Sandberg, R., Neilson, J.R., Sarma, A., Sharp, P.A., and Burge, C.B. (2008). Proliferating 
cells express mRNAs with shortened 3' untranslated regions and fewer microRNA 
target sites. Science 320, 1643-1647. 
Schirle, N.T., and MacRae, I.J. (2012). The crystal structure of human Argonaute2. 
Science 336, 1037-1040. 
Schirle, N.T., Sheu-Gruttadauria, J., and MacRae, I.J. (2014). Structural basis for 
microRNA targeting. Science 346, 608-613. 
Schwarz, D.S., Ding, H.L., Kennington, L., Moore, J.T., Schelter, J., Burchard, J., 
Linsley, P.S., Aronin, N., Xu, Z.S., and Zamore, P.D. (2006). Designing siRNA 
that distinguish between genes that differ by a single nucleotide. PLoS Genetics 2, 
1307-1318. 
Selbach, M., Schwanhausser, B., Thierfelder, N., Fang, Z., Khanin, R., and Rajewsky, N. 
(2008). Widespread changes in protein synthesis induced by microRNAs. Nature 
455, 58-63. 
Shin, C., Nam, J.W., Farh, K.K.H., Chiang, H.R., Shkumatava, A., and Bartel, D.P. 
(2010). Expanding the MicroRNA Targeting Code: Functional Sites with 
Centered Pairing. Molecular Cell 38, 789-802. 
Siepel, A., and Haussler, D. (2004). Phylogenetic estimation of context-dependent 
substitution rates by maximum likelihood. Mol Biol Evol 21, 468-488. 
Smyth, G.K. (2004). Linear models and empirical bayes methods for assessing 
differential expression in microarray experiments. Stat Appl Genet Mol Biol 3, 
Article3. 
Smyth, G.K. (2005). Limma: linear models for microarray data. In Bioinformatics and 
computational biology solutions using R and Bioconductor (Springer), pp. 397-
420. 
Stefani, G., and Slack, F.J. (2008). Small non-coding RNAs in animal development. 
Nature Reviews Molecular Cell Biology 9, 219-230. 
Sturm, M., Hackenberg, M., Langenberger, D., and Frishman, D. (2010). TargetSpy: a 
supervised machine learning approach for microRNA target prediction. BMC 
Bioinformatics 11. 
Tafer, H., Ameres, S.L., Obernosterer, G., Gebeshuber, C.A., Schroeder, R., Martinez, J., 
and Hofacker, I.L. (2008). The impact of target site accessibility on the design of 
effective siRNAs. Nature Biotechnology 26, 578-583. 
Tan, S.M., Kirchner, R., Jin, J., Hofmann, O., McReynolds, L., Hide, W., and Lieberman, 
J. (2014). Sequencing of Captive Target Transcripts Identifies the Network of 
Regulated Genes and Functions of Primate-Specific miR-522. Cell Reports 8, 
1225-1239. 
Team, R.C. (2014). R: A language and environment for statistical computing. R 
Foundation for Statistical Computing Vienna, Austria. 
Tian, B., Hu, J., Zhang, H., and Lutz, C.S. (2005). A large-scale analysis of mRNA 
polyadenylation of human and mouse genes. Nucleic Acids Res 33, 201-212. 
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., 
Botstein, D., and Altman, R.B. (2001). Missing value estimation methods for 
DNA microarrays. Bioinformatics 17, 520-525. 
Tsang, J.S., Ebert, M.S., and van Oudenaarden, A. (2010). Genome-wide Dissection of 
MicroRNA Functions and Cotargeting Networks Using Gene Set Signatures. 
103
Molecular Cell 38, 140-153. 
Ulitsky, I., Shkumatava, A., Jan, C.H., Subtelny, A.O., Koppstein, D., Bell, G.W., Sive, 
H., and Bartel, D.P. (2012). Extensive alternative polyadenylation during 
zebrafish development. Genome Res 22, 2054-2066. 
Vejnar, C.E., and Zdobnov, E.M. (2012). MiRmap: comprehensive prediction of 
microRNA target repression strength. Nucleic Acids Res 40, 11673-11683. 
Venables, W.N., and Ripley, B.D. (2002). Modern applied statistics with S, 4th edn (New 
York: Springer). 
Wang, X. (2014). Composition of seed sequence is a major determinant of microRNA 
targeting patterns. Bioinformatics 30, 1377-1383. 
Wang, X.W. (2008). miRDB: A microRNA target prediction and functional annotation 
database with a wiki interface. RNA 14, 1012-1017. 
Wang, X.W., and El Naqa, I.M. (2008). Prediction of both conserved and nonconserved 
microRNA targets in animals. Bioinformatics 24, 325-332. 
Wen, J., Parker, B.J., Jacobsen, A., and Krogh, A. (2011). MicroRNA transfection and 
AGO-bound CLIP-seq data sets reveal distinct determinants of miRNA action. 
RNA 17, 820-834. 
Wu, H., Ye, C., Ramirez, D., and Manjunath, N. (2009). Alternative processing of 
primary microRNA transcripts by Drosha generates 5' end variation of mature 
microRNA. PLoS One 4, e7566. 
Wu, Z.J., Irizarry, R.A., Gentleman, R., Martinez-Murillo, F., and Spencer, F. (2004). A 
model-based background adjustment for oligonucleotide expression arrays. 
Journal of the American Statistical Association 99, 909-917. 
Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 
mRNA. Science 304, 594-596. 
  
104
Figures and figure legends 
Figure 1. Inefficacy of recently reported non-canonical sites. 
(A) Response of mRNAs to the loss of miRNAs, comparing mRNAs that contain either a 
canonical or nucleation-bulge site to miR-430 to those that do not contain a miR-430 site. 
Plotted are cumulative distributions of mRNA fold changes observed when comparing 
embryos that lack miRNAs (MZDicer) to those that have miRNAs (WT), focusing on 
mRNAs possessing a single site of the indicated type in their 3′ UTR. Similarity of site-
containing distributions to the no-site distribution was tested [one-sided Kolmogorov–
Smirnov (K–S) test, P values]; the number of mRNAs analyzed in each category is listed 
in parentheses. See also Figure 1–figure supplement 1C and Figure 1–figure supplement 
4A. 
(B and C) Response of mRNAs to the loss of miR-155, focusing on mRNAs that contain 
either a single canonical or ≥1 CLIP-supported non-canonical site to miR-155. These 
panels are as in (A), but compare fold changes for mRNAs with the indicated site type 
following genetic ablation of mir-155 in either T cells (B) or Th1 cells (C). See also 
Figure 1–figure supplement 2 and Figure 1–figure supplement 4B. 
(D and E) Response of mRNAs to the knockdown of miR-92, focusing on mRNAs that 
contain either a single canonical or ≥1 CLASH-identified non-canonical site to miR-92. 
These panels are as in (A), except CLASH-supported non-canonical sites were the same 
as those defined previously (Helwak et al., 2013) and thus were permitted to reside in any 
region of the mature mRNA, and these panels compare fold changes for mRNAs with the 
indicated site type following either knockdown of miR-92 (D) or combined knockdown 
of miR-92 and 24 other miRNAs (E) in HEK293 cells. See also Figure 1–figure 
105
supplements 3A–B. 
(F) As in (D), but focusing on mRNAs that contain ≥1 chimera-identified site. See also 
Figure 1–figure supplements 3C–E. 
(G) Response of mRNAs to the transfection of 16 miRNAs, focusing on mRNAs that 
contain either a canonical or MIRZA-predicted non-canonical site. This panel is as in 
(A), but compares the fold changes for mRNAs with the indicated site type after 
introducing miRNAs, aggregating results from 16 individual transfection datasets. Fold 
changes are plotted for the top 100 non-canonical predictions for each of 16 miRNAs 
compiled either before (MIRZA, top 100) or after (MIRZA, no 6mers) removing mRNAs 
containing 6mer or offset 6mer 3′-UTR sites. 
(H) Response of mRNAs to a transfection of miR-522, focusing on mRNAs that contain 
either a single canonical or ≥1 IMPACT-seq-supported non-canonical site to miR-522. 
These panels are as in (A), except IMPACT-seq-supported non-canonical sites were the 
same as those defined previously (Tan et al., 2014) and thus were permitted in any region 
of the mature mRNA. 
 
Figure 1–figure supplement 1. Inefficacy of nucleation-bulge sites. 
(A and B) These panels are as in Figure 1A but compare the response of cognate site-
containing mRNAs in a compendium of either 11 miRNA transfection datasets (A) or 74 
sRNA transfection datasets (B). The datasets were pre-processed (Figure 3) and are 
provided in Supplementary file 1. 
(C) This panel is as in Figure 1A but compares the response of mRNAs in MZDicer 
embryos in which miR-430 has been injected. 
(D–F) This panel is as in Figure 1A but compares the response of mRNAs with the 
106
indicated miR-124 site types after transfecting miR-124 into either HEK293 cells (D), 
HeLa cells (E), or Huh7 cells (F). 
 
Figure 1–figure supplement 2. Inefficacy of CLIP-supported non-canonical miR-155 
sites. 
(A and B) These panels are as in Figure 1B but compare the response of mRNAs after 
genetic ablation of miR-155 in Type 2 helper T cells (Th2, A) or B cells (B). 
 
Figure 1–figure supplement 3. Inefficacy of CLASH- and chimera-supported non-
canonical sites. 
(A–D) These panels are as in Figure 1D but compare the response of mRNAs with sites 
cognate to any one of four miRNA families (miR-15/16, miR-19, miR-17/20/93/106, or 
miR-25/92), for either all CLASH-supported targets (A), mRNAs with CLASH-
supported 3′-UTR sites (B), all chimera-supported targets (C), or  mRNAs with chimera-
supported 3′-UTR sites (D). These four miRNA families were chosen because their 
predicted targets were the most responsive to knockdown of the 25 miRNAs. P values 
reflect the median P value (as evaluated by a K–S test) across 100 trials in which a no-
site control cohort with matched 3′-UTR lengths was chosen for each site-containing 
distribution. Length-matched no-site controls were required for this analysis because 
longer 3′ UTRs had a greater chance of containing additional sites to at least one of the 
many miRNAs that were knocked down, and thus had a greater chance of being 
derepressed as a result of interactions otherwise not considered in the analysis. To 
populate each control cohort, 500 different no-site mRNAs were chosen, considering the 
107
3′-UTR length of each site-containing mRNA and selecting (without replacement) control 
mRNAs from among the 10 no-site mRNAs with the most similar 3′-UTR lengths. 
Shown is the response of a control cohort for mRNAs containing non-canonical sites. 
mRNAs with 3′ UTRs >2000 nt were excluded from the analysis because so many of the 
3′ UTRs >2000 nt had a site to at least one of the four miRNA families, making it 
impossible to select appropriate length-matched controls. 
(E) This panel is as in Figure 1F but compares the response of mRNAs with the indicated 
miR-302 site types after knocking down miR-302/367 in hESCs. 
 
Figure 1–figure supplement 4. Inefficacy of non-canonical sites in mediating 
translational repression. 
(A) This panel is as in Figure 1A but compares the response of mRNAs using ribosome 
footprint profiling (Bazzini et al., 2012), which captures changes in both mRNA stability 
and translational efficiency through the high-throughput sequencing of ribosome-
protected mRNA fragments (RPFs). 
(B) This panel is as in Figure 1–figure supplement 2B but compares fold changes in RPFs 
after genetic ablation of miR-155 in in B cells. 
(C) This panel is as in (B) but compares protein fold changes for chimera-supported 
targets, as evaluated by pulsed SILAC (Selbach et al., 2008) after transfection of miR-
155 in HeLa cells. 
 
Figure 1–figure supplement 5. Re-evaluating conservation of chimera-supported non-
canonical sites. 
108
(A) Conservation of chimera-supported non-canonical sites detected in an analysis 
modeled after that of Grosswendt et al. (2014) but modified to control for background 
conservation. Plotted for the indicated miRNAs is the average conservation of chimera-
supported non-canonical sites, as measured by branch-length score (BLS), compared to 
the average conservation of 100 equally sized cohorts of controls; error bars, standard 
deviation of cohort averages; **, P < 0.01; *, P < 0.05, one-sided Z test. We considered 
chimera-supported non-canonical sites that mapped within 3′ UTRs and contained a 
single mismatch to the 6 nt seed of the miRNA. This set of sites mirrored that analyzed 
previously (Grosswendt et al., 2014), and excluded offset 6mers, which as a class was 
already known to mediate repression and exhibit preferential conservation (Friedman et 
al., 2009). Cohorts of control sites were generated such that for each chimera-supported 
site, each control cohort contained a single example of the identical 6 nt motif that was 
present in the indicated region (either an AGO cluster or 3′ UTR) but not supported by 
chimeric reads. To control for local background conservation and thereby avoid treating 
sites within slowly evolving 3′ UTRs the same as those within rapidly evolving 3′ UTRs, 
we used the binning procedure developed for calculating PCT scores (Friedman et al., 
2009); 3′ UTRs were partitioned into 10 conservation bins (based on the median BLS of 
the nucleotides of the human sequence), and control sites were randomly selected (with 
replacement) from 3′ UTRs in the same bin as the actual site. Control AGO clusters were 
collected as was done previously (Grosswendt et al., 2014), using genome-wide data 
downloaded from clipz.unibas.ch and derived from multiple AGO PAR-CLIP 
experiments performed in HEK293 cells (Kishore et al., 2011). The union of AGO 
clusters for all experiments was computed and filtered for overlap with Ensembl-
109
annotated 3′ UTRs, using the “merge” and “intersectBED” utilities, respectively, found in 
BEDTools v2.20.1 (parameter “-s”) (Quinlan and Hall, 2010). 
(B) Attribution of the conservation signal to the confounding effects of conserved 
regions. Considered are 1443 non-canonical chimera-supported sites selected as in (A) 
but including sites of all miRNA families. For each chimera-supported site, a z score was 
generated using the distribution of BLSs for 100 control sites chosen as in panel (A) from 
either AGO clusters or 3′ UTRs, as indicated. Each z score reflected how the conservation 
of the actual site differed from that of its controls. Compared are cumulative distributions 
of the z scores for sites of broadly conserved miRNAs and those of less conserved 
miRNAs. If the chimera-supported non-canonical sites were preferentially conserved 
because of their function in mediating repression, then sites of broadly conserved 
miRNAs would be expected to have a right-shifted distribution compared to sites of less 
conserved miRNAs, as explained in the next paragraph.  However, no significant 
difference was discerned between each pair of z-score distributions. 
 
One way to reconcile the conservation signal observed in panel (A) with our conclusion 
that a large majority if not all of these sites bind miRNA but do not mediate repression is 
to consider the potentially confounding biochemical properties of conserved regions, 
which are illustrated by the observation that artificial siRNAs preferentially target sites 
that are evolutionarily conserved over those that are not (Nielsen et al., 2007). Because 
these siRNAs are not natural (and do not share a seed with conserved miRNAs) the 
evolutionary conservation of these preferred sites could not have arisen because they 
function to mediate sRNA-guided repression. Instead, some other function of these 3′-
110
UTR regions, such as greater accessibility to RNA-binding factors, must explain their 
preferential conservation and also endow them with properties that favor sRNA binding 
(Nielsen et al., 2007). To examine whether confounding properties of conserved 3′-UTR 
regions might similarly explain the elevated conservation of chimera-supported sites, we 
compared the z scores for sites bound by broadly conserved miRNAs (miRNAs in 
families conserved beyond mammals, as listed in TargetScan7) with those bound by less 
conserved miRNAs.  MicroRNAs conserved among mammals but not more broadly were 
grouped with the less conserved miRNAs because canonical 6mer and 7mer sites to these 
miRNAs have no conservation signal above background, presumably because these 
miRNAs have not been present long enough for the number of preferentially conserved 
6mer and 7mer sites to rise above the background (Friedman et al., 2009). We reasoned 
that the same would be true of non-canonical sites, to the extent that any are 
preferentially conserved. If the conservation signal observed in panel (A) were related to 
miRNA binding, we would have expected a difference between the scores for the sites of 
broadly and less conserved miRNAs. The lack of a significant difference supports the 
idea that chimera-supported non-canonical sites tend to be conserved for the same reason 
that functional sites to artificial siRNAs tend to be conserved. 
 
Figure 2. Confirmation of experimentally identified non-canonical miRNA binding sites. 
(A) Sequence logos corresponding to motifs enriched in dCLIP clusters that either appear 
following transfection of miR-124 into HeLa cells (Chi et al., 2009) (top) or disappear 
following knockout of miR-155 in T cells (Loeb et al., 2012) (bottom). Shown to the 
right of each logo is its E-value among clusters lacking a seed-matched or offset 6mer 
111
canonical site and the fraction of these clusters that matched the logo. Shown below each 
logo are the complementary regions of the cognate miRNA family, highlighting 
nucleotides 2–8 in capital letters. 
(B) Position of the top-ranked motif corresponding to non-canonical sites enriched in 
CLASH (Helwak et al., 2013) (left) or chimera (Grosswendt et al., 2014) (right) data for 
each human miRNA family supported by at least 50 interactions without a seed-matched 
or offset 6mer canonical site. For each family the most enriched logo was aligned to the 
reverse complement of the miRNA. In cases in which a logo mapped to multiple 
positions along the miRNA, the positions with the best and second best scores are 
indicated (red and blue, respectively). 
(C) Sequence logos of motifs enriched in chimera interactions that lack canonical sites. 
As in (A), but displaying sequence logos identified in the chimera data of part (B) for a 
sample of nine human miRNAs. Logos identified for the other human miRNAs are also 
provided (Figure 2–figure supplement 1B). A nucleotide that differs between miRNA 
family members is indicated as a black “n”. 
 
Figure 2–figure supplement 1. Comparison of CLASH and chimera data and 
identification of motifs enriched in human chimera interactions that lack canonical sites. 
(A) Comparison of CLASH (left) and chimera (right) reads from human cells, showing 
the proportion possessing a canonical site (blue) and overlapping 3′ UTRs (red). In total, 
18,514 CLASH and 10,567 chimera interactions were analyzed. 
(B) Sequence logos of motifs enriched in chimera interactions that lack canonical sites. 
This panel is as in Figure 2C but displays the remaining motifs identified from the 
112
chimera data analyzed in Figure 2B. In cases of alignment ambiguity, both alignments are 
shown below the logo. For some miRNA families, multiple motifs were significantly 
enriched (E ≤ 0.001) and are shown separately. Significantly enriched motifs (or a top-
ranked motif matching the miRNA) were not found for miR-21, and miR-3168 was 
excluded from the analysis due to poor support for its authenticity as a miRNA. 
(C) Sequence logos of motifs that do not match the cognate miRNA but are nonetheless 
enriched in miR-124 dCLIP (Chi et al., 2009) and miR-522 IMPACT-seq (Tan et al., 
2014) clusters that lack canonical sites to the miRNA. The miR-124 logo was nearly 
identical to a non-specific motif previously identified as enriched in CLIP data from the 
mouse brain (Chi et al., 2012). The miR-522 logo was found instead of the previously 
reported miRNA-matching logo (Tan et al., 2014). 
 
Figure 2–figure supplement 2. Identification of motifs enriched in mouse and nematode 
chimera interactions that lack canonical sites. 
(A) Sequence logos of motifs enriched in M. musculus chimera interactions that lack 
canonical sites; otherwise as in Figure 2C. Significantly enriched motifs (or a top-ranked 
motif matching the miRNA) were not found for let-7 and miR-142-3p. 
(B) Sequence logos of motifs enriched in C. elegans chimera interactions that lack 
canonical sites; otherwise as in Figure 2C. Significantly enriched motifs (or a top-ranked 
motif matching the miRNA) were not found for miR-1. 
 
Figure 3. Pre-processing the microarray datasets to minimize nonspecific effects and 
technical biases. 
113
(A) Example of the correlated response of mRNAs after transfecting two unrelated 
sRNAs (sRNA 1 and 2, respectively). Results for mRNAs containing at least one 
canonical 7–8 nt 3′-UTR site for either sRNA 1, sRNA 2, or both sRNAs are highlighted 
in red, blue, and green, respectively. Values for mRNAs without such sites are in grey. 
All mRNAs were used to calculate the Spearman correlation (rs). 
(B) Correlated responses observed in a compendium of 74 transfection experiments from 
six studies (colored as indicted in the publications list). For each pair of experiments, the 
rs value was calculated as in panel (A), colored as indicated in the key, and used for 
hierarchical clustering. 
(C) Study-dependent relationships between the responses of mRNAs to the transfected 
sRNA and either 3′-UTR length or 3′-UTR AU content, focusing on mRNAs without a 
canonical 7–8 nt 3′-UTR site to the sRNA. Boxplots indicate the median rs (bar), 25th and 
75th percentiles (box), and the minimum of either 1.5 times the interquartile range or the 
most extreme data point (whiskers), with the width of the box proportional to the number 
of datasets used from each study. The studies are colored as in panel (B), abbreviating the 
first author and year. 
(D) Reduced correlation between the responses of mRNAs to unrelated sRNAs after 
applying the PLSR technique. This panel is as in (A) but plots the normalized mRNA 
fold changes. 
(E) Reduced correlations in results of the compendium experiments after applying the 
PLSR technique. This panel is as in (B) but plots the correlations after normalizing the 
mRNA fold changes. 
(F) Reduced study-dependent relationships between mRNA responses and either 3′-UTR 
114
length or 3′-UTR AU content. This panel is as in (C) but plots the correlations after 
normalizing the mRNA fold changes. 
(G and H) Cumulative distributions of fold changes for mRNAs containing at least one 
canonical 7–8 nt 3′-UTR site or no site either before normalization (raw) or after 
normalization (normalized). Panel (G) plots the results from experiments shown in (A) 
and (D), and (H) plots results from all 74 datasets. 
 
Figure 3–figure supplement 1. Reduced biases from derepression of endogenous 
miRNA targets. 
(A) Pie chart reflecting the relative proportions of reads for the indicated miRNA families 
observed when sequencing small RNAs from HeLa cells. Relative miRNA levels were 
quantified as described previously (Denzler et al., 2014). 
(B and C) Cumulative distributions of fold changes for mRNAs with at least one 
canonical 7–8 nt 3′-UTR site to the indicated miRNA family in the compendium of 74 
sRNA transfection datasets, either before (B) or after (C) normalization. P values were 
computed using a one-sided Wilcoxon rank-sum test, comparing each of the site-
containing distributions to the no-site distribution. This test was a more stringent 
alternative to the K–S test, which led to highly significant P values for very slight 
differences, due to the large number of mRNAs in each distribution. To account for 
multiple hypotheses, an appropriate Bonferroni-corrected significance threshold would be 
P < 0.005, which was not achieved for any comparison in panel (C). 
 
Figure 4. Developing a regression model to predict miRNA targeting efficacy. 
115
(A) Optimizing the scoring of predicted structural accessibility. Predicted RNA structural 
accessibility scores were computed for variable-length windows within the region 
centered on each canonical 7–8 nt 3′-UTR site. The heatmap displays the partial 
correlations between these values and the repression associated with the corresponding 
sites, determined while controlling for local AU content and other features of the 
context+ model (Garcia et al., 2011). 
(B) Performance of the models generated using stepwise regression compared to that of 
either the context-only or context+ models. Shown are boxplots of r2 values for each of 
the models across all 1000 sampled test sets, for mRNAs possessing a single site of the 
indicated type. For each site type, all groups significantly differ (P < 10-15, paired 
Wilcoxon sign-rank test). Boxplots are as in Figure 3C. 
(C) The contributions of site type and each of the 14 features of the context++ model. For 
each site type, the coefficients for the multiple linear regression are plotted for each 
feature. Because features are each scored on a similar scale, the relative contribution of 
each feature in discriminating between more or less effective sites is roughly proportional 
to the absolute value of its coefficient. Also plotted are the intercepts, which roughly 
indicate the discriminatory power of site type. Dashed bars indicate the 95% confidence 
intervals of each coefficient.  
116
Figure 4–Source data 1. Coefficients of the trained context++ model corresponding to 
each site type. Using these coefficients and corresponding scaling factors (Table 3), 
context++ scores can be computed essentially as illustrated in Supplementary Figure 5 of 
Garcia et al. (2011). 
 
 
Feature 8mer 7mer-m8 7mer-A1 6mer 
(Intercept) –0.589 –0.224 –0.195 –0.079 
TA_3UTR 0.222 0.139 0.117 0.058 
SPS 0.210 0.135 0.095 0.035 
sRNA1A –0.018 0.010 –0.025 –0.002 
sRNA1C –0.021 0.014 –0.021 0.004 
sRNA1G 0.060 0.062 0.030 0.018 
sRNA8A 0.022 0.004 –0.049 –0.015 
sRNA8C 0.012 –0.031 0.033 0.016 
sRNA8G 0.015 –0.008 –0.017 0.006 
Site8A N/A N/A 0.000 –0.002 
Site8C N/A N/A 0.036 0.015 
Site8G N/A N/A 0.015 0.012 
Local_AU –0.254 –0.177 –0.075 –0.040 
3P_score –0.040 –0.055 –0.060 –0.024 
SA –0.115 –0.134 –0.077 –0.028 
Min_dist 0.118 0.056 0.045 0.036 
Len_ORF 0.205 0.100 0.063 0.029 
Len_3UTR 0.310 0.154 0.129 0.045 
Off6m –0.020 –0.011 –0.020 –0.010 
ORF8m –0.118 –0.044 –0.058 –0.060 
PCT –0.103 –0.048 –0.048 0.005 
117
Figure 5. Performance of target prediction algorithms on a test set of seven experiments 
in which miRNAs were individually transfected into HCT116 cells. 
(A) Average number of targets predicted by the indicated algorithm for each of the seven 
miRNAs in the test set. The numbers of predictions with at least one canonical 7–8 nt 3′-
UTR site to the transfected miRNA (dark blue) are distinguished from the remaining 
predictions (light blue). Names of algorithms are colored according to whether they 
consider only sequence or thermodynamic features of site pairing (grey), only site 
conservation (orange), pairing and contextual features of a site (red), or pairing, 
contextual features, and site conservation (purple). The most recently updated predictions 
were downloaded, with year that those predictions were released indicated in parentheses. 
(B and C) Extent to which the predictions explain the mRNA fold changes observed in 
the test set. For predictions tallied in panel (A), the explanatory power, as evaluated by 
the r2 value for the relationship between the scores of the predictions and the observed 
mRNA fold changes in the test set, is plotted for either mRNAs with 3′ UTRs containing 
at least one canonical 7–8 nt 3′-UTR site (B) or other mRNAs (C). Algorithms designed 
to evaluate only targets with seed-matched 7–8 nt 3′-UTR sites are labeled “N/A” in (C). 
(D) Repression of the top predictions of the context++ model and of our previous two 
models, focusing on an average of 16 top predicted targets per miRNA in the test set. The 
dotted lines indicate the median fold-change value for each distribution, otherwise as in 
Figure 1A. 
(E and F) Median mRNA fold changes observed in the test set for top-ranked predicted 
targets, considering either all predictions (E) or only those with 3′ UTRs lacking at least 
one canonical 7–8 nt site (F). For each algorithm listed in panel (A), all reported 
118
predictions for the seven miRNAs were ranked according to their scores, and the 
indicated sliding threshold of top predictions was implemented. For example, at the 
threshold of 4, the 28 predictions with the top scores were identified (an average of 4 
predictions per miRNA, allowing miRNAs with more top scores to contribute more 
predictions), mRNA fold-change values from the cognate transfections were collected, 
and the median value was plotted. When the threshold exceeded the number of reported 
predictions, no value was plotted. Also plotted is the median mRNA fold change for all 
mRNAs with at least one cognate canonical 7–8 nt site in their 3′ UTR (dashed line; an 
average of 1366 mRNAs per miRNA), the median fold change for all mRNAs with at 
least one conserved cognate canonical 7–8 nt site in their 3′ UTR (dotted line; an average 
of 461 mRNAs per miRNA), and the 95% interval for the median fold change of 
randomly selected mRNAs, determined using 1000 resamplings (without replacement) at 
each cutoff (shading). Conserved sites were defined as in TargetScan6, with conservation 
cutoffs for each site type set at different branch-length scores (cutoffs of 0.8, 1.3, and 1.6 
for 8mer, 7mer-m8, and 7mer-A1 sites, respectively). 
 
Figure 5–figure supplement 1. Performance of miRNA prediction algorithms on the test 
set. 
(A) This panel is as in Figure 5D, but shows the results for all algorithms evaluated in 
Figure 5A. Algorithm names are listed in the order of the median fold change for their 
top predictions, with each name colored using the color used for its cumulative 
distribution. 
(B and C) These panels are as in Figures 5E–F, respectively, but compare mean fold 
119
changes instead of median fold changes.  
 
Figure 6. Response of predictions and mRNAs with experimentally supported canonical 
binding sites. 
(A–E) Comparison of the top TargetScan7 predicted targets to mRNAs with canonical 
sites identified from dCLIP in either HeLa cells with and without transfected miR-124 
(Chi et al., 2009) or T cells with and without miR-155 (Loeb et al., 2012). Plotted are 
cumulative distributions of mRNA fold changes after transfection of miR-124 in HeLa 
cells (A), or after genetic ablation of miR-155 in either T cells (B), Th1 cells (C), Th2 cells 
(D), and B cells (E) (one-sided K–S test, P values). For genes with alternative last exons, 
the analysis considered the score of the most abundant alternative last exon, as assessed 
by 3P-seq tags (as is the default for TargetScan7 when ranking predictions). Each dCLIP-
identified mRNA was required to have a 3′-UTR CLIP cluster with at least one canonical 
site to the cognate miRNA (including 6mers but not offset 6mers). Each intersection 
mRNA (red) was found in both the dCLIP set and top TargetScan7 set. Similarity 
between performance of the TargetScan7 and dCLIP sets (purple and green, respectively) 
and TargetScan7 and intersection sets (blue and red, respectively) was tested (two-sided 
K–S test, P values); the number of mRNAs analyzed in each category is in parentheses. 
TargetScan7 scores for mouse mRNAs were generated using human parameters for all 
features. 
(F–H) Comparison of top TargetScan7 predicted targets to mRNAs with canonical 
binding sites identified using photoactivatable-ribonucleoside-enhanced CLIP (PAR-
CLIP) (Hafner et al., 2010; Lipchina et al., 2011). Plotted are cumulative distributions of 
120
mRNA fold changes after either transfecting miR-7 (F) or miR-124 (G) into HEK293 
cells, or knocking down miR-302/367 in hESCs (H). Otherwise these panels are as in (A–
E). 
(I) Comparison of top TargetScan7 predicted targets to mRNAs with canonical sites 
identified using CLASH (Helwak et al., 2013). Plotted are cumulative distributions of 
mRNA fold changes after knockdown of 25 miRNAs from 14 miRNA families in 
HEK293 cells. For each of these miRNA families, a cohort of top TargetScan7 
predictions was chosen to match the number of mRNAs with CLASH-identified 
canonical sites, and the union of these TargetScan7 cohorts was analyzed. The total 
number of TargetScan7 predictions did not match the number of CLASH-identified 
targets due to slightly different overlap between mRNAs targeted by different miRNAs. 
Otherwise these panels are as in (A–E). 
(J) Comparison of top TargetScan7 predicted targets to mRNAs with chimera-identified 
canonical sites (Grosswendt et al., 2014). Otherwise this panel is as in (I). 
(K) Comparison of top TargetScan7 predicted targets to mRNAs with canonical binding 
sites identified using pulldown-seq (Tan et al., 2014). Plotted are cumulative distributions 
of mRNA fold changes after transfecting miR-522 into triple-negative breast cancer 
(TNBC) cells. Otherwise this panel is as in (A–E). 
(L) Comparison of top TargetScan7 predicted targets to mRNAs with canonical sites 
identified using IMPACT-seq (Tan et al., 2014). Otherwise this panel is as in (K). 
 
Figure 7. Example display of TargetScan7 predictions. 
The example shows a TargetScanHuman page for the 3′ UTR of the LRRC1 gene. At the 
121
top is the 3′-UTR profile, showing the relative expression of tandem 3′-UTR isoforms, as 
measured using 3P-seq (Nam et al., 2014). Shown on this profile is the end of the longest 
Gencode annotation (blue vertical line) and the total number of 3P-seq reads (339) used 
to generate the profile (labeled on the y-axis). Below the profile are predicted conserved 
sites for miRNAs broadly conserved among vertebrates (colored according to the key), 
with options to display conserved sites for mammalian conserved miRNAs, or poorly 
conserved sites for any set of miRNAs. Boxed are the predicted miR-124 sites, with the 
site selected by the user indicated with a darker box. The multiple sequence alignment 
shows the species in which an orthologous site can be detected (white highlighting) 
among representative vertebrate species, with options to display site conservation among 
all 84 vertebrate species. Below the alignment is the predicted consequential pairing 
between the selected miRNA and its sites, showing also for each site its position, site 
type, context++ score, context++ score percentile, weighted context++ score, branch-
length score, and PCT score. 
 
Figure 7–figure supplement 1. Flowchart of the computational pipeline used to build 
the TargetScan7 database. 
  
122
mRNA fold change (log2)
−0.25 0 0.25
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
0.5−0.5
MIRZA, top 100
P < 10−6  (1600)
MIRZA, no 6mers
P=0.06 (1600)
8mer     P < 10−122   (837)
7mer-m8  P < 10−103  (2253)
7mer-A1  P < 10−51   (1735)
6mer    P < 10−19   (5061)
No site          (48945)
Agarwal et al.
Fig 1
A
D
B
E
C
F
mRNA fold change (log2)
−1 0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
1.5−1.5
Offset 6mer  P < 10−4   (539)
Nucleation bulge
P = 0.43 (134)
8mer    P < 10−3    (35)
7mer-m8  P < 10−8   (234)
7mer-A1       P < 10−8    (96)
6mer   P < 10−11 (421)
No site                      (3001)
6 hr zebrafish embryo, MZDicer vs WT,
miR-430 targets
1
HeLa cells, 16 miRNA transfections
Canonical,
CLASH-supported   
P < 10−4      (32)
Non-canonical  
P = 0.03  (397)
8mer     P < 10−27 (133)
7mer-m8  P < 10−13 (285)
7mer-A1  P = 0.01  (325)
6mer     P < 0.01  (781)
No site                  (5648)
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
HEK293 cells, miR-92a
knockdown
HEK293 cells, knockdown of 25 
miRNAs, miR-92a targets
Canonical,
CLASH-supported   
P < 10−3      (32)
Non-canonical  
P = 0.13  (403)
8mer     P < 10−21 (133)
7mer-m8  P < 10−4  (282)
7mer-A1  P < 0.01  (368)
6mer     P = 0.02  (789)
No site                  (5842)
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
−0.5 0.5
Th1 cells, miR-155 knockout
mRNA fold change (log2)
−1 −0.5 0 10.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
Canonical, 
dCLIP-supported
P < 10−15 (54)
Non-canonical 
P = 0.77 (28)
8mer     P < 10−12 (108)
7mer-m8  P < 10−3  (186)
7mer-A1  P < 10−9  (161)
6mer    P = 0.07 (331)
No site                  (4984)
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−1 −0.5 0 10.5
Canonical, 
dCLIP-supported
P < 10−8  (63)
Non-canonical
P < 10−3  (32)
8mer     P < 10−6   (134)
7mer-m8  P < 10−3  (232)
7mer-A1  P = 0.09 (203)
6mer    P < 0.01  (400)
No site                  (6287)
T cells, miR-155 knockout
G
Canonical,
chimera-supported   
P < 10−7    (76)
Non-canonical  
P = 0.29  (97)
8mer     P < 10−27 (133)
7mer-m8  P < 10−13 (285)
7mer-A1  P = 0.02  (325)
6mer     P = 0.01  (781)
No site                  (5957)
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
HEK293 cells, miR-92a
knockdown
H
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−1 −0.5 0 10.5
Canonical, IMPACT-
seq-supported
P < 10−4   (46)
Non-canonical
P = 0.16  (885)
8mer     P < 10−15  (77)
7mer-m8  P < 10−18 (311)
7mer-A1  P < 10−7  (238)
6mer    P < 10−9  (1111)
No site                   (4137)
TNBC cells, miR-522 transfection
123
mRNA fold change (log2)
−1.5 −0.5 0 1.50.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
1−1
mRNA fold change (log2)
−1.5 −0.5 0 1.50.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
1−1
mRNA fold change (log2)
−1.5 −0.5 0 1.50.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
1−1
mRNA fold change (log2)
−1.5 −0.5 0 1.50.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
1−1
Offset 6mer       P = 0.13  (421)
Nucleation bulge
P = 0.9 (212)
8mer         P < 10−6   (45)
7mer-m8       P < 10−6   (173)
7mer-A1       P < 0.01  (141)
6mer        P < 0.01  (640)
No site                                  (3161)
Offset 6mer      P = 0.02  (517)
Nucleation bulge
P = 0.78 (264)
8mer        P < 10−13  (50)
7mer-m8      P < 10−26  (218)
7mer-A1      P = 0.02  (175)
6mer       P = 0.43  (775)
No site                                  (3968)
Offset 6mer      P < 0.01    (503)
Nucleation bulge
P = 0.44 (137)
8mer        P < 10−11   (49)
7mer-m8      P < 10−21    (229)
7mer-A1      P < 10−15    (110)
6mer       P < 10−12    (415)
No site                                   (2942)
Offset 6mer      P < 10−6    (513)
Nucleation bulge
P = 0.15 (258)
8mer        P < 10−15  (53)
7mer-m8      P < 10−26  (219)
7mer-A1      P < 10−3    (173)
6mer       P = 0.11  (772)
No site                                  (3876)
mRNA fold change (log2)
−0.5 0 0.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
1−1
Offset 6mer      P < 10−24    (17826)
Nucleation bulge
P = 0.06 (4667)
8mer        P < 10−252  (1503)
7mer-m8      P < 10−296  (4534)
7mer-A1      P < 10−158  (3886)
6mer       P < 10−65    (12086)
No site                                   (171917)
F
C
A
D
E
HEK293 cells, miR-124 transfection
HeLa cells, miR-124 transfection Huh7 cells, miR-124 transfection
HeLa cells, 74 sRNA transfections
9 hr zebrafish embryo, MZDicer+miR430 vs
MZDicer, miR-430 targets
mRNA fold change (log2)
−0.5 0 0.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
1−1
Offset 6mer     P < 10−12  (2928)
Nucleation bulge
P = 0.96 (1042)
8mer         P < 10−75  (333)
7mer-m8     P < 10−55  (871)
7mer-A1     P < 10−34  (914)
6mer      P < 10−17   (2597)
No site                                 (25234)
HeLa cells, 11 miRNA transfections
B
Agarwal et al.
Fig 1-figure supplement 1
124
BA
mRNA fold change (log2)
−1 −0.5 0 10.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
Th2 cells, miR-155 knockout
Canonical,
dCLIP-supported
P < 10−3    (56)
Non-canonical
P = 0.91  (28)
8mer    P < 10−5   (109)
7mer-m8  P = 0.58  (194)
7mer-A1  P = 0.33  (163)
6mer   P = 0.98  (337)
No site                            (4956)
Canonical,
dCLIP-supported
P < 10−19  (67)
Non-canonical
P = 0.21  (32)
8mer    P < 10−13  (104)
7mer-m8  P < 10−10  (178)
7mer-A1  P < 10−13  (160)
6mer   P < 10−5   (304)
No site                              (4178)
mRNA fold change (log2)
−1 −0.5 0 10.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
B cells, miR-155 knockout
Agarwal et al.
Fig 1-figure supplement 2
125
A
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.4 0 0.40.2−0.2
B
C
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
Canonical, 
chimera-supported
P < 10−20  (654)
Non-canonical
P = 0.11  (122)
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
HEK293 cells, knockdown of 25
miRNAs, targets of 4 families,
all CLASH-supported sites
HEK293 cells, knockdown of 25
miRNAs, targets of 4 families,
all chimera-supported sites
Canonical, 
chimera-supported
P < 10−3     (17)
Non-canonical
P = 0.83  (11)
8mer   P < 10−5   (88)
7mer-m8 P < 0.01  (639)
7mer-A1 P < 10−6   (150)
6mer  P = 0.04  (636)
No site                     (5664)
Canonical, 
CLASH-supported
P < 10−7  (439)
Non-canonical 
P = 0.16 (267)
8mer   P < 10−17  (254)
7mer-m8 P < 10−5    (877)
7mer-A1 P < 10−5    (599)
6mer  P < 10−3   (2183)
No site                        (500)
D
E
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
HEK293 cells, knockdown of 25
miRNAs, targets of 4 families,
3′ UTR CLASH-supported sites
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
HEK293 cells, knockdown of 25
miRNAs, targets of 4 families,
3′ UTR chimera-supported sites
Canonical, 
CLASH-supported
P < 10−8  (178)
Non-canonical 
P = 0.11 (78)
8mer   P < 10−17  (254)
7mer-m8 P < 10−5    (877)
7mer-A1 P < 10−5    (599)
6mer  P < 10−3   (2183)
No site                        (500)
8mer   P < 10−18  (254)
7mer-m8 P < 10−5    (877)
7mer-A1 P < 10−4    (599)
6mer  P < 10−3   (2183)
No site                        (500)
Canonical, 
chimera-supported
P < 10−20  (486)
Non-canonical
P = 0.92  (57)
8mer   P < 10−17  (254)
7mer-m8 P < 10−5    (877)
7mer-A1 P < 10−4    (599)
6mer  P < 10−3   (2183)
No site                        (500)
hESC cells, miR-302/367 knockdown,
miR-302 targets 
Agarwal et al.
Fig 1-figure supplement 3
126
6 hr zebrafish embryo, MZDicer vs WT,
miR-430 targets
Offset 6mer     P < 10−6   (384)
Nucleation bulge
P = 0.26  (104)
8mer       P < 10−3     (27)
7mer-m8     P < 10−21  (161)
7mer-A1     P < 10−10   (79)
6mer      P < 10−14  (303)
No site                           (2088)
RPF fold change (log2)
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
−1 0 2−2 1
A B
C
RPF fold change (log2)
−1 −0.5 0 10.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
Canonical,
dCLIP-supported
P < 10−20   (67)
Non-canonical
P = 0.09   (32)
8mer   P < 10−10  (104)
7mer-m8 P < 10−10  (178)
7mer-A1 P < 10−6   (160)
6mer  P < 10−3    (304)
No site                       (4178)
B cells, knockout of miR-155
Protein fold change (log2)
−1 −0.5 0 10.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0 Canonical, chimera-
supported
P < 10−10 (45)
Non-canonical
P = 0.71 (96)
8mer   P < 10−7     (42)
7mer-m8 P < 10−11   (90)
7mer-A1 P < 10−13   (95)
6mer  P < 10−5  (195)
No site                      (2192)
HeLa, miR-155 transfection
Agarwal et al.
Fig 1-figure supplement 4
127
AAgarwal et al.
Fig 1-figure supplement 5
−4 −2 0 2 4
Conservation z scores
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
Sites of broadly conserved miRNAs,
3' UTR controls (1215)
Sites of less conserved miRNAs,
3' UTR controls (228)
Sites of broadly vs less
conserved miRNAs,
3' UTR controls
P = 0.86
B
0.0
0.5
1.0
1.5
2.0
2.5
Human miRNA families
M
ea
n 
br
an
ch
-le
ng
th
 s
co
re
mi
R-
25
mi
R-
19 let
-7
mi
R-
15
mi
R-
37
4
mi
R-
10
mi
R-
18
1
mi
R-
19
6
mi
R-
13
0
mi
R-
17
Conservation of 2–7 match with one mismatch Conservation of non-canonical binding sites of broadly
conserved and less conserved miRNAs
**
* **
*
** *
* **
Chimera-supported sites
Control sites in AGO clusters
Control sites in 3' UTRs
Sites of broadly conserved miRNAs,
AGO cluster controls (1215)
Sites of less conserved miRNAs,
AGO cluster controls (228)
Sites of broadly vs less conserved
miRNAs, AGO cluster controls
P = 0.88
128
Agarwal et al.
Fig 2
A
UUAGCUAGUCCUACGAUCUUCA E = 10−112247/545 
miR-155
B
its
0
1
2
a UCGUAAUu-5’
G
C
A
U
U
G
AC
G
U
G
U
A
C
U
CAU E = 10−100202/290 
miR-196
B
its
0
1
2
uUGAUGGAu-5’
G
C
A
U
CUUCUUCCAUGAUC E = 0.00629/176 
miR-320
B
its
0
1
2
g g a g a g u nggGUCGAAAa-5’
A
UG
C
A
UGCA
GCCUCUCAUC E = 10−3251/122 
miR-7uCAGAAGGu-5’
B
its
0
1
2
C
A
G
U
A
C
U
G
A
GU
U
C
A
G
U
G
C
C
UA
C
UA
C
AU
E = 10−78
310/310 
miR-25/32/92n CACGUUAn-5’
B
its
0
1
2
C
B
1 8 20
Position from miRNA 5’ end
1 8 20
Position from miRNA 5’ end
Non-canonical CLASH motifs Non-canonical chimera motifs
C
U
A
G
UA
C
G
U
C
G
U
A
G
CUAUAGAG E = 10−1745/122 
miR-10a g a UGUCCCAu-5’
B
its
0
1
2
G
C
A
U
C
G
A
U
C
A
G
U
A
G
U
A
C
G
G
UC
G
U
AC E = 10−107220/321 
miR-19u AAACGUGu-5’
B
its
0
1
2
GAGGCACAU E = 10−318/79 
miR-130/301
B
its
0
1
2
AACGUGAc-5’
A
C
U
G
A
C
U
A
UGCUUACCA E = 10−521/64 
miR-30ACAAAUGu-5’
B
its
0
1
2
G
A
UGCCUUUAUA E = 190013/83
miR-155uCGUAAUu-5’
B
its
0
1
2
U
A
C
G
A
UGCGCCCUGCU E = 10−45158/3525
miR-124
B
its
0
1
2
CA _CGGAAu-5’
129
AAgarwal et al.
Fig 2-figure supplement 1
A
G
C
U
C
GA
C
G
U
ACCUCCUABits E = 10−80168/605 
miR-98/let-7
0
1
2
GAUGGAGn-5’
GAUCCAGUUCUUG
let-7
E = 10−11
35/605 
u n ga uGAUGGAGn-5’
B
its
0
1
2
miR-98u g aa uGAUGGAGn-5’
C
A
UGCACAUCAGU E = 10−2334/86 
miR-18a CGUGGAAu-5’
B
its
0
1
2
miR-18a c g u g a ucuaCGUGGAAu-5’
GAUGCGGUCGUAAUCACU E = 0.00118/68 
miR-29uACCACGAu-5’
B
its
0
1
2 UGACAAUAGCA E = 10−614/104 
miR-33GUUA _CGUg-5’
B
its
0
1
2
UAUUACCUU
miR-98/let-7
E = 10−23
45/605 B
its
0
1
2
nGAUGGAGn-5’
C
G
A
U
C
G
U
A
G
U
C
U
GUUGCUCCUA E = 10−3283/150 
miR-378/422
B
its
0
1
2
n nUCAGGUCa-5’
A
G
C
U
U
G
A
A
U
C
A
U
C
U
C
G
AUAAUGC E = 10−172143/174 
miR-423-3p
B
its
0
1
2
cUGGCUCGa-5’
CCUCUA E = 10−533/57 
miR-423-5p
B
its
0
1
2
CGGGGAGu-5’
GUGCAUACAUAGA E = 0.0229/58 
miR-148/152
B
its
0
1
2
ACGUGACu-5’
G
C
A
U
C
G
A
UGGACAUUGU E = 10−187149/231 
miR-181
B
its
0
1
2
uACUUACAa-5’
UUAGAUAUUCACAUA E = 10−327/75 
miR-374
B
its
0
1
2
AUAA _UAUu-5’
CLASH, without
canonical site
CLASH, with
canonical site CLASH, 
overlapping
3’ UTR
CLASH,
other region
Chimera,
without
canonical site
Chimera, with 
canonical site Chimera,
overlapping
3’ UTR
Chimera,
other region
B
C
G
A
U
A
C
U
GCAUUGGCUCAGUA E = 10−4367/104 
miR-33GUUACGUg-5’
B
its
0
1
2
miR-33g u u a c g uugnnGUUACGUg-5’
C
A
GUCUAGCUAUGAC E = 10−337/87 
miR-221/222
B
its
0
1
2
UACAUCGa-5’
miR-221g g u c g ucuguUACAUCGa-5’
E = 10−42
114/441 
miR-17/20/93/106
UGAUCUGAGCAUGCAUACU
uCGUGAAAn-5’
B
its
0
1
2
miR-17/20/93/106a c g u gnnnnuCGUGAAAn-5’
U
A
C
G
C
A
U
C
G
A
U
A
U
CUUGGCACU E = 10−32139/165 
miR-15/16cACGACGAu-5’
B
its
0
1
2
miR-15/16nu n cACGACGAu-5’
GACUGUA E = 0.03615/91 
miR-101CAUGACAu-5’
B
its
0
1
2
miR-101u g uCAUGACAu-5’
C
U
A
G
A
GCAGAGCGAUGUGCUAG E = 10−49265/4759 
miR-522 transfection in
TNBC cells, IMPACT-seq
B
its
0
1
2
G
CCCAUUCGCCUC
miR-124 transfection in
HeLa cells, dCLIP
E = 10−23
143/3525B
its
0
1
2
130
BA
Agarwal et al.
Fig 2-figure supplement 2
G
A
U
G
A
U
C
G
U
C
A
U
A
GUCGCUA E = 10−4273/80 
miR-17/20/93/106u g g a c g u g n n n n u CGUGAAAn-5’
B
its
0
1
2
CUGAGCUA E = 0.001312/50 
miR-23g a c c gUUACACUa-5’
B
its
0
1
2
G
U
A
G
UAGCUCAACAU E = 10−420/79 
miR-27a u c g gUGACACUu-5’
B
its
0
1
2
C
U
A
G
G
U
A
C
GUCCU E = 10−1955/59 
miR-142-5pc a c g aaagaUGAAAUAc-5’
B
its
0
1
2
G
A
CACCUC E = 0.01843/139 
miR-48/84/let-7GAUGGAGu-5’
B
its
0
1
2
AUCUAAUAUCGAAUGUAC E = 0.1628/64 
miR-50/62/90UGUA _UAGu-5’
B
its
0
1
2
G
C
A
U
C
U
G
A
U
C
A
C
U
G
A
U
G
A
U
G
C
A
U
C
AU
E = 10−294
503/600 
miR-51/52/53/54/55/56AUGCCCAn-5’
B
its
0
1
2
GAGAUCUAACUCCUA E = 10−3189/465 
miR-58GCUA _GAGu-5’
B
its
0
1
2
A
CUGUGUCUCCAU E = 10−744/465 
miR-58GCUAGAGu-5’
B
its
0
1
2
G
C
U
AUGAUCAUAU E = 10−947/465 
miR-58u GCUAGAGu-5’
B
its
0
1
2
A
UUGGUAGACUCCUAACU E = 10−2038/108 
miR-63/64/65/229u CACAGUAn-5’
B
its
0
1
2
C
G
A
U
A
U
G
C
G
U
C
G
A
U
U
C
C
U
A
C
U
A
C
UA
E = 10−26
79/96 
miR-238/239UCAUGUUu-5’
B
its
0
1
2
C
A
G
U
G
C
G
U
A
C
G
A
U
A
U
C
C
A
U
A
U
C
G
C
U
A
E = 10−29
286/286 
miR-80/81/82ACUAGAGu-5’
B
its
0
1
2
CUAGUCGCUCAUAUCGAGUCACCUA E = 10−1071/71 
miR-72/73/74AGAACGGn-5’
B
its
0
1
2
A
C
G
U
G
A
C
UG
CA
U
CUUUC E = 0.01958/142 
miR-71aCAGAAAGu-5’
B
its
0
1
2
ACUACCUUAUGCG E = 0.02914/80 Bits 012 miR-17/20/93/106
U
A
G
A
U
A
GGCACUCGAU E = 10−2727/111 
miR-124CA _CGGAAu-5’
B
its
0
1
2
131
Publication
Birmingham 2006
Anderson 2008
Grimson 2007
Lim 2005
Jackson 2006a & b
Schwartz 2006
A
B
C
D
E
F
G
H
−0.2 0.2 0.6
 rs
C
or
re
la
tio
n 
(r
s) 
to
 m
R
N
A
 fo
ld
 c
ha
ng
e
−0.4
−0.2
0.0
0.2
0.4
−0.4
−0.2
0.0
0.2
0.4
B’06 A’08 G’07 L’05 J’06 S’06
Feature
3′ UTR length
3′ UTR AU content
C
or
re
la
tio
n 
(r
s) 
to
 m
R
N
A
 fo
ld
 c
ha
ng
e
1
sRNA 1,
≥1 site
sRNA 2,
≥1 site
No site
B’06 A’08 G’07 L’05 J’06 S’06
rs = -0.10
−1.5 −0.5 0.0 0.5 1.0 1.5
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
mRNA fold change (log2), sRNA 1
m
R
N
A
 fo
ld
 c
ha
ng
e 
(lo
g 2
), 
sR
N
A
 2
−1.0
sRNA 1 & 2,
≥1 site
rs = 0.55
−1.5 −0.5 0.0 0.5 1.0 1.5
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
mRNA fold change (log2), sRNA 1
m
R
N
A
 fo
ld
 c
ha
ng
e 
(lo
g 2
), 
sR
N
A
 2
−1.0
Raw, ≥1 site
Normalized, ≥1 site
Raw, no site
Normalized, no site
−0.5 −0.25 0.0 0.25 0.5
mRNA fold change (log2)
0.0
0.2
0.4
0.6
0.8
1.0
C
um
ul
at
iv
e 
fra
ct
io
n
Raw, ≥1 site
Normalized, ≥1 site
Raw, no site
Normalized, no site
−0.5 −0.25 0.0 0.25 0.5
mRNA fold change (log2)
0.0
0.2
0.4
0.6
0.8
1.0
C
um
ul
at
iv
e 
fra
ct
io
n
Agarwal et al.
Fig 3
132
A B
C
let−7
miR−21
miR−17
miR−24
miR−27
miR−30
miR−15
miR−26
miR−25
miR−29
Other
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
C
um
ul
at
iv
e 
fra
ct
io
n
−0.5 0 0.5
let−7     P < 10−101
miR−21 P < 10−14
miR−17 P < 10−157
miR−24 P = 0.02
miR−27 P < 10−54
miR−30 P < 10−107
miR−15 P < 10−34
miR−26 P < 10−120
miR−25 P = 0.26
miR−29 P < 10−17
No site
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
C
um
ul
at
iv
e 
fra
ct
io
n
−0.5 0 0.5
let−7     P = 0.007
miR−21 P = 0.06
miR−17 P = 0.002
miR−24 P = 0.98
miR−27 P = 0.33
miR−30 P = 0.009
miR−15 P = 0.41
miR−26 P = 0.002
miR−25 P = 0.71
miR−29 P = 0.32
No site
Agarwal et al.
Fig 3-figure supplement 1
133
A B
10
W
in
do
w
 s
iz
e 15
5
20
1
>0.00
-0.02
-0.04
-0.06
-0.08
-0.10
Partial
correlation
−10
Position relative to seed match
+10 +15−15 +5−5
NNNNNNNNNNNNNNNNNNNNN-5′ miRNA
Context only
Context+
Stepwise
6mer 7mer-A1 7mer-m8 8mer
0.00
0.05
0.10
0.15
0.20
0.25
r2  
 to
 h
el
d−
ou
t d
at
a
C
Agarwal et al.
Fig 4
(In
te
rc
ep
t)
TA
_3
U
TR
S
P
S A
sR
N
A
1 
C G
Lo
ca
l_
A
U
3P
_s
co
re S
A
Le
n_
O
R
F
Le
n_
3U
TR
M
in
_d
is
t
O
ff6
m
O
R
F8
m
8mer
7mer-m8
7mer-A1
6mer
C
oe
ffi
ci
en
t
P
C
T
−0.8
−0.6
−0.4
−0.2
0.0
0.2
0.4
A
sR
N
A
8 
C G A
S
ite
8 
C G
134
Agarwal et al.
Fig 5
Average top predictions considered per miRNA
4 8 16 32 64 128 256 512 1024 2048 4096
M
ed
ia
n 
m
R
N
A
 fo
ld
 c
ha
ng
e 
( lo
g 2
)
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0.0
−0.7
miRmap (2013)
mRNA fold change (log2)
−0.5 0 0.5
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
1.0−1.0
HCT116 cells, 7 miRNA transfections
context++ (2015)
TargetScan6.All (2012)
TargetScan5.All (2008)
TargetScan6.Cons (2012)
TargetScan5.Cons (2008)
TargetScan.PCT (2012)
TargetSpy (2010)
RNA22 (2011)
PITA.Top (2008)
PITA.All (2008)
PicTarF (2012)
PicTarC (2012)
PicTarM (2012)
MirTarget2 (2012)
miRSVR (2010)
miRanda-MicroCosm (2008)
ElMMO2 (2011)
DIANA.microT.CDS (2012)
AnTarTsfxn (2011)
AnTarCLIP (2011)
PACCMIT-CDS.Cons (2013)
PACCMIT-CDS.All (2013)
TargetRank (2007)
Predicted miRNA−target
interactions (average per miRNA)
0
10
00
20
00
30
00 40
00 0 0.1 0.150.05
r2 to test set
0 0.10.05
r2 to test set
B CA
seed-MIRZA-G-C (2015)
MIRZA-G-C (2015)
mRNAs with 7–8 nt site
mRNAs with conserved 7–8 nt site
All mRNAs
TargetScan5.All (112)
TargetScan6.All (112)
context++ (112)
DMBSTAR (2015)
seed-MIRZA-G (2015)
MIRZA-G (2015)
SVMicrO (2011)
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
FE
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
AnTarCLIP
AnTarTsfxn
DIANA.microT.CDS
ElMMO2
MBSTAR
miRanda-MicroCosm
miRmap
miRSVR
MirTarget2
MIRZA-G
MIRZA-G-C
seed-MIRZA-G
seed-MIRZA-G-C
PACCMIT-CDS.All
PACCMIT-CDS.Cons
PicTarM
PicTarC
PicTarF
PITA.All
PITA.Top
RNA22
SVMicrO
TargetRank
TargetSpy
TargetScan.PCT
TargetScan5.All
TargetScan5.Cons
TargetScan6.All
TargetScan6.Cons
context++
●
AnTarCLIP
AnTarTsfxn
DIANA.microT.CDS
MBSTAR
miRanda-MicroCosm
miRmap
miRSVR
MIRZA-G
MIRZA-G-C
PACCMIT-CDS
PACCMIT-CDS.Cons
PITA.All
RNA22
SVMicrO
TargetSpy
context++
Average top predictions considered per miRNA
4 8 16 32 64 128 256 512 1024 2048 4096
M
ed
ia
n 
m
R
N
A
 fo
ld
 c
ha
ng
e 
( lo
g 2
)
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0.0
−0.7
135
Agarwal et al.
Fig 5-figure supplement 1
B
A
C
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Average top predictions considered per miRNA
4 8 16 32 64 128 256 512 1024 2048 4096
M
ea
n 
m
R
N
A
 fo
ld
 c
ha
ng
e (
lo
g 2
)
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0.0
−0.7
Average top predictions considered per miRNA
4 8 16 32 64 128 256 512 1024 2048 4096
M
ea
n 
m
R
N
A
 fo
ld
 c
ha
ng
e (
lo
g 2
)
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1
0.0
−0.7
mRNAs with 7–8 nt site
● ●
AnTarCLIP
AnTarTsfxn
DIANA.microT.CDS
ElMMO2
MBSTAR
miRanda-MicroCosm
miRmap
miRSVR
MirTarget2
MIRZA-G
MIRZA-G-C
seed-MIRZA-G
seed-MIRZA-G-C
PACCMIT-CDS.All
PACCMIT-CDS.Cons
PicTarM
PicTarC
PicTarF
PITA.All
PITA.Top
RNA22
SVMicrO
TargetRank
TargetSpy
TargetScan.PCT
TargetScan5.All
TargetScan5.Cons
TargetScan6.All
TargetScan6.Cons
context++
AnTarCLIP
AnTarTsfxn
DIANA.microT.CDS
MBSTAR
miRanda-MicroCosm
miRmap
miRSVR
MIRZA-G
MIRZA-G-C
PACCMIT-CDS
PACCMIT-CDS.Cons
PITA.All
RNA22
SVMicrO
TargetSpy
context++
mRNAs with conserved 7–8 nt site
−1.0 −0.5 0.0 0.5 1.0
0.0
0.2
0.4
0.6
0.8
1.0
C
um
ul
at
iv
e 
fra
ct
io
n
mRNA fold change (log2)
HCT116 cells, 7 miRNA transfections
All mRNAs
AnTarCLIP
AnTarTsfxn
DIANA.microT.CDS
ElMMO2
MBSTAR
miRanda-MicroCosm
miRmap
miRSVR
MirTarget2
MIRZA-G
MIRZA-G-C
seed-MIRZA-G
seed-MIRZA-G-C
PACCMIT-CDS.All
PACCMIT-CDS.Cons
PicTarM
PicTarC
PicTarF
PITA.All
PITA.Top
RNA22
SVMicrO
TargetRank
TargetSpy
TargetScan.PCT
TargetScan5.All
TargetScan5.Cons
TargetScan6.All
TargetScan6.Cons
context++
136
A B C
D
Agarwal et al.
Fig 6
E F
G H I
J
TNBC cells, miR-522 transfection
Canonical, IMPACT-seq
supported             P < 10−4   (46)
Top TargetScan7  P < 10−26  (46)
No site                            (5024)
−1.0 −0.5 0 0.5 1.0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
TargetScan7 vs IMPACT-seq
P < 10−6
Intersection               P < 0.01   (3)
Top TargetScan7  P < 0.01    (3)
TNBC cells, miR-522 transfection
−1.0 −0.5 0 0.5 1.0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
Canonical, pulldown-seq
supported            P < 10−81 (259)
Top TargetScan7 P < 10−68 (259)
No site                             (5024)
Intersection          P < 10−38  (78)
Top TargetScan7 P < 10−28  (78)
TargetScan7 vs pulldown-seq
P = 0.78
TargetScan7 vs intersection
P = 0.54
K L
−0.4 −0.2 0 0.2 0.4
mRNA fold change (log2)
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
Canonical, PAR-CLIP-
supported            P < 10−23 (429)
Top TargetScan7 P < 10−33  (429)
No site                              (5675)
hESC cells, miR-302/367 knockdown,
miR-302 targets 
TargetScan7 vs PAR-CLIP
P = 0.18
TargetScan7 vs
intersection
P = 0.91
Intersection          P < 10−21 (128)
Top TargetScan7 P < 10−19 (128)
HeLa cells, miR-124 transfection 
−1.0 −0.5 0 0.5 1.0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
Canonical, dCLIP-
supported             P < 10−26  (346)
Top TargetScan7  P < 10−70  (346)
No site                               (7022)
TargetScan7 vs dCLIP
P < 10−7
Intersection           P < 10−23   (62)
Top TargetScan7  P < 10−20   (62)
TargetScan7 vs
intersection
P = 0.82
HEK293 cells, miR-124 transfection 
Canonical, PAR-CLIP-
supported             P < 10−25  (345)
Top TargetScan7  P < 10−65  (345)
No site                              (4992)
−1.0 −0.5 0 0.5 1.0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
TargetScan7 vs PAR-CLIP
P < 10−9
TargetScan7 vs
intersection
P = 0.24
Intersection           P < 10−14   (70)
Top TargetScan7   P < 10−22   (70)
−1.0 −0.5 0 0.5 1.0
Canonical, PAR-CLIP-
supported             P < 10−8   (49)
Top TargetScan7  P < 10−19  (49)
No site                              (5309)
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
HEK293 cells, miR-7 transfection
TargetScan7 vs PAR-CLIP
P < 0.01
Intersection           P < 0.05    (4)
Top TargetScan7  P < 10−3     (4)
TargetScan7 vs intersection
P = 1
TargetScan7 vs intersection
P = 0.6
HEK293 cells, knockdown of 25 
miRNAs, targets for all 25
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
Canonical, chimera-
supported            P < 10−43   (709)
Top TargetScan7 P < 10−68   (724)
No site                               (1421)
TargetScan7 vs chimera
P < 0.01
Intersection          P < 10−47  (184)
Top TargetScan7 P < 10−36  (225)
TargetScan7 vs
intersection
P = 0.24
HEK293 cells, knockdown of 25 
miRNAs, targets for all 25
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
−0.3 −0.1 0 0.30.1 0.2−0.2
Canonical, CLASH-
supported            P < 10−14 (337)
Top TargetScan7 P < 10−33 (336)
No site                              (1217)
TargetScan7 vs CLASH
P < 10−4
Intersection          P < 10−7    (30)
Top TargetScan7 P < 10−5     (37)
TargetScan7 vs
intersection
P = 0.79
Th1 cells, miR-155 knockout
−1.0 −0.5 0 0.5 1.0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
Canonical, dCLIP-
supported            P < 10−18 (63)
Top TargetScan7 P < 10−18  (63)
No site                            (5012)
TargetScan7 vs dCLIP
P = 0.94
Intersection          P < 10−8    (10)
Top TargetScan7 P < 10−4  (10)
TargetScan7 vs
intersection
P = 0.4
Th2 cells, miR-155 knockout
−1.0 −0.5 0 0.5 1.0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
       Canonical, dCLIP-
       supported     P < 10−5 (65)
Top TargetScan7 P < 10−5  (65)
No site                           (4984)
TargetScan7 vs dCLIP
P = 0.94
Intersection          P < 10−4   (10)
Top TargetScan7 P = 0.09 (10)
TargetScan7 vs
intersection
P = 0.4
B cells, miR-155 knockout
−1.0 −0.5 0 0.5 1.0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
Canonical, dCLIP-
supported            P < 10−23 (79)
Top TargetScan7 P < 10−26 (79)
No site                            (4210)
TargetScan7 vs dCLIP
P = 0.98
Intersection          P < 10−11 (18)
Top TargetScan7 P < 10−6  (18)
TargetScan7 vs
intersection
P = 0.49
T cells, miR-155 knockout
−1.0 −0.5 0 0.5 1.0
C
um
ul
at
iv
e 
fra
ct
io
n
0.0
0.2
0.4
0.6
0.8
1.0
mRNA fold change (log2)
Canonical, dCLIP-
supported          P < 10−10 (72)
Top TargetScan7 P < 10−4   (72)
No site                            (6319)
TargetScan7 vs dCLIP
P = 0.06
Intersection          P < 10−3   (12)
Top TargetScan7  P < 0.05 (12)
TargetScan7 vs
intersection
P = 0.85
137
Agarwal et al.
Fig 7
138
Get 3’ UTR coordinates of protein-coding 
    Gencode transcripts
Compare to other gene model resources
Link 3P-seq clusters to gene models
Infer longest 3’ UTR for each stop codon
Collect aligned 3’ UTRs
Calculate median branch length score (BLS) 
    of each 3’ UTR alignment
Partition 3’ UTRs into 10 conservation bins
Partition 3’ UTRs by conservation Calculate site conservation metrics
Calculate BLS of each site for sites to broadly
    conserved miRNAs
Assign conservation status using BLS 
    thresholds
Calculate PCT from BLS
Aggregate normalized 3P-seq clusters for each 
    reference 3’ UTR
Calculate 3’ UTR isoform ratios along UTR length 
Find seed-matched sites
Find 6mer, 7mer-A1, 7mer-m8, and 8mer sites 
    in all reference 3’ UTRs and their orthologs
Collect ORFs
Identify set of representative ORF coordinates 
    corresponding to each reference 3’ UTR
Extract ORF sequences from multiz alignments  
Create web interface
Design scripts to access database and display 
    results by miRNA family or gene/transcript 
    ID for each organism
Provide options to rank targets for each 
    miRNA and miRNAs targeting each mRNA
Get coordinates of reference 3’ UTRs
Mask regions overlapping ORFs in other 
    transcripts
Extract multiz alignments 
Group miRNAs into families
Acquire miRNA annotations for key 
    vertebrate species
Modify annotation of conserved miRNAs
    based on miRNA catalogs
Summarize target predictions
Calculate total weighted context++ scores
Calculate aggregate PCTs (for sites to broadly 
    conserved miRNA families) for reference 
    3’ UTRs
For each miRNA family, tally the number of 
    sites of each type per target
Load all data into MySQL database
Group miRNAs with the same sequence
    at positions 2 – 8 into families
Identify miRNA families that are conserved 
    among mammals or are more broadly 
    conserved among vertebrates
Curate alternative isoforms of conserved 
    families
Calculate context++ score for each site
Score features 
of miRNA families 
Score features 
of mRNAs
Score features 
of sites 
Agarwal et al.
Fig 7-figure supplement 1
139
Tables 
Table 1. The 26 features considered in the models, highlighting the 14 robustly selected 
through stepwise regression (bold). The feature description does not include the scaling 
performed (Table 3) to generate more comparable regression coefficients. 
 
Feature Abbreviation Description 
Frequency chosen  
8mer 7mer-m8 7mer-A1 6mer 
miRNA       
3′-UTR target-site   
abundance 
TA_3UTR Number of sites in all annotated 3′ UTRs (Arvey et al., 2010; 
Garcia et al., 2011) 100% 100% 100% 100% 
ORF target-site abundance TA_ORF Number of sites in all annotated ORFs (Garcia et al., 2011) 9.4% 0.7% 68.1% 93.4% 
Predicted seed-pairing 
stability 
SPS Predicted thermodynamic stability of seed pairing (Garcia et al., 
2011) 100% 100% 100% 100% 
sRNA position 1 sRNA1 Identity of nucleotide at position 1 of the sRNA 68% 100% 99.7% 97.7% 
sRNA position 8 sRNA8 Identity of nucleotide at position 8 of the sRNA 0% 0.8% 100% 100% 
site       
Site position 1 site1 Identity of nucleotide at position 1 of the site N/A 57.1% N/A 2% 
Site position 8 site8 Identity of nucleotide at position 8 of the site 0.8% 95.1% 99.4% 100% 
Site position 9 site9 Identity of nucleotide at position 9 of the site (Lewis et al., 2005; 
Nielsen et al., 2007) 15.4% 7.1% 0.9% 93.7% 
Site position 10 site10 Identity of nucleotide at position 10 of the site (Nielsen et al., 2007) 0.1% 100% 8.5% 26.3% 
Local AU content local_AU AU content near the site (Grimson et al., 2007; Nielsen et al., 2007) 100% 100% 100% 100% 
3′ supplementary pairing 3P_score Supplementary pairing at the miRNA 3′ end (Grimson et al., 2007) 42.5% 100% 100% 100% 
Distance from stop codon dist_stop log10(Distance of site from stop codon) 62.4% 10.8% 8.7% 25.7% 
Predicted structural 
accessibility 
SA log10(Probability that a 14-nt segment centered on the match to 
sRNA positions 7 and 8 is unpaired) 100% 100% 100% 100% 
Minimum distance min_dist log10(Minimum distance of site from stop codon or polyadenylation 
site) (Grimson et al., 2007) 99.9% 100% 87.4% 100% 
Probability of conserved 
targeting 
PCT Probability of site conservation, controlling for dinucleotide 
evolution and site context (Friedman et al., 2009) 100% 100% 100% 20.8% 
mRNA       
5′-UTR length len_5UTR log10(Length of the 5′ UTR) 98.2% 8.2% 4.6% 17.2% 
ORF length len_ORF log10(Length of the ORF) 100% 100% 100% 100% 
3′-UTR length len_3UTR log10(Length of the 3′ UTR) (Hausser et al., 2009) 100% 100% 100% 100% 
5′-UTR AU content AU_5UTR Fraction of AU nucleotides in the 5′ UTR 13% 38.9% 91.1% 31.3% 
ORF AU content AU_ORF Fraction of AU nucleotides in the ORF 1.2% 72.4% 28.4% 35.8% 
3′-UTR AU content AU_3UTR Fraction of AU nucleotides in the 3′ UTR (Robins and Press, 2005; 
Hausser et al., 2009) 5.4% 73.3% 65.3% 80.6% 
3′-UTR offset 6mer sites off6m Number of offset 6mer sites in the 3′ UTR (Friedman et al., 2009) 65.9% 89.6% 99.8% 100% 
ORF 8mer sites ORF8m Number of 8mer sites in the ORF (Lewis et al., 2005; Reczko et al., 
2012) 99.5% 99.1% 100% 100% 
ORF 7mer-m8 sites ORF7m8 Number of 7mer-m8 sites in the ORF (Reczko et al., 2012) 4.7% 4.3% 85.3% 100% 
ORF 7mer-A1 sites ORF7A1 Number of 7mer-A1 sites in the ORF (Reczko et al., 2012) 68.4% 34.2% 97.8% 98.4% 
ORF 6mer sites ORF6m Number of 6mer sites in the ORF (Reczko et al., 2012) 91% 13.3% 0.7% 36.7% 
  
140
Table 2. Summary of datasets analyzed in this study, and corresponding figures using the 
datasets. Supplemental figures are abbreviated (e.g., “Figure 1–figure supplement 2A” is 
shortened to “1–FS2A”). 
 
Figure Gene Expression Omnibus (GEO) ID, ArrayExpress ID, or data source Reference 
1A, 1–FS4A GSM854425, GSM854430, GSM854431, GSM854436, GSM854437, GSM854442,  
GSM854443 
(Bazzini et al., 2012) 
1B, 6B GSM1012118, GSM1012119, GSM1012120, GSM1012121, GSM1012122, GSM1012123 (Loeb et al., 2012) 
1C, 1–FS2A, 6C-D E-TABM-232 (Rodriguez et al., 2007) 
1D, 1F GSM1122217, GSM1122218, GSM1122219, GSM1122220, GSM1122221, GSM1122222,  
GSM1122223, GSM1122224, GSM1122225, GSM1122226 
(Helwak et al., 2013) 
1E, 1–FS3A-D, 6I-J GSM538818, GSM538819, GSM538820, GSM538821 (Hafner et al., 2010) 
1G GSM156524, GSM156532, GSM210897, GSM210898, GSM210901, GSM210903,  
GSM210904, GSM210907, GSM210909, GSM210911, GSM210913, GSM37599,  
http://psilac.mdc-berlin.de/download/ (let7b_32h, miR-30_32h, miR-155_32h,  
miR-16_32h) 
(Lim et al., 2005; 
Grimson et al., 2007; 
Linsley et al., 2007; 
Selbach et al., 2008) 
1H, 6K-L E-MTAB-2110 (Tan et al., 2014) 
1–FS1A GSM210897, GSM210898, GSM210901, GSM210903, GSM210904, GSM210907,  
GSM210909, GSM210911, GSM210913, GSM37599, GSM37601 
(Lim et al., 2005; 
Grimson et al., 2007) 
1–FS1B, 3, 3–FS1B-C, 
4 
74 datasets compiled in Supplementary data 4 of Garcia et al. (2011), used as is or after  
normalization (Supplementary file 1); GSM119707,GSM119708,GSM119710, 
GSM119743,GSM119745,GSM119746,GSM119747,GSM119749,GSM119750,GSM119759, 
GSM119761,GSM119762,GSM119763,GSM133685,GSM133689,GSM133699,GSM133700, 
GSM134325,GSM134327,GSM134466,GSM134480,GSM134483,GSM134485,GSM134511, 
GSM134512,GSM134551,GSM210897,GSM210898,GSM210901,GSM210903,GSM210904, 
GSM210907,GSM210909,GSM210911,GSM210913,GSM37599,GSM37601; E-MEXP-1402  
(1595297366,1595297383,1595297389,1595297394,1595297399,1595297422, 
1595297427,1595297432,1595297491,1595297496,1595297501,1595297507, 
1595297513,1595297518,1595297524,1595297530,1595297535,1595297564, 
1595297588,1595297595,1595297605,1595297614,1595297621,1595297627, 
1595297644,1595297650,1595297662); E-MEXP-668 (16012097016666, 
16012097016667,16012097016668,16012097016669,16012097017938, 
16012097017939,16012097017952,16012097017953,16012097018568, 
251209725411) 
(Lim et al., 2005; 
Birmingham et al., 2006; 
Jackson et al., 2006a; 
Jackson et al., 2006b; 
Schwarz et al., 2006; 
Grimson et al., 2007; 
Anderson et al., 2008) 
1–FS1C GSM95614, GSM95615, GSM95616, GSM95617, GSM95618, GSM95619 (Giraldez et al., 2006) 
1–FS1D-F GSM1269344, GSM1269345, GSM1269348, GSM1269349, GSM1269350, GSM1269351,  
GSM1269354, GSM1269355, GSM1269356, GSM1269357, GSM1269360, GSM1269361,  
GSM1269362, GSM1269363 
(Nam et al., 2014) 
1–FS2B, 1–FS4B, 6E GSM1479572, GSM1479576, GSM1479580, GSM1479584 (Eichhorn et al., 2014) 
1–FS3E, 6H, S3E http://icb.med.cornell.edu/faculty/betel/lab/betelab_v1/Data.html (Lipchina et al., 2011) 
1–FS4C http://psilac.mdc-berlin.de/download/pSILAC_all_protein_ratios_OE.txt (miR155) (Selbach et al., 2008) 
3–FS1A GSM416753 (Mayr and Bartel, 2009) 
5, 5–FS1 GSM156522, GSM156580, GSM156557, GSM156548, GSM156533, GSM156532,  
GSM156524, processed and normalized (Supplementary file 2) 
(Linsley et al., 2007) 
6A GSM37601 (Lim et al., 2005) 
6F-G GSM363763, GSM363766, GSM363769, GSM363772, GSM363775, GSM363778 (Hausser et al., 2009) 
141
Table 3. Scaling parameters used to normalize data to the [0, 1] interval. Provided are the 
5th and 95th percentile values for continuous features that were scaled, after the values of 
the feature were appropriately transformed as indicated (Table 1). 
 
Feature 
8mer 7mer-m8 7mer-A1 6mer 
5th % 95th % 5th % 95th % 5th % 95th % 5th % 95th % 
3P_score 1.000 3.500 1.000 3.500 1.000 3.500 1.000 3.500 
SPS –11.130 –5.520 –11.130 –5.490 –8.410 –3.330 –8.570 –3.330 
TA_3UTR 3.113 3.865 3.067 3.887 3.145 3.887 3.113 3.887 
Len_3UTR 2.392 3.637 2.409 3.615 2.413 3.630 2.405 3.620 
Len_ORF 2.788 3.753 2.773 3.729 2.773 3.730 2.775 3.731 
Min_dist 1.415 3.113 1.491 3.096 1.431 3.117 1.477 3.106 
Local_AU 0.308 0.814 0.277 0.782 0.342 0.801 0.295 0.772 
SA –4.356 –0.661 –5.218 –0.725 –4.230 –0.588 –5.082 –0.666 
PCT 0.000 0.816 0.000 0.364 0.000 0.449 0.000 0.193 
 
 
142
Chapter 3. Independent regulation of vertebral number and vertebral identity by 
microRNA-196 paralogs 
 
Siew Fen Lisa Wong1*, Vikram Agarwal2,3,4*, Jennifer H. Mansfield5,6, Nicolas Denans7, 
Matthew G. Schwartz5, Haydn M. Prosser8, Olivier Pourquié5,9, David P. Bartel2,3, 
Clifford J. Tabin5 and Edwina McGlinn1,5  
 
1EMBL Australia, Australian Regenerative Medicine Institute, Monash University, 
Clayton, Vic, 3800, Australia. 
2Howard Hughes Medical Institute and Whitehead Institute for Biomedical Research, 
Cambridge, MA 02142, USA. 
3Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, 
USA. 
4Computational and Systems Biology Program, Massachusetts Institute of Technology, 
Cambridge, MA 02139, USA. 
5Department of Genetics, Harvard Medical School, Boston, MA 02115, USA. 
6Barnard College, Department of Biological Sciences, 1306 Altschul Hall, 
3009 Broadway, New York, NY, 10027. 
7Stanford School of Medicine, Department of Developmental Biology and Genetics, 
Stanford, CA 94305. 
8The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, 
Cambridge, UK. 
9Department of Pathology, Brigham and Women’s Hospital, Boston, MA 02115, USA. 
 
* These authors contributed equally to this work 
 
V.A. performed computational and statistical analysis with D.P.B.’s guidance. S.F.L.W., 
J.H.M, M.G.S., and E.M. performed mouse experiments. N.D. performed chick 
experiments with O.P.’s guidance. H.M.P. helped generate RNA sequencing data. C.J.T. 
and E.M. designed the study. V.A. and E.M. produced figures and wrote the manuscript. 
 
Published as: 
Wong SFL*, Agarwal V*, Mansfield JH, Denans N, Schwartz MG, Prosser HM, 
Pourquié O, Bartel DP, Tabin CJ, McGlinn E. "Independent regulation of vertebral 
number and vertebral identity by microRNA-196 paralogs". 2015. Proceedings of the 
National Academy of Sciences USA. doi: 10.1073/pnas.1512655112.  
143
Abstract 
The Hox genes play a central role in patterning the anterior-to-posterior axis. An 
important function of Hox activity in vertebrates is the specification of different vertebral 
morphologies, with an additional role in axis elongation emerging. The miR-196 family 
of microRNAs are predicted to extensively target Hox 3′ UTRs, although the full extent 
to which miR-196 regulates Hox expression dynamics and influences mammalian 
development remains to be elucidated. Here we used an extensive allelic series of mouse 
knockouts to show that the miR-196 family of microRNAs is essential both for properly 
patterning vertebral identity at different axial levels and for modulating the total number 
of vertebrae. All three miR-196 paralogs, 196a1, 196a2 and 196b act redundantly to 
pattern the mid-thoracic region, whereas 196a2 and 196b have an additive role in 
controlling the number of rib-bearing vertebra and positioning of the sacrum. 
Independent of this, 196a1, 196a2 and 196b, act redundantly to constrain total vertebral 
number. Loss of miR-196 leads to a collective upregulation of numerous trunk Hox target 
genes with a concomitant delay in activation of caudal Hox genes which are proposed to 
signal the end of axis extension. Additionally, we identified altered molecular signatures 
associated with the Wnt, Fgf and Notch/segmentation pathways, and demonstrate that 
miR-196 has the potential to regulate Wnt activity by multiple mechanisms. By feeding 
into, and thereby integrating, multiple genetic networks controlling vertebral number and 
identity, miR-196 is a critical player defining axial formulae.  
 
Introduction 
A defining feature of the vertebrate animals is the presence of a segmented vertebral 
144
column. Species are uniquely characterized by the total number of vertebra that form, and 
by the regionalization of these vertebra along the anterior-to-posterior axis into groups 
with distinct morphologies (e.g. cervical, thoracic, lumbar and sacral). The genetic 
determinants of vertebral number and vertebral identity have largely been considered as 
separate; thus how, or even whether, these processes are molecularly integrated remains 
to be clearly elucidated.  
 Vertebral precursors, known as somites, arise by continued expansion and 
segmentation of a region of the caudal embryo, the presomitic mesoderm (PSM) 
(Benazeraf and Pourquie, 2013). Expansion of the PSM requires a self-renewing axial 
progenitor population which initially resides in the node-streak border of the epiblast, and 
subsequently repositions to the tailbud (Psychoyos and Stern, 1996; Cambray and 
Wilson, 2002, 2007; Iimura et al., 2007; McGrew et al., 2008). These progenitors provide 
a source of cells that, following ingression through the primitive streak, populate the 
PSM and other derivatives to drive posterior elongation. Key players in this process 
include genes involved in Wnt and Fgf signaling, in addition to the Cdx transcription 
factors, as evidenced by severe axis truncations when each are mutated (Neijts et al., 
2014). Balancing the expansion of this cell population, cells of the anterior PSM bud off 
to form somites with a rhythmic periodicity inherent to each species.  The eventual 
exhaustion of progenitor self-renewal capacity is thought to halt axis elongation, the 
timing of which is a critical factor in establishing species-specific vertebral number 
(Gomez et al., 2008).  
 Within vertebral precursors, specific combinations of Hox transcription factors 
impart positional information that governs vertebral identity (Wellik, 2007). In mammals, 
145
the 39 Hox genes are clustered at four separate genomic loci (HoxA, HoxB, HoxC and 
HoxD), with each gene classified into one of 13 paralogous groups dependent on 
sequence similarities and relative positions within the respective clusters (Fig. 1A). These 
genes are expressed in partially overlapping domains during embryonic development, 
with a spatio-temporal collinearity that reflects genomic ordering (Duboule and Dolle, 
1989; Graham et al., 1989). Exhaustive analysis of Hox mouse mutants over more than 
20 years has revealed individual and cumulative Hox function in conferring specific 
positional identities to the forming vertebral column (Wellik, 2007). For instance, the 
central/trunk Hox genes (paralogs 5–8) primarily pattern thoracic vertebra, whereas Hox 
11 paralogs pattern sacral and caudal vertebra (Wellik and Capecchi, 2003) and position 
the sacrum (Favier et al., 1995; Favier et al., 1996).  
 In addition to transcripts encoding for the Hox proteins, transcription within the 
genomic Hox clusters produce non-coding regulatory RNAs, including several 
microRNAs (Fig. 1A) (Heimberg and McGlinn, 2012). In mouse, these include the miR-
10 family, which is found throughout most bilaterian animals, miR-615, which his found 
in eutherian mammals, and the miR-196 family, which is found in vertebrates and 
tunicates. Three murine miR-196 paralogs exist (referred to as 196a1, 196a2 and 196b), 
each with essentially identical targeting potential (Yekta et al., 2004; Bartel, 2009). The 
three miR-196 paralogs exhibit deep conservation across all vertebrate lineages analyzed 
to date, both in terms of their genomic positioning upstream of Hox9 paralogs, and in 
their extensive predicted targeting of Hox 3′ UTRs primarily of the trunk region (Fig. 1A) 
(Yekta et al., 2004; Yekta et al., 2008; Vonk et al., 2013). In an early developmental 
context, in vivo validation of these interactions has focused primarily on a single Hox 
146
target, Hoxb8 (Mansfield et al., 2004; Yekta et al., 2004; Hornstein et al., 2005; McGlinn 
et al., 2009; Asli and Kessel, 2010; He et al., 2011), with no evidence for additional Hox 
target regulation observed in miR-196 knockdown studies in zebrafish (He et al., 2011). 
Thus, the extent to which collective Hox output is regulated by miR-196, either in terms 
of the number of genes affected or the relative levels of regulation is unknown.  
  The extent to which the developmental modules that define total vertebral number 
are integrated with those that impart positional information has not been well established, 
although these processes can be uncoupled (Dubrulle et al., 2001; Schroter and Oates, 
2010; Harima et al., 2013). A function for Hox genes in establishing total vertebral 
number has been largely dismissed because, with the exception of Hoxb13–/– 
(Economides et al., 2003), Hox knockouts do not phenotypically support such a role 
(Wellik, 2007). However, ectopic trunk Hox activity can, under certain conditions, drive 
axis elongation (Young et al., 2009). Conversely, posterior Hox activity slows axis 
elongation and terminates the main body axis (Young et al., 2009; Denans et al., 2015), 
suggesting an alternative view of Hox activity in this context. In this light, phenotypic 
observations following reduced activity of miR-196, a repressor of Hox activity, are quite 
remarkable.  Knockdown studies in chick and zebrafish support a role for miR-196 in 
regulating vertebral identity (McGlinn et al., 2009; He et al., 2011). Additionally, miR-
196 morphant zebrafish exhibit an extended vertebral column, with what appears to be an 
“insertion” of a rib-bearing precaudal element (He et al., 2011). How this latter 
phenotype arises developmentally is not known, and is difficult to reconcile with de-
repression of trunk Hox target genes alone (Pollock et al., 1992; Pollock et al., 1995). 
These knockdown approaches could not shed light on individual paralog contributions for 
147
this highly related miRNA family, and importantly, the molecular networks downstream 
of miR-196, which have the potential to drive phenotypic alterations, remain 
uncharacterized.  
 Here, we have generated individual knockout alleles for each of the three miR-
196 family members in mouse. This has allowed us to build an entire allelic deletion 
series to reveal the individual and additive roles of miR-196 paralogs in patterning 
vertebral identity at many axial levels and in controlling the total number of vertebrae. 
We have characterized the detailed molecular landscape controlled by miR-196 activity 
in the early embryo to show that miR-196 regulates, and therefore has the ability to 
integrate, multiple key signaling pathways to drive developmental processes. 
 
Results 
Differential transcription of miR-196a1 and miR-196a2 in the developing embryo.  
To reveal the individual expression patterns, and therefore potential for functional 
redundancy, of identical miRNAs 196a1 and 196a2, we generated eGFP knock-in alleles 
termed 196a1GFP and 196a2GFP (Fig. S1). Expression of reporter mRNA reflects sites of 
active transcription, though does not reveal additional post-transcriptional regulation that 
endogenous miRNAs may undergo. Whole mount in situ hybridization analysis of 
reporter mRNA indicated that both miRNAs were expressed specifically in the posterior 
embryonic derivatives of all three germ layers, and revealed striking differences in their 
spatio-temporal kinetics that have not previously been delineated (Fig. 1B-J). miR-196a1 
is expressed at the onset of somitogenesis [embryonic day (E) 8.0; data not shown] and 
throughout the posterior growth zone at E8.5 (Fig. 1B). Strong expression is maintained 
148
in the PSM until the end of axis elongation, with a discrete band of low expression in the 
anterior PSM from E10.5 (see inset Fig. 1E and F). The anterior boundary of somitic and 
neural expression extends to approximately somite 13/14 [prevertebra (pv) 9, thoracic (T) 
2] at E9.5 with a caudal shift in somitic tissue and a rostral shift in neural tissue as 
development proceeds (Fig. 1C and D). This expression profile indicates that miR-196a1 
exhibits a classic collinear profile relative to the adjacent Hox gene, Hoxb9 (anterior limit 
at E9.5, pv3) (Chen and Capecchi, 1997). miR-196a2 expression is temporally delayed 
relative to miR-196a1, with faint expression ventral to the PSM at E8.5-9.0 (arrows in 
Fig. 1F and G). Strong expression is then observed throughout the PSM and neural plate 
at E9.5 (Fig. 1H). A stable anterior somitic limit at approximately somite 21/22 (pv17, 
T10) and neural limit 2 somites rostral to this is established soon after, consistent with its 
positioning between Hoxc9 and Hoxc10 (Burke et al., 1995). This analysis revealed both 
unique and overlapping expression patterns of miR-196a1 and miR-196a2, suggesting 
these identical miRNAs might have both unique functions where individually expressed 
and either redundant or additive functions at sites of co-expression.  
 
Genetic deletion of miR-196 leads to altered vertebral identity 
The collective function of miR-196 family members had yet to be assessed in mammals. 
Moreover, the dissection of paralog contributions to overall miR-196 activity had not 
been achieved in any system. To address this, we generated straight knockout alleles at 
each of the three murine miR-196 loci (Fig. S2), allowing us to create the complete allelic 
series of single, double and triple miR-196 knockout embryos. This allowed us to 
demonstrate an essential requirement for miR-196 activity in patterning the mid-thoracic, 
149
the thoraco-lumbar transition and lumbo-sacral regions, with both paralog-specific and 
additive effects revealed.  
 Removal of individual miR-196 paralogs alone revealed partially penetrant 
homeotic patterning defects (Fig. 2A, Table S1). In 196a2 or 196b single-mutant 
embryos, the presence of an ectopic rudimentary rib nubbin on the first lumbar vertebra 
indicated an anterior homeotic transformation of this element (Fig. 2A). Additionally, in 
approximately one quarter of cases, we observed anterior homeotic transformations 
encompassing all subsequent lumbar and sacral elements, resulting in a posterior 
displacement of the sacrum (schematized in Fig. 2D). Although this latter phenotype 
could be interpreted as an “insertion” of a thoracic element, the repositioned last lumbar 
vertebrae (L6* in Fig. 1l) was often asymmetric, with both lumbar and sacral 
characteristics (Table S1), which supports the interpretation of serial identity changes, 
beginning at L1 and encompassing all subsequent elements. We did not observe a similar 
L1-to-T anterior homeotic transformation in 196a1 single-mutant embryos, which for the 
most part, exhibited no overt vertebral alterations (Fig 2A). However, at very low 
penetrance (Table S1), 196a1 single-mutant embryos displayed an anterior displacement 
of the sacrum, with or without a reduction in rib length of the last thoracic element (T13), 
suggesting these paralogs may have an opposing role at this axial level. 
 We hypothesized that the penetrance and severity of the phenotypes observed 
after mutating single miR-196 paralogs could be enhanced by combining these mutations. 
Indeed, 196a2–/–;196b–/– double-mutant skeletons exhibited a fully penetrant phenotype, 
with two pairs of supernumerary ribs and anterior homeotic transformation of all 
subsequent elements (Fig. 2A,B,D). Relative to this double mutant phenotype, triple 
150
knockout embryos, 196a1–/–;196a2–/–;196b–/–, displayed no additional patterning defects 
(Fig. 2A,D).  
 We also hypothesized that combining these mutations might reveal additional 
defects not observed in single mutants. Indeed, all double-mutant skeletons, or skeletons 
with a triple-knockout combination of 4 or more alleles removed, exhibited a partially 
penetrant increase in the number of ribs attached to the sternum (Table S1, Fig. 2C) 
indicating a transformation of the 8th thoracic element to a more anterior identity. 
Together, our analysis has shown that 1) 196a2 and 196b have single and additive effects 
in patterning the thoraco-lumbar transition and in positioning the sacrum, with a possible 
opposing role or miR-196a1 at this axial level; 2) miR-196a1, miR-196a2 and miR-196b 
act redundantly to pattern the mid-thoracic region, with phenotypic alterations observed 
only when two or more paralogs are removed. As such, our work has provided the first 
genetic proof for miR-196 as a homeotic family of genes and revealed identity changes at 
multiple axial levels.  
 
Genetic deletion of miR-196 leads to an increase in vertebral number 
Homeotic transformations do not alter the number of vertebra, simply their identity. It 
was therefore surprising, that in zebrafish, miR-196 has been shown to constrain total 
vertebral number (He et al., 2011). We assessed whether this was an evolutionarily 
conserved function of miR-196, and found that the three murine miR-196 paralogs 
constrain total vertebral number in a redundant fashion. Wildtype C57BL6J/N mice 
exhibit small variations in the total number of vertebrae (Fig. 2E). Compared with the 
wildtype mean, we observed a statistically significant increase of approximately one 
151
vertebral element in various allelic combinations, including 196a1–/–;196a2–/–, 196a2–/–
;196b–/– and triple-knockout combinations with four or more alleles deleted (Fig. 2E). 
Depending on the exact allelic combination, this additional element was patterned as a 
thoracic (e.g. in 196a2–/–;196b–/– mice) or a post-sacral (e.g. in 196a1–/–;196a2–/– mice) 
element. Together, these results indicate that miR-196-mediated control of vertebral 
number and patterning of segment identity are separable processes. All three miR-196 
paralogs contribute additively to establishing vertebral number within mouse.  Layered 
on top of this, individual miR-196 paralogs have a differential impact on positional 
identity and ultimate axial formulae, likely as a result of their differential spatio-temporal 
kinetics (Fig. 1B-K) relative to target mRNAs.  
 
Transcriptome alterations are detected following allelic removal of miR-196 activity 
To elucidate the molecular mechanism and targets downstream of miR-196, we examined 
the response of mRNAs to the loss of mir-196 alleles in E9.5 embryos. To focus these 
molecular analyses on the relevant cells, i.e., those cells that normally express miR-196, 
we used only embryos with at least one eGFP knock-in allele and performed RNA-seq on 
RNA isolated from cells that were GFP positive (Fig. 3A). With mRNA profiled across 
ten genotypes (Table S2), we then compared mRNA changes as increasing numbers and 
differing combinations of alleles were deleted (Table S3). We first examined the effect of 
allelic miR-196 deletion on predicted miR-196 target genes. Utilizing the total context+ 
score from TargetScan 6.2, which considers the number and type of miRNA binding sites 
as well as additional features to predict the genes most effectively targeted by each 
miRNA (Garcia et al., 2011), we observed that the top predicted targets of miR-196 
152
exhibited significant de-repression upon the loss of additional miR-196 alleles (Fig. 3B, 
Figure S4). The de-repression of these predicted targets increased with the number of 
additional alleles deleted (Fig. 3B), revealing miR-196 dosage sensitivity. The direct 
interaction between miR-196 and its target transcripts could occur in any of the three 
germ layer derivatives in which miR-196 was expressed, and indeed, an unbiased analysis 
of all differentially expressed genes revealed statistically altered molecular signatures 
reflecting this (Fig. S5). Of particular interest, we observe statistical enrichment in genes 
controlling skeletal morphology (Fig. 3C,D and Fig. S5), indicating the presence of a 
molecular signature consistent with the vertebral abnormalities observed at the 
phenotypic level.  
 
Hox cluster expression dynamics are altered in miR-196 mutant embryos 
It was not known exactly how many of the ten predicted murine miR-196 Hox target 
genes are in fact bona-fide targets in an in vivo developmental context, nor was it known 
the relative level of regulation that these predicted targets undergo. When specifically 
interrogating our transcriptome datasets to assess effects on Hox gene expression, a 
significant and dose-dependent upregulation of predicted miR-196 Hox targets was 
observed (Fig. 4A), which paralleled the dose-dependent patterning defects (Fig. 2A). 
Comparison of 196a2–/–;196b–/– versus 196a2+/– profiles identified 7/10 predicted miR-
196 Hox targets as significantly de-repressed in double-mutant cells at this 
developmental stage. Those predicted Hox targets exhibiting no significant de-repression 
in our analysis included Hoxb1, Hoxa4 and Hoxa5. The most highly de-repressed Hox 
targets were Hoxc8 and Hoxa7, both of which harbor multiple predicted miR-196 binding 
153
sites in their 3′ UTRs, and Hoxb8, which exhibits unusually extensive complementarity to 
miR-196 (Mansfield et al., 2004; Yekta et al., 2004). Further, the measurement of 
differential expression (Fig. 4A) were almost certain to be an underestimate, since our 
strategy Utilized eGFP-positive control samples in which at least one miR-196 allele had 
been removed. Whole-mount in situ hybridization (WISH) further revealed that the de-
repression of Hoxb8 and Hoxc8 target transcripts in 196a2–/–;196b–/– E9.5 embryos 
relative to wildtype manifested as a posterior expansion of endogenous expression 
domains in both the PSM and neural tube (Figure 4B,C; n = 3/3 per genotype, 
respectively). In light of previous reports (Pollock et al., 1992; Pollock et al., 1995), this 
failure in timely clearance of the trunk Hox program from more posterior locations is 
likely to drive supernumerary rib formation observed in miR-196 mutant embryos.  
 Importantly, we also identified a dose-dependent downregulation of posterior Hox 
genes following progressive removal of miR-196 alleles (Fig. 4A). This was particularly 
evident for Hoxd10-d13 genes, and was also significant for posterior genes of the HoxA 
and HoxC clusters. Although the absence of predicted miR-196 sites within these 
mRNAs, together with the direction of the regulation (down instead of up with 
diminished miRNA) indicated that this regulation was indirect, it was nonetheless notable 
for three reasons. First, given the potential for phenotypic dominance of posterior over 
anterior Hox gene function (e.g. rib-suppression role of Hox10 paralogs (Wellik and 
Capecchi, 2003; Carapuco et al., 2005), a timely activation of a posterior developmental 
program in miR-196 mutants would be expected to suppress supernumerary rib 
formation. Second, these posterior Hox proteins, particularly Hoxd11 and Hoxa11, are 
known to position the lumbo-sacral junction (Davis and Capecchi, 1994; Favier et al., 
154
1995; Spitz et al., 2001), providing a molecular explanation for how the sacrum was re-
positioned in miR-196 mutants. Finally, in addition to understanding vertebral identity 
defects, these molecular alterations may provide important experimental support for a 
proposed model whereby maintenance of tailbud cell divisions, and therefore total 
vertebral number, is promoted by trunk Hox genes and antagonized by caudal Hox genes 
(Economides et al., 2003; Young et al., 2009; Denans et al., 2015). Our results place mir-
196 activity at this critical junction, coordinating a reproducible trunk-to-tail Hox code 
transition. We suggest that such a delay in Hox-code transition could contribute to the 
formation of an additional vertebral element observed following genetic removal of miR-
196 activity in mouse. This is likely to be a broadly conserved role for miR-196 across 
vertebrate species, as supported by regionalized vertebral expansion observed in 
zebrafish (He et al., 2011).  
 
Identification of additional direct targets of miR-196 
The statistical enrichment of Hox genes amongst all miR-196 predicted targets (Yekta et 
al., 2008) prioritized these mRNAs for immediate analysis. However, microRNAs can 
simultaneously repress extensive suites of target genes (Bartel, 2009). To provide 
experimental support for additional direct targets of miR-196 that have the potential to 
function in this developmental context, we identified the most highly up-regulated genes 
in our RNA-seq dataset that either contained a conserved binding site or were predicted 
to respond strongly to the miRNA (i.e., context+ score ≤ –0.2) (Fig. 5A). For the top three 
evolutionarily conserved miR-196 target genes identified, we assessed whether regulation 
of their expression by miR-196 required direct binding to sites within their 3′ UTR. Using 
155
a luciferase-based reporter assay system in cell culture, miR-196 was shown to repress 
each of the target genes in a sequence-specific manner (Fig. 5B). Of particular interest 
within this set was the cell-adhesion molecule (Prtg) involved in the ingression of PSM 
progenitors (Ito et al., 2011), and an orphan nuclear receptor (Nr6a1) essential for 
somitogensis in mouse (Chung et al., 2001) and one of the very few genes that have be 
associated with variation of vertebral number (Mikawa et al., 2007). These 
experimentally supported miR-196 targets highlight important avenues for future 
investigation, not only with respect to axial patterning and elongation but also the many 
other developmental processes (Hornstein et al., 2005; Asli and Kessel, 2010; He et al., 
2011) and pathological conditions (Li et al., 2012; Velu et al., 2014) involving miR-196. 
 
miR-196 activity is required for signaling pathways associated with axis elongation, 
segmentation and the trunk-to-tail transition. 
miR-196 activity has been shown to negatively regulate retinoic acid pathway activity in 
the context of pectoral fin formation (He et al., 2011), but regulation of additional 
developmental signaling pathways in the early embryo, either directly or indirectly, has 
not been systematically assessed. Upon further interrogation of our RNA-seq data, we 
found altered molecular signatures of both axis elongation and somite segmentation 
across many allelic comparisons (Fig. 6). We observed a clear upregulation of the Wnt 
negative feed-back inhibitor Dkk1 (Chamorro et al., 2005). In addition, the collective 
down-regulation of numerous direct and indirect downstream targets of Wnt signaling 
(Takahashi et al., 2002; Buttitta et al., 2003; Lickert et al., 2005; Weidinger et al., 2005; 
Dequeant et al., 2006) (Fig. 6), and the prediction of diminished β-catenin/CTNNB1 
156
activity following global pathway analysis (Fig. S6), indicated an overall reduction in 
Wnt activity in mutant embryos. Wnt and Fgf signaling positively reinforce one another 
in the mouse tailbud (Aulehla et al., 2008; Dunty et al., 2008; Naiche et al., 2011), and 
consistent with diminished Wnt activity in miR-196 mutants, we also observed a 
downregulation of the Fgf8 ligand and numerous Fgf downstream effectors (Fig. 6). We 
observed a robust down-regulation of Notch signaling components and anterior PSM 
genes Mesp2, Epha4 and Ripply2, likely as a consequence of diminished Wnt activity 
acting via the Notch ligand Dll1 (Galceran et al., 2004; Hofmann et al., 2004; Dunty et 
al., 2008). Interestingly, these molecular alterations described for miR-196 mutant 
embryos resembled alterations observed following removal of all mature miRNAs in the 
mesoderm lineage (Zhang et al., 2011), which in the latter case resulted in a caudal 
displacement of the hindlimb by 3 somites.  
 Finally, a coordinated temporal delay in the trunk-to-tail Hox code transition has 
been observed in mice null for Gdf11 (McPherron et al., 1999), which as heterozygotes, 
bear striking phenotypic resemblance to 196a2–/–;196b–/– or miR-196 triple knockout 
mouse embryos. We therefore specifically interrogated our RNA-seq data to assess the 
levels of Gdf11 and its direct downstream effector Isl1 (Jurberg et al., 2013). In 196a2–/–
;196b–/– embryos, which exhibit 100% penetrant L-to-T transformation and sacral 
displacement, we observed a statistically significant reduction in Gdf11 and Isl1 levels by 
15% (Table S3). As mentioned, this is likely to be an underestimate of the level of 
regulation, given the experimental strategy employed. The requirement for Gdf11 in 
defining presacral vertebral number is dose-dependent (McPherron et al., 1999). The 
exact threshold requirement for Gdf11 signaling is not known, and it remains to be 
157
determined whether subtle down-regulation of Gdf11 contributes to phenotypic 
alterations observed in miR-196 mutant mice. Together, our transcriptome analysis 
revealed multiple developmental networks that require miR-196 activity for appropriate 
control of gene expression and suggest intriguing avenues for future experimental 
exploration.  
 
miR-196 has the potential to modulate Wnt signaling by multiple mechanisms 
Vertebral progenitors in the epiblast and tailbud are sensitive to the levels of Wnt 
signaling. Genetic removal of the Wnt3a ligand (Takada et al., 1994), or conversely, 
ectopic activation of Wnt3a in the epiblast (Jurberg et al., 2014), result in severe axis 
truncation posterior to the forelimb. Wnt3a expression has been shown to decrease as 
progenitor cells commit to a paraxial mesoderm fate (Takemoto et al., 2011; Nowotschin 
et al., 2012), and sustained Wnt activity disrupts somite formation (Aulehla et al., 2008) 
and somite polarity (Jurberg et al., 2014), dependent on timing and method of activation. 
These observations indicate that careful titration of Wnt levels is essential throughout the 
process of somite formation. Our data suggests that miR-196 activity is required in 
maintaining precise levels of Wnt activity (Fig. 6). Mechanistically, this could be 
achieved in at least two ways. First, miR-196 could directly target genes in the Wnt 
pathway. Specifically, the potent Wnt antagonist Dkk1 harbors a single predicted miR-
196 site within its 3′ UTR, and Dkk1 expression was upregulated following removal of 
miR-196 activity (Fig. 6). Using WISH, we confirmed increased expression of Dkk1 in 
196a1–/–;196a2–/– embryos relative to 196a1+/–;196a2+/– (Fig 7A; n = 2/2 per genotype). 
To test whether miR-196 can act directly to repress Dkk1, we Utilized a luciferase-based 
158
reporter assay system in cell culture to show that, indeed, miR-196 negatively regulates 
the Dkk1 3′ UTR in a sequence-specific manner (Fig. 7B). However, the repression in the 
reporter assay was more modest than that observed in vivo using RNA-seq (Fig. 6), and 
Dkk1 is not a conserved target of miR-196, suggesting that indirect regulation by miR-
196 also plays role. Second, miR-196 control over Wnt activity might work in part via 
Hox intermediates, which have the potential to either activate or repress Wnt signaling 
(Young et al., 2009; Denans et al., 2015). We have recently shown using chick in vivo 
electroporation and imaging that the collinear activation of a subset of Hox9-13 posterior 
Hox genes within paraxial mesoderm progenitors translates into a graded increase in Wnt 
repression and a slowing down of axis elongation (Denans et al., 2015). One Hox gene 
that was found to significantly repress Wnt activity using this in vivo luciferase-based 
Wnt reporter assay was the miR-196 target Hoxa9.  We therefore went on to test whether 
additional miR-196 Hox targets have the ability to repress Wnt activity in this context. 
We co-electroporated a Wnt/β-catenin reporter (BATLuc) and a CMV-Renilla construct 
in paraxial mesoderm progenitors together with an expression vector containing either 
Venus or Hoxb1, Hoxa5, Hoxa7, Hoxb7, Hoxb8 and Hoxc8.  Of these six Hox genes 
tested, four (Hoxa7, Hoxb7, Hoxb8 and Hoxc8) showed strong repression of luciferase 
activity, while two (Hoxb1 and Hoxa5) did not (Fig 7C).  Interestingly, the two Hox 
genes do not influence Wnt/β-catenin reporter activity in early chick paraxial mesoderm 
progenitors are the same Hox genes which show no indication of direct regulation by 
miR-196 in E9.5 mouse tissue (Fig. 4A). Together, these data demonstrate that miR-196 
has the potential to directly and indirectly regulate the precise levels of Wnt activity in 
the developing embryo.  
159
Discussion 
Our work demonstrates the essential role for murine miR-196 in regulating vertebral 
identity across different levels of the body axis, and reveals evolutionary conservation in 
the role of miR-196 in constraining total vertebral number. Importantly, our strategy has 
allowed us to comprehensively dissect paralog contribution to resultant phenotypes, 
allowing us to distinguish a patterning role for miR-196 from its role in modulating 
vertebral number. Moreover, we have characterized the detailed molecular landscape 
controlled by miR-196 activity in the early embryo to show that miR-196 regulates, and 
therefore has the ability to integrate, multiple key signaling pathways to drive 
developmental processes.  
 
miR-196 activity is essential for vertebral identity 
Despite the clear potential for functional redundancy between miR-196 paralogs (Yekta 
et al., 2004), homeotic transformation of vertebral elements could be observed at low 
penetrance following removal of an individual miR-196 paralog (e.g. 196a2–/– or 196b–/– 
single mutants). With increasing loss of miR-196 family members (e.g. 196a2–/–;196b–/– 
double mutants), fully penetrant vertebral phenotypes were observed that were equivalent 
in severity to many single and compound Hox mutants (Favier et al., 1996; van den 
Akker et al., 2001). Vertebral identity changes were observed at sites where loss-of-
function phenotypes have previously been described for numerous direct targets of Hox 
genes (van den Akker et al., 2001), reinforcing the view that miR-196 acts within 
endogenous Hox domains rather than simply as a fail-safe mechanism to clear an anterior 
developmental program at more posterior locations (McGlinn et al., 2009). Paradoxically, 
160
the 196a2–/–;196b–/– or triple knockout phenotypes are remarkably similar to either 
Hoxc8–/– or Hoxc8–/–;Hoxd8–/– skeletons, with 8 ribs attached to the sternum, L1-to-T 
transformation and a posterior displacement of the sacrum (van den Akker et al., 2001). 
However, with respect to number of sternal rib attachments and L1-toT transformation, 
Hoxc8 loss-of-function and gain-of-function mutant mice exhibit identical phenotypes 
(Pollock et al., 1992; van den Akker et al., 2001). These data indicate that exquisite 
regulation of a quantitative Hox code is essential in defining vertebral identity at this 
axial location. Interestingly, deletion of Hoxb8 rescues many defects observed in Hoxc8 
null mice, highlighting that there are aspects of a qualitative Hox code that we are yet to 
understand. Nonetheless, similar to Hoxc8, ectopic Hoxb8 expression results in 
supernumerary rib formation throughout the lumbar region (Pollock et al., 1995), 
supporting the view that a collective up-regulation of direct targets of Hox genes drives 
homeotic alterations of the mid-thoracic to upper lumbar region in miR-196 mutant mice.  
 A shift in the position of the sacrum observed in miR-196 mutant embryos was 
not easily reconcilable with the function of miR-196 in directly repressing trunk Hox 
target genes (Pollock et al., 1992; Pollock et al., 1995). However, we show that in 
addition to direct Hox gene regulation, miR-196 indirectly regulates the expression levels 
or temporal activation of many caudal Hox genes, including those that are known to 
control positioning of the sacrum, such as Hoxa10, Hoxd10 and Hoxd11 (Davis et al., 
1995; Favier et al., 1995; Favier et al., 1996; Zakany et al., 1997). The mechanisms 
leading to a delay in posterior Hox gene activation in miR-196 mutant mice are currently 
unknown.  A similar coordinated temporal shift in the trunk-to-tail Hox code has been 
demonstrated in Gdf11–/– mice (McPherron et al., 1999), which show conservation in the 
161
types of vertebral transformations we observe here in miR-196 mutant embryos.  In this 
context, Gdf11 appears to work via retinoic acid signaling (Lee et al., 2010; Jurberg et al., 
2013), and whether altered Gdf11 and retinoic acid signaling contribute to miR-196 
phenotypic alterations remains to be tested.  
 
miR-196 activity constrains total vertebral number  
Total vertebral number of a given species is highly reproducible, and mutations that 
extend the vertebral column of model organisms are very rare. Amongst vertebrate 
species, however, great diversity in vertebral number has arisen. Cross-species 
comparison (Gomez et al., 2008) or direct genetic perturbation (Dubrulle et al., 2001; 
Schroter and Oates, 2010) demonstrate that the periodicity of segmentation clock 
oscillation relative to the rate of PSM growth is the central parameter in defining 
vertebral number. It remains to be determined how an additional vertebral element seen 
here in miR-196 mutant mice, or in miR-196 morphant zebrafish (He et al., 2011), are 
generated at a cellular level (i.e., does the clock tick faster, or does it tick at the same rate 
for longer). Our analysis does however reveal molecular alterations in miR-196 mutant 
embryos which have the potential to affect vertebral number. 
 First, altered expression of Notch, Wnt and Fgf pathways could alter the 
periodicity of segment formation (Benazeraf and Pourquie, 2013). However, diminished 
Wnt and Fgf would be predicted to increase somite size (Dubrulle et al., 2001; Sawada et 
al., 2001; Aulehla et al., 2003; Bajard et al., 2014), which if axis elongation was 
unaltered, would lead to a reduction in vertebral number. Further work is required to 
clarify any functional role for miR-196 in the molecular networks coordinating 
162
segmentation.  
 Second, we have shown that miR-196 activity can modulate the expression levels 
of many Hox genes, either directly or indirectly. It is well documented that Hox genes 
control mesodermal ingression, thus regulating cell injection into the PSM (Iimura and 
Pourquie, 2006; Denans et al., 2015). The rate of PSM growth is not uniform along the 
A-P axis (Gomez et al., 2008), with a switch to PSM shortening occurring at about the 
trunk-to-tail transition in most amniotes. This switch correlates with activation of a 
posterior Hox code (Hox9 onwards), and a subset of posterior Hox genes slow axis 
elongation by controlling the ingression of PM progenitors via Wnt repression (Denans et 
al., 2015). We show here that the ability to repress Wnt signaling is not exclusive to 
posterior Hox genes, but that Hox7/8 paralogs also downregulate Wnt signaling in a 
collinear manner in the chick epiblast. This fits well with previous observations that 
Hoxb7 and Hoxb9 have a collinear effect on ingression (Iimura et al., 2007). The role of 
this repression might be to help maintain cells with progressively more posterior identity 
in the epiblast, in order to get a progressive deposition of collinear Hox domains. A delay 
in posterior Hox gene activation would result in delayed commencement of axis 
elongation slow-down, potentially allowing the formation of additional vertebral 
elements. The repression of Wnt by posterior Hox genes as a means to slow down and 
terminate axis elongation (Young et al., 2009; Denans et al., 2015) is consistent with the 
known function of Wnt3a in driving axis elongation (Takada et al., 1994). The repression 
of Wnt by trunk Hox genes is less intuitive, and not consistent with a study in mouse 
(Young et al., 2009). However, the importance of precise Wnt levels in the early steps of 
axis formation, and of cellular context, are beginning to be appreciated. In vitro analysis 
163
of epiblast stem cells demonstrate that low levels of Wnt induce a primitive streak-like 
pluripotent state, whereas higher levels of Wnt activity promoted lineage commitment 
(Tsakiridis et al., 2014). Additionally, high levels of Wnt3a in the mouse epiblast appear 
to exhaust the progenitor pool (Jurberg et al., 2014). It is therefore possible that, in vivo, 
as the axis is rapidly elongating and Wnt activity is already high, the trunk Hox7/8 genes 
negatively feedback on Wnt activity to avoid the immediate depletion of the pool of 
progenitors by ingression and hence to regulate the progressive formation of the axis. 
 Although a heterochronic shift in the trunk-to-tail Hox code transition could be 
predicted to vary vertebral number, morphological evidence for this has been scarce. 
Analysis of total vertebral number in Gdf11–/– mice, which exhibit a dramatic 
heterochronic shift in Hox code, is hampered by caudal truncation (McPherron et al., 
1999). Although ectopic trunk Hox gene expression (Hoxa5 and Hoxb8) has the ability to 
rescue axis truncation defects of a genetically engineered mutant (Young et al., 2009), 
they do not appear to increase vertebral number on a wildtype background (Pollock et al., 
1992; Young et al., 2009). This is possibly due to the fact that posterior prevalence still 
holds; caudal Hox genes and miR-196 would be expressed at the usual time and place to 
regulate and terminate axis elongation. In the case of miR-196 knockouts, the cumulative 
effect on both trunk and caudal Hox gene expression could permit continued maintenance 
of progenitor divisions whilst delaying commencement of axis elongation slow-down, 
resulting in increased vertebral number.  
 Together, our results highlight an essential requirement for miR-196 activity in 
reinforcing a timely trunk-to-tail transition and reproducibility of axial formulae. Given 
the ancestral role of Hox activity in species that Utilize a posterior growth zone (Ryan 
164
and Baxevanis, 2007), and the recurrent acquisition of miRNAs within the Hox clusters 
across metazoan taxa (Lagos-Quintana et al., 2003; Yekta et al., 2004; Heimberg and 
McGlinn, 2012; Moran et al., 2014), variation in Hox-miRNA interactions may represent 
an important mechanism for the evolution of animal body plans.  
 
Materials and Methods 
miR-196a1GFP and miR-196a2GFP knock-in construction 
A 72bp (miR-196a1) or 52bp (miR-196a2) genomic fragment encompassing each mature 
miRNA sequence was replaced with a cassette containing eGFP fused to the rabbit β-
Globin 3′ UTR followed by FRT-flanked PGKem7-Neomycin. A Kozak sequence was 
inserted upstream of the eGFP start codon.  Targeting constructs were generated using 
129/Sv sequence and electroporated into J1 embryonic stem cells. Correctly targeted ES 
cells were identified and used to generate germline transmitting knock-in lines. Prior to 
analysis, the Neomycin selection cassette was removed by crossing to a ubiquitous FLPe-
deleter mouse line.  Resulting lines were bred onto a C57Bl/6J background and 
confirmed as isogenic by SNP genotyping. 
 
miR-196a1–/– and miR-196a2–/– and miR-196b–/– generation 
Previously targeted ES cells at each of the three miR-196 loci have been generated 
(Prosser et al., 2011). Correctly targeted JM8A3 ES cells were reconfirmed by Southern 
blot and used to generate germline transmitting knockout lines. Prior to analysis, the 
puDeltaTK selection cassette was removed by crossing to a ubiquitous Cre-deleter 
mouse.  Resulting lines are on a mixed C57Bl/6J and C57Bl/6N background.  
165
Mouse skeletal preparation and analysis 
Skeletal preparation was performed on E18.5 embryos or p0 postnatal pups as previously 
described (McLeod, 1980). 
 
In situ hybridization 
Whole mount in situ hybridization was performed as previously described (McGlinn and 
Mansfield, 2011).  
 
FACS sorting and RNA-seq sample preparation 
Freshly dissected E9.5 embryos were dissociated in 0.25%Trypsin/2% chick serum, 
neutralized in DMEM+10% FBS and washed into PBS+2% FBS for FACS sorting. GFP 
positive cells were FACS sorted directly into RTL buffer (Qiagen) and RNA isolated 
using RNEasy with added on-column DNase treatment (Qiagen). RNA quality was 
assessed using a Bioanalyzer and 200 ng per individual embryo was used as input for 
RNA-seq library generation (unstranded Illumina TruSeq Kit). Libraries were 
multiplexed and sequenced using an Illumina HiSeq 2000 instrument, generating 50bp 
single end reads. 
 
RNA-seq and category enrichment analysis 
Quantification of the transcriptome using RNA-seq data was performed as previously 
described (Denzler et al., 2014). Raw reads were aligned to the latest build of the mouse 
genome (mm10) using STAR v. 2.3.1n (options --outFilterType BySJout --
outFilterMultimapScoreRange 0 --readMatesLengthsIn Equal --outFilterIntronMotifs 
166
RemoveNoncanonicalUnannotated --clip3pAdapterSeq 
TCGTATGCCGTCTTCTGCTTG --outSAMstrandField intronMotif --outStd SAM) 
(Dobin et al., 2013). Considering all replicates of a particular genotype, differential 
expression statistics were computed between genotypes of interest using cuffdiff v. 2.1.1 
(options --library-type fr-unstranded -c 100 -b mm10.fa -u --max-bundle-frags 
100000000) (Trapnell et al., 2013), using mouse transcript models of protein-coding 
genes annotated in Ensembl release 72. Before all subsequent analysis, we filtered away 
genes annotated by cuffdiff as “NOTEST” in all genotypes, indicating the genes were too 
lowly expressed to accuracy quantify their abundances. To evaluate functional gene 
categories that were statistically enriched, we loaded differentially expressed genes (i.e. 
genes with a Q value < 0.05) into the Core Analysis function of IPA software (Ingenuity 
Systems), testing gene categories related to development and function. All P values 
reported from this analysis were adjusted using the Benjamini-Hochberg method to 
control the false discovery rate. 
 
miRNA target analysis 
To identify predicted miRNA targets, the 3′ UTR sequences of protein-coding genes were 
searched to identify 6mer, 7mer-A1, 7mer-m8 and 8mer miRNA binding sites cognate to 
the miR-196 seed (Grimson et al., 2007; Garcia et al., 2011). A context+ score was 
computed for each target site within a given 3′ UTR, and scores were summed to produce 
a total context+ score for each gene, which was used for all miRNA-related analyses 
(Garcia et al., 2011). TargetScanMouse 6.2 was further utilized to assess target site 
conservation, or to include predicted miR-196 targets containing non-canonical 3′ 
167
compensatory sites, such as in the case of Hoxb8 (Friedman et al., 2009). 
 
Permutation test for significance testing 
A permutation test was devised to evaluate the significance of differences in vertebral 
number.  Briefly, given two groups of count-based data of size n and m, we randomly 
partitioned the counts (without replacement) from the union of the two groups to generate 
100,000 pairs of data, again of size n and m. To compute an empirical one-sided P value, 
we then computed the proportion of pairs that satisfied the condition that the difference in 
the means of each pair exceeded the difference in means of the original two groups. 
 
In vitro luciferase assay  
3′ UTR sequence (300-700 nucleotides) of protein-coding genes of interest were 
commercially synthesized and cloned into psiCheck2 vector. For each, a mutant version 
containing 4 nucleotide substitutions within the miR-196 seed sequence were generated. 
Constructs were transfected into NIH3T3 cells with or without 25pmol mmu-miR-196b 
duplex. Transfection (Lipofectamine2000, Life Technologies) and luciferase analysis 
(Dual Luciferase Reporter Assay System, Promega) were performed as per the 
manufacturer’s instructions.  
 
Chick electroporation and in vivo BatLuc reporter analysis 
Chicken embryos were harvested at stage 5 HH (Hamburger and Hamilton, 1951) and 
electroporated ex ovo as described (Denans et al., 2015) with a DNA mix containing 
BATLuc (1 µg/µL final), CMV-Renilla (Promega) (used as a control to normalize the 
168
differences of electroporation intensity between embryos (0.2 µg/µL final), a control 
pCAGGS-Venus vector (gift from K. Hadjantonakis) or a Hox gene of interest (Hoxb1, 
a5, a7, b7, b8 or c8) cloned in pCAGGS-IRES2-Venus (5 µg/µL final). Electroporated 
embryos were cultured in a humidified incubator at 38°C for 20 h. Embryos were 
analyzed using a fluorescent microscope and only embryos showing restricted expression 
of Venus in the paraxial mesoderm were selected (90 to 100% of the electroporated 
embryos) for luciferase assay (between 3 to 5 embryos for each condition). The posterior 
region (from somite 1 to tail-bud) of the selected embryos was dissected and lysed in 
passive lysis buffer (Promega) for 15 minutes at room temperature. Lysates were then 
distributed in a 96 well plate and luciferase assays were performed using a Centro LB 
960 luminometer (Berthold Technology) and the dual luciferase kit (Promega) following 
manufacturer’s instructions. Raw intensity values for Firefly luciferase signal were 
normalized with corresponding Renilla luciferase values (RLU) and the control 
experiment was set to 1. 
 
Acknowledgements 
We thank Xin Sun and Denis Duboule for supplying in situ probes and A. Dobin for help 
in understanding RNA-seq mapping parameters. We thank Allan Bradley for providing 
three knockout mouse ES cell lines used in these studies. We thank Christophe Marcelle, 
Eran Hornstein and Jan Manent for critical reading of the manuscript. This work was 
supported by an NSF Graduate Research Fellowship (V.A.), NIH grant GM067031 
(D.B.), NIH grant R37HD032443-19 (C.J.T.) and NH&MRC Project Grant APP1051792 
(E.M.). E.M. thanks Bioplatforms Australia for support. D.B. is a Howard Hughes 
169
Medical Institute Investigator. The Australian Regenerative Medicine Institute is 
supported by grants from the State Government of Victoria and the Australian 
Government. The data reported in this paper are compiled in Supplementary Information; 
all raw and processed RNA-seq data are deposited in the NCBI Gene Expression 
Omnibus (GEO) under accession number GSE53018. The authors declare no competing 
financial interests. Correspondence and requests for materials should be addressed to 
edwina.mcglinn@emblaustralia.org.  
170
References 
Asli, N.S., and Kessel, M. (2010). Spatiotemporally restricted regulation of generic motor 
neuron programs by miR-196-mediated repression of Hoxb8. Developmental 
biology 344, 857-868. 
Aulehla, A., Wehrle, C., Brand-Saberi, B., Kemler, R., Gossler, A., Kanzler, B., and 
Herrmann, B.G. (2003). Wnt3a plays a major role in the segmentation clock 
controlling somitogenesis. Developmental cell 4, 395-406. 
Aulehla, A., Wiegraebe, W., Baubet, V., Wahl, M.B., Deng, C., Taketo, M., Lewandoski, 
M., and Pourquie, O. (2008). A beta-catenin gradient links the clock and 
wavefront systems in mouse embryo segmentation. Nature cell biology 10, 186-
193. 
Bajard, L., Morelli, L.G., Ares, S., Pecreaux, J., Julicher, F., and Oates, A.C. (2014). 
Wnt-regulated dynamics of positional information in zebrafish somitogenesis. 
Development 141, 1381-1391. 
Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions. Cell 136, 
215-233. 
Benazeraf, B., and Pourquie, O. (2013). Formation and segmentation of the vertebrate 
body axis. Annual review of cell and developmental biology 29, 1-26. 
Burke, A.C., Nelson, C.E., Morgan, B.A., and Tabin, C. (1995). Hox genes and the 
evolution of vertebrate axial morphology. Development 121, 333-346. 
Buttitta, L., Tanaka, T.S., Chen, A.E., Ko, M.S., and Fan, C.M. (2003). Microarray 
analysis of somitogenesis reveals novel targets of different WNT signaling 
pathways in the somitic mesoderm. Developmental biology 258, 91-104. 
Cambray, N., and Wilson, V. (2002). Axial progenitors with extensive potency are 
localised to the mouse chordoneural hinge. Development 129, 4855-4866. 
Cambray, N., and Wilson, V. (2007). Two distinct sources for a population of maturing 
axial progenitors. Development 134, 2829-2840. 
Carapuco, M., Novoa, A., Bobola, N., and Mallo, M. (2005). Hox genes specify vertebral 
types in the presomitic mesoderm. Genes & development 19, 2116-2121. 
Chamorro, M.N., Schwartz, D.R., Vonica, A., Brivanlou, A.H., Cho, K.R., and Varmus, 
H.E. (2005). FGF-20 and DKK1 are transcriptional targets of beta-catenin and 
FGF-20 is implicated in cancer and development. EMBO J 24, 73-84. 
Chen, F., and Capecchi, M.R. (1997). Targeted mutations in hoxa-9 and hoxb-9 reveal 
synergistic interactions. Developmental biology 181, 186-196. 
Chung, A.C., Katz, D., Pereira, F.A., Jackson, K.J., DeMayo, F.J., Cooney, A.J., and 
O'Malley, B.W. (2001). Loss of orphan receptor germ cell nuclear factor function 
results in ectopic development of the tail bud and a novel posterior truncation. 
Molecular and cellular biology 21, 663-677. 
Davis, A.P., and Capecchi, M.R. (1994). Axial homeosis and appendicular skeleton 
defects in mice with a targeted disruption of hoxd-11. Development 120, 2187-
2198. 
Davis, A.P., Witte, D.P., Hsieh-Li, H.M., Potter, S.S., and Capecchi, M.R. (1995). 
Absence of radius and ulna in mice lacking hoxa-11 and hoxd-11. Nature 375, 
791-795. 
Denans, N., Iimura, T., and Pourquie, O. (2015). Hox genes control vertebrate body 
171
elongation by collinear Wnt repression. Elife 4. 
Denzler, R., Agarwal, V., Stefano, J., Bartel, D.P., and Stoffel, M. (2014). Assessing the 
ceRNA hypothesis with quantitative measurements of miRNA and target 
abundance. Mol Cell 54, 766-776. 
Dequeant, M.L., Glynn, E., Gaudenz, K., Wahl, M., Chen, J., Mushegian, A., and 
Pourquie, O. (2006). A complex oscillating network of signaling genes underlies 
the mouse segmentation clock. Science 314, 1595-1598. 
Dobin, A., Davis, C.A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., 
Chaisson, M., and Gingeras, T.R. (2013). STAR: ultrafast universal RNA-seq 
aligner. Bioinformatics 29, 15-21. 
Duboule, D., and Dolle, P. (1989). The structural and functional organization of the 
murine HOX gene family resembles that of Drosophila homeotic genes. EMBO J 
8, 1497-1505. 
Dubrulle, J., McGrew, M.J., and Pourquie, O. (2001). FGF signaling controls somite 
boundary position and regulates segmentation clock control of spatiotemporal 
Hox gene activation. Cell 106, 219-232. 
Dunty, W.C., Jr., Biris, K.K., Chalamalasetty, R.B., Taketo, M.M., Lewandoski, M., and 
Yamaguchi, T.P. (2008). Wnt3a/beta-catenin signaling controls posterior body 
development by coordinating mesoderm formation and segmentation. 
Development 135, 85-94. 
Economides, K.D., Zeltser, L., and Capecchi, M.R. (2003). Hoxb13 mutations cause 
overgrowth of caudal spinal cord and tail vertebrae. Developmental biology 256, 
317-330. 
Favier, B., Le Meur, M., Chambon, P., and Dolle, P. (1995). Axial skeleton homeosis and 
forelimb malformations in Hoxd-11 mutant mice. Proceedings of the National 
Academy of Sciences of the United States of America 92, 310-314. 
Favier, B., Rijli, F.M., Fromental-Ramain, C., Fraulob, V., Chambon, P., and Dolle, P. 
(1996). Functional cooperation between the non-paralogous genes Hoxa-10 and 
Hoxd-11 in the developing forelimb and axial skeleton. Development 122, 449-
460. 
Friedman, R.C., Farh, K.K., Burge, C.B., and Bartel, D.P. (2009). Most mammalian 
mRNAs are conserved targets of microRNAs. Genome Research 19, 92-105. 
Galceran, J., Sustmann, C., Hsu, S.C., Folberth, S., and Grosschedl, R. (2004). LEF1-
mediated regulation of Delta-like1 links Wnt and Notch signaling in 
somitogenesis. Genes & development 18, 2718-2723. 
Garcia, D.M., Baek, D., Shin, C., Bell, G.W., Grimson, A., and Bartel, D.P. (2011). 
Weak seed-pairing stability and high target-site abundance decrease the 
proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol 18, 1139-1146. 
Gomez, C., Ozbudak, E.M., Wunderlich, J., Baumann, D., Lewis, J., and Pourquie, O. 
(2008). Control of segment number in vertebrate embryos. Nature 454, 335-339. 
Graham, A., Papalopulu, N., and Krumlauf, R. (1989). The murine and Drosophila 
homeobox gene complexes have common features of organization and expression. 
Cell 57, 367-378. 
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and Bartel, D.P. 
(2007). MicroRNA targeting specificity in mammals: determinants beyond seed 
pairing. Molecular Cell 27, 91-105. 
172
Hamburger, V., and Hamilton, H.L. (1951). A series of normal stages in the development 
of the chick embryo. J Morphol 88, 49-92. 
Harima, Y., Takashima, Y., Ueda, Y., Ohtsuka, T., and Kageyama, R. (2013). 
Accelerating the tempo of the segmentation clock by reducing the number of 
introns in the Hes7 gene. Cell Rep 3, 1-7. 
He, X., Yan, Y.L., Eberhart, J.K., Herpin, A., Wagner, T.U., Schartl, M., and 
Postlethwait, J.H. (2011). miR-196 regulates axial patterning and pectoral 
appendage initiation. Developmental biology 357, 463-477. 
Heimberg, A., and McGlinn, E. (2012). Building a robust a-p axis. Current genomics 13, 
278-288. 
Hofmann, M., Schuster-Gossler, K., Watabe-Rudolph, M., Aulehla, A., Herrmann, B.G., 
and Gossler, A. (2004). WNT signaling, in synergy with T/TBX6, controls Notch 
signaling by regulating Dll1 expression in the presomitic mesoderm of mouse 
embryos. Genes & development 18, 2712-2717. 
Hornstein, E., Mansfield, J.H., Yekta, S., Hu, J.K., Harfe, B.D., McManus, M.T., 
Baskerville, S., Bartel, D.P., and Tabin, C.J. (2005). The microRNA miR-196 acts 
upstream of Hoxb8 and Shh in limb development. Nature 438, 671-674. 
Iimura, T., and Pourquie, O. (2006). Collinear activation of Hoxb genes during 
gastrulation is linked to mesoderm cell ingression. Nature 442, 568-571. 
Iimura, T., Yang, X., Weijer, C.J., and Pourquie, O. (2007). Dual mode of paraxial 
mesoderm formation during chick gastrulation. Proc Natl Acad Sci U S A 104, 
2744-2749. 
Ito, K., Nakamura, H., and Watanabe, Y. (2011). Protogenin mediates cell adhesion for 
ingression and re-epithelialization of paraxial mesodermal cells. Developmental 
biology 351, 13-24. 
Jurberg, A.D., Aires, R., Novoa, A., Rowland, J.E., and Mallo, M. (2014). Compartment-
dependent activities of Wnt3a/beta-catenin signaling during vertebrate axial 
extension. Dev Biol 394, 253-263. 
Jurberg, A.D., Aires, R., Varela-Lasheras, I., Novoa, A., and Mallo, M. (2013). Switching 
axial progenitors from producing trunk to tail tissues in vertebrate embryos. Dev 
Cell 25, 451-462. 
Kramer, A., Green, J., Pollard, J., Jr., and Tugendreich, S. (2014). Causal analysis 
approaches in Ingenuity Pathway Analysis. Bioinformatics 30, 523-530. 
Lagos-Quintana, M., Rauhut, R., Meyer, J., Borkhardt, A., and Tuschl, T. (2003). New 
microRNAs from mouse and human. RNA 9, 175-179. 
Lee, Y.J., McPherron, A., Choe, S., Sakai, Y., Chandraratna, R.A., Lee, S.J., and Oh, S.P. 
(2010). Growth differentiation factor 11 signaling controls retinoic acid activity 
for axial vertebral development. Dev Biol 347, 195-203. 
Li, Z., Huang, H., Chen, P., He, M., Li, Y., Arnovitz, S., Jiang, X., He, C., Hyjek, E., 
Zhang, J., et al. (2012). miR-196b directly targets both HOXA9/MEIS1 
oncogenes and FAS tumour suppressor in MLL-rearranged leukaemia. Nature 
communications 3, 688. 
Lickert, H., Cox, B., Wehrle, C., Taketo, M.M., Kemler, R., and Rossant, J. (2005). 
Dissecting Wnt/beta-catenin signaling during gastrulation using RNA interference 
in mouse embryos. Development 132, 2599-2609. 
Mansfield, J.H., Harfe, B.D., Nissen, R., Obenauer, J., Srineel, J., Chaudhuri, A., Farzan-
173
Kashani, R., Zuker, M., Pasquinelli, A.E., Ruvkun, G., et al. (2004). MicroRNA-
responsive 'sensor' transgenes uncover Hox-like and other developmentally 
regulated patterns of vertebrate microRNA expression. Nature genetics 36, 1079-
1083. 
McGlinn, E., and Mansfield, J.H. (2011). Detection of gene expression in mouse embryos 
and tissue sections. Methods in molecular biology 770, 259-292. 
McGlinn, E., Yekta, S., Mansfield, J.H., Soutschek, J., Bartel, D.P., and Tabin, C.J. 
(2009). In ovo application of antagomiRs indicates a role for miR-196 in 
patterning the chick axial skeleton through Hox gene regulation. Proceedings of 
the National Academy of Sciences of the United States of America 106, 18610-
18615. 
McGrew, M.J., Sherman, A., Lillico, S.G., Ellard, F.M., Radcliffe, P.A., Gilhooley, H.J., 
Mitrophanous, K.A., Cambray, N., Wilson, V., and Sang, H. (2008). Localised 
axial progenitor cell populations in the avian tail bud are not committed to a 
posterior Hox identity. Development 135, 2289-2299. 
McLeod, M.J. (1980). Differential staining of cartilage and bone in whole mouse fetuses 
by alcian blue and alizarin red S. Teratology 22, 299-301. 
McPherron, A.C., Lawler, A.M., and Lee, S.J. (1999). Regulation of anterior/posterior 
patterning of the axial skeleton by growth/differentiation factor 11. Nat Genet 22, 
260-264. 
Mikawa, S., Morozumi, T., Shimanuki, S., Hayashi, T., Uenishi, H., Domukai, M., 
Okumura, N., and Awata, T. (2007). Fine mapping of a swine quantitative trait 
locus for number of vertebrae and analysis of an orphan nuclear receptor, germ 
cell nuclear factor (NR6A1). Genome research 17, 586-593. 
Moran, Y., Fredman, D., Praher, D., Li, X.Z., Wee, L.M., Rentzsch, F., Zamore, P.D., 
Technau, U., and Seitz, H. (2014). Cnidarian microRNAs frequently regulate 
targets by cleavage. Genome Res 24, 651-663. 
Naiche, L.A., Holder, N., and Lewandoski, M. (2011). FGF4 and FGF8 comprise the 
wavefront activity that controls somitogenesis. Proc Natl Acad Sci U S A 108, 
4018-4023. 
Neijts, R., Simmini, S., Giuliani, F., van Rooijen, C., and Deschamps, J. (2014). Region-
specific regulation of posterior axial elongation during vertebrate embryogenesis. 
Dev Dyn 243, 88-98. 
Nowotschin, S., Ferrer-Vaquer, A., Concepcion, D., Papaioannou, V.E., and 
Hadjantonakis, A.K. (2012). Interaction of Wnt3a, Msgn1 and Tbx6 in neural 
versus paraxial mesoderm lineage commitment and paraxial mesoderm 
differentiation in the mouse embryo. Dev Biol 367, 1-14. 
Pollock, R.A., Jay, G., and Bieberich, C.J. (1992). Altering the boundaries of Hox3.1 
expression: evidence for antipodal gene regulation. Cell 71, 911-923. 
Pollock, R.A., Sreenath, T., Ngo, L., and Bieberich, C.J. (1995). Gain of function 
mutations for paralogous Hox genes: implications for the evolution of Hox gene 
function. Proceedings of the National Academy of Sciences of the United States 
of America 92, 4492-4496. 
Prosser, H.M., Koike-Yusa, H., Cooper, J.D., Law, F.C., and Bradley, A. (2011). A 
resource of vectors and ES cells for targeted deletion of microRNAs in mice. 
Nature biotechnology 29, 840-845. 
174
Psychoyos, D., and Stern, C.D. (1996). Fates and migratory routes of primitive streak 
cells in the chick embryo. Development 122, 1523-1534. 
Ryan, J.F., and Baxevanis, A.D. (2007). Hox, Wnt, and the evolution of the primary body 
axis: insights from the early-divergent phyla. Biology direct 2, 37. 
Sawada, A., Shinya, M., Jiang, Y.J., Kawakami, A., Kuroiwa, A., and Takeda, H. (2001). 
Fgf/MAPK signalling is a crucial positional cue in somite boundary formation. 
Development 128, 4873-4880. 
Schroter, C., and Oates, A.C. (2010). Segment number and axial identity in a 
segmentation clock period mutant. Curr Biol 20, 1254-1258. 
Spitz, F., Gonzalez, F., Peichel, C., Vogt, T.F., Duboule, D., and Zakany, J. (2001). Large 
scale transgenic and cluster deletion analysis of the HoxD complex separate an 
ancestral regulatory module from evolutionary innovations. Genes & 
development 15, 2209-2214. 
Takada, S., Stark, K.L., Shea, M.J., Vassileva, G., McMahon, J.A., and McMahon, A.P. 
(1994). Wnt-3a regulates somite and tailbud formation in the mouse embryo. 
Genes & development 8, 174-189. 
Takahashi, M., Fujita, M., Furukawa, Y., Hamamoto, R., Shimokawa, T., Miwa, N., 
Ogawa, M., and Nakamura, Y. (2002). Isolation of a novel human gene, 
APCDD1, as a direct target of the beta-Catenin/T-cell factor 4 complex with 
probable involvement in colorectal carcinogenesis. Cancer research 62, 5651-
5656. 
Takemoto, T., Uchikawa, M., Yoshida, M., Bell, D.M., Lovell-Badge, R., Papaioannou, 
V.E., and Kondoh, H. (2011). Tbx6-dependent Sox2 regulation determines neural 
or mesodermal fate in axial stem cells. Nature 470, 394-398. 
Trapnell, C., Hendrickson, D.G., Sauvageau, M., Goff, L., Rinn, J.L., and Pachter, L. 
(2013). Differential analysis of gene regulation at transcript resolution with RNA-
seq. Nature biotechnology 31, 46-53. 
Tsakiridis, A., Huang, Y., Blin, G., Skylaki, S., Wymeersch, F., Osorno, R., Economou, 
C., Karagianni, E., Zhao, S., Lowell, S., et al. (2014). Distinct Wnt-driven 
primitive streak-like populations reflect in vivo lineage precursors. Development 
141, 1209-1221. 
van den Akker, E., Fromental-Ramain, C., de Graaff, W., Le Mouellic, H., Brulet, P., 
Chambon, P., and Deschamps, J. (2001). Axial skeletal patterning in mice lacking 
all paralogous group 8 Hox genes. Development 128, 1911-1921. 
Velu, C.S., Chaubey, A., Phelan, J.D., Horman, S.R., Wunderlich, M., Guzman, M.L., 
Jegga, A.G., Zeleznik-Le, N.J., Chen, J., Mulloy, J.C., et al. (2014). Therapeutic 
antagonists of microRNAs deplete leukemia-initiating cell activity. The Journal of 
clinical investigation 124, 222-236. 
Vonk, F.J., Casewell, N.R., Henkel, C.V., Heimberg, A.M., Jansen, H.J., McCleary, R.J., 
Kerkkamp, H.M., Vos, R.A., Guerreiro, I., Calvete, J.J., et al. (2013). The king 
cobra genome reveals dynamic gene evolution and adaptation in the snake venom 
system. Proceedings of the National Academy of Sciences of the United States of 
America 110, 20651-20656. 
Weidinger, G., Thorpe, C.J., Wuennenberg-Stapleton, K., Ngai, J., and Moon, R.T. 
(2005). The Sp1-related transcription factors sp5 and sp5-like act downstream of 
Wnt/beta-catenin signaling in mesoderm and neuroectoderm patterning. Current 
175
biology : CB 15, 489-500. 
Wellik, D.M. (2007). Hox patterning of the vertebrate axial skeleton. Developmental 
dynamics : an official publication of the American Association of Anatomists 
236, 2454-2463. 
Wellik, D.M., and Capecchi, M.R. (2003). Hox10 and Hox11 genes are required to 
globally pattern the mammalian skeleton. Science 301, 363-367. 
Yekta, S., Shih, I.H., and Bartel, D.P. (2004). MicroRNA-directed cleavage of HOXB8 
mRNA. Science 304, 594-596. 
Yekta, S., Tabin, C.J., and Bartel, D.P. (2008). MicroRNAs in the Hox network: an 
apparent link to posterior prevalence. Nature reviews Genetics 9, 789-796. 
Young, T., Rowland, J.E., van de Ven, C., Bialecka, M., Novoa, A., Carapuco, M., van 
Nes, J., de Graaff, W., Duluc, I., Freund, J.N., et al. (2009). Cdx and Hox genes 
differentially regulate posterior axial growth in mammalian embryos. 
Developmental cell 17, 516-526. 
Zakany, J., Gerard, M., Favier, B., and Duboule, D. (1997). Deletion of a HoxD enhancer 
induces transcriptional heterochrony leading to transposition of the sacrum. 
EMBO J 16, 4393-4402. 
Zhang, Z., O'Rourke, J.R., McManus, M.T., Lewandoski, M., Harfe, B.D., and Sun, X. 
(2011). The microRNA-processing enzyme Dicer is dispensable for somite 
segmentation but essential for limb bud positioning. Developmental biology 351, 
254-265. 
  
176
Figures and figure legends 
Fig 1. Unique and overlapping expression patterns of miR-196 paralogs in mouse.   
(A) Mouse Hox clusters, with the position of Hox-embedded microRNAs depicted.  
Predicted Hox targets of the miR-196 family are indicated in blue.  (B-K) Detection of 
eGFP transcripts in miR-196a1GFP/+ (B-F) and miR-196a2GFP/+ (G-K) embryos 
demonstrates spatio-temporal expression differences for these identical miRNAs. 
Embryonic age indicated, red and white arrowheads indicate the anterior boundary of 
somitic and neural expression respectively. Arrows in (F,G) indicate weak ventral 
expression in miR-196a2GFP/+ embryos. Inset in (D,E) indicates reduced eGFP signal in 
the anterior PSM of 196a1GFP/+ embryos. 
 
Fig 2. miR-196 paralogs function in establishing vertebral identity and number in 
mouse.   
(A) Identification of vertebral patterning defects in individual and compound mir-196 
loss-of-function E18.5 embryos. Genotypes indicated. The positions of the 13th thoracic 
element (T13) and first sacral element (S1) are labelled. Inset displays the thoracic-
lumbar junction. (B) Individual vertebra analysis to demonstrate identity alterations at the 
thoraco-lumbar and lumbo-sacral junctions. Genotypes indicated. The position of a rib-
like nubbin on lumbar elements is marked with arrow. The position of sacral process is 
marked with an asterisk. (C) Rib fusion defects observed following loss of miR-196 
alleles, genotypes indicated. Fusion of the 8th rib to the sternum was unilateral or bilateral 
as indicated with arrows. (D) Summary of patterning defects identified across the miR-
196 allelic series. An asterisk indicates a homeotic transformation of that vertebral 
177
element. (E) Quantification of vertebral number in single and compound mir-196 loss-of-
function E18.5 embryos identifies a role for miR-196 in controlling axis length in mouse. 
Statistical comparison of vertebral number relative to wildtype were performed using a 
permutation test, with P values corrected for multiple hypothesis testing using the 
Bonferroni method;  * P < 0.05, *** P < 0.001, **** P < 0.0001. 
 
Fig 3. Whole transcriptome analysis of miR-196 mutant cells reveals a dysregulation 
of miRNA targets and skeletal genes 
(A) Overview of the experimental and computational strategy used to identify global 
transcriptome alterations following loss of miR-196 function. (B) Mean fold changes of 
genes associated with predicted targets of miR-196, partitioned into four context+ 
intervals according to predicted miRNA targeting efficacy (0 < context+ < -0.2, n=2112; 
-0.2 ≤ context+ < -0.3, n=145; -0.3 ≤ context+ < -0.4, n=50; context+ ≤ -0.4, n=37), 
across seven genotype comparisons. Statistical comparison of observed up-regulation of 
genes relative to genes with no miRNA target site, as evaluated by a one-sided 
Kolmogorov-Smirnov (K-S) test; * P < 0.05, **  P < 0.001. (C) Top 10 significant 
categories related to gene development and function associated with differentially 
expressed genes. (D) Top 15 categories related to “skeletal and muscular development” 
activated in the 196a2–/–;196b–/– vs 196a2-/+ comparison, with corresponding activation z-
scores and P values. An activation z-scores is a measurement of the consistency between 
the observed pattern of up- and down-regulation of genes in a category and the predicted 
activation or inhibition pattern in networks stored in the Ingenuity Knowledgebase 
relative to a random pattern (Kramer et al., 2014). P values in (C,D) are Benjamini-
178
Hochberg corrected P values, with dashed black lines indicating a significance threshold 
of 0.01. 
 
Fig 4. Loss of miR-196 function alters global Hox signatures.  
(A) Extensive Hox gene dysregulation is identified following loss of miR-196. 
Quantitative expression analysis of all 39 Hox genes in cells isolated from E9.5 mutant 
embryos, genotype comparisons are color-coded. Hox genes with one or more predicted 
miR-196 target binding sites are indicated in red. Filled circles at the tips of fold changes 
represent a statistically-significant change at q < 0.05. (B,C) WISH analysis of miR-
196a2GFP/GFP;miR-196b–/– E9.5 embryos relative to wildtype identifies a caudal 
expansion of Hoxb8 (B; n= 3/3) and Hoxc8 (C; n=3/3). Expression within the PSM is 
indicated with a red line/arrowhead, neural tube expression with a white arrowhead.  
 
Fig. 5. Identification of additional putative direct (non-Hox) miR-196 targets. (A) 
List of the most highly upregulated genes, and their associated fold changes in seven 
genotype comparisons, that either: i) contain a conserved miR-196 binding site, or ii) are 
predicted to respond strongly to the miRNA (i.e. have a context+ score ≤ -0.2). Genes 
with one or more conserved miR-196 target binding sites are indicated in green. (B) In 
vitro luciferase analysis confirms sequence-specific regulation of 3 experimentally 
supported target genes of miR-196. Renilla luciferase intensity values have been 
normalized to their respective Firefly values (RLU). Controls (wildtype 3′ UTR construct 
without miR-196b) were set to 1. MUT: mutated 3′ UTR construct destroying miR-196 
binding site. Error bars represent standard deviation. P values, students t-test, * P < 0.05, 
179
***  P < 0.0005, **** P < 0.0001.  
 
Fig 6. Loss of miR-196 function alters signaling pathways known to control 
segmentation, axis elongation and the trunk-to-tail transition. (a) Quantitative 
expression analysis of pathways known to control segmentation and axial extension in 
cells isolated from E9.5 mutant embryos, genotype comparisons are color-coded. Filled 
circles at the tips of fold changes represent a statistically-significant change at Q < 0.05.  
 
Fig 7. miR-196 has the potential to regulate Wnt signaling by both direct and 
indirect mechanisms. 
(A) WISH analysis confirms increased Dkk1 in 196a1–/– ;196a2–/– E9.5 embryos relative 
to 196a1-/+;196a2-/+ (n = 2/2 for each genotype).  (B) In vitro luciferase assay confirms 
sequence-specific regulation of Dkk1 by miR-196 . Renilla luciferase intensity values 
have been normalized to their respective Firefly values (RLU). Controls (wildtype 3′ 
UTR construct without miR-196b) were set to 1. MUT: mutated 3′ UTR construct 
destroying miR-196 binding site. (C) Luciferase assay measuring Wnt/β-catenin activity 
after over-expression of BATLuc together with CMV-Renilla and either control, Hoxb1, 
Hoxa5, Hoxa7, Hoxb7, Hoxb8, or Hoxc8; n= 4-9 samples per gene assessed. Firefly 
luciferase intensity values have been normalized to their respective Renilla values (RLU). 
Control value were set to 1. In (B) and (C),  error bars represent standard deviation. 
Reported P values are from the  Students t-test, * P < 0.05, ** P < 0.005, *** P < 0.0005, 
**** P < 0.0001. 
 
180
Supplementary Fig S1. Generation of miR-196a1GFP and miR-196a2GFP knock-in 
mouse lines.  
(A) miR-196a1GFP knock-in targeting strategy and (B) confirmation of correct targeting 
by Southern blot analysis of BStZ17I genomic digestion. The position of Southern probe 
is indicated with a blue box in (A). (C) miR-196a2GFP knock-in targeting strategy and (D) 
confirmation of correct targeting by Southern blot analysis of Swa1 genomic digestion. 
The position of Southern probe is indicated with a blue box in (C).  
 
Supplementary Fig S2. Generation of miR-196a1–/– and miR-196a2–/– and miR-196b–
/– knockout mouse lines.  
(A) Generalized targeting strategy employed by the Wellcome Trust Sanger Institute to 
create miRNA knockout ES cells (Prosser et al., 2011). Prior to ES cell injection, correct 
targeting was confirmed in house by Southern blot analysis of the miR-196a1–/– (B),  
miR-196a2–/– (C) and miR-196b–/– (D) loci. The general Southern blot strategy is 
indicated in blue in (A).  
 
Supplementary Fig S3. Summary of vertebral patterning alterations observed in 
miR-196 single and compound mutant mice.  
A cartoon summary of the main patterning defects observed in miR-196 mutant mice, 
homeotic transformation of the wildtype axial formulae are marked in with an asterisk. 
The numbers of skeletons analyzed for each genotype and their phenotypic spectrum is 
indicated.  
 
181
Supplementary Fig S4. Predicted miRNA target genes are up-regulated upon the 
loss of miR-196. 
(A-G) Cumulative density plots of the fold changes of genes predicted as targets of miR-
196, partitioned into four context+ intervals according to increasing predicted miRNA 
targeting efficacy (0 < context+ < -0.2, n=2112; -0.2 ≤ context+ < -0.3, n=145; -0.3 ≤ 
context+ < -0.4, n=50; context+ ≤ -0.4, n=37), and genes with no predicted target site 
(n=6924), across seven genotype comparisons. The P values indicate a statistical 
comparison of the observed de-repression of genes relative to genes with no miRNA 
target site, as evaluated by a one-sided Kolmogorov-Smirnov (K-S) test. 
 
Supplementary Fig S5. Significant functional categories associated with 
differentially expressed genes. 
(A-G) All significant categories related to gene development and function associated with 
differentially expressed genes, across seven genotype comparisons. All P values are 
Benjamini-Hochberg corrected, with dashed black lines indicating a significance 
threshold of 0.01.   
 
Supplementary Fig S6. Inference of upstream regulators reveals a downregulation 
of Wnt activity. 
(A) Upstream regulators inferred by Ingenuity Pathway Analysis as being dysregulated 
based upon the behavior of differentially-expressed genes in three genotype comparisons. 
Activation z-scores and P values are computed as described in Figure 3D. As a positive 
control, miR-196 is correctly inferred as the most significant miRNA to have diminished 
182
activity. β-catenin/CTNNB1 (Wnt) activity is predicted to also diminish with the loss of 
miR-196; in contrast, MYCN, MYC and SRF activity is predicted to become activated. (B) 
Network of upstream and downstream interactions in the Ingenuity knowledgebase that 
were used to infer decreased Wnt activity in the 196a2–/–;196b–/– vs 196a2-/+ comparison. 
Genes are shaded according to their observed up- or down-regulation in this comparison. 
 
Supplementary Table 1. Removal of miR-196 family members causes axial 
patterning defects. 
Summary of vertebral malformations and vertebral transformations identified in single 
and compound miR-196 knockout mice.  
 
Supplementary Table 2: RNA-seq library statistics 
Summary of read mapping statistics associated with the 44 RNA-seq samples generated 
in this study, including total number of reads sequenced per sample and total mapped to 
the mouse genome (mm10). 
183
184
185
AD
B
C
0 < context+ <í
í context+ <í
í conte[Wí
context+ í
Dí
í
YV
Dí

Dí
í
YV
Dí

Dí
íD
í
íY
VD
í
D
í
Dí
íE
í
YV
D
í
Dí
E
íí
YV
D
í
Dí
íE
íí
YV
D
í
M
ea
n 
fo
ld
 c
ha
ng
e 
(lo
g 
)
**
*
*
** **
**
**
**
*
*
**
** * ****
CardioYDVFXODU6\VWHP'eY
ConnectiYH7LVVXH'eY
Organ MorSKRORJ\
6kHOHWDODQG0XVFXODU6\VWHP'eY
2UJDQ'eY
2UJDQLVPDO'eY
(PEr\RQLF'eY
7LVVXH'eY
2UJDQLVPDO6XrYLYal
7LVVXH0RrSKRORJ\
íORJ10(P YDOXH
0 5 10 15
DEYVD
DDYVDD
DYVD
DYVD
F$&6VRUW
*)3FHOOV
51$VHT
ELRORJLFDOUHSOLFDWHVJHQRW\SH
Wong et al., Figure 3
4XDQWLW\RIYerWHErae
'LIfHUHQWLDWLRQRIRVWHREODVWV
Cartilage deYHORSPHQW
MorSKRORJ\RIMaw
MorSKRORJ\RIrLE
$EQRrPDOPRrpholog\RIERQH
)XVLRQRIYerWHErae
MorSKRORJ\RIOLPE
)XVLRQRIERQH
MorSKRORJ\RIVkeleton
MorSKRORJ\RIVNXOO
MorSKRORJ\RID[LDOVkeleton
MorSKRORJ\RIYerWHErae
MorSKRORJ\RIYerWHErDOFROXPQ
MorSKRORJ\RIERQH
ActiYDWLRQ]íVFRUH
0 1    0   6 8 10 
íORJ10(P YDOXH





Dí
D
í
íY
VD
í
D
í
**
**
**
*
'LfIHUHQWLDOH[SUHVVLRQDQDO\VLVFXffdifI
3DWKZD\DQDO\VLV
,QJHQXLW\
PL5WDUJHW
DQDO\VLVTDUJHW6FDQ

186
187
188
189
190
A 
B 
Wong et al.,  Figure S1 
eGFP 
5’ Hom arm 3’ Hom arm 
UTR Neo 
Swa1 
Swa1 
miR-196a2 Wildtype 
Targeting 
construct 
Targeted  
allele 
Neo removal  
C 
D 
W
ild
ty
pe
 
Ta
rg
et
ed
 
2.9 Kb 
5.9 Kb 
Δ52bp 
DipTox 
eGFP 
5’ Hom arm 3’ Hom arm 
UTR Neo 
965bp 6179bp 
5.9 Kb 
eGFP 
5’ Hom arm 3’ Hom arm 
UTR 
2.8 Kb 
Swa1 
Swa1 
Ta
rg
et
ed
 
W
ild
ty
pe
 
eGFP 
5’ Hom arm 3’ Hom arm 
UTR Neo 
BstZ17I BstZ17I
miR-196a1 Wildtype 
Targeting 
construct 
Targeted  
allele 
Neo removal  
W
ild
ty
pe
 
Ta
rg
et
ed
 
6.2 Kb 
9.4 Kb 
Δ72bp 
  FRT 
BstZ17I BstZ17I 
DipTox 
eGFP 
5’ Hom arm 3’ Hom arm 
UTR Neo 
1545bp 8012bp 
9.4 Kb 
eGFP 
5’ Hom arm 3’ Hom arm 
UTR 
6.2 Kb 
Hoxb9 
  FRT 
191
PGK 
5’ Hom arm 3’ Hom arm 
puDeltaTK BGHpA 
Digest site Digest site 
miRNA Wildtype 
Targeting 
construct 
Targeted  
allele 
Selection 
removal  
A 
B 
W
ild
ty
pe
 
Ta
rg
et
ed
 
Δ≈200bp 
LoxP 
F3 
FRT 
Digest site Digest site 
PGK 
5’ Hom arm 3’ Hom arm 
puDeltaTK BGHpA 
5’ Hom arm 3’ Hom arm 
Ta
rg
et
ed
 
W
ild
ty
pe
 
Hpa1 Digest 
C 
HindIII Digest 
D 
W
ild
ty
pe
 
Ta
rg
et
ed
 
BspH1 Digest 
Ta
rg
et
ed
 
Ta
rg
et
ed
 
W
ild
ty
pe
 
W
ild
ty
pe
 
Ta
rg
et
ed
 
Confirmation of miR-196a1 targeting Confirmation of miR-196a2 targeting Confirmation of miR-196b targeting 
Wong et al.,  Figure S2 
192
Wildtype               n=47           1              46 
Single mutants 
196a1-/-               n=28        7          19
196a2-/-               n=21                  4 12         5
196b-/-                n=26                  13            9         4 
Double mutants
196a1-/-;196a2-/-                n=18              0 13          5 
196a2-/-;196ab-/-                n=13                      0      13 
196a1-/-;196b-/-                n=5             1           2          2 
Triple knockout - Allelic series  
196a1+/-;196a2+/-;196b+/-   n=11                      4 1          6 
196a1+/-;196a2+/-;196b-/-    n=12              0 12 
196a1+/-;196a2-/-;196b+/-    n=7              0 7 
196a1+/-;196a2-/-;196b-/-     n=8              0 8 
196a1-/-;196a2+/-;196b+/-    n=7              2 1 4
196a1-/-;196a2+/-;196b-/-     n=6                  1 5 
196a1-/-;196a2-/-;196b+/-     n=3              0 3 
196a1-/-;196a2-/-;196b-/-      n=3                0 3 
Wong et al.,  Figure S3 
T13 T13 
L1 
L2 
L3 
L4 
L5 
L6 
S1 
S2 
S3 
S4 
Ca1 
T13 
T14* 
L2 
L3 
L4 
L5 
L6 
S1 
S2 
S3 
S4 
Ca1 
L1* 
L2* 
L3* 
L4* 
L5* 
L6* 
S1* 
S2* 
S3* 
S4* 
Ca1* 
T14* 
T15* 
T13 
L2* 
L3* 
L4* 
L5* 
L6* 
S1* 
S2* 
S3* 
S4* 
Ca1* 
T14* 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
Wildtype L1-to-T 
L1-to-T 
Posterior  
sacral 
displacement 
L1-to-T 
L2-to-T 
Posterior  
sacral 
displacement 
T13 
L1 
L2 
L3 
L4 
L5 
S1 
S2 
S3 
S4 
Ca1 
Anterior 
sacral  
displacement 
with or without 
T13 rib 
reduction 
 
193
AD
B
E
C
í í í    
a1-/-a2-/- vs a1-/+a2-/+
FRQWe[Wí3í
íFRQWe[Wí3í
íFRQWe[Wí3
FRQWe[Wí3í
1RVLWH
&X
PX
ODW
LYH
IUD
FWL
RQ






)ROGFKDQJHORJ)
a1-/- vs a1-/+
FRQWe[Wí3í
íFRQWe[Wí3
íFRQWe[Wí3
FRQWe[Wí3
1RVLWH
í í í    
&X
PX
ODW
LYH
IUD
FWL
RQ






)ROGFKDQJHORJ)
a2-/- vs a2-/+
FRQWe[Wí3
íFRQWe[Wí3
íFRQWe[Wí3
FRQWe[Wí3í
1RVLWH
a2-/-b-/- vs a2-/+
í í í    
&X
PX
ODW
LYH
IUD
FWL
RQ






)ROGFKDQJHORJ)
í í í    
&X
PX
ODW
LYH
IUD
FWL
RQ






)ROGFKDQJHORJ)
FRQWe[Wí3
íFRQWe[Wí3
íFRQWe[Wí3
FRQWe[Wí3í
1RVLWH
a2-/+b-/- vs a2-/+
FRQWe[Wí3
íFRQWe[Wí3í
íFRQWe[Wí3
FRQWe[Wí3í
1RVLWH
í í í    
&X
PX
ODW
LYH
IUD
FWL
RQ






)ROGFKDQJHORJ)
F a2-/-b-/+ vs a2-/+
í í í    
&X
PX
ODW
LYH
IUD
FWL
RQ






)ROGFKDQJHORJ)
FRQWe[Wí3
íFRQWe[Wí3
íFRQWe[Wí3
FRQWe[Wí3í
1RVLWH
G
a1-/+a2-/- vs a1-/+a2-/+
í í í    
FRQWe[Wí3í
íFRQWe[Wí3í
íFRQWe[Wí3
FRQWe[Wí3í
1RVLWH
&X
PX
ODW
LYH
IUD
FWL
RQ






)ROGFKDQJHORJ)
Wong et al., Figure S4
194
A B C
D E F
Wong et al., Figure S5
a1-/- vs a1-/+ a2-/- vs a2-/+
a1-/-a2-/- vs a1-/+a2-/+ a2-/+b-/- vs a2-/+ a2-/-b-/+ vs a2-/+
a2-/-b-/- vs a2-/+
íORJ10(P vDOXH
0 5 10 15
5HSURGXFWLvH6\s. DHv.
7LVVXH0RUSKRORJ\
7LVVXH'Hv.
6kHOHWDO	0XVFXODU6\s. DHv.
2UJDQLVPDO'Hv.
2UJDQ0RUSKRORJ\
2UJDQ'Hv.
1HUvRXV6\s. DHv.
(PEU\RQLF'Hv.
&RQQHFWLvH7LVVXH'Hv.
+HSDWLF6\s. DHv.
5HVSLUDWRU\6\s. DHv.
+DLU	6NLQ'Hv.
'LJHVWLvH6\s. DHv.
2UJDQLVPDO)XQ
5HSURGXFWLvH6\s. DHv.
&DUGLRvDVFXODU6\s. DHv.
1HUvRXV6\s. DHv.
7LVVXH0RUSKRORJ\
+HPDWRORJLFDO6\s. DHv.
2UJDQLVPDO6XUYLvDO
5HQDO	8URORJLFDO6\s. DHv.
7LVVXH'Hv.
2UJDQLVPDO'Hv.
2UJDQ0RUSKRORJ\
2UJDQ'Hv.
(PEU\RQLF'Hv.
&RQQHFWLvH7LVVXH'Hv.
6kHOHWDO	0XVFXODU6\s. DHv.
íORJ10(P vDOXH
0 5 10 15
a1-/+a2-/- vs a1-/+a2-/+
AXGLWRU\	VHVWLEXODU6\s. DHv.
2UJDQLVPDO6XUYLvDO
&RQQHFWLvH7LVVXH'Hv.
1HUvRXV6\s. DHv.
7LVVXH0RUSKRORJ\
7LVVXH'Hv.
6kHOHWDO	0XVFXODU6\s. DHv.
2UJDQLVPDO'Hv.
2UJDQ0RUSKRORJ\
2UJDQ'Hv.
(PEU\RQLF'Hv.
&DUGLRvDVFXODU6\s. DHv.
íORJ10(P vDOXH
0 5 10 15
L\PSKRLG7LVVXH6WUXF	'Hv.
AXGLWRU\	VHVWLEXODU6\s. DHv.
9LVXDO6\s. DHv.
+HSDWLF6\s. DHv.
+DLU	6NLQ'Hv.
5HVSLUDWRU\6\s. DHv.
+HPDWRSRLHVLV
+HPDWRORJLFDO6\s. DHv.
(QGRFULQH6\s. DHv.
'LJHVWLvH6\s. DHv.
TXPRU0RUSKRORJ\
5HSURGXFWLvH6\s. DHv.
5HQDO	8URORJLFDO6\s. DHv.
2UJDQLVPDO6XUYLvDO
&DUGLRvDVFXODU6\s. DHv.
7LVVXH0RUSKRORJ\
7LVVXH'Hv.
6kHOHWDO	0XVFXODU6\s. DHv.
2UJDQ0RUSKRORJ\
2UJDQ'Hv.
1HUvRXV6\s. DHv.
&RQQHFWLvH7LVVXH'Hv.
2UJDQLVPDO'Hv.
(PEU\RQLF'Hv.
íORJ10(P vDOXH
0 5 10 15
5HSURGXFWLvH6\s. DHv.
'LJHVWLvH6\s. DHv.
AXGLWRU\	VHVWLEXODU6\s. DHv.
1HUvRXV6\s. DHv.
2UJDQLVPDO6XUYLvDO
7LVVXH0RUSKRORJ\
&DUGLRvDVFXODU6\s. DHv.
7LVVXH'Hv.
2UJDQ'Hv.
&RQQHFWLvH7LVVXH'Hv.
2UJDQLVPDO'Hv.
(PEU\RQLF'Hv.
6kHOHWDO	0XVFXODU6\s. DHv.
2UJDQ0RUSKRORJ\
íORJ10(P vDOXH
0 5 10 15
+HSDWLF6\s. DHv.
+HPDWRORJLFDO6\s. DHv.
%HKaYLRU
9LVXDO6\s. DHv.
5HSURGXFWLvH6\s. DHv.
+DLU	6NLQ'Hv.
AXGLWRU\	VHVWLEXODU6\s. DHv.
5HQDO	8URORJLFDO6\s. DHv.
'LJHVWLvH6\s. DHv.
5HVSLUDWRU\6\s. DHv.
1HUvRXV6\s. DHv.
7LVVXH0RUSKRORJ\
7LVVXH'Hv.
6kHOHWDO	0XVFXODU6\s. DHv.
2UJDQLVPDO'Hv.
2UJDQ0RUSKRORJ\
2UJDQ'Hv.
(PEU\RQLF'Hv.
&RQQHFWLvH7LVVXH'Hv.
2UJDQLVPDO6XUYLvDO
&DUGLRvDVFXODU6\s. DHv.
íORJ10(P vDOXH
0 5 10 15
G
5HSURGXFWLvH6\s. DHv.
5HQDO	8URORJLFDO6\s. DHv.
AXGLWRU\	VHVWLEXODU6\s. DHv.
+DLU	6NLQ'Hv.
5HVSLUDWRU\6\s. DHv.
1HUvRXV6\s. DHv.
2UJDQ0RUSKRORJ\
&RQQHFWLvH7LVVXH'Hv.
'LJHVWLvH6\s. DHv.
7LVVXH0RUSKRORJ\
6kHOHWDO	0XVFXODU6\s. DHv.
7LVVXH'Hv.
2UJDQLVPDO'Hv.
2UJDQ'Hv.
(PEU\RQLF'Hv.
2UJDQLVPDO6XUYLvDO
&DUGLRvDVFXODU6\s. DHv.
íORJ10(P vDOXH
0 5 10 15
195
AActivDWLRQ]íVFRUH íORJ10(P vDOXH
0 5 10 15
CTNNB1
PL5í
SRF
MYC
MYCN
í í 0  
DEYVD
DEYVD
DEYVD

Inferred upstream regulators
B
38 genes
Wong et al., Figure S6
196





! 







(










(

)






,










(
*

#$
(




(


(.
)




!
 

"

"
"
"

!


!
 
	






#
 



$





#


"
#
 













"
#
 
$







#
 


 
 
 





#
 



$
"

!

!

!

!

#
 















#
 
$







#
 


!
!
!


 

#
 
$

"

"
"
!

"
#
 

 

 
 







































"
!



!

"



"



"

"























#
#









#






























#

 


!


#


"

















 
 
!

#
























!












 
 









!
 


 

!

!





"




 

"

"





!


!











 




















































,#


$


,
"-


-









(
'#


$


(
(


	
















)+


)+
"),


)
,#


$

),
")-


)
-
197
#" !&
		
	% 
	 
	 )
 	
	$





		$




	


	

 " 

	 !     
 " 

		 !     
 " 

	 !     
 " 

	 !     
 " 

			 !     
  

		 !     
  

	 !     
  

	 !     
  

		 !     
  

	 !     
 " 

	 !     
 " 

		 !     
 " 

		 !     
 " 

	 !     
 " 

			 !     
  

	 !     
  

			 !     
  

	 !     
  

	 !     
  

	 !     
 "" 

	 !     
 "" 

	 !     
 "" 

		 !     
 "" 

			 !     
 " 

		 !     
 " 

	 !     
 " 

	 !     
 " 

		 !     
  

	 !     
  

		 !     
  

	 !     
  

	 !     
 " 

	 !     
 " 

	 !     
 " 

			 !     
 " 

		 !     
 " 

	 !     
 " 

	 !     
 " 

		 !     
 " 

			 !     
  

	 !     
  

	 !     
  

		 !     
  

	 !     
198
Chapter 4. Future Directions 
 
Quantitative models of miRNA targeting in Drosophila 
Though much work has been done to understand the determinants of miRNA target 
recognition that enhance prediction in mammals, relatively little has been done in other 
clades of animal life, including important invertebrate model organisms such as the worm 
(C. elegans) and the fruit fly (D. melanogaster). Indeed, there are only a handful of 
models of miRNA target prediction that exist for these clades, and many are based upon 
purely evolutionary information which captures little information about the strength of 
repression conferred. Understanding the similarities and differences in these key model 
organisms relative to mammals would have several benefits: i) it would be interesting 
from an evolutionary perspective, providing a glimpse into the fundamental principles of 
miRNA targeting common to animals, and ii) it would give insight into the construction 
of gene regulatory networks in each of these species, which would aid in the 
interpretation of gene expression data and molecular pathways perturbed in different 
experimental conditions. 
 To elucidate the principles of miRNA target recognition in mammals, the 
mammalian miRNA field has benefited greatly from its ability to generate gene 
expression datasets derived from miRNA transfections in cell culture. While it is difficult 
to culture cell lines derived from the worm, such limitations do not exist for the fly due to 
the availability of cultured S2 cells derived from D. melanogaster (Schneider, 1972). To 
explore the features associated with effective miRNA targeting in the fly, we performed a 
series of six miRNA transfection experiments in cultured S2 cells, quantifying the 
abundance of all expressed transcripts by RNA sequencing relative to the corresponding 
199
abundance in mock-transfected cells (e.g., using the scheme emulating the one illustrated 
in Figure 5A, pg. 23). We then computed log2(fold changes) in mRNA levels between 
miRNA-transfected cells and day-matched mock transfections for all genes that were 
detectable. 
Initial analyses of these data confirmed that Drosophila employs at least five 
canonical target sites (i.e., 8mer, 7mer-m8, 7mer-A1, 6mer, and offset 6mer sites) 
resembling those of mammals and that the hierarchy of effectiveness of these sites also 
parallels that of mammals (i.e., as illustrated in Figure 5B, pg. 23). Preliminary efforts to 
characterize features associated with repression reveal that determinants guiding site 
efficacy in Drosophila seem to be only a subset of those detected as being important in 
mammals, with RNA structural accessibility and 3′ UTR length being chosen most 
consistently as features useful for prediction. 
As an orthogonal means of evaluating the usage of these canonical sites in the 
transcriptome, I investigated site conservation, which required extension of our 
comparative sequence methods to the insect clade (which consists of 12 species of 
Drosophila as well as three other insect species). This analysis revealed that at least 
11,000 miRNA–target interactions have been selectively conserved in fly 3′ UTRs. It 
remains to be determined whether the computation of site conservation (PCT) values will 
improve the ability of a regression model to discern effective miRNA target sites. 
Collectively, I find that a core set of features informative for prediction are common to 
both flies and mammals, although the poor support in the fly for features that are 
important for prediction in mammals implies that several principles of miRNA targeting 
have diverged between the two clades. 
200
Conservation of miRNA targeting networks among bilaterians 
Several studies have identified 34 ancient miRNA families common to most bilaterian 
organisms (Figure 1, pg. 13) (Grimson et al., 2008; Wheeler et al., 2009). Despite 
evidence that these ancient miRNAs have conserved spatiotemporal dynamics in early 
animal development (Christodoulou et al., 2010), few efforts have attempted to determine 
whether they participate in similar regulatory networks across large evolutionary 
timespans encompassing bilaterian life. A previous study that attempted to uncover 
ancient miRNA–target relationships among the vertebrate, fly, and worm clades failed to 
detect many such examples (Chen and Rajewsky, 2006), but was potentially limited due 
to the following reasons: i) poor annotation of ancient miRNAs, ii) poor annotation of the 
3′ UTRs of sequenced human, fly, and worm genomes, iii) limited methods of the 
reconstruction of orthologous/paralogous relationships across clades, and iv) the 
restricted ability to identify conserved miRNA target sites within clades due to lower 
quality multiple sequence alignments. These limitations provided the motivation to revisit 
these questions using enhanced methods of defining orthologous relationships among 
bilaterian gene families (Wu et al., 2014). 
Having compiled a list of conserved miRNA target sites in the 3′ UTRs of the 
worm, fly, and vertebrate clades, I sought to estimate the number of ancient sites that 
have persisted in orthologous genes since the bilaterian ancestor arose ~600 million years 
ago. Using a bootstrapping technique to generate 1000 sampled ortholog lists matched for 
3′ UTR length, A/U content, and conservation rate, I have estimated there to be 
approximately 13 deeply conserved sites (p < 0.002) shared among the bilaterians and 19 
(p < 0.047) shared among the protostomes (fly and worm clades). The paucity of such 
201
sites reinforces the model suggested by Chen and Rajewsky (2006), that there has been 
extensive rewiring in the miRNA networks across these three major clades of bilaterians. 
Although there appear to be few miRNA–target relationships preserved from the 
common ancestor of bilaterians, these results must be interpreted in the context of 
sampling bias among the species used in the analysis. In particular, the fly and worm 
clades may not be representative of the ancestral state as they have undergone massive 
gene loss and genome compaction, with as many as 10% of ancestral genes having been 
lost in these species (Raible and Arendt, 2004). A better approach may be to utilize 
species that are early-branching bilaterians with a slow rate of molecular evolution, such 
as the acoel (Hofstenia miamia) and planarian (Schmidtea mediterranea), which would be 
more representative of the ancestral state of bilaterians (Srivastava et al., 2014). Future 
work would thus aim to extend the search for such ultraconserved sites to deep-branching 
phyla within the bilaterians, improving the annotation of miRNAs and 3′ UTRs in these 
species as a prerequisite to further analysis. 
  
202
References 
 
Chen, K., and Rajewsky, N. (2006). Deep conservation of microRNA-target relationships 
and 3'UTR motifs in vertebrates, flies, and nematodes. Cold Spring Harb Symp 
Quant Biol 71, 149-156. 
Christodoulou, F., Raible, F., Tomer, R., Simakov, O., Trachana, K., Klaus, S., Snyman, 
H., Hannon, G.J., Bork, P., and Arendt, D. (2010). Ancient animal microRNAs 
and the evolution of tissue identity. Nature 463, 1084-1088. 
Grimson, A., Srivastava, M., Fahey, B., Woodcroft, B.J., Chiang, H.R., King, N., 
Degnan, B.M., Rokhsar, D.S., and Bartel, D.P. (2008). Early origins and 
evolution of microRNAs and Piwi-interacting RNAs in animals. Nature 455, 
1193-1197. 
Raible, F., and Arendt, D. (2004). Metazoan evolution: some animals are more equal than 
others. Curr Biol 14, R106-108. 
Schneider, I. (1972). Cell lines derived from late embryonic stages of Drosophila 
melanogaster. J Embryol Exp Morphol 27, 353-365. 
Srivastava, M., Mazza-Curll, K.L., van Wolfswinkel, J.C., and Reddien, P.W. (2014). 
Whole-body acoel regeneration is controlled by Wnt and Bmp-Admp signaling. 
Curr Biol 24, 1107-1113. 
Wheeler, B.M., Heimberg, A.M., Moy, V.N., Sperling, E.A., Holstein, T.W., Heber, S., 
and Peterson, K.J. (2009). The deep evolution of metazoan microRNAs. Evol Dev 
11, 50-68. 
Wu, Y.-C., Bansal, M.S., Rasmussen, M.D., Herrero, J., and Kellis, M. (2014). 
Phylogenetic Identification and Functional Characterization of Orthologs and 
Paralogs across Human, Mouse, Fly, and Worm. 
 
203
204
Appendix 1. Global analysis of the effect of different cellular contexts on microRNA 
targeting 
 
Jin-Wu Nam1,2,3,4,9, Olivia S. Rissland1,2,3,9, David Koppstein1,2,3, Cei Abreu-Goodger5,6, 
Calvin Jan1,2,3, Vikram Agarwal1,2,7, Muhammed A. Yildirim1,2,3, Antony Rodriguez5,8, 
and David P. Bartel1,2,3 
 
1Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA 
2Howard Hughes Medical Institute 
3Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, 
USA 
4Graduate School of Biomedical Science and Engineering, Hanyang University, Seoul, 
Korea. 
5Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, 
Texas, 77030 USA 
6Current address: Laboratorio Nacional de Genómica para la Biodiversidad (Langebio), 
CINVESTVA, Irapuato, Guanajuato, México 
7Computational and Systems Biology Program, Massachusetts Institute of Technology, 
Cambridge, Massachusetts 02139, USA 
8Current Address: Department of Physical Therapy, University of Texas Medical Branch 
Galveston, 301 University Blvd, Galveston, Texas, 77555 
9These authors contributed equally to this work 
 
V.A. helped devise the weighted context+ model. J.W.N., C.A.G., and M.A.Y. performed 
computational analyses. O.S.R. performed microRNA transfections. D.K. and C.J. 
generated 3P-Seq libraries in mouse and human samples, respectively. A.R. created 
knockout mice. J.W.N., O.S.R., and D.P.B. designed the study. J.W.N., O.S.R., and 
D.P.B. wrote the manuscript. 
 
Published as: 
Nam J-W, Rissland OS, Koppstein D, Abreu-Goodger C, Jan CH, Agarwal V, Yildirim 
MA, Rodriguez A, Bartel DP. "Global analysis of the effect of different cellular contexts 
on microRNA targeting". 2014. Molecular Cell 53(6):1031-43.
205
Molecular Cell
Resource
Global Analyses of the Effect
of Different Cellular Contexts
on MicroRNA Targeting
Jin-Wu Nam,1,2,3,4,8 Olivia S. Rissland,1,2,3,8 David Koppstein,1,2,3 Cei Abreu-Goodger,5 Calvin H. Jan,1,2,3
Vikram Agarwal,1,2,6 Muhammed A. Yildirim,1,2,3 Antony Rodriguez,7,9 and David P. Bartel1,2,3,*
1Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
2Howard Hughes Medical Institute
3Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
4Department of Life Science, College of Natural Science and Graduate School of Biomedical Science and Engineering, Hanyang University,
Seoul 133-791, Korea
5Laboratorio Nacional de Geno´mica para la Biodiversidad (Langebio), CINVESTAV, Irapuato, Guanajuato 36824, Me´xico
6Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
7Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
8These authors contributed equally to this work
9Present address: Department of Physical Therapy, University of Texas Medical Branch Galveston, 301 University Boulevard, Galveston,
TX 77555, USA
*Correspondence: dbartel@wi.mit.edu
http://dx.doi.org/10.1016/j.molcel.2014.02.013
SUMMARY
MicroRNA (miRNA) regulation clearly impacts animal
development, but the extent towhich development—
with its resulting diversity of cellular contexts—
impacts miRNA regulation is unclear. Here, we
compared cohorts of genes repressed by the same
miRNAs in different cell lines and tissues and found
that target repertoires were largely unaffected, with
secondary effects explaining most of the differential
responses detected. Outliers resulting from differen-
tial direct targeting were often attributable to alterna-
tive 30 UTR isoform usage that modulated the
presence ofmiRNA sites.More inclusive examination
of alternative 30 UTR isoforms revealed that they in-
fluence 10% of predicted targets when comparing
any two cell types. Indeed, considering alternative
30 UTR isoform usage improved prediction of target-
ing efficacy significantly beyond the improvements
observed when considering constitutive isoform
usage. Thus, although miRNA targeting is remark-
ably consistent in different cell types, considering
the 30 UTR landscape helps predict targeting efficacy
and explain differential regulation that is observed.
INTRODUCTION
The control of gene output can be complex, with opportunities
for regulation at each step of mRNA production, processing,
localization, translation, and turnover. A widespread type of
posttranscriptional control is that mediated by microRNAs
(miRNAs) (Bartel, 2009). By base-pairing with complementary
sites in their targets, miRNAs direct the repression of mRNAs,
primarily through mRNA destabilization (Baek et al., 2008; Guo
et al., 2010; Hendrickson et al., 2009). With each family of
miRNAs capable of targetingmessages from hundreds of genes,
and over half of the human transcriptome containing preferen-
tially conserved miRNA sites (Friedman et al., 2009), miRNAs
are expected to impact essentially every mammalian develop-
mental process and human disease.
Central for understanding this pervasive mode of genetic con-
trol is understanding miRNA-target interactions. One factor
affecting the efficacy of miRNA-target interactions is the miRNA
site type. Site types are primarily classified based on the extent
to which they match the 50 region of the miRNA. 6mer sites
perfectly pair to only the miRNA seed (nucleotides 2–7 of the
miRNA) and typically confer marginal repression, at best. Seed
pairing can be augmented with an adenosine opposite miRNA
nucleotide 1 or a Watson-Crick pair with miRNA nucleotide 8,
giving a 7mer-A1 or 7mer-m8 site, respectively; sites augmented
with both the adenosine and the match to nucleotide 8 are 8mer
sites (Grimson et al., 2007; Lewis et al., 2005). On average, 8mer
sites are more efficacious than 7mer-m8 sites, which are more
efficacious than 7mer-A1 sites, with supplemental pairing to
the 30 region of the miRNA marginally increasing efficacy of
each site type (Grimson et al., 2007). Two other site types are
effective but so rare that together they are thought to constitute
less than 1% of all targeting; these are 30 compensatory sites
(Bartel, 2009) and centered sites (Shin et al., 2010). Offset
6-mer sites and each of the more recently proposed noncanoni-
cal site types (Betel et al., 2010; Chi et al., 2012; Helwak et al.,
2013; Khorshid et al., 2013; Loeb et al., 2012; Majoros et al.,
2013) are either not effective or less effective than 6-mer sites
(Friedman et al., 2009) (V.A. and D.P.B., unpublished data).
Early target predictions considered only the number and type
of sites to rank predictions and thus had to rely on site conserva-
tion to refine the rankings (Bartel, 2009). However, the same site
Molecular Cell 53, 1031–1043, March 20, 2014 ª2014 Elsevier Inc. 1031
vier
207
I
can bemuchmore effective in the context of one mRNA than it is
in the context of another; identifying and considering these
context features surrounding the miRNA site can improve target
predictions (Grimson et al., 2007; Gu et al., 2009; Kertesz et al.,
2007; Nielsen et al., 2007). As part of the context model, three
context features were originally used to improve the TargetScan
algorithm: (1) the local AU content of the sequence surrounding
the site (presumably a measure of occlusive secondary struc-
ture), (2) the distance between the site and the closest 30 UTR
end, and (3) whether or not the site lies in the path of the ribo-
some (Grimson et al., 2007). With these features of UTR context
in the model, effective sites could be predicted above the false
positives without considering the evolutionary conservation of
the site (Baek et al., 2008; Grimson et al., 2007). Additional im-
provements came with development of the context+ model,
which incorporated two features of the miRNA seed region:
(1) the predicted stability of matches to the seed region, which
correlated with efficacy, and (2) the number of matches to the
seed region within the 30 UTRs of the transcriptome, which
inversely correlated with efficacy (Garcia et al., 2011).
Despite the advances of the past decade that have come from
defining the site types and building models of miRNA-targeting
efficacy that consider (1) the influences of site type and number,
(2) the 30 UTR context of the site, and (3) certain miRNA proper-
ties, the accuracy of miRNA-target predictions still has substan-
tial room for improvement. One consideration currently ignored
in miRNA-targeting models is the potential influence of different
biological and cellular contexts. Although predictions for
miRNAs or mRNAs that are not present in the cell can be easily
disregarded, other influences of cellular context are undoubtedly
exerting effects in ways that compromise prediction utility.
One way that cellular context can exert its effect is through dif-
ferential expression of mRNA-binding proteins, which can either
increase or decrease the efficacy of miRNA sites. For instance,
binding of Pumilio increases miRNA-mediated repression in the
30 UTRs of the p27 and E2F3 mRNAs (Kedde et al., 2010; Miles
et al., 2012), whereas Dnd1 binding occludes miRNA target sites
to relievemiRNA-mediated repressionofnanosand tdrd7mRNAs
(Kedde et al., 2007). These examples could represent just the tip
of the iceberg, as the extent to which differential expression of
such trans-acting factors affects miRNA targeting in different
cell types has not been investigated across the transcriptome.
Another consideration largely ignored in miRNA target predic-
tions is the impact of alternative 30 UTR isoforms, which are
generated through alternative cleavage and polyadenylation
(APA). For example, mRNAs with the same open reading frame
(ORF) often have tandemUTR isoforms in which APA at proximal
or distal poly(A) sites generates shorter or longer 30 UTRs,
respectively (Miyamoto et al., 1996; Tian et al., 2005). Regulatory
elements, such as miRNA sites, in the commonly included (or
‘‘constant’’) region are present in both short and long isoforms,
but those in the alternatively included (or ‘‘variable’’) region are
present only in the long isoform, and thus a cell-type-specific
shift in APA results in a corresponding shift in isoforms respond-
ing to the regulation (Ji et al., 2009; 2011; Mayr and Bartel, 2009;
Sandberg et al., 2008; Ulitsky et al., 2012). Development of high-
throughput poly(A)-site mapping techniques, such as 3P-seq
(poly[A]-position profiling by sequencing; Jan et al., 2011), has
allowed quantitative and precise detection of alternative 30
UTR usage within a sample as well as differences over the
course of development (Derti et al., 2012; Hoque et al., 2013;
Jan et al., 2011; Lianoglou et al., 2013; Shepard et al., 2011; Ulit-
sky et al., 2012; Spies et al., 2013). Efforts to predict miRNA tar-
gets are only beginning to incorporate this information. For
example, when predicting mammalian targets, the most recent
version of TargetScan still considers only the longest annotated
30 UTR isoform of each gene.When predicting nematode and ze-
brafish targets, TargetScan predicts the targeting of each 3P-
seq-annotated UTR isoform but does not consider the relative
abundance of each isoform when ranking these predictions.
The studied examples of differential expression of RNA-bind-
ing proteins and differential usage of 30 UTR isoforms imply that
these, or perhaps other phenomena, might broadly influence the
impact of miRNAs, causing the targets of a miRNA to substan-
tially differ in two different cellular contexts, even when only
considering mRNAs expressed in both cell types. Genome-
wide studies of transcription factor binding show that cell type
can influence transcriptional regulation (Cooper et al., 2007;
Farnham, 2009), but global effects of cellular context on miRNA
regulation or other forms of posttranscriptional regulation have
not been reported. Understanding the frequency and magnitude
of these effects is important for understanding the degree to
which miRNA regulation itself is regulated. Knowing the extent
to which experimental observations from one cell type can be
extrapolated to another also has practical value for placing
miRNAs into gene regulatory networks. For example, the heter-
ologous reporter assay (in which the 30 UTRof a suspected target
is appended to a reporter gene and tested for its response to the
miRNA, with and without mutation of the putative miRNA-bind-
ing sites) is a workhorse for testing the plausibility of proposed
miRNA-target interactions, but its utility would be diminished if
the sites that mediate repression in one cell type do not reliably
do so in other cell types.
To begin to explore the frequency and magnitude of cell-type-
specific effects on miRNA-mediated repression, we introduced
the same miRNAs into three different human cell lines and moni-
tored mRNA changes by RNA-seq. We also analyzed the effects
of miRNA loss in different mouse and zebrafish tissues and
stages. Most predicted targets responded similarly in different
cellular contexts, and for those that did differ, these differences
often resulted from secondary effects, not direct differences in
miRNA-mediated targeting. When direct differences in targeting
were detected, these differences often resulted from alternative
30 UTR isoform usage. Experimental profiling of poly(A) sites
showed that APA affects 10% of predicted targets when
comparing any pair of cell types. With this in mind, we incorpo-
rated 30 UTR isoform usage as a parameter in miRNA target pre-
diction and found that it significantly improved performance.
RESULTS
Most miRNA-Target Interactions Are Not Detectably
Affected by Cell Type
To determine the extent to which cell type influences miRNA
targeting, we transfected two different miRNA duplexes (miR-
124 and miR-155) into three different cell lines (HeLa, human
Molecular Cell
Effects of Cellular Context on miRNA Repression
1032 Molecular Cell 53, 1031–1043, March 20, 2014 ª2014 Else nc.
embryonic kidney 293 [HEK293], and Huh7 cells) and monitored
mRNA changes using mRNA-seq. These cell lines were chosen
for two reasons: (1) they had large differences in their expression
of endogenous miRNAs (Landgraf et al., 2007; Mayr and Bartel,
2009), and (2) they could be transfected at high efficiency. For
each miRNA/cell line combination, we examined two biological
replicates, comparing the effects of the miRNA transfection rela-
tive to those of the mock-treated controls. Each of these trans-
fection data sets exhibited the expected global targeting effects,
as determined by analysis of fold changes for site-containing
mRNAs (Figure S1A available online) and by unbiased analysis
using the Sylamer tool (Figure S1B) (van Dongen et al., 2008).
After the data were globally normalized to correct for general
cell-type differences, as well as for experimental and technical
biases, we investigated if the differences observed between
the cell types were significant, given the variance between repli-
A
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-124: genes with sites
HeLa change (log2)
H
EK
29
3 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-124: genes with sites
HeLa change (log2)
H
uh
7 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-124: genes with sites
HEK293 change (log2)
H
uh
7 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-155: genes with sites
HeLa change (log2)
H
EK
29
3 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-155: genes with sites
HeLa change (log2)
H
uh
7 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-155: genes with sites
HEK293 change (log2)
H
uh
7 
ch
an
ge
 (lo
g 2)
B
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-124: genes without sites
HeLa change (log2)
H
EK
29
3 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-124: genes without sites
HeLa change (log2)
H
uh
7 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-124: genes without sites
HEK293 change (log2)
H
uh
7 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-155: genes without sites
HeLa change (log2)
H
EK
29
3 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-155: genes without sites
HeLa change (log2)
H
uh
7 
ch
an
ge
 (lo
g 2)
0
1
2
-4 -3 -2 -1 0 1 2
-4
-3
-2
-1
miR-155: genes without sites
HEK293 change (log2)
H
uh
7 
ch
an
ge
 (lo
g 2)
Total = 2419 
n = 1169 (4)
FDR = 0.267
Total = 1987
n = 1098 (13)
FDR = 0.205
Total = 2067 
n = 1164 (2)
FDR = 0.131
Total = 1714
n = 991 (137)
FDR = 0.335
Total = 1280
n = 921 (92)
FDR = 0.377
Total = 1361 
n = 1037 (29)
FDR = 0.241
Total = 1082
n = 238 (0)
FDR = 0
Total = 968
n = 218 (0)
FDR = 0
Total = 933 
n = 236 (3)
FDR = 0.313
Total = 1813 
n = 328 (13)
FDR = 0.381
Total = 1693
n = 335 (2)
FDR = 0.399
Total = 1778 
n = 308 (3)
FDR = 0.354
Figure 1. Most miRNA-Target Interactions
Are Unaffected by Cell Type
(A) Pairwise comparisons of mRNA changes after
transfecting the same miRNA into different cell
lines. Shown are changes for genes with at least 1
7mer 30 UTR site for the indicated miRNA, plotting
the results for genes expressed in both cell lines.
The region corresponding to a log2 change > –0.3
is shaded (gray); n, number of genes outside
the gray region. Genes significantly differentially
repressed are highlighted (blue) and tallied (num-
ber in parentheses). In some cases, not all of the
differentially repressed genes fit within the plots.
(B) These panels are as in (A), but for control genes.
For the miR-124 transfections, mRNA changes are
plotted for genes with miR-155 sites (excluding
any that contained sites to both miRNAs) and vice
versa.
cate experiments. To do so, an expected
difference was estimated using a permu-
tation test for each target mRNA (Tusher
et al., 2001). Then, a delta value (D)—the
difference between these observed and
expected values—was calculated. This
D value thus combines both the magni-
tude of the difference between the cell
lines and the variability associated with
each measurement (Figure S1C), and so
as it increases, the statistical confidence
in differential regulation also increases.
Importantly, for all pairs of cell lines that
we investigated, on average 1.1% (12)
and 5.8% (57) of predicted targets (with
a log2 change < 0.3 in either sample)
were differentially repressed with a D R
0.2 for miR-124 and miR-155, respec-
tively (Figures 1A and S1D; Table S1). In
contrast, on average, 0.1% and 0.3% of
genes with control sites were affected
differentially by miR-124 and miR-155,
respectively (Figure 1B). The lower frac-
tion of significantly differential targets for miR-124 targeting is
partly due to a higher variance between replicates observed
with miR-124 targeting (Figure S1E). In some miR-124 compari-
sons, hardly any predicted targets were differentially repressed
at these cutoffs. For example, when comparing the effects of
miR-124 in HeLa and HEK293 cells, only 4 of 1,169 coexpressed
predicted targets (with log2 change <0.3 in either sample) were
significantly differentially repressed (false discovery rate [FDR] =
0.267; Figure 1A). In the miR-155 pairwise comparisons, more,
but still only aminority, of the predicted targets were differentially
affected. For instance, when comparing effects of miR-155 in
HeLa and HEK293 cells, 137 of the 991 coexpressed targets
were differentially regulated (D R 0.2, FDR = 0.335; Figure 1A).
Similar results were obtained when we examined the effect of
miR-124 in IMR90 cells, a normal diploid fibroblast cell line (Fig-
ure S1F). Together, these data suggest that, although the
208
lecul
Molecular Cell
Effects of Cellular Context on miRNA Repression
Mo ar Cell 53, 1031–1043, March 20, 2014 ª2014 Elsevier Inc. 1033
vier
209
I
repression of some targets differs between cell lines, themiRNA-
mediated repression of most targets is not detectably affected
by the cellular environment.
30 UTR Isoforms in Different Cell Types and Tissues
Because APA can affect the inclusion of regulatory sites in the 30
UTR, we reasoned that some of the observed differential repres-
sion was due to differential use of alternative 30 UTRs. To identify
these cases, 3P-seq was used to quantify poly(A)-site usage in
the three human cell lines (HeLa, HEK293, and Huh7), as illus-
trated for LRRC1 (Figure 2A). The accuracy of 3P-seq for quanti-
fying alternative isoforms, previously inferred by its high accuracy
in quantifyingmRNA levels (Spies et al., 2013;Ulitsky et al., 2012),
was further confirmed by comparison to the results of 30-seq (Lia-
noglou et al., 2013), which has been extensively validated with
RNA blots (Figures S2A–S2D). Although human 30 UTRs are rela-
tivelywell annotated, our analysis improved these annotations: of
the mRNAs with poly(A) sites supported by at least ten 3P tags,
A
D E
F G
B
C
Figure 2. The 30 UTR Landscape Affects
miRNA Targeting
(A) Different AIRs for miR-124 sites in the LRCC1
gene in different cell types. Shown is the RefSeq
annotation track of LRCC1 (dark blue), with the
associated 3P tags from the three cell lines
assayed (above) and the corresponding AIRs
(below).
(B and C) Extent to which APA affects miRNA site
inclusion. Shown are the number and percentage
of sites for which AIRs for miR-124 (B) or miR-155
(C) change by at least 0.3 in each pair-wise cell-
type comparison. The arrows point to the cell line
with the higher AIR, and the width is proportional to
the number of sites with differential AIR.
(D–G) Relationship between AIR and miRNA-
mediated repression. For each site type—8mer
(D), 7mer-m8 (E), 7mer-A1 (F), and a representative
pair of control sites (G)—predicted targets were
binned by their AIR. For each bin, the mean fold-
changemediated by either miR-124 ormiR-155 for
each transfection of the various cell lines (HEK293,
HeLa, and Huh7) is plotted. The red line is the
least-squares best fit to the data (Pearson r2,
F test).
30% had major 30 UTR isoforms that
were shorter than the RefSeq annotation,
and 10% had major isoforms that were
longer (Table S2C). Moreover, similar to
previous studies (Derti et al., 2012; Hoque
et al., 2013; Smibert et al., 2012; Ulitsky
et al., 2012), we found that in each cell
type, over half (51%–63%) of the genes
with 3P-seq-supported poly(A) sites had
multiple tandem isoforms that were each
supported by at least 1% of the tags (Fig-
ure S2E), and 10,701 (70.1%)mRNAs dis-
played APA in at least one cell type.
To confirm that this isoform heteroge-
neity resembled that found in other verte-
brates, we used our pipeline to analyze 3P-seq data sets from
two mouse cell lines (mouse embryonic stem cells [mESCs]
and NIH 3T3 cells; Tables S2D and S2F) and published data
sets from zebrafish tissues (brain, ovary, and testes) and devel-
opmental stages (2, 6, 24, and 72 hr postfertilization [hpf] and
adult) (Ulitsky et al., 2012). As with human poly(A)-site usage,
these data sets allowed further refinement of 30 UTR ends from
those currently annotated in RefSeq (30% and 40% in mouse
and zebrafish, respectively; Tables S2G–S2I). Overall, the frac-
tion of mRNAs with multiple tandem 30 UTR isoforms was similar
when comparing different cell lines, tissues, and vertebrate ani-
mals (Figures S2E–S2G).
Alternative Cleavage and Polyadenylation Affects
miRNA Targeting
By quantitatively measuring poly(A)-site usage, the 3P-seq
data sets allow examination of how APA varies in different
cellular contexts (Ulitsky et al., 2012). When comparing the 4
Molecular Cell
Effects of Cellular Context on miRNA Repression
1034 Molecular Cell 53, 1031–1043, March 20, 2014 ª2014 Else nc.
210
lecul
human cell lines, 1,708 (11.2%) of the mRNAs had different
dominant 30 UTR ends (Figure S2H), and when comparing
weighted 30 UTR lengths, each cell type had a unique 30 UTR
length distribution (Figures S2I–S2K). Among the human cell
lines examined, Huh7 cells tended to have the shortest 30
UTRs, and HEK293 cells the longest. Moreover, although the
percentage of genes with multiple UTR isoforms was relatively
constant between cell types, the identities of these genes and
the poly(A) sites used were more variable. Indeed, of the 7,563
mRNAs with multiple poly(A) sites in all 4 human cell lines,
51.2% had weighted 30 UTR lengths that changed by more
than 100 nt (Figure S2L). As reported previously (Ulitsky et al.,
2012), weighted 30 UTR length differences were especially
apparent during zebrafish development and in two mouse cell
lines (Figures S2M and S2N). Taken together, these results
confirmed that many transcripts have alternative 30 UTR iso-
forms and that 30 UTR lengths change across different vertebrate
cell types and developmental stages.
To determine the extent to which APA affects miRNA target-
ing, we developed a metric called the affected isoform ratio
(AIR), which, for each miRNA target site, indicates the fraction
of mRNA transcripts containing that site (Figure 2A). To calculate
AIRs, we first estimated the fraction of each tandem isoform
based on the fraction of 3P tags at its poly(A) site relative to all
the tags that mapped to the poly(A) sites contained within that
exon (Figure 2A). These isoform fractions were then used to
compute the 30 UTR isoform ratio for different UTR regions in
which each constant region (present in all the tandem isoforms)
had an isoform ratio of 1.0, whereas each variable region had an
isoform ratio corresponding to the sum of the isoform fractions
spanning that region (Figure 2A). For each miRNA site, the AIR
was simply the isoform ratio at the region of the UTR containing
the site. Consistent with Huh7 cells generally expressing shorter
30 UTR isoforms, of 30 UTR sites for the miR-124, 154 and 191
had lower AIRs (AIR difference R 0.3) in Huh7 cell lines than in
HeLa and HEK293 cells, respectively, but only 67 and 41 sites
had higher AIRs (Figure 2B). A similar result was observed with
miR-155 sites (Figure 2C).
To compare how miRNA targeting efficacy was affected by
APA within a cell type, genes with multiple 30 UTR isoforms
were first partitioned by their site type; for genes containing
multiple sites, the best site type was chosen (with 8mer >
7mer-m8 > 7mer-A1). Within each site-type partition, genes
were binned by their AIRs, and the efficacies of sites within
each bin were compared. For each of the three site types,
mean repression correlated with AIR such that sites with higher
AIRs were more repressed than those with lower AIRs (Figures
2D–2G). Indeed, genes with sites having an AIR less than 0.25
were barely repressed by the corresponding miRNA. Similar re-
sults were obtainedwith a large precompiledmicroarray data set
of miRNA/siRNA transfections (Garcia et al., 2011) (Figure S2O).
When the analysis was repeated 100 times, each time with a
different negative-control cohort in which genes lacking any
target sites (including 6mers) were selected and partitioned
based on a randomly selected pseudosite (e.g., Figure 2G),
repression and AIR never significantly correlated.
Sites near the middle of long 30 UTRs mediate less repression
than those at the ends (Grimson et al., 2007). The distance be-
tween the site and the nearest end of the 30 UTR (referred to as
the minimum distance) is a feature incorporated into the model
of site efficacy used by TargetScan to rank target predictions
(Garcia et al., 2011; Grimson et al., 2007). Because this mini-
mum-distance feature depends on the poly(A) site, we reasoned
that APA might change this feature for some miRNA sites, with a
corresponding effect on site efficacy. When examining tran-
scripts with sites with minimum distances 25 nt shorter in
HEK293 cells than in HeLa cells, more repression was observed
in HEK293 cells than in HeLa cells (Figure S2Q); importantly,
these differences were not attributable to differential target-site
inclusion because the AIRs for these sites were unchanged
(<0.01). Correspondingly, genes with minimum distances that
were longer in HEK293 cells were more repressed in HeLa cells,
whereas genes not predicted to be targets were unaffected (Fig-
ure S2Q). Together, these results indicate that APA, by short-
ening and lengthening 30 UTRs, affects both the inclusion and
the efficacy of miRNA sites.
Incorporating Poly(A)-Site Usage Improves miRNA
Target Prediction
With the insights gained on the effects of APA on miRNA target-
ing (Figure S3A), we developed a revised prediction model,
called the ‘‘weighted context+’’ (or wContext+) model. This
model produced a cell-type-specific score for each site by
calculating its context+ score using TargetScan linear regression
models for each of its context and miRNA features (Garcia et al.,
2011) and then weighting this score by the AIR of the site in each
cell type (Figure 3A). For each miRNA, the wContext+ scores of
multiple sites were summed (disregarding positive scores) to
generate the total wContext+ score for each gene, in which the
scores with lower negative values indicated greater predicted
repression. To assess the advantage of weighting the scores
based on the AIRs, and thereby considering the isoform hetero-
geneity of each cell type, we compared the performance of the
wContext+model with those of the current context+model (Gar-
cia et al., 2011) applied to a single 30 UTR isoform for each gene,
choosing either (1) the longest isoform annotated by RefSeq, (2)
the longest isoform determined by 3P-seq, or (3) the major 30
UTR isoform determined by 3P-seq. On average, the wContext+
model outperformed the previousmodel by50%, and although
some of this improvement was attributable to more accurate
identification of themajor 30 UTR isoforms, most was attributable
to utilizing AIRs (Figure 3B). The wContext+ model also dis-
played better sensitivity and specificity when evaluating area
under the curve in receiver operating characteristic (ROC) plots
(Figure S3B).
Alternative Cleavage and Polyadenylation Is a Major
Cause of Differential miRNA Targeting
We next examined the extent to which differential poly(A)-site
usage caused differential miRNA targeting. Between any pair
of the human cell lines, the AIRs of 7%–10% of miR-124 sites
and 7%–12% of miR-155 sites changed by >30% (Figures 2B
and 2C). Similarly, 5%–9% of predicted miR-124 targets and
5%–10% of predicted miR155 targets had wContext+ scores
differing by R0.1 (Figures S4A and S4B; Table S3). When we
repeated this analysis in mouse (with predicted miR-155 and
Molecular Cell
Effects of Cellular Context on miRNA Repression
Mo ar Cell 53, 1031–1043, March 20, 2014 ª2014 Elsevier Inc. 1035
vier
211
I
miR-223 sites in mESCs and NIH 3T3 cells) and in zebrafish (with
predicted miR-430 sites across the four developmental stages),
similar ranges were observed, indicating that in diverse verte-
brate species, APA affects 10% of predicted miRNA target
sites when comparing two cell types (Figures S4C and S4D).
Of the 126 predicted targets that were differentially repressed
by miR-155, 11.1% had wContext+ scores with differences
R0.03, a significant enrichment compared to that in nondifferen-
tial miRNA targets (p = 0.004, hypergeometric test; Figure 4A).
For example, theCHURC1 gene had 1 8mer and 2 7mer-m8 sites
for miR-155, but these sites were only present in the longer of its
two major isoforms (Figure 4B). Because the longer isoform was
more prevalent in HeLa cells, 66% of CHURC1 transcripts con-
tained miR-155 target sites in HeLa cells, whereas only 3% con-
tained the sites in HEK293 cells (Figure 4B). The consequently
large difference in wContext+ scores explained why this gene
was repressed more strongly in HeLa than HEK293 cells (Fig-
ure 4C). Reciprocally, the longer isoforms of the ATAD2B gene
contained one 8mer and one 7mer-m8 site and were predomi-
nately expressed in HEK293 cells, whereas the short isoform
that lacked these regulatory sites was expressed in HeLa cells
(Figure 4D), and this gene was repressed more strongly in
HEK293 cells than in HeLa cells (Figure 4E). Similar examples
illustrating cases in which APA explained differential miRNA tar-
geting were found in all pairs of cell types examined (Figures 4F–
4I and S4E–S4Q).
APA, however, did not explain most differentially repressed
predicted targets (with D > 0.3; Table S3; Figure 4). These
mRNAs might have responded differently because other cell-
type-specific factors, such as RNA-binding proteins, differen-
tially modulated site efficacy in the two cell types. Alternatively,
these mRNAs might have had similar direct response to the
miRNA and only appeared to be differentially repressed because
of differential secondary effects of transfecting the miRNA. For
example, in one cell type, the miRNA might have repressed a
transcriptional repressor, causing increased transcription of
the predicted target. Indeed, we observed that for many of these
cases, mRNAs were in fact upregulated in one of the two cell
lines (Figure S4P), supporting the idea that the differences
were mediated by secondary effects rather than differential site
efficacy. To distinguish between these possibilities, we used re-
porter assays to determine the extent to which the miRNA sites
themselves mediated differential repression. For 9 candidates,
we placed either wild-type or mutated sites, embedded in
500 nucleotides of the surrounding 30 UTR, downstream of
Renilla luciferase and compared the repression mediated by
miR-155 in HEK293 and HeLa cells. Although six were signifi-
cantly repressed by miR-155 in both cell lines, only two (LPIN1
and LMBRD2) were significantly differentially repressed (Fig-
ure 4J; p = 0.0004 and 1.113 105, respectively, Mann-Whitney
U test). Both were more repressed in HEK293 cells than in HeLa
cells, consistent with the RNA-seq results. Although these two
mRNAs are good candidates for APA-independent differential
repression, the paucity of such candidates suggests that most
instances of apparent differential repression are due to differen-
tial secondary effects rather than to modulations of miRNA tar-
geting efficacy.
AIR Correlates with Site Efficacy for Targets of
Endogenous miRNAs
To extend our results to the effects of miRNAs in their endoge-
nous contexts, we profiled both mRNA changes (by microarray)
and poly(A)-site usage (by 3P-seq) in six different tissues (heart,
kidney, liver, lung, muscle, and white adipose tissue [WAT]) from
wild-type and miR-22 knockout mice (Table S4) (Gurha et al.,
2012). As expected, predicted miR-22 targets were generally
upregulated in the knockout tissues (Figure S5A). Although
modest, this effect was significant in five of the six tissues
(muscle, heart, kidney, liver, and WAT) and most pronounced
for mRNAs with 8mer sites (Figure S5A).
Using the 3P-seq data sets, we generated tissue-specific 30
UTR annotations. Interestingly, lung tissue had 1.5–2 times
more poly(A) sites than did the other tissues and mouse cell lines
(NIH 3T3 and mESCs), perhaps because of the more heteroge-
neous nature of this tissue. As observed with exogenously
delivered miRNAs, miRNA-mediated repression significantly
correlated with the AIR for 8mer and 7mer-m8 sites, but not for
negative-control sites (Figure 5A; p = 0.00056, 0.0012, and
0.880, respectively). An insignificant correlation for 7mer-A1
sites (p = 0.487) was attributed to the weak derepression
observed overall in the miR-22 data sets, which made it difficult
for a signal from this weaker site type to appear.
With these tissue-specific 30 UTR annotations in mouse and
published ones from zebrafish, we developed and evaluated
A
B
Figure 3. TheWeighted Context+ Model Improves Target Prediction
(A) Calculation of wContext+ scores. For each site, the context+ score,
calculated using the TargetScan linear regression model, is weighted by a cell-
type-specific AIR. For genes with multiple sites, the scores for each individual
site are added to yield the total wContext+ score.
(B) Improved performance of the wContext+ model. Plotted are r2 values
calculated from the correlation (Pearson r) between score and observed
change in the indicated transfection data set. For the previous model
(context+), three different 30 UTR annotations were used: the RefSeq anno-
tation (dark blue); the longest isoform, as determined by 3P-seq (light blue);
and the major isoform, as determined by 3P-seq (purple).
Molecular Cell
Effects of Cellular Context on miRNA Repression
1036 Molecular Cell 53, 1031–1043, March 20, 2014 ª2014 Else nc.
212
lecul
wContext+models for miR-22 targeting inmice andmiR-430 tar-
geting in zebrafish embryos. Although the overall repression
differed in magnitude from that observed for the exogenous
miRNAs in human cells, with the magnitude of endogenous
miR-22 repression being much lower, and that of endogenous
miR-430 being much higher, the results resembled those
A B C
D E
F G H I
J
Figure 4. Differential miRNA-Mediated Repression Is Often Due to Alternative 30 UTR Isoform Usage
(A) Genes with differential AIRs are enriched in genes that are differentially repressed. This panel is as in Figure 1A, but highlighting genes with significantly
different repression that also have wContext+ score differencesR0.03 (orange).
(B) Higher AIR of CHURC1 miR-155 sites in HeLa compared to HEK293 cells. Otherwise, this panel is as in Figure 2A.
(C) Greater miR-155 repression ofCHURC1 in HeLa cells. Plotted are the wContext+ and expression change forCHURC1 in HeLa (pink) and HEK293 (blue) cells.
(D) This panel is as in (B), except for ATAD2B, a gene with higher AIR and greater miR-155 repression in HEK293 cells.
(E) This panel is as in (C), except for ATAD2B, a gene with higher AIR and greater miR-155 repression in HEK293 cells.
(F) This panel is as in (A), except comparing changes mediated by miR-124 in HeLa and HEK293 cells.
(G) This panel is as in (C), except for ANTXR2, a gene with higher AIR and greater miR-124 repression in HeLa cells.
(H) This panel is as in (A), except comparing changes mediated by miR-124 in HEK293 and HeLa cells.
(I) This panel is as in (C), except for CLDN1, a gene with higher AIR and greater miR-124 repression in HeLa cells.
(J) Direct measurements of miR-155-mediated repression of 30 UTR segments from nine genes initially classified as differentially regulated, despite having similar
AIRs.Renilla luciferase reporters followed by 30 UTR segments (with either wild-type ormutatedmiR-155 sites) from the indicated geneswere transfected into either
HeLa or HEK293 cells in the presence of the cognate (miR-155) or a noncognate (miR-1) miRNA. Five genes were originally repressed more in HeLa cells in the
genome-wideanalyses (highlighted inpink), and fourwereoriginally repressedmore inHEK293cells (highlighted inblue).Plottedare thenormalizedrepressionvalues,
with error bars representing the third largest and third smallest values. Significance was calculated with theMann-Whitney U test (*p < 0.05, **p < 0.01, ***p < 0.001).
Molecular Cell
Effects of Cellular Context on miRNA Repression
Mo ar Cell 53, 1031–1043, March 20, 2014 ª2014 Elsevier Inc. 1037
vier
213
I
observed for targeting by exogenous miRNAs, with the
wContext+ model outperforming the context+ model for all tis-
sues except the kidney (Figure 5B). The greatest difference
was observed in the zebrafish embryo, where the wContext+
model outperformed the context+model bymore than 70% (Fig-
ure 5C, r2 = 0.194 and 0.112, respectively). As in human cell lines,
some of the improvement was attributable to more accurate
identification of the major 30 UTR isoforms, but most was attrib-
utable to considering the AIRs, which capture the heterogeneity
of the 30 UTR landscape.
Alternative Cleavage and Polyadenylation Causes
Differential Repression by Endogenous miRNAs
To determine the extent to which repression by miRNAs in their
endogenous contexts varies between different tissues, we
applied the D value score to the miR-22 data sets, focusing on
the five tissues with significant repression. Although fold-change
signals were more variable and weaker than those observed in
the human cell lines, as judged by a higher D value cutoff, a
similar fraction of predicted targets showed differential repres-
sion in any pairwise comparison (7.7%, on average; Figures
S5B–S5F and Table S5). For instance, in comparing repression
mediated by miR-22 in liver and heart cells (Figure S5C), 74 of
545 genes with 7mer or 8mer sites in their 30 UTRs were differen-
tially repressed (13.6%).
For each pair of cell types, APA affected a significant fraction
of differentially repressed predicted targets (Figures S5G–S5K,
p = 1.03 1016 to 0.027). For instance, when comparing muscle
and heart cells, APA explained 12.3% of differentially repressed
targets (Figure S5G, p = 0.027). Mycbp, an example of such a
target, was effectively targeted in muscle cells, where its longer
isoform was more expressed, but not in the heart, where a
shorter isoform predominated (Figure S5G). Reciprocally,
Ctnnal1 was more effectively targeted in heart cells, where its
longer isoform was more expressed, than in the muscle (Fig-
ure S5G). Thus, as with exogenously delivered miRNAs, APA ex-
plained some of the observed differential repression.
30 UTR Heterogeneity Measured in One Cell Type
Improves the Targeting Model for Other Cell Types
Despite clear examples of cell-type-specific 30 UTR heterogene-
ity (Figures 2 and 4), AIRs were often similar in diverse cells or
tissues, suggesting that for cells in which AIRs cannot be calcu-
lated (due to the lack of 3P-seq data), AIRs from other cell types
of the same species might still improve the targeting model. To
test this idea, we evaluated wContext+ models that were based
on noncognate human and mouse cell types with expression
changes by miRNAs observed in the cognate cells. Importantly,
wContext+ models based on the other cell types still outper-
formed the previous model (Figures 6A and 6B), presumably
because the advantage of considering constitutive isoform ratios
more than offset any disadvantage of training on noncognate
alternative ratios.
We then developed a murine wContext+ model, using AIRs
calculated from 3P-seq analysis of mESCs and NIH 3T3 cells,
and evaluated this model using data reporting mRNA changes
after deleting either miR-223 or miR-155 (Guo et al., 2010; John-
nidis et al., 2008; Rodriguez et al., 2007). As observed for
cognate cells, AIR and targeting efficacy were correlated such
that sites with higher AIRs in mESCs or 3T3 cells were more
derepressed in the knockout data sets (data not shown). More-
over, despite being based on noncognate AIRs from mESCs
A B
C
Figure 5. Alternative 30 UTR Isoform Usage Affects Targeting by Endogenous miRNAs
(A) Relationship between AIR and endogenous repression bymiR-22. This panel is as in Figures 2D–2G, but comparingmRNA changes inmouse tissues (muscle,
heart, liver, kidney, white adipose tissue [WAT], and lung) with and without miR-22.
(B) Improved performance of the wContext+ model for predicting endogenous miR-22 targeting in mice. Otherwise, this panel is as in Figure 3B.
(C) Improved performance of the wContext+ model for predicting endogenous miR-430 targeting in zebrafish embryos. This panel is as in Figure 3B, except
analyzing predicted miR-430 targets in wild-type embryos and embryos that lack miR-430 (MZ-Dicer) at 9 hr postfertilization (hpf).
Molecular Cell
Effects of Cellular Context on miRNA Repression
1038 Molecular Cell 53, 1031–1043, March 20, 2014 ª2014 Else nc.
214
lecul
and NIH 3T3 cells, the wContext+ model outperformed context+
models for miR-155 and miR-223 targeting in different cell types
(Figure 6C). These results extended our conclusions to additional
instances of endogenous miRNA targeting. More importantly,
they extended the practical utility of considering isoform hetero-
geneity, showing that by exploiting similarities of isoform ratios
between different cell types, this approach can improve predic-
tions of targeting efficacy, even in cell types for which detailed
information on isoform heterogeneity has not yet been acquired
(which is the vast majority of cell types).
This being said, wContext+ models performed best when
tested on the cell type for which the isoform data had been
acquired (Figures 6A and 6B), presumably because extrapola-
tion of isoform information from one cell type to another fails to
capture key instances in which differential APA causes cell-
type-specific targeting. Indeed, when we repeated this compar-
ison, but this time excluding all genes initially classified as
differential targets, the cognate model still outperformed that
based on other cell types (Figure S6). Thus, differential APA
broadly underlies cell-type-specific targeting, affecting even
those genes that were not identified in our initial analysis as being
differentially regulated because the differences did not exceed
our threshold for statistical significance.
miRNA Targeting Can Affect the 30 UTR Landscape
Having found that alternative isoform usage influenced miRNA
targeting, we tested whether the reciprocal relationship could
also be detected: does miRNA-mediated repression influence
isoform usage? To examine the effects of miR-22 on the
30 UTR landscape, we compared 3P-seq data sets generated
from wild-type and miR-22 knockout mice for the five tissues
A
B C
Figure 6. Considering Isoform Ratios Im-
proves the Model of miRNA Targeting in
Noncognate Cell Types
(A) The performance of non-cell-type-specific
wContext+ models for exogenous miRNAs. A
comparison of performance of the original
context+ model (dark blue), the cell-type-specific
wContext+ model (pink), and the wContext+
model based on 3P-seq from other cell types
(gray; error bars, SD). Otherwise, this panel is as in
Figure 3B.
(B) This panel is as in (A), but for endogenous tar-
geting by murine miR-22.
(C) Non-cell-type-specific wContext+ model im-
proves prediction of endogenous targeting medi-
ated by miR-223 in neutrophils and miR-155 in B
and Th1 cells. Otherwise, this panel is as in (A).
in which significant miR-22 repression
was observed (heart, kidney, liver, mus-
cle, and WAT). For all of these tissues,
predicted targets with sites in the variable
region had longer weighted 30 UTRs in the
miR-22 knockout mice. This lengthening
was significant in comparison to control
sites (Figure 7; p = 0.0001–0.0096),
consistent with a model in which the
longer isoform(s) are specifically targeted and repressed in
wild-type, but not mutant, cells. We obtained similar results
when using 3P tags to quantify the preferential targeting of the
longer isoform of genes containing a site in their variable region
(Figure S7A and S7B).
We also examined the effects of miR-430 in zebrafish
embryos, which robustly represses its targets during the
maternal-to-zygotic transition (Giraldez et al., 2006). Similar to
that observed with murine miR-22, the 30 UTR landscape was
shaped by miR-430 (Figures S7C–S7E). Consistent with a model
in which isoform usage has already been shaped by miR-430
repression by 6 hpf, wContext+ scores calculated with 2 hpf
3P-seq data were more predictive of miRNA-dependent expres-
sion changes than those calculated with 6 hpf 3P-seq data (Fig-
ure S7F). Together, these results demonstrate that repression by
miRNAs in the cytoplasm helps shape the relative expression of
UTR isoforms and highlights the interplay between these two
processes.
DISCUSSION
Differential expression of miRNAs and their mRNA targets clearly
provides an important mechanism to influence the target reper-
toire of the miRNAs. Less clear has been the extent to which
different cellular contexts additionally influence the targeting of
coexpressed mRNAs by coexpressed miRNAs. For both endog-
enously and exogenously expressed miRNAs, we found rela-
tively few site-containing, coexpressed genes with detectable
cell-type-specific differences in their responses. When identi-
fying a target as responding differently in two cellular contexts,
we considered the variance as well as the magnitude of the
Molecular Cell
Effects of Cellular Context on miRNA Repression
Mo ar Cell 53, 1031–1043, March 20, 2014 ª2014 Elsevier Inc. 1039
vier
215
I
difference in repression. One implication of this approach is that
as the number or accuracy of those measurements increases,
the lowered experimental uncertainty will enable additional dif-
ferential targets to be identified. However, our result of an overall
uniformity of target repression will not change, as most magni-
tudes of the newly detected differences will be smaller than
those currently detected.
For those targets that responded differentially, one important
mechanistic explanation is differential 30 UTR isoform usage that
influences either the inclusion of sites or their placement within
more or less favorable contexts. Site-containing genes that
were affected by differential 30 UTR isoform usage were signifi-
cantly enriched in the differentially repressed set. Furthermore,
differential isoform usage presumably affects many additional
genes that have differences too modest to be confidently iden-
tified in our initial analysis of differentially expressed genes.
Indeed, when comparing 30 UTR isoforms observed in any
two cell types, approximately 10% of predicted targets are
likely to be affected by differential usage. Moreover, cognate
wContext+ models outperformed models that considered
constitutive isoform ratios (but not the cognate cell-type-
specific ratios), which demonstrated the importance of cell-
type-specific APA events on miRNA targeting, even for targets
that were not originally identified as responding differentially
(Figure S6).
More generally, despite known inter- and intracellular hetero-
geneity in the 30 UTR landscape and the corresponding effects
on regulatory site inclusion (Derti et al., 2012; Hoque et al.,
2013; Mayr and Bartel, 2009; Sandberg et al., 2008; Smibert
et al., 2012; Ulitsky et al., 2012), miRNA-target prediction has,
until this study, largely ignored the effects of alternative isoform
usage. With transcriptome-wide cell-type-specific 30 UTR anno-
tation becoming more common, wContext+ models might even-
tually be generated for each tissue or cell line of interest. In the
meantime, for the many cell types for which such annotations
are not yet available, predicting targets using isoform data
from noncognate cell types still improves performance over
previous algorithms because it enables consideration of consti-
tutive isoform ratios. Accordingly, the next version of TargetScan
will implement a non-cell-type-specific wContext+ model for
human, mouse, and fish predictions.
Studies to understand the mechanisms underlying the defini-
tion of the 30 UTR landscape have focused primarily on nuclear
events—i.e., cleavage and polyadenylation—since these are
the prime contributors in determining 30 UTR isoform usage
(Berg et al., 2012; Bhattacharjee and Bag, 2012; Lee et al.,
2007). Nevertheless, we show that cytoplasmic events also
shape this landscape by differentially modulating the stability
of short and long isoforms. Repression mediated by miR-22
had statistically significant effects on the 30 UTR landscape in
somatic tissues, but the effect of miRNA targeting was most
apparent in zebrafish embryos, where targeting by miR-430 is
especially robust. Perhaps the interplay between miRNA target-
ing and 30 UTR isoform usage has the greatest biological impact
during tightly regulated spatiotemporal processes, such as early
embryonic development.
A B C
D E
Figure 7. Repression by miR-22 Shapes the 30 UTR Landscape
(A–E) Influence of miR-22 targeting on 30 UTR isoform usage. Weighted 30 UTR lengths were determined using 3P-seq data from heart (A), liver (B), muscle (C),
kidney (D), and WAT (E). Plotted are the cumulative distributions of the differences in lengths (subtracting that of the wild-type tissue from that of the miR-22
knockout tissue) for geneswith control sites in the variable region (gray) and thosewithmiR-22 sites in the variable region (red). Significancewas determined using
the Kolmogorov-Smirnov test.
Molecular Cell
Effects of Cellular Context on miRNA Repression
1040 Molecular Cell 53, 1031–1043, March 20, 2014 ª2014 Else nc.
216
lecul
The other mechanisms that might account for cell-type-spe-
cific effects of the miRNA can be grouped into two categories,
those involving actual differences in targeting itself and those
mediated through secondary effects of introducing the miRNA.
To distinguish between these two possibilities, we used
luciferase assays to isolate miRNA-mediated repression from
secondary effects, focusing on nine predicted targets that
responded differently to the miRNA despite uniform AIRs in
the two cell types. Only two of the nine retained differential tar-
geting in the luciferase assay, suggesting that most differential
effects not explained by alternative isoform ratios were the
result of secondary effects. These two genes, LPIN1 and
LMBRD2, are interesting candidates for future work in under-
standing, at the molecular level, how differences in cellular
context mediate differences in miRNA-target interactions.
Nonetheless, our observation of so few instances in which dif-
ferential targeting explained differential effects suggests that
miRNA targeting is remarkably uniform between cell types and
that a miRNA-target interaction identified in one cellular context
will generally hold in other contexts in which the target site is
present (i.e., has a high AIR) and the miRNA is expressed at a
level sufficient to guide repression.
Perhaps some miRNAs have target repertoires more substan-
tially affected by different cellular contexts, but we were unable
to identify any in our study, although we examined exogenously
and endogenously expressed miRNAs in a variety of tissues in
three different vertebrates. Indeed, in light of our results, the
initial example of differential targeting—that of Dnd1 modulating
miR-430 repression (Kedde et al., 2010)—is now all the more
striking, as it appears to represent the exception rather than
the rule. Perhaps cellular contexts affect other types of posttran-
scriptional pathways to a greater extent. Are other regulatory
programs (such as that mediated by AU-rich elements) primarily
modulated by APA, or are these primarily influenced by the
expression of other 30 UTR-binding proteins? These remain
important and unanswered questions for our understanding
and prediction of posttranscriptional regulation.
EXPERIMENTAL PROCEDURES
Cell Culture
HEK293 (ATCC), HeLa (ATCC), and Huh7 (Health Science Research Resource
Bank) cells were cultured as recommended by themanufacturer in Dulbecco’s
modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum
(Clontech) and penicillin/streptomycin.
Plasmids
Plasmids were constructed as described (Supplemental Information).
miRNA Transfections
Cells were transfected with Lipofectamine 2000 (Invitrogen) and 100 nM
miRNA duplex or pUC19, as recommended by the manufacturer. After 24 hr,
cells were harvested, and RNA was extracted using TRI Reagent (Life
Technologies).
RNA-Seq Library Preparation
After RNA isolation, poly(A)+ RNA was selected using oligo(dT) beads
(Invitrogen). Strand-specific RNA-seq libraries were prepared as previously
described (Guo et al., 2010) or using a dUTP-based approach (Bioo Scientific)
according to the manufacturer’s directions.
3P-Seq Sample Preparation
RNA from wild-type and miR-22 knockout (Gurha et al., 2012) mouse tissues
was isolated by adding a steel bead and 1 ml of TRI Reagent to tissues and
then vortexing for 2 min in a TissueLyser II (QIAGEN) at 30 Hz twice. The ho-
mogenate was centrifuged for 8 min at 12,000 3 g, and the supernatant was
purified according to the manufacturer’s protocol, with an additional phenol/
chloroform extraction after phase separation. 3P-seq libraries were prepared
from 75 mg of isolated RNA (mouse tissues, mESC, NIH 3T3, HeLa, HEK293,
Huh7, IMR90 cells) as described previously (Jan et al., 2011) with modifica-
tions (see Supplemental Information).
Luciferase Assays
HEK293 andHeLa cells were plated in 24-well plates 24 hr prior to transfection.
Cells were transfected using Lipofectamine 2000 and Opti-MEM with 100 ng
of Renilla luciferase reporter plasmid and 20 ng of firefly luciferase control
reporter plasmid pIS0 (Grimson et al., 2007) per well. Cells were harvested
after 24 hr. Luciferase activities were measured using dual-luciferase assays,
as described by the manufacturer (Promega). Three or four biological
replicates, each with three technical replicates (i.e., three different wells
transfected on the same day), were performed. Renilla activity was first
normalized to firefly activity to control for transfection efficiency. As described
previously (Grimson et al., 2007), repression of the reporter with wild-type sites
was then additionally normalized to that of a reporter in which the sites were
mutated. Fold repression was calculated relative to that of the noncognate
miRNA.
Mice
The mice harboring the null miR-22 mutant allele were described previously
(Gurha et al., 2012). All animal procedures were approved by the Baylor
College of Medicine Institutional Animal Care and Use Committee (Animal
Protocol 4930). Microarrays were carried out using Illumina Mouse WG-6
v1.1 Whole-Genome Expression BeadChips on 9-week-old miR-22 null and
wild-type mice as described previously (Gurha et al. 2012).
ACCESSION NUMBERS
The NCBI GEO accession number for the microarray data from wild-type and
miR-155 knockout B cells reported in this paper is GSE52940. Transcript
profiling by microarray for wild-type and miR-22 knockout mouse tissues is
deposited in EBI ArrayExpress as E-MTAB-2038. The NCBI GEO accession
number for the RNA-seq and 3P-seq data sets reported in this paper is
GSE52531.
SUPPLEMENTAL INFORMATION
Supplemental Information includes Supplemental Experimental Procedures,
seven figures, and five tables and can be found with this article online at
http://dx.doi.org/10.1016/j.molcel.2014.02.013.
ACKNOWLEDGMENTS
We thank theWI genome technology core for sequencing and members of the
Bartel and Nam labs for helpful comments and discussions. We also thank
C. Shin and D. Baek for providing B cell microarray data. This work was sup-
ported by the KRIBB Research Initiative Program and the Basic Science
Research Program through NRF, funded by the Ministry of Science, ICT &
Future Planning, awarded to J.-W.N. (NRF-2013R1A1A1010185), grants
from the NIH to D.P.B. and O.S.R. (RO1 GM067031 and K99 GM102319),
and an NSF Graduate Research Fellowship to V.A. D.P.B. is an investigator
of the Howard Hughes Medical Institute.
Received: November 4, 2013
Revised: January 27, 2014
Accepted: February 6, 2014
Published: March 13, 2014
Molecular Cell
Effects of Cellular Context on miRNA Repression
Mo ar Cell 53, 1031–1043, March 20, 2014 ª2014 Elsevier Inc. 1041
vier
217
I
REFERENCES
Baek, D., Ville´n, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008).
The impact of microRNAs on protein output. Nature 455, 64–71.
Bartel, D.P. (2009). MicroRNAs: target recognition and regulatory functions.
Cell 136, 215–233.
Berg, M.G., Singh, L.N., Younis, I., Liu, Q., Pinto, A.M., Kaida, D., Zhang, Z.,
Cho, S., Sherrill-Mix, S., Wan, L., and Dreyfuss, G. (2012). U1 snRNP deter-
mines mRNA length and regulates isoform expression. Cell 150, 53–64.
Betel, D., Koppal, A., Agius, P., Sander, C., and Leslie, C. (2010).
Comprehensive modeling of microRNA targets predicts functional non-
conserved and non-canonical sites. Genome Biol. 11, R90.
Bhattacharjee, R.B., and Bag, J. (2012). Depletion of nuclear poly(A) bind-
ing protein PABPN1 produces a compensatory response by cytoplasmic
PABP4 and PABP5 in cultured human cells. PLoS ONE 7, e53036.
Chi, S.W., Hannon, G.J., and Darnell, R.B. (2012). An alternative mode of
microRNA target recognition. Nat. Struct. Mol. Biol. 19, 321–327.
Cooper, S.J., Trinklein, N.D., Nguyen, L., and Myers, R.M. (2007). Serum
response factor binding sites differ in three human cell types. Genome Res.
17, 136–144.
Derti, A., Garrett-Engele, P., Macisaac, K.D., Stevens, R.C., Sriram, S., Chen,
R., Rohl, C.A., Johnson, J.M., and Babak, T. (2012). A quantitative atlas of
polyadenylation in five mammals. Genome Res. 22, 1173–1183.
Farnham, P.J. (2009). Insights from genomic profiling of transcription factors.
Nat. Rev. Genet. 10, 605–616.
Friedman, R.C., Farh, K.K.-H., Burge, C.B., and Bartel, D.P. (2009). Most
mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19,
92–105.
Garcia, D.M., Baek, D., Shin, C., Bell, G.W., Grimson, A., and Bartel, D.P.
(2011). Weak seed-pairing stability and high target-site abundance decrease
the proficiency of lsy-6 and other microRNAs. Nat. Struct. Mol. Biol. 18,
1139–1146.
Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K.,
Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenyla-
tion and clearance of maternal mRNAs. Science 312, 75–79.
Grimson, A., Farh, K.K.-H., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and
Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants
beyond seed pairing. Mol. Cell 27, 91–105.
Gu, S., Jin, L., Zhang, F., Sarnow, P., and Kay, M.A. (2009). Biological basis for
restriction of microRNA targets to the 30 untranslated region in mammalian
mRNAs. Nat. Struct. Mol. Biol. 16, 144–150.
Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian
microRNAs predominantly act to decrease target mRNA levels. Nature 466,
835–840.
Gurha, P., Abreu-Goodger, C., Wang, T., Ramirez, M.O., Drumond, A.L., van
Dongen, S., Chen, Y., Bartonicek, N., Enright, A.J., Lee, B., et al. (2012).
Targeted deletion of microRNA-22 promotes stress-induced cardiac dilation
and contractile dysfunction. Circulation 125, 2751–2761.
Helwak, A., Kudla, G., Dudnakova, T., and Tollervey, D. (2013). Mapping the
human miRNA interactome by CLASH reveals frequent noncanonical binding.
Cell 153, 654–665.
Hendrickson, D.G., Hogan, D.J., McCullough, H.L., Myers, J.W., Herschlag,
D., Ferrell, J.E., and Brown, P.O. (2009). Concordant regulation of translation
and mRNA abundance for hundreds of targets of a human microRNA. PLoS
Biol. 7, e1000238.
Hoque, M., Ji, Z., Zheng, D., Luo, W., Li, W., You, B., Park, J.Y., Yehia, G., and
Tian, B. (2013). Analysis of alternative cleavage and polyadenylation by 30 re-
gion extraction and deep sequencing. Nat. Methods 10, 133–139.
Jan, C.H., Friedman, R.C., Ruby, J.G., and Bartel, D.P. (2011). Formation,
regulation and evolution of Caenorhabditis elegans 3’UTRs. Nature 469,
97–101.
Ji, Z., Lee, J.Y., Pan, Z., Jiang, B., and Tian, B. (2009). Progressive lengthening
of 30 untranslated regions of mRNAs by alternative polyadenylation during
mouse embryonic development. Proc. Natl. Acad. Sci. USA 106, 7028–7033.
Ji, Z., Luo, W., Li, W., Hoque, M., Pan, Z., Zhao, Y., and Tian, B. (2011).
Transcriptional activity regulates alternative cleavage and polyadenylation.
Mol. Syst. Biol. 7, 534.
Johnnidis, J.B., Harris, M.H., Wheeler, R.T., Stehling-Sun, S., Lam, M.H.,
Kirak, O., Brummelkamp, T.R., Fleming, M.D., and Camargo, F.D. (2008).
Regulation of progenitor cell proliferation and granulocyte function by
microRNA-223. Nature 451, 1125–1129.
Kedde, M., Strasser, M.J., Boldajipour, B., Oude Vrielink, J.A.F., Slanchev, K.,
le Sage, C., Nagel, R., Voorhoeve, P.M., van Duijse, J., Ørom, U.A., et al.
(2007). RNA-binding protein Dnd1 inhibits microRNA access to target
mRNA. Cell 131, 1273–1286.
Kedde, M., van Kouwenhove, M., Zwart, W., Oude Vrielink, J.A.F., Elkon, R.,
and Agami, R. (2010). A Pumilio-induced RNA structure switch in p27-30
UTR controls miR-221 and miR-222 accessibility. Nat. Cell Biol. 12, 1014–
1020.
Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U., and Segal, E. (2007). The role
of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–
1284.
Khorshid, M., Hausser, J., Zavolan, M., and van Nimwegen, E. (2013). A bio-
physical miRNA-mRNA interaction model infers canonical and noncanonical
targets. Nat. Methods 10, 253–255.
Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer,
S., Rice, A., Kamphorst, A.O., Landthaler, M., et al. (2007). A mammalian
microRNA expression atlas based on small RNA library sequencing. Cell
129, 1401–1414.
Lee, J.Y., Yeh, I., Park, J.Y., and Tian, B. (2007). PolyA_DB 2: mRNA polyade-
nylation sites in vertebrate genes. Nucleic Acids Res. 35 (Database issue),
D165–D168.
Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing,
often flanked by adenosines, indicates that thousands of human genes are
microRNA targets. Cell 120, 15–20.
Lianoglou, S., Garg, V., Yang, J.L., Leslie, C.S., and Mayr, C. (2013).
Ubiquitously transcribed genes use alternative polyadenylation to achieve
tissue-specific expression. Genes Dev. 27, 2380–2396.
Loeb, G.B., Khan, A.A., Canner, D., Hiatt, J.B., Shendure, J., Darnell, R.B.,
Leslie, C.S., and Rudensky, A.Y. (2012). Transcriptome-wide miR-155 binding
map reveals widespread noncanonical microRNA targeting. Mol. Cell 48,
760–770.
Majoros, W.H., Lekprasert, P., Mukherjee, N., Skalsky, R.L., Corcoran, D.L.,
Cullen, B.R., and Ohler, U. (2013). MicroRNA target site identification by inte-
grating sequence and binding information. Nat. Methods 10, 630–633.
Mayr, C., and Bartel, D.P. (2009). Widespread shortening of 3’UTRs by alterna-
tive cleavage and polyadenylation activates oncogenes in cancer cells. Cell
138, 673–684.
Miles, W.O., Tscho¨p, K., Herr, A., Ji, J.-Y., and Dyson, N.J. (2012). Pumilio
facilitates miRNA regulation of the E2F3 oncogene. Genes Dev. 26, 356–368.
Miyamoto, S., Chiorini, J.A., Urcelay, E., and Safer, B. (1996). Regulation of
gene expression for translation initiation factor eIF-2 alpha: importance of
the 30 untranslated region. Biochem. J. 315, 791–798.
Nielsen, C.B., Shomron, N., Sandberg, R., Hornstein, E., Kitzman, J., and
Burge, C.B. (2007). Determinants of targeting by endogenous and exogenous
microRNAs and siRNAs. RNA 13, 1894–1910.
Rodriguez, A., Vigorito, E., Clare, S., Warren, M.V., Couttet, P., Soond, D.R.,
van Dongen, S., Grocock, R.J., Das, P.P., Miska, E.A., et al. (2007).
Requirement of bic/microRNA-155 for normal immune function. Science
316, 608–611.
Sandberg, R., Neilson, J.R., Sarma, A., Sharp, P.A., and Burge, C.B. (2008).
Proliferating cells express mRNAs with shortened 30 untranslated regions
and fewer microRNA target sites. Science 320, 1643–1647.
Molecular Cell
Effects of Cellular Context on miRNA Repression
1042 Molecular Cell 53, 1031–1043, March 20, 2014 ª2014 Else nc.
218
lecul
Shepard, P.J., Choi, E.-A., Lu, J., Flanagan, L.A., Hertel, K.J., and Shi, Y.
(2011). Complex and dynamic landscape of RNA polyadenylation revealed
by PAS-Seq. RNA 17, 761–772.
Shin, C., Nam, J.-W., Farh, K.K.-H., Chiang, H.R., Shkumatava, A., and Bartel,
D.P. (2010). Expanding the microRNA targeting code: functional sites with
centered pairing. Mol. Cell 38, 789–802.
Smibert, P.,Miura, P.,Westholm, J.O., Shenker, S.,May,G., Duff,M.O., Zhang,
D., Eads, B.D., Carlson, J., Brown, J.B., et al. (2012). Global patterns of tissue-
specific alternative polyadenylation in Drosophila. Cell Rep 1, 277–289.
Spies, N., Burge, C.B., and Bartel, D.P. (2013). 30 UTR-isoform choice has
limited influence on the stability and translational efficiency of most mRNAs
in mouse fibroblasts. Genome Res. 23, 2078–2090.
Tian, B., Hu, J., Zhang, H., and Lutz, C.S. (2005). A large-scale analysis of
mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 33,
201–212.
Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of micro-
arrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA
98, 5116–5121.
Ulitsky, I., Shkumatava, A., Jan, C.H., Subtelny, A.O., Koppstein, D., Bell,
G.W., Sive, H., and Bartel, D.P. (2012). Extensive alternative polyadenylation
during zebrafish development. Genome Res. 22, 2054–2066.
van Dongen, S., Abreu-Goodger, C., and Enright, A.J. (2008). Detecting
microRNA binding and siRNA off-target effects from expression data. Nat.
Methods 5, 1023–1025.
Molecular Cell
Effects of Cellular Context on miRNA Repression
Mo ar Cell 53, 1031–1043, March 20, 2014 ª2014 Elsevier Inc. 1043
Appendix 2. Assessing the ceRNA hypothesis with quantitative measurements of 
miRNA and target abundance 
Rémy Denzler1,2, Vikram Agarwal3,4,5, Joanna Stefano3,4, David P Bartel3,4, and Markus 
Stoffel1,2 
1Institute of Molecular Health Sciences, ETH Zurich, Otto-Stern-Weg 7, HPL H36, 8093 
Zurich, Switzerland 
2Competence Center of Systems Physiology and Metabolic Disease, ETH Zurich, Otto-
Stern-Weg 7, 8093 Zurich, Switzerland 
3Howard Hughes Medical Institute and Whitehead Institute for Biomedical Research, 
Cambridge, MA 02142, USA 
4Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, 
USA 
5Computational and Systems Biology Program, Massachusetts Institute of Technology, 
Cambridge, MA 02139, USA 
V.A. performed computational analysis. R.D. performed experiments. J.S. generated RNA 
sequencing data. R.D. and M.S. designed the study. R.D., V.A., D.P.B., and M.S. wrote 
the manuscript. 
Published as: 
Denzler R, Agarwal V, Stefano J, Bartel DP, Stoffel M. "Assessing the ceRNA 
hypothesis with quantitative measurements of miRNA and target abundance". 2014. 
Molecular Cell 54(5):766-776. 
219
c.
220
Molecular Cell
Article
Assessing the ceRNA Hypothesis with Quantitative
Measurements of miRNA and Target Abundance
Re´my Denzler,1,2 Vikram Agarwal,3,4,5 Joanna Stefano,3,4 David P. Bartel,3,4,* and Markus Stoffel1,2,*
1Institute of Molecular Health Sciences, ETH Zurich, Otto-Stern-Weg 7, HPL H36, 8093 Zurich, Switzerland
2Competence Center of Systems Physiology and Metabolic Disease, ETH Zurich, Otto-Stern-Weg 7, 8093 Zurich, Switzerland
3Howard Hughes Medical Institute and Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
4Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
5Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
*Correspondence: dbartel@wi.mit.edu (D.P.B.), stoffel@biol.ethz.ch (M.S.)
http://dx.doi.org/10.1016/j.molcel.2014.03.045
SUMMARY
Recent studies have reported that competitive
endogenous RNAs (ceRNAs) can act as sponges
for a microRNA (miRNA) through their binding sites
and that changes in ceRNA abundances from indi-
vidual genes can modulate the activity of miRNAs.
Consideration of this hypothesis would benefit from
knowing the quantitative relationship between a
miRNA and its endogenous target sites. Here, we
altered intracellular target site abundance through
expression of an miR-122 target in hepatocytes and
livers and analyzed the effects on miR-122 target
genes. Target repression was released in a
threshold-like manner at high target site abundance
(R1.5 3 105 added target sites per cell), and this
threshold was insensitive to the effective levels of
the miRNA. Furthermore, in response to extreme
metabolic liver disease models, global target site
abundance of hepatocytes did not change suffi-
ciently to affect miRNA-mediated repression. Thus,
modulation of miRNA target abundance is unlikely
to cause significant effects on gene expression and
metabolism through a ceRNA effect.
INTRODUCTION
MicroRNAs (miRNAs) are an abundant class of small noncoding
RNAs that regulate gene expression at the levels of mRNA stabil-
ity and translation (Pillai et al., 2005; Eulalio et al., 2008; Guo
et al., 2010). They pair to target sites (referred to as miRNA
response elements [MREs]) within mRNAs to direct the posttran-
scriptional downregulation of these mRNA targets. The human
genome has more than 500 miRNA genes, and miRNAs from in-
dividual gene families are able to target hundreds of different
messenger RNAs (Baek et al., 2008; Friedman et al., 2009). Given
that more than half of all human mRNAs are estimated to be
conserved miRNA targets, miRNAs are thought to have wide-
spread effects on gene regulation (Friedman et al., 2009). Even
though many miRNA knockout models show no apparent defect
under normal conditions, they frequently exhibit miRNA-depen-
dent phenotypes when specific stresses are applied (Li et al.,
2009; Brenner et al., 2010). Therefore, miRNAs are proposed
to be critical regulators in stress signal mediation and modula-
tion, where inadequate miRNA levels and responses can cause
or exacerbate disease (Mendell and Olson, 2012).
Highly expressed site-containing RNAs, either found naturally
or delivered as research reagents, can act as ‘‘sponges’’ to
titrate miRNAs away from other normal targets (Ebert et al.,
2007; Franco-Zorrilla et al., 2007; Mukherji et al., 2011; Hansen
et al., 2013; Memczak et al., 2013). Theoretical and experimental
reports have claimed that crosstalk between site-containing
RNAs extends far beyond a few highly expressed sponges.
Analyses of high-throughput data sets indicate that the activity
of a miRNA is not just dependent on its levels but also its relative
target site abundance (TA), defined as the relative number of
sites within the transcriptome for that miRNA (Arvey et al.,
2010; Garcia et al., 2011). One hypothesis suggests that this
crosstalk has a widespread regulatory function, with the act of
titratingmiRNAs away from their other targets somehow explain-
ing why so many target sites have been conserved in evolution
(Seitz, 2009). This idea is extended to the notion that many
miRNA targets act as competitive endogenous RNAs (ceRNAs)
that modulate the repression of other targets as their expression
increases or decreases (Salmena et al., 2011; Tay et al., 2011).
Experimental evidence for such a ceRNA crosstalk was initially
described for the tumor-suppressor gene PTEN, which appears
to be regulated by the abundance of its pseudogene (PTENP1) in
a DICER-dependent manner (Poliseno et al., 2010). Recent
studies have reported the potential physiological relevance of
other ceRNAs, including a long noncoding RNA that regulates
muscle differentiation (Cesana et al., 2011), an overexpressed
30 untranslated region (30 UTR) inducing cancer in transgenic
mice (Fang et al., 2013), and a circular RNA (circRNAs) regulating
miR-7 activity in the CNS (Hansen et al., 2013; Memczak et al.,
2013). However, such studies have used cancer cell lines with
abnormal miRNA and ceRNA expression (Poliseno et al., 2010;
Karreth et al., 2011), leaving their physiological relevance in pri-
mary cells unclear.
The ceRNA hypothesis is controversial because it is difficult to
imagine how the change in expression of individual miRNA tar-
gets, which each typically contribute a miniscule fraction of the
TA, could possibly influence enough miRNA molecules to affect
766 Molecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier In
221
M
regulation of other targets. Consideration of the ceRNA hypoth-
esis would clearly benefit from quantitative knowledge of the
intracellular relationship of miRNAs and their corresponding
target sites. Although some attempts have been undertaken to
evaluate this relationship, the data were typically acquired
in silico (Ala et al., 2013; Figliuzzi et al., 2013), in vitro with purified
components (Wee et al., 2012), or in experimental setups in
which rapidly dividing cells were transfected with synthetic
miRNAs, which complicate any interpretations more quantitative
than relative comparisons (Arvey et al., 2010; Garcia et al., 2011;
Tay et al., 2011). A more recent study not subject to these limita-
tions reported that miRNA efficacy tended to be higher for
miRNAs with lower predicted target:miRNA ratios but did not
address the question of how much change in ceRNA might be
required to detectably influence miRNA efficacy (Mullokandov
et al., 2012).
In this study, we analyzed the stoichiometric relationship of
miR-122 and its target sites by manipulating TA through
controlled expression of a validated target of miR-122 in primary
hepatocytes and livers. miR-122 has been linked to important
human diseases, such as hepatitis C, liver cancer, and hyper-
cholesterolemia, and its target genes have been well character-
ized (Jopling et al., 2005; Kru¨tzfeldt et al., 2005; Esau et al., 2006;
Tsai et al., 2009). Our absolute quantification of relevant entities
in primary cells and disease states provided insights on the rela-
tionship between miR-122 TA and miR-122 activity. These
results will facilitate future studies predicting the biologically
relevant range of TAs of other miRNAs and the magnitude of
change in target abundance required to influence gene expres-
sion through a ceRNA mechanism.
RESULTS
miRNA Target Derepression Is Detected at a High
Threshold of Added MREs
To assess the relationship between a miRNA and its MREs and
the effect of this relationship on target gene regulation, we chose
the highly expressed liver-specific miR-122 as a model system.
We manipulated endogenous MREs in a controlled manner by
overexpressing a full-length AldolaseA (AldoA) mRNA, a strong
and validated target of miR-122 (Kru¨tzfeldt et al., 2005), using re-
combinant adenoviruses (Ad-AldoA) carrying either a mutated
(Mut), one (1s), or three (3s) miR-122 binding site(s) (Figures 1A
and S1A). To eliminate potential off-target effects mediated by
the AldoA protein, we introduced a premature stop codon that
prevented translation of AldoA protein (Figure S1B).
To assess the stoichiometric relationship of miR-122 and the
added MREs in primary hepatocytes, we measured the absolute
number of these entities per cell. Quantitative RT-PCR measure-
ments calibrated with an internal standard curve of synthetic
miRNA revealed that miR-122 was expressed at 1.2 3 105 mol-
ecules per cell (Figure 1B), which was comparable to levels pre-
viously reported (Bissels et al., 2009). As expected, miR-16 and
miR-33 were each expressed at fewer copies per cell (1.1 3 104
and 1.2 3 103, respectively). Next, we measured the increased
miR-122 target abundance after infecting hepatocytes with Ad-
AldoA at three different multiplicities of infection (MOI; 2, 20,
and 200) with our constructs that introduced zero, one, or three
miR-122 MREs per AldoA transcript. Adenovirus constructs
showed very high transduction efficiencies (Figure S2A), and a
linear correlation was observed between viral dose and green
fluorescent protein (GFP) mRNA, which was expressed from
an independent promoter in the Ad-AldoA vector (Figure 1C).
Similar results were observed when monitoring GFP protein
levels (Figures S2B and S2C). At MOI 200, AldoA transcripts
increased from 3.3 3 103 (endogenous levels) to 0.8–1.1 3 106
molecules per cell (Figure 1D), introducing up to 2.63 106 AldoA
MREs per cell (Figure 1E). The ratio of AldoA to GFP mRNA
showed that the AldoA transcripts were repressed in an MRE-
dependent manner at MOI 2 and 20, which confirmed that
miR-122 was functionally engaging the MREs within these tran-
scripts (Figure 1F). This regulation disappeared at MOI 200,
suggesting that, at this very high MOI, AldoA transcript over-
whelmed the regulatory capacity of miR-122 (Figure 1F). Quanti-
fication of miR-122 confirmed that the loss of regulation was not
due to a loss in miR-122; even at very high levels, Ad-AldoA did
not influence the levels of either miR-122 or two control miRNAs,
although it did reduce miR-33 by 2-fold (Figure 1G).
Having observed a loss in AldoA repression at a high MOI, we
reasoned that high levels of AldoA transcript could act as a
sponge to also derepress cellular miR-122 targets. Indeed,
known miR-122 targets, but not a control transcript ApoM,
increased at a high MOI (Figures 1H and S2D). Interestingly,
this derepression was confidently detected only when AldoA
MREs exceeded 1.5–2.7 3 105 per cell. This threshold corre-
sponded to 1.25–2.25 MREs per miR-122 molecule. Once this
threshold was exceeded, additional AldoA MREs led to greater
miR-122 target derepression, and the magnitude correlated
with the number of miR-122 sites introduced by AldoA tran-
scripts. Altogether, these data demonstrate that derepression
mediated through increased expression of a miR-122 target
can occur but can be detected only after exceeding a high
threshold of added MREs.
The High Threshold Persists after Lowering miR-122
Activity
Two scenarios might explain the high threshold of added MREs
required to observe endogenous target derepression. The
‘‘excess miRNA’’ scenario posits that very abundant miRNAs
are present in excess over their targets, and thus competing
MREs would need to titrate this excess binding capacity before
they could exert an observable effect on endogenous target
repression. Our case of miR-122 in hepatocytes would be one
of the more attractive candidates for this scenario, given that
miR-122 is the most abundant miRNA in hepatocytes (Landgraf
et al., 2007). Indeed, its abundance of 1.2 3 105 molecules per
cell is among the highest reported for amiRNA in anymammalian
system. The second scenario is the ‘‘high TA’’ scenario. In this
scenario, the effective number of miRNA binding sites within
cellular transcripts is so high that even highly expressed miRNAs
are mostly bound to a site at any moment in time, and thus the
number of competing MREs would need to approach this high
effective number of sites before the competingMREs could exert
an observable impact on endogenous target repression. The
idea of many miRNA binding sites within cellular transcripts is
supported by reports that many miRNAs have hundreds of
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
olecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier Inc. 767
c.
222
conserved MREs (Friedman et al., 2009), miRNAs also repress
many additional mRNAs with nonconserved MREs (Farh et al.,
2005; Kru¨tzfeldt et al., 2005; Giraldez et al., 2006; Baek et al.,
2008), and high-throughput crosslinking identifies many addi-
tional binding sites that would not be classified as MREs
because they don’t mediate detectable repression (including
many sites within open reading frames and marginally effective
sites elsewhere) but would nonetheless add to the effective num-
ber of binding sites (Hafner et al., 2010). These two scenarios
predict two very different responses to miRNA reduction. In the
excess miRNA scenario, miRNA reduction would lower the
excess miRNA capacity and thereby lower the threshold of
added MREs required to observe endogenous target derepres-
sion. In the high TA scenario, the effective number of sites
already exceeds the miRNA abundance, and, more importantly,
the threshold relates to the effective number of binding sites and
pA
CMV
AldoA
CMV
AldoA pA
pA
CMV
AldoA
Mut
1s, miR-122
3s, miR-122
8 nt 17 nt
miR-33
2 20 200
Ad-AldoA Mut
Ad-AldoA 1s
Ad-AldoA 3s
MOI
miR-122
2 20 200
0.25
0.5
1
2
MOI
Fo
ld
ch
a
n
ge
(m
iR
N
A/
sn
o
20
2) miR-16
2 20 200
MOI
miR-107
2 20 200
MOI
2 20 200
0.1
1
10
102
103
MOI
Fo
ld
ch
an
ge
(G
FP
/3
6b
4)
Ndrg3
0 103104105106107
AldoA MRE per cell
P4ha1
0 103104105106107
AldoA MRE per cell
Slc7a1
0 103104105106107
AldoA MRE per cell
ApoM
0 103104105106107
Ad-AldoA Mut
Ad-AldoA 1s
Ad-AldoA 3s
AldoA MRE per cell
Tmed3
0 103104105106107
AldoA MRE per cell
Ccng1
0 103104105106107
AldoA MRE per cell
2 20 200
105
106
103
104
Ad-AldoA Mut
Ad-AldoA 1s
Ad-AldoA 3s
Ad-Ctrl
MOI
Co
pi
es
pe
r
ce
ll(
Al
do
A)
2 20 200
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
MOI
Fo
ld
ch
an
ge
(A
ld
o
A/
GF
P)
1 10 102103104105106107
102
103
104
105
106
107
miR-33
miR-16
miR-122 
Synthetic miRNA added
(copies/cell)
De
te
ct
e
d
m
iR
NA
(co
pi
e
s/
ce
ll)
2 20 200
0
103
104
105
106
107
MOI
Al
do
A
M
R
E
pe
r
ce
ll
A B DC
E F G
H
Gys1
0 103104105106107
1
2
4
AldoA MRE per cell
Fo
ld
ch
a
n
ge
(G
en
e
/3
6b
4)
Figure 1. miRNA Target Derepression Is Detected at a High Threshold of Added MREs
(A) Schematic overview of the different AldoA-expressing adenovirus constructs (Ad-AldoA) harboring either one (1s, blue) or three (3s, green) miR-122 binding
sites or a mutated site (Mut, red). Ad-AldoA 3s contained three 8 nt seed matches of miR-122 separated by 17 nt spacers. See also Figure S1.
(B) Absolute miRNA quantification of primary hepatocyte cell lysates spiked with different amounts of synthetic miRNA. Solid lines represent linear regression
data with respective 95% confidence intervals.
(C–H) Primary hepatocytes infected with different multiplicities of infection (MOI) of the Ad-AldoA constructs. Relative gene expression of GFP (C) and AldoA (F)
and absolute copy numbers per cell of AldoA (D) and AldoA MRE (E). Relative expression of miRNAs (G) or miR-122 target genes and a control nontarget gene
(ApoM) (H). See also Figure S2.
GFP andmiRNA expression are relative to Ad-AldoAMut at MOI 2; AldoA, miR-122 target genes and the control gene are relative to the respective Ad-AldoAMut
at given MOI. Data represent mean ± SEM (n = 3) for all panels.
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
768 Molecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier In
223
M
not the number of miRNA molecules. Thus, in this scenario,
miRNA reduction would lower the degree to which targets are
repressed, but it would not lower the threshold of added MREs
required to observe derepression.
By analyzing whether a change in miRNA levels influences the
threshold for the number of addedAldoAMREs needed for dere-
pression, we sought to experimentally evaluate which scenario
applies. We injected three different amounts (low, intermediate,
and high) of Antagomir-122 (Ant-122) into mice and found that
miR-122 levels detected in the primary hepatocytes were
reduced to 0.3, 0.08, and 0.01 of that observed in hepatocytes
from mice injected with the mismatch Ant-122 control (Ant-
122mm; Figure 2A) (Kru¨tzfeldt et al., 2005). Target gene dere-
pression correlated with decreased miR-122 levels, which
confirmed that our miR-122 quantification reflected miR-122
activity (Figure 2B). Next, we studied the effect of controlled
overexpression of AldoA MRE on target gene derepression in
hepatocytes with a modest 3-fold decrease in miR-122 levels.
Interestingly, derepression was detected only when exceeding
the threshold of 2 3 105 AldoA MREs per cell (Figure 2C). This
miR-122 miR-16
103
104
105
106 Ant-122 intermediate
Ant-122 low
Ant-122 high
Ant-122mm high
Co
pi
es
pe
rc
el
l(m
iRN
A)
Gys1
0 105106107
1
2
4
AldoA MRE
per cell
Fo
ld
ch
an
ge
(G
en
e
/3
6b
4)
103 104 105
0.25
0.5
1
Slc7a1
P4ha1
Gys1
AldoA
Ndrg3
Dyrk2
ApoM
miR-122
(copies/cell)
Fo
ld
ch
an
ge
(G
e
n
e
/3
6b
4)
Snrk
0 105 106 107
Ad-AldoA Mut
Ad-AldoA 1s
Ad-AldoA 3s
AldoA MRE
per cell
103 104 105
0
104
105
106
107
Ad-AldoA Mut
Ad-AldoA 1s
Ad-AldoA 3s
Ad-Ctrl
miR-122
(copies/cell)
Al
do
A
M
R
E
pe
r
ce
ll
103 104 105
105
106
107
103
104
miR-122
(copies/cell)
Co
pi
es
pe
r
ce
ll(
Al
do
A)
Gys1
103 104 105
1
2
4
miR-122
(copies/cell)
Fo
ld
ch
an
ge
(G
en
e
/3
6b
4)
Slc7a1
103 104 105
miR-122
(copies/cell)
Ndrg3
103 104 105
miR-122
(copies/cell)
Tmed3
103 104 105
miR-122
(copies/cell)
Ccng1
103 104 105
miR-122
(copies/cell)
Snrk
103 104 105
Ad-AldoA Mut
Ad-AldoA 1s
Ad-AldoA 3s
miR-122
(copies/cell)
A
B
C
F
Slc7a1
0 105 106 107
AldoA MRE
per cell
Ndrg3
0 105 106 107
AldoA MRE
per cell
Tmed3
0 105 106 107
AldoA MRE
per cell
Ccng1
0 105 106 107
AldoA MRE
per cell
D E
Figure 2. The High Threshold Persists after Lowering miR-122 Activity
(A) Absolute miRNA copy numbers per cell or (B) relative expression of miR-122 target genes and control nontarget genes (Dyrk2 and ApoM) in primary
hepatocytes from mice treated with Ant-122mm or different concentrations of Ant-122. Values for miR-122 target and control genes are normalized to that of the
lowest miR-122 concentration.
(C) Relative expression of miR-122 target genes and a nontarget gene (Snrk) in primary hepatocytes with 3-fold decreased miR-122 levels shown in (A), infected
with MOI 20 and 200 of Ad-AldoA Mut (red), 1s (blue), or 3s (green).
(D–F) Primary hepatocytes shown in (A) infectedwithMOI 200 of the three Ad-AldoA constructs. Absolute copy numbers per cell ofAldoA (D) andAldoAMRE (E) in
relation to miR-122 copy numbers.
(F) Relative expression of miR-122 target genes and control nontarget gene (Snrk) normalized to Ad-AldoA Mut of the respective miR-122 condition.
Absolute miRNA copy numbers were calculated by multiplying relative abundance (miRNA/snoRNA202) that were normalized to Ant-122mm with the copy
number evaluated in Figure 1B. Data represent mean ± SEM (n = 4) for all panels.
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
olecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier Inc. 769
c.
224
threshold was comparable to that observed in cells without
reduced miR-122 levels, which indicated that the reason for
the threshold was not excessmiR-122 binding capacity. Instead,
high TA is the more likely reason that the amount of addedMREs
must exceed a very high level before exerting an observable
effect.
Some studies claiming ceRNA-mediated gene regulation
focus on the number of sites to miRNA families that are shared
between the ceRNAs without differentiating between those
miRNAs that are expressed at a level sufficient to repress target
genes and those that are not (Jeyapalan et al., 2011; Fang et al.,
2013). To demonstrate that derepression can only occur in con-
ditions in which target gene repression is happening, we infected
hepatocytes harboring different miR-122 levels with Ad-AldoA at
MOI 200 andmeasured target derepression. Levels ofAldoA and
respective AldoAMRE copy number per cell were comparable in
all Ant-treated samples (Figures 2D and 2E).miR-122 target gene
derepression was between 1.5- and 2.5-fold in hepatocytes with
high miR-122 levels and below 1.5-fold in cells with intermediate
miR-122 activity (Figure 2F). No target gene derepression was
observed in hepatocytes with the lowest miR-122 levels. Alto-
gether, these data demonstrate that miRNAs need to exceed
an expression level sufficient to repress their targets in order
for targets to be derepressed in a ceRNA-dependent manner.
The Magnitude of Derepression Correlates with
Predicted Site Efficacy and Number of Added
AldoA MREs
Previous ceRNA studies have focused on only one or a few tar-
gets of a miRNA even though a ceRNA change that influences
miRNA activity would be expected to affect more than a few tar-
gets. Because any perturbation of a cell might result in spurious
expression changes in a few predicted targets, a transcriptome-
wide analysis examining the preferential effect on predicted tar-
gets would more confidently detect the influence of a competing
RNA. Therefore, we extended our quantitative analysis to the
transcriptome and performed RNA sequencing (RNA-seq) on
primary hepatocytes infected with different Ad-AldoA constructs
at MOI 2, 20, and 200. Then, we analyzed the relationship
between the derepression of predicted targets and their site
number, site type (6, 7, and 8 nt sites), site position, and other
determinants used by TargetScan to calculate total context+
scores of predicted miRNA targets (Lewis et al., 2005; Grimson
et al., 2007; Garcia et al., 2011). When predicted targets of
miR-122, miR-33, miR-16, or abundant miRNA families in liver
(either let-7, miR-192, or a combination of the next four most
abundant families) were distributed into ten context+ score
bins and plotted against their median fold change, the effect of
target derepression was evident for predicted targets of miR-
122 but not for those of any of the other miRNA families (Figures
3A, S3A, S3B, and Table S1). As expected, the extent of target
derepression correlated with the magnitude of the context+
score as well as with the number of added AldoA MREs. These
correlations were also observed in the fold change distributions
of miR-122 predicted targets (Figure 3B), and analogous results
were obtained when stratifying predicted targets by site type
(Figure S3C). Regardless of how we grouped the predicted tar-
gets, the same threshold of R1.5 3 105 added MREs per cell
was required in order to observe miR-122 derepression. We
also studied target gene derepression in primary hepatocytes
treated with Ant-122 or the mismatch control Ant-122mm and
found that the strongest predicted targets (e.g., those with a
context+ scores below –0.2) were significantly derepressed in
the Ant-122-treated conditions (Figures S3D–S3G and Table S1).
Modest Changes in Target Abundance Are Induced by
Metabolic Stress and Disease
Next, we sought to investigate the quantitative relationship be-
tweenMREs added upon Ad-AldoA infection and those normally
contributed by mRNAs of primary hepatocytes. First, we tested
how transcript abundances, measured by RNA-seq in fragments
per kilobase of transcript per million fragments mapped (FPKM)
correlated with the absolute copy numbers determined by quan-
titative PCR. To this end, we compared the expression levels of
four genes that are differentially expressed in primary hepato-
cyte and liver samples and found a linear relationship between
FPKM and absolute copy numbers over several orders of magni-
tude (Figure 4A), which allowed us to transform RNA-seq data to
absolute mRNA copies per cell. Then, we compared how AldoA
transcript abundance corresponded to genome mRNA abun-
dance at different MOIs of Ad-AldoA-infected hepatocytes.
The AldoA contribution ranged from 0.3%–0.8% at MOI of 2,
6%–12% at MOI 20, and > 50% of all mRNA at MOI 200 (Fig-
ure 4B). In contrast, the largest endogenous contributor to the
transcriptome of primary hepatocytes was Transferrin (Trf),
which made up only 1.6% of the mRNA (30,000 molecules per
cell). Thus, the level of AldoA at the MOI for which derepression
was observed (MOI 20 and 200), was substantially higher than
that of transcripts from any single cellular gene.
We also attempted to place the AldoA abundance within the
context of the miR-122 TA within the hepatocyte transcriptome.
A previous estimate of miRNA TA considers all of the 7 and 8 nt
sites for that miRNA within expressed 30 UTRs (Garcia et al.,
2011). This TAmight over- or underestimate the effective number
of binding sites of the transcriptome, depending on the extent to
which some of these sites are inaccessible (e.g., because they
are occluded bymRNA secondary structure or RNA binding pro-
teins) and the extent to which intracellular binding capacity is
augmented by additional sites (e.g., 6 nt sites, other marginal
sites, and nonconventional sites as well as sites in ORFs,
50 UTRs, or noncoding RNAs), many of which might add to the
effective number of binding sites without mediating repression.
Despite these uncertainties, relative TA estimates for different
miRNAs provide a useful basis for distinguishing the more effec-
tive miRNAs from the less effective ones (Garcia et al., 2011).
Our conclusion that competing MREs begin to exert their
effects as they approach themiRNA binding capacity of the tran-
scriptome provided the means to evaluate the relationship
between the previous TA estimate and the apparent number of
binding sites. When calculated as before (summing 7 and 8 nt
sites in transcriptome 30 UTRs), the miR-122 TA in hepatocytes
at Ad-AldoA MOI 2 was 1.83 105 sites per cell, which essentially
matched the threshold of added MREs required to begin to
observe derepression. The addition of 6 nt sites in the analysis
increased the number to 4.4 3 105 miR-122 sites per cell. Given
that this was still below the number of added MREs required to
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
770 Molecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier In
225
M
observe half-maximal derepression, for all additional analyses,
we considered this revised TA estimate (all 6, 7, and 8 nt sites
within the transcriptome 30 UTRs), which we define as the
apparent TA (or TAapp), as a conservative estimate of the effec-
tive number of miRNA sites.
Next, we calculated how AldoA MREs influenced the miR-122
TAapp (Figure 4C) and what fraction of the TAapp AldoA MREs
contributed (Figure 4D). Because only very highly expressed
genes could reach the levels required to affect TA, we searched
for endogenous transcripts that quantitatively contributed the
largest percentage to transcriptome TAapp. Actinb (Actb), which
contributed 5.5% of the TAapp, was the largest potential contrib-
utor to miR-122 site abundance in primary hepatocytes (Fig-
ure 4E), although this contribution was less than the 30%
contribution required for AldoA to detectably modulate miR-
122 repression (Figure 4D). When using the same approach to
estimate TAapp for let-7, miR-16, miR-33, miR-192, or each of
the next four most abundant miRNA families, the transcript
with the largest contribution to any TAapp was Albumin (Alb),
which contributed 3% of the miR-103 TAapp (Figure 4E).
As a major metabolic integrator of physiological processes,
the liver exhibits profound changes of gene regulation in
response to insulin signaling and cholesterol metabolism. To
examine whether these changes might affect miRNA TAapp, we
analyzed two models with severe pathological changes in
cholesterol metabolism (LDLR-deficient mice, Ldlr–/–) (Ishibashi
et al., 1993) and hepatic steatosis (high-fat diet [HFD] mice; Fig-
ures 4F, S4A, and Table S2) (Channon and Wilkinson, 1936). We
also examined livers that were perfused in the absence and pres-
ence of insulin, representing fasted and fed states, respectively
(Figure S4B and Table S2). In all livers studied, Alb and Trans-
thyretin (Ttr) contributed 10%–20% to TAapp. The only strong
contributor that was differentially regulated in any model was
major urinary protein 7 (Mup7), which essentially disappeared
-
0.
45
-
0.
35
-
0.
25
-
0.
15
-
0.
05
n
o
sit
e
Next 4 abun.
liver miRNA
miR-192
let-7
Bin center
(context+ score)
-1.0 -0.5 0.0 0.5 1.0 1.5
Context+ 0.00 to -0.15
Context+ -0.15 to -0.30
Context+ -0.30 to -0.45
Context+ below -0.45
No site 
**
miR-122
Fold change (log2)
-0.4
-0.2
0.0
0.2
0.4
0.6
Fo
ld
ch
an
ge
(lo
g 2
)
-0.4
-0.2
0.0
0.2
0.4
0.6
Fo
ld
ch
an
ge
(lo
g 2
)
-
0.
45
-
0.
35
-
0.
25
-
0.
15
-
0.
05
n
o
sit
e
-0.4
-0.2
0.0
0.2
0.4
0.6
Bin center
(context+ score)
Fo
ld
ch
an
ge
(lo
g 2
)
-
0.
45
-
0.
35
-
0.
25
-
0.
15
-
0.
05
n
o
sit
e
miR-33
miR-16
miR-122
Bin center
(context+ score)
0.0
0.2
0.4
0.6
0.8
1.0
****
****
****
****
miR-122
Cu
m
u
la
tiv
e
 
fra
ct
io
n
**
***
****
miR-122
****
****
****
****
miR-122
0.0
0.2
0.4
0.6
0.8
1.0
**
***
****
****
miR-122
Cu
m
u
la
tiv
e
 
fra
ct
io
n
miR-122
** ***
****
****
miR-122
1.5-1.0 -0.5 0.0 1.00.5
0.0
0.2
0.4
0.6
0.8
1.0
miR-122
Fold change (log2)
Cu
m
u
la
tiv
e
 
fra
ct
io
n
1.5-1.0 -0.5 0.0 0.5 1.0
miR-122
Fold change (log2)
M
OI
20
0
M
OI
20
M
OI
2
3s/Mut 1s/Mut 3s/1s 3s/Mut 1s/Mut 3s/1s
A B
M
OI
20
0
M
OI
20
M
OI
2
Figure 3. The Magnitude of Derepression Correlates with Predicted Site Efficacy and Number of Added AldoA MREs
(A and B) RNA-seq results showing derepression of predicted targets from primary hepatocytes infected with MOI 200, 20, and 2 of Ad-AldoA Mut, 1s, or 3s
shown in Figures 1C–1H.
(A) Predicted targets ofmiR-122 (red), miR-16 (blue), miR-33 (orange), let-7 (green), miR-192 (purple), or a combination of the next fourmost abundant liver miRNA
families (black) were grouped into ten bins based on their context+ scores. For eachmiRNA family, themedian log2 fold change is plotted for the predicted targets
in each bin. Medians were normalized to that of the bin with genes without sites. Bins each had at least ten genes; see Figure S3B for group sizes.
(B) Cumulative distributions of mRNA changes for genes with no miR-122 site (black) or predicted target genes with the indicated context+ score bins (color).
Number of genes per bin: black, 6,629; green, 1,693; orange, 434; red, 120; purple, 33. *p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001, one-sided Kolmogorov-
Smirnov (K-S) test.
See also Figure S3 and Table S1.
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
olecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier Inc. 771
c.
226
in livers of HFD mice, causing its contribution to TAapp to
decrease from 11.6% in normal livers to 0.01% in HFD
livers. Alb, the most highly expressed mRNA and the largest
potential contributor of sites for the miR-103 family, had the
potential to reduce TAapp by a maximum of only 20% when
fully silenced. Conversely, a 30% increase in target abundance
would require the most abundant liver transcript to increase
2.5-fold.
Because none of the small number of genes that alone could
alter TAapp in a consequential way appeared to do so, we tested
whether a substantial change could be achieved through collec-
tive changes of all mRNAs. Evaluation of TAapp changes for
102 103 104 105 106
107
1 10
102
103
104
105
106
1
10
AldoA
Crot
Chka
ApoM
y = 3.83·x
FPKM
Co
py
nu
m
be
rp
er
 ce
ll
Liver HFD
-
0.
45
-
0.
35
-
0.
25
-
0.
15
-
0.
05
n
o
sit
e
0.1%
1%
10%
100%
miR-122 let-7
miR-192
miR-16
miR-33
Next 4 abundant
liver miRNA
Alb
Ttr
Apoa2
Bin center
(context+ score)
2 20 200
1
10
Ant-122mm
Ad-AldoA Mut
Ad-AldoA 1s
Ad-AldoA 3s
MOI
(A
ld
oA
+
Tr
a
n
sc
rip
to
m
e
)/
(Tr
a
n
sc
rip
to
m
e
)m
iR
-
12
2
TA
ap
p
Liver Ldlr-/- / WT Liver HFD / Chow Liver Insulin / PBS
0.0
0.2
0.4
0.6
0.8
1.0
miR-122
let-7
miR-192
miR-103
miR-29
miR-21
miR-101
miR-320
miR-26
miR-423
miR-1839
miR-16
miR-33
Re
la
tiv
e
Tr
a
n
sc
rip
to
m
e
TA
ap
p
(6,
-
7-
&
8-
n
ts
ite
s)
0 2 20 200
0.1%
1%
10%
100%
MOI 
(A
ld
o
A/
Tr
a
n
sc
rip
to
m
e
)m
R
N
A
0 2 20 200
0.1%
1%
10%
100%
Ant-122mm
Ad-AldoA 1s
Ad-AldoA 3s
MOI
TA
Al
od
A 
/ T
ra
n
sc
rip
to
m
e
TA
ap
p
Prim. Hep.
MOI2 Ad-AldoA 1s
-
0.
45
-
0.
35
-
0.
25
-
0.
15
-
0.
05
n
o
sit
e
0.1%
1%
10%
100%
AldoA Alb Ttr
Actb
Bin center
(context+ score)
TA
M
ax
/ T
ra
n
sc
rip
to
m
e
TA
ap
p
Liver Chow
-
0.
45
-
0.
35
-
0.
25
-
0.
15
-
0.
05
n
o
sit
e
0.1%
1%
10%
100%
Alb
Ttr
Mup7
Apoa2
Bin center
(context+ score)
TA
M
ax
/ T
ra
n
sc
rip
to
m
e
TA
ap
p
A B C
G
D E F
Figure 4. Modest Changes in Target Abundance Are Induced by Metabolic Stress and Disease
(A) Relationship between FPKM from RNA-seq data and absolute quantification with qPCR. Represented are four genes quantified in all 11 primary hepatocyte
samples plus wild-type and Ldlr–/– liver samples. Line represents linear regression of data points. Data represent mean ± 95% confidence intervals.
(B–D) RNA-seq data fromprimary hepatocytes infectedwithMOI 200, 20, and 2 of Ad-AldoAMut, 1s, or 3s shown in Figure 1C–H. Data representmean ± SEM (B).
Contribution of AldoAmRNA to the sum of genomemRNA. Increase of transcriptomemiR-122 TAapp (C) and the respective contribution of AldoAMRE (D) to total
transcriptome miR-122 TAapp mediated by the different Ad-AldoA constructs and viral concentrations.
(E and F) Fractional contribution of the largest potential contributors to transcriptome TAapp in primary hepatocytes infected with MOI 2 of Ad-AldoA 1s (E) or in
wild-type livers (F) originated from mice either fed normal chow or high-fat diet (HFD). Potential contributors were binned by their context+ scores, and the top
potential contributors are plotted within each bin. See also Figure S4 and Table S2.
(G) Relative target abundance of livers from models of physiological (insulin) or disease/stress states (Ldlr–/– and HFD).
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
772 Molecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier In
227
M
miR-122, the next ten most abundant miRNA families in liver,
miR-33, and miR-16 revealed that TAapp values for these
miRNAs were not altered more than 25% in any physiological
or disease model, and most changes were below 10% (Fig-
ure 4G). We also calculated TAapp values for the liver samples
and primary hepatocytes infected with Ad-AldoA at MOI 20, in
which derepression was observed. Transcriptome TAapp values
ranged between 2.5–7.5 3 105 sites per cell in liver models,
and between 3.6–133 105 in primary hepatocytes (Figure S4C).
No ceRNA Effect Is Detected In Vivo
To examine the influence of AldoA MREs on target gene dere-
pression and relevant physiological endpoints in vivo, we
injected wild-type mice with 3 3 109 plaque-forming units of
Ad-AldoA and examined livers 5 days postinfection. Virally ex-
pressedGFP, and therefore adenovirus expression, was compa-
rable in all conditions (Figure 5A). Ad-AldoA increased AldoA
transcripts from 2.2 3 102 (endogenous levels) to 4.7 3 103
copies per cell (Figure 5B), introducing between 2.6 3 103 and
5.1 3 103 miR-122 MREs per cell with Ad-AldoA 1s or 3s,
respectively (Figure 5C). Overexpression of the Ad-AldoA con-
structs did not change levels of miR-122 or a control miRNA (Fig-
ure 5D). No derepression of any miR-122 target or control gene
(Snrk and Dyrk2) was observed (Figure 5E). Furthermore, we did
not detect changes in serum cholesterol levels (Figure 5F), which
decrease upon miR-122 inhibition by Ant-122 (Kru¨tzfeldt et al.,
2005). As predicted from our studies of primary hepatocytes,
these results showed that introduction of 5.1 3 103 miR-122
MREs per cell was insufficient to induce either target derepres-
sion or downstream physiological responses.
DISCUSSION
Our results support a model in which the changes in ceRNAs
must begin to approach the TA of miRNA before they can exert
a consequential effect on the repression of targets for that
miRNA. For miR-122 in hepatocytes, derepression began to be
observed at a threshold of 1.53 105 added sites per cell, a value
exceeding the physiological levels of any endogenous target as
well as the aggregate change of all predicted targets in different
disease states. Altogether, our data imply that a ceRNA effect
mediated through a single miRNA family in a physiological or dis-
ease setting of the liver is unlikely. However, we cannot exclude
the possibility that unidentified highly abundant and regulated
noncoding RNAs (including circRNAs) might substantially con-
tribute to the pool of transcriptome binding sites.
In stating that changes in endogenous targets are unlikely to
mediate a ceRNA effect that is detectable, we do not mean to
imply that there is absolutely no molecular consequence of
changing the level of an endogenous target. Large changes in
each of several dozen target genes could alter TA by 1% or
sometimes more, which would influence the repression of other
targets but not to an extent that would be detectable by our
methods. For example, an increase in TA by 5% is expected to
decrease repression of other targets by approximately 5%,
causing a target that was previously repress by 30% to now be
repressed by approximately 28.5%—a change too small to
be detected and presumably too small to be of biological
consequence.
Studying the stoichiometric relationship of an miRNA and its
TA and assessing the effect of this relationship on target gene
regulation has been challenging. Estimates of TA have proven
to be particularly difficult, given that the extent to which ineffec-
tive or marginally effective binding sites contribute to TA has
been unclear, and no experimentally determined TA values had
been obtained. Our experiments indicate that the TAapp for
miR-122 in the hepatocyte transcriptome is 4.4 3 105 sites
per cell. Although this estimate corresponds to the number
of R6 nt seed-matched sites for miR-122 in the 30 UTRs, we
do not presume that all UTR sites mediate repression. Indeed,
the TAapp is expected to exceed the number of miR-122
MREs, given that sites that bind the miRNA too transiently to
exert repression (including most sites in ORFs) would nonethe-
less contribute to TAapp.
We qualify our TA estimate as an ‘‘apparent TA’’ for two rea-
sons: first, our miR-122 TAapp is expected to be a function of
the strength of the miR-122 site that was used in its determina-
tion. The AldoA site is relatively strong (context+ score of 0.4;
Figure 4E). Had we empirically estimated the TA with a weaker
miR-122 site, more of the added sites would have been required
to approach half derepression, and thus the TAapp value would
have been correspondingly higher. Second, the endogenous
sites contribute to TAapp in proportion to their ability to sequester
the miRNA, and thus because many weak sites (ranging from
those typically classified as nonspecific sites to those that
might be more specific yet nonetheless ineffective or marginally
102
103
104
Ad-Ctrl
Co
pi
es
pe
rc
e
ll(
Al
do
A)
Gy
s1
Slc
7a
1
P4
ha
1
Nd
rg3 Sn
rk
Dy
rk2
0.5
1
2 Ad-AldoA Mut
Ad-AldoA 1s
Ad-AldoA 3s
Fo
ld
ch
an
ge
(G
e
n
e
/3
6b
4)
0.0
0.5
1.0
1.5
2.0
Fo
ld
ch
an
ge
(G
FP
/3
6b
4)
1 3 5
0
50
100
150
Days after injection
Ch
ol
e
st
er
o
l(m
g/
dl
)
Mu
t 1s 3s
0.0
0.5
1.0
1.5
2.0
miR-16 
miR-122
Fo
ld
ch
an
ge
 
(m
iR
NA
/s
n
o
RN
A2
02
)
A B C D
E F
0
102
103
104
Al
do
A
M
RE
pe
r
ce
ll
Figure 5. No ceRNA Effect Is Detected In Vivo
(A–E) Mice were injected with Ad-AldoA Mut (red, n = 6), 1s (blue, n = 6), or 3s
(green, n = 5), and gene expression analysis was performed 5 days post-
infection. Relative gene expression ofGFP (A), absolute copy numbers per cell
of AldoA (B), and added AldoA MREs (C). Relative expression of miRNAs (D)
and miR-122 target genes or control nontarget genes (Snrk and Dyrk2) (E).
(F) Plasma cholesterol levels of Ad-AldoA-treated mice at days 1, 3, and 5.
The Ad-AldoA used in this experiment expressed the full-length protein. Data
represent mean ± SEM.
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
olecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier Inc. 773
c.
228
effective) eachmake partial contributions to the TAapp, the actual
number of sites that contributed is expected to greatly exceed
the TAapp. When considering this second point, estimating a
TAapp is of greater practical value than knowing the total number
of endogenous sites that helped sequester the miRNA.
Our miR-122 TAapp was empirically derived on the premise
that using Ad-AldoA to double the effective miR-122 TA and
thereby decrease the number of encounters between miR-122
and its endogenous targets by half would lead to a correspond-
ing decrease in endogenous target repression. If the amount of
miR-122-mediated repression is not a simple linear function of
the number of encounters with its targets, then TAapp would
need to be corrected accordingly. For other miRNAs, TAapp
values were estimated starting with the miR-122 TAapp and
assuming that relative values for different miRNAs would scale
in proportion to their numbers of UTR sites—an assumption sup-
ported by studies showing that miRNA efficacy negatively corre-
lates with the relative numbers of UTR sites (Arvey et al., 2010;
Garcia et al., 2011; Mullokandov et al., 2012). Despite any uncer-
tainty arising from these simplifying assumptions, our TAapp es-
timates have the unique benefit of being founded on intracellular
experimental observations.
This experimental grounding produced TAapp values much
higher than those previously assumed. For example, previous
modeling of the quantitative relationships between miRNAs
and their targets assumed that a typical miRNA had 500 target
sites per cell (Wee et al., 2012). Modeling based on this low num-
ber of targets suggests that for moderately expressed miRNAs,
adding only 500 sites through increased ceRNA expression
could double the expression of a repressed mRNA, whereas
for more highly expressed miRNAs, many more sites would be
required to exert an effect (Mullokandov et al., 2012; Wee
et al., 2012). Our results in hepatocytes indicate that TAapp values
for the eleven most abundant miRNA families ranged from 2.53
105 to 7.5 3 105 sites, about 1,000 times greater than the value
previously assumed. This substantially revised estimate of effec-
tive TA leads to a different and somewhat simplified picture of the
potential for regulation through ceRNAs. In our model, miRNA
levels matter only in so much as the miRNA must reach a level
sufficient to repress a target mRNA. For any miRNA exceeding
this level, the potential for ceRNAs to influence repression is
simply a matter of whether the ceRNAs add or subtract enough
sites to meaningfully influence the TAapp. Because TAapp is a
function of the number of seed-matched sites in the transcrip-
tome and substantially exceeds the level of even the most highly
expressed miRNA, the ceRNA difference required to achieve
half-maximal effects is independent of the miRNA level. Thus,
our insights and results indicate that repression by even moder-
ately expressed miRNAs would be difficult to detectably change
through a ceRNA effect.
Under extreme physiological and disease conditions, target
abundances were not changed more than 10% for most miRNA
families. The maximum change of 25% was observed for
the let-7 miRNA family in mice fed an HFD versus a chow
diet. Interestingly, in this condition, a single highly expressed
gene (Mup7) accounted for 50% of the total decrease in
let-7 target abundance. A recent phase I trial for RNAi therapy
of Ttr amyloidosis reduced human TTR levels by >80% (Coelho
et al., 2013). Such a strong reduction of the TTR transcript,
which contributes 10% of the miR-192 TAapp in mouse livers,
would account for a decrease in miR-192 target abundance
analogous to that observed for Mup7 and let-7 in the HFD
versus chow diet, a change not expected to detectably affect
miRNA activity.
The conclusion that only large contributors to TAapp can de-
tectably influence the miRNA activity agrees with our in vivo ex-
periments; in normal liver, AldoA is expressed at 2.4 3 102
copies per cell and is among the thousand most highly ex-
pressed genes. Still, a 9-fold increase in transcript levels after
Ad-AldoA 3s infection, which added 5 3 103 MREs, increased
miR-122 TAapp by only 2% and therefore imparted no detectable
influence on target gene expression. Mup7 and Ttr are among
the thirty genes expressed in liver at copy numbers above 104
copies per cell, and therefore approaching within an order of
magnitude the estimated miRNA TAapp values. Hence, only
these 30 genes have potential on their own to perceptibly influ-
ence a TAapp.
Our study focused on miR-122, an unusually highly expressed
miRNA. Nonetheless, the same high threshold for detectable
target derepression was observed when miR-122 activity was
reduced, which indicated that our conclusions apply also to
more moderately expressed miRNAs. A study reporting loss of
miR-20 repression when adding high levels of target mRNA
also observed a threshold at high target expression (Mukherji
et al., 2011). As expected, their threshold disappeared when a
miR-20 sponge was used to lower miRNA activity below detec-
tion. More interestingly, they found that transfecting an miR-20
mimic increased the threshold for derepression. A possible
reason that they observed a change in threshold with a change
in miRNA, whereas we did not, is that their miR-20 mimic might
have added enough miRNA to exceed the miR-20 TAapp of their
cells. Another difference between their experiments and ours is
that their target contained bulged sites of a type that can induce
miRNA degradation (Ameres et al., 2010), which might produce
an apparent shift in the threshold.
Gene expression in the liver is profoundly regulated by circa-
dian and hormonal and nutritional states. Using livers of mice
exposed to insulin signaling and to pathological conditions of
cholesterol metabolism, we did not observe large changes in
target abundance, raising the possibility that our findings can
be generalized to other organs and disease states. Nonetheless,
during cell differentiation and in the context of malignant trans-
formation, expression of coding and noncoding RNA can change
dramatically (Rhodes and Chinnaiyan, 2005; Lujambio and
Lowe, 2012). In such biological settings conditions might arise
in which TAapp is lower than in physiological settings and/or a
single mRNA substantially contributes to target abundance.
In principle, such alterations could make the system more
amenable to ceRNA-mediated gene regulation.
EXPERIMENTAL PROCEDURES
Animal Experiments
Animals were maintained on a 12 hr light/12 hr dark cycle under a controlled
environment in a pathogen-free facility at the Institute for Molecular Systems
Biology, ETH Zu¨rich. Mice were administered adenovirus through a single
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
774 Molecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier In
229
M
tail-vein injection of 3 3 109 plaque-forming units in a final volume of 0.2 ml
diluted in PBS and killed 5 days postinjection. Antagomir was administered
through tail-vein injections on three consecutive days, and primary hepato-
cytes were isolated on day four. For high, intermediate, and low miR-122
inhibition, mice received 3 3 80, 40, and 20 mg/kg Ant-122, respectively.
Ant-122mm (control) was used at the highest concentration. All animal exper-
iments were approved by the ethics committee of the Kantonale Veterina¨ramt
Zu¨rich.
Primary Hepatocytes Isolation and Viral Infections
Primary hepatocytes of 8- to 12-week-old male C57BL/6N mice were
isolated on the basis of the method described by Zhang et al. (2012). Hepa-
tocytes were counted and plated at 300,000 cells per well in Dulbecco’s
modified Eagle’s medium low-glucose media and adenoviruses were
added in Hepatozyme media 4–6 hr after plating and harvested 24 hr post-
infection. All cells were incubated at 37C in a humidified atmosphere con-
taining 5% CO2.
Adenoviruses
Recombinant adenoviruses were generated as described in the Supple-
mental Experimental Procedures. All adenoviruses expressed GFP from an
independent promoter. Ad-Ctrl was based on the same vector backbone
(including GFP) but lacked the AldoA transgene.
Gene Expression Analysis
2 ug of total RNA was treated with the DNA-free Kit (Life Technologies) and
reverse transcribed with the High Capacity cDNA Reverse Transcription Kit
(Life Technologies). Quantitative PCR reactions were performed with the
LightCycler 480 (Roche) employing KAPA SYBR FAST qPCR Master Mix
(23) for LightCycler 480 (Kapa Biosystems) and gene-specific primer pairs
(Table S3). Relative gene expression was calculated with the ddCT method
and mouse 36b4 (Rplp0) for normalization.
miRNA Expression Analysis
150 ng of total RNA was reverse-transcribed with the TaqMan MicroRNA
Assays and Reverse Transcription Kits (Life Technologies). Quantitative PCR
reactions were performed with the LightCycler 480 employing TaqMan
Universal PCR Master Mix, No AmpErase UNG (Life Technologies), and
TaqMan MicroRNA Assays (Life Technologies). Relative miRNA expres-
sion was calculated with the ddCT method and mouse snoRNA202 for
normalization.
RNA-Seq
For single-end library construction, total RNA was depleted of rRNA with the
Ribo-Zero rRNA Removal Kit (Epicenter). RNA libraries were prepared with
the dUTP-based, Illumina-compatible NEXTflex Directional RNA-Seq Kit
(Bioo Scientific). For paired-end library construction (performed by BGI),
total RNA was enriched for poly(A) mRNA with oligo(dT) beads and treated
with buffer in order to yield 200–700 nt fragments. First-strand cDNA was
synthesized with random hexamer primers, and second-strand cDNA was
synthesized with buffer, dNTPs, RNase H, and DNA polymerase I. cDNA
was run on an Agarose gel for suitable fragment size selection followed by a
purification, adaptor ligation, and PCR amplification. All libraries (both single-
and paired-end) were sequenced with an Illumina HiSeq 2000 sequencing
machine.
ACCESSION NUMBERS
The NCBI Gene Expression Omnibus accession number for the data reported
in this paper is GSE52801.
SUPPLEMENTAL INFORMATION
Supplemental Information contains Supplemental Experimental Procedures,
four figures, and three tables and can be found with this article online at
http://dx.doi.org/10.1016/j.molcel.2014.03.045.
ACKNOWLEDGMENTS
We would like to thank M. Ravichandran and W. Johnston for technical assis-
tance as well as D. Koppstein, V. Auyeung, M. Latreille, and members of the
D.P.B. and M.S. labs for critically reviewing this manuscript. This material is
based upon work supported under a National Science Foundation Graduate
Research Fellowship (to V.A.), an ERC grant (Metabolomirs) and the NCCR
(RNA and Biology; to M.S.), and NIH grant GM067031 (to D.P.B.). D.P.B. is a
Howard Hughes Medical Institute Investigator. D.P.B. and M.S. are members
of the scientific advisory boards of Alnylam Pharmaceuticals and Regulus
Therapeutics.
Received: January 7, 2014
Revised: March 4, 2014
Accepted: March 19, 2014
Published: May 1, 2014
REFERENCES
Ala, U., Karreth, F.A., Bosia, C., Pagnani, A., Taulli, R., Le´opold, V., Tay, Y.,
Provero, P., Zecchina, R., and Pandolfi, P.P. (2013). Integrated transcriptional
and competitive endogenous RNA networks are cross-regulated in permissive
molecular environments. Proc. Natl. Acad. Sci. USA 110, 7154–7159.
Ameres, S.L., Horwich, M.D., Hung, J.H., Xu, J., Ghildiyal, M., Weng, Z., and
Zamore, P.D. (2010). Target RNA-directed trimming and tailing of small
silencing RNAs. Science 328, 1534–1539.
Arvey, A., Larsson, E., Sander, C., Leslie, C.S., and Marks, D.S. (2010). Target
mRNA abundance dilutesmicroRNA and siRNA activity.Mol. Syst. Biol. 6, 363.
Baek, D., Ville´n, J., Shin, C., Camargo, F.D., Gygi, S.P., and Bartel, D.P. (2008).
The impact of microRNAs on protein output. Nature 455, 64–71.
Bissels, U., Wild, S., Tomiuk, S., Holste, A., Hafner, M., Tuschl, T., and Bosio,
A. (2009). Absolute quantification of microRNAs by using a universal reference.
RNA 15, 2375–2384.
Brenner, J.L., Jasiewicz, K.L., Fahley, A.F., Kemp, B.J., and Abbott, A.L.
(2010). Loss of individual microRNAs causes mutant phenotypes in sensitized
genetic backgrounds in C. elegans. Curr. Biol. 20, 1321–1325.
Cesana,M., Cacchiarelli, D., Legnini, I., Santini, T., Sthandier, O., Chinappi, M.,
Tramontano, A., and Bozzoni, I. (2011). A long noncoding RNA controls muscle
differentiation by functioning as a competing endogenous RNA. Cell 147,
358–369.
Channon, H.J., and Wilkinson, H. (1936). The effect of various fats in the pro-
duction of dietary fatty livers. Biochem. J. 30, 1033–1039.
Coelho, T., Adams, D., Silva, A., Lozeron, P., Hawkins, P.N., Mant, T., Perez, J.,
Chiesa, J., Warrington, S., Tranter, E., et al. (2013). Safety and efficacy of RNAi
therapy for transthyretin amyloidosis. N. Engl. J. Med. 369, 819–829.
Ebert, M.S., Neilson, J.R., and Sharp, P.A. (2007). MicroRNA sponges:
competitive inhibitors of small RNAs in mammalian cells. Nat. Methods 4,
721–726.
Esau, C., Davis, S., Murray, S.F., Yu, X.X., Pandey, S.K., Pear, M., Watts, L.,
Booten, S.L., Graham, M., McKay, R., et al. (2006). miR-122 regulation of lipid
metabolism revealed by in vivo antisense targeting. Cell Metab. 3, 87–98.
Eulalio, A., Huntzinger, E., and Izaurralde, E. (2008). Getting to the root of
miRNA-mediated gene silencing. Cell 132, 9–14.
Fang, L., Du, W.W., Yang, X., Chen, K., Ghanekar, A., Levy, G., Yang, W., Yee,
A.J., Lu, W.Y., Xuan, J.W., et al. (2013). Versican 30-untranslated region
(30-UTR) functions as a ceRNA in inducing the development of hepatocellular
carcinoma by regulating miRNA activity. FASEB J. 27, 907–919.
Farh, K.K., Grimson, A., Jan, C., Lewis, B.P., Johnston, W.K., Lim, L.P., Burge,
C.B., and Bartel, D.P. (2005). The widespread impact of mammalian
MicroRNAs on mRNA repression and evolution. Science 310, 1817–1821.
Figliuzzi, M., Marinari, E., and De Martino, A. (2013). MicroRNAs as a selective
channel of communication between competing RNAs: a steady-state theory.
Biophys. J. 104, 1203–1213.
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
olecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier Inc. 775
c.
230
Franco-Zorrilla, J.M., Valli, A., Todesco, M., Mateos, I., Puga, M.I., Rubio-
Somoza, I., Leyva, A., Weigel, D., Garcı´a, J.A., and Paz-Ares, J. (2007).
Target mimicry provides a newmechanism for regulation of microRNA activity.
Nat. Genet. 39, 1033–1037.
Friedman, R.C., Farh, K.K.H., Burge, C.B., and Bartel, D.P. (2009). Most
mammalian mRNAs are conserved targets of microRNAs. Genome Res. 19,
92–105.
Garcia, D.M., Baek, D., Shin, C., Bell, G.W., Grimson, A., and Bartel, D.P.
(2011). Weak seed-pairing stability and high target-site abundance decrease
the proficiency of lsy-6 and other microRNAs. Nat. Struct. Mol. Biol. 18,
1139–1146.
Giraldez, A.J., Mishima, Y., Rihel, J., Grocock, R.J., Van Dongen, S., Inoue, K.,
Enright, A.J., and Schier, A.F. (2006). Zebrafish MiR-430 promotes deadenyla-
tion and clearance of maternal mRNAs. Science 312, 75–79.
Grimson, A., Farh, K.K., Johnston, W.K., Garrett-Engele, P., Lim, L.P., and
Bartel, D.P. (2007). MicroRNA targeting specificity in mammals: determinants
beyond seed pairing. Mol. Cell 27, 91–105.
Guo, H., Ingolia, N.T., Weissman, J.S., and Bartel, D.P. (2010). Mammalian
microRNAs predominantly act to decrease target mRNA levels. Nature 466,
835–840.
Hafner, M., Landthaler, M., Burger, L., Khorshid, M., Hausser, J., Berninger, P.,
Rothballer, A., Ascano, M., Jr., Jungkamp, A.C., Munschauer, M., et al. (2010).
Transcriptome-wide identification of RNA-binding protein and microRNA
target sites by PAR-CLIP. Cell 141, 129–141.
Hansen, T.B., Jensen, T.I., Clausen, B.H., Bramsen, J.B., Finsen, B.,
Damgaard, C.K., and Kjems, J. (2013). Natural RNA circles function as efficient
microRNA sponges. Nature 495, 384–388.
Ishibashi, S., Brown, M.S., Goldstein, J.L., Gerard, R.D., Hammer, R.E., and
Herz, J. (1993). Hypercholesterolemia in low density lipoprotein receptor
knockout mice and its reversal by adenovirus-mediated gene delivery.
J. Clin. Invest. 92, 883–893.
Jeyapalan, Z., Deng, Z., Shatseva, T., Fang, L., He, C., and Yang, B.B. (2011).
Expression of CD44 30-untranslated region regulates endogenous microRNA
functions in tumorigenesis and angiogenesis. Nucleic Acids Res. 39, 3026–
3041.
Jopling, C.L., Yi, M., Lancaster, A.M., Lemon, S.M., and Sarnow, P. (2005).
Modulation of hepatitis C virus RNA abundance by a liver-specific
MicroRNA. Science 309, 1577–1581.
Karreth, F.A., Tay, Y., Perna, D., Ala, U., Tan, S.M., Rust, A.G., DeNicola, G.,
Webster, K.A., Weiss, D., Perez-Mancera, P.A., et al. (2011). In vivo identifica-
tion of tumor- suppressive PTEN ceRNAs in an oncogenic BRAF-induced
mouse model of melanoma. Cell 147, 382–395.
Kru¨tzfeldt, J., Rajewsky, N., Braich, R., Rajeev, K.G., Tuschl, T., Manoharan,
M., and Stoffel, M. (2005). Silencing of microRNAs in vivo with ‘antagomirs’.
Nature 438, 685–689.
Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer,
S., Rice, A., Kamphorst, A.O., Landthaler, M., et al. (2007). A mammalian
microRNA expression atlas based on small RNA library sequencing. Cell
129, 1401–1414.
Lewis, B.P., Burge, C.B., and Bartel, D.P. (2005). Conserved seed pairing,
often flanked by adenosines, indicates that thousands of human genes are
microRNA targets. Cell 120, 15–20.
Li, X., Cassidy, J.J., Reinke, C.A., Fischboeck, S., and Carthew, R.W. (2009). A
microRNA imparts robustness against environmental fluctuation during devel-
opment. Cell 137, 273–282.
Lujambio, A., and Lowe, S.W. (2012). Themicrocosmos of cancer. Nature 482,
347–355.
Memczak, S., Jens, M., Elefsinioti, A., Torti, F., Krueger, J., Rybak, A., Maier,
L., Mackowiak, S.D., Gregersen, L.H., Munschauer, M., et al. (2013). Circular
RNAs are a large class of animal RNAs with regulatory potency. Nature 495,
333–338.
Mendell, J.T., and Olson, E.N. (2012). MicroRNAs in stress signaling and
human disease. Cell 148, 1172–1187.
Mukherji, S., Ebert, M.S., Zheng, G.X., Tsang, J.S., Sharp, P.A., and van
Oudenaarden, A. (2011). MicroRNAs can generate thresholds in target gene
expression. Nat. Genet. 43, 854–859.
Mullokandov, G., Baccarini, A., Ruzo, A., Jayaprakash, A.D., Tung, N.,
Israelow, B., Evans, M.J., Sachidanandam, R., and Brown, B.D. (2012).
High-throughput assessment of microRNA activity and function using
microRNA sensor and decoy libraries. Nat. Methods 9, 840–846.
Pillai, R.S., Bhattacharyya, S.N., Artus, C.G., Zoller, T., Cougot, N., Basyuk, E.,
Bertrand, E., and Filipowicz, W. (2005). Inhibition of translational initiation by
Let-7 MicroRNA in human cells. Science 309, 1573–1576.
Poliseno, L., Salmena, L., Zhang, J., Carver, B., Haveman, W.J., and Pandolfi,
P.P. (2010). A coding-independent function of gene and pseudogene mRNAs
regulates tumour biology. Nature 465, 1033–1038.
Rhodes, D.R., and Chinnaiyan, A.M. (2005). Integrative analysis of the cancer
transcriptome. Nat. Genet. Suppl. 37, S31–S37.
Salmena, L., Poliseno, L., Tay, Y., Kats, L., and Pandolfi, P.P. (2011). A ceRNA
hypothesis: the Rosetta Stone of a hidden RNA language? Cell 146, 353–358.
Seitz, H. (2009). Redefining microRNA targets. Curr. Biol. 19, 870–873.
Tay, Y., Kats, L., Salmena, L., Weiss, D., Tan, S.M., Ala, U., Karreth, F.,
Poliseno, L., Provero, P., Di Cunto, F., et al. (2011). Coding-independent regu-
lation of the tumor suppressor PTEN by competing endogenous mRNAs. Cell
147, 344–357.
Tsai, W.C., Hsu, P.W., Lai, T.C., Chau, G.Y., Lin, C.W., Chen, C.M., Lin, C.D.,
Liao, Y.L., Wang, J.L., Chau, Y.P., et al. (2009). MicroRNA-122, a tumor sup-
pressor microRNA that regulates intrahepatic metastasis of hepatocellular
carcinoma. Hepatology 49, 1571–1582.
Wee, L.M., Flores-Jasso, C.F., Salomon, W.E., and Zamore, P.D. (2012).
Argonaute divides its RNA guide into domains with distinct functions and
RNA-binding properties. Cell 151, 1055–1067.
Zhang, W., Sargis, R.M., Volden, P.A., Carmean, C.M., Sun, X.J., and Brady,
M.J. (2012). PCB 126 and other dioxin-like PCBs specifically suppress hepatic
PEPCK expression via the aryl hydrocarbon receptor. PLoS ONE 7, e37103.
Molecular Cell
Quantitative Evaluation of the ceRNA Hypothesis
776 Molecular Cell 54, 766–776, June 5, 2014 ª2014 Elsevier In
Appendix 3. Expanded identification and characterization of mammalian circular 
RNAs 
Junjie U. Guo1,2,3, Vikram Agarwal1,2,3,4, Huili Guo1,2,3,5 and David P. Bartel1,2,3,6 
1Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA 
2Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA 
3Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, 
USA 
4Computational and Systems Biology Program, Massachusetts Institute of Technology, 
Cambridge, MA 02139, USA 
5Present address: Institute of Molecular and Cell Biology, 61 Biopolis Drive, Proteos, 
Singapore 
V.A. performed computational analysis related to miRNA target site enrichment and 
helped devise pipeline for circRNA identification. J.U.G. performed all other 
computational analyses. H.G. generated ribosomal footprinting data. J.U.G. and D.P.B. 
designed the study and wrote the manuscript. 
Published as: 
Guo JU, Agarwal V, Guo H, Bartel DP. "Expanded identification and characterization of 
mammalian circular RNAs". 2014. Genome Biology 15(7):409. 1-14. 
231
RESEARCH Open Access
Expanded identification and characterization of
mammalian circular RNAs
Junjie U Guo1,2,3, Vikram Agarwal1,2,3,4, Huili Guo1,2,3,5,6,7 and David P Bartel1,2,3*
Abstract
Background: The recent reports of two circular RNAs (circRNAs) with strong potential to act as microRNA (miRNA)
sponges suggest that circRNAs might play important roles in regulating gene expression. However, the global
properties of circRNAs are not well understood.
Results: We developed a computational pipeline to identify circRNAs and quantify their relative abundance from
RNA-seq data. Applying this pipeline to a large set of non-poly(A)-selected RNA-seq data from the ENCODE project,
we annotated 7,112 human circRNAs that were estimated to comprise at least 10% of the transcripts accumulating
from their loci. Most circRNAs are expressed in only a few cell types and at low abundance, but they are no more
cell-type-specific than are mRNAs with similar overall expression levels. Although most circRNAs overlap protein-coding
sequences, ribosome profiling provides no evidence for their translation. We also annotated 635 mouse circRNAs, and
although 20% of them are orthologous to human circRNAs, the sequence conservation of these circRNA orthologs is
no higher than that of their neighboring linear exons. The previously proposed miR-7 sponge, CDR1as, is one of only
two circRNAs with more miRNA sites than expected by chance, with the next best miRNA-sponge candidate deriving
from a gene encoding a primate-specific zinc-finger protein, ZNF91.
Conclusions: Our results provide a new framework for future investigation of this intriguing topological isoform while
raising doubts regarding a biological function of most circRNAs.
Background
Many classes of non-protein-coding RNAs (ncRNAs) exist
in cells [1,2], and members of each class play important
roles in either regulating gene expression or other biological
processes [3-6]. For example, microRNAs (miRNAs)
pair to sites within messenger RNAs (mRNAs) to target
the mRNAs for translational repression and/or mRNA
destabilization [7]. In an intriguing elaboration of this
regulatory pathway, the activity of the mammalian miR-7
miRNA can be inhibited by CDR1as/ciRS-7, which is in
turn targeted by another miRNA, miR-671, which shows
near-perfect complementarity and triggers endonucleo-
lytic cleavage of CDR1as [8-10]. CDR1as is a circular RNA
(circRNA) deriving from an antisense transcript of the
CDR1 protein-coding gene [10]. With >60 conserved sites
for miR-7, CDR1as is thought to act as a sponge to titrate
miR-7 from its other targets [8,9]. A second circRNA
proposed to act as a sponge is the testis-specific transcript
of the male sex-determining gene Sry, which contains
16 sites for miR-138 [9]. Because circRNAs lack poly(A)
tails and 5′ termini, they would escape the deadenylation,
decapping and degradation normally caused by miRNA
association [11], an obvious advantage for an RNA acting
as a miRNA sponge [8,9].
Thousands of additional circRNAs with unknown func-
tions have been identified in various species [8,12-15].
These circRNAs are generated primarily through a type of
alternative RNA splicing called ‘back-splicing’, in which a
splice donor splices to an upstream acceptor rather than a
downstream acceptor (Figure 1A) [8,12,14,16,17]. Based
on several criteria, including their intriguing expression
patterns, their apparently elevated sequence conservation
and the compelling hypothesis that CDR1as acts as a miR-7
sponge, these circRNAs have been proposed to comprise
a large class of post-transcriptional regulators. However,
the number of additional circRNAs acting as natural
miRNA sponges is currently unclear. Indeed, the extent to
* Correspondence: dbartel@wi.mit.edu
1Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA
2Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
Full list of author information is available at the end of the article
© 2014 Guo et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain
Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,
unless otherwise stated.
Guo et al. Genome Biology 2014, 15:409
http://genomebiology.com/2014/15/7/409
AC D
F
B
G
E
Figure 1 (See legend on next page.)
Guo et al. Genome Biology 2014, 15:409 Page 2 of 14
http://genomebiology.com/2014/15/7/409
which these circular isoforms might act in any biological
capacity is not known.
To begin to consider potential roles of circRNAs in
post-transcriptional regulation, we developed a compu-
tational pipeline that identifies circRNAs from long-
read RNA-seq data without relying on gene annotations.
The pipeline resembled that reported previously [8], ex-
cept it quantifies and considers the abundance of each
circular isoform with respect to its alternative linear
isoforms. Applying this pipeline to the non-poly(A)-
selected RNA-seq data from the ENCODE project, we
catalogued >7,000 human circRNAs and characterized
their global properties, acquiring new insights regarding
their biogenesis, the cell-type specificity of their expres-
sion, the extent to which they are conserved, the extent
to which they are translated and their potential to act as
miRNA sponges.
Results
Properties of human circRNAs
To identify circRNAs from RNA-seq data, we developed
the following computational pipeline (Figure 1B). We
first mapped all the RNA-seq reads to the genome using
Bowtie in single-end mode, allowing ≤2 mismatches.
Then we used BLAT to find partial alignment of the un-
mapped reads. Dual alignments of two read segments
mapping to the genome in the reversed order were indi-
cative of circRNAs. The circular fraction (that is, the
fraction of the circular isoform relative to all transcripts
from the same locus) was quantified for each circRNA
candidate by counting relevant reads from the same
sample. We performed circRNA identification and quan-
tification using all the currently available whole-cell
non-poly(A)-selected RNA-seq data from the ENCODE
project [1], which included a large variety of cultured
cell types (Table S1A in Additional file 1). As in some
previous studies [8,14], our pipeline used the assembled
genome for sequence alignment but disregarded its an-
notations, and thus it was not affected by incomplete or
inaccurate genome annotations and was not biased in
favor of alternative isoforms of pre-mRNAs.
circRNAs produced from back-splicing would be ex-
pected to have splicing signals at their junctions. Introns
spliced by the major spliceosome usually contain the
GU dinucleotide at their 5′ end (the splice donor) and
the AG dinucleotide at their 3′ end (the splice acceptor)
[18]. Indeed, when we analyzed all the dinucleotide fre-
quencies in 10-nucleotide genomic windows mapping
to each observed circular junction, a vast majority of
candidate circular junctions contained the GT dinucleo-
tide within 5 nucleotides of the putative donor end and
the AG dinucleotide within 5 nucleotides of the putative
acceptor end (Figure 1C; Figure S1A in Additional file 2).
Moreover, a search for motifs within 10-nucleotide gen-
omic windows flanking the circular junctions recap-
tured the canonical sequence motifs of splice donors
and acceptors (Figure S1B in Additional file 2). When
considering the minority of candidates without GT-AG-
flanking junctions, no pronounced dinucleotide enrich-
ment or significant motif was observed (Figure S1A,B in
Additional file 2).
Reasoning that for biological circRNAs a higher frac-
tion of the transcript isoforms might be circular, as is
the case for CDR1as, for which almost no linear isoform
could be detected [8,9], we calculated for each candidate
the fraction of its transcript isoforms that were circular
and compared the circular fractions of groups of circRNA
candidates with different flanking dinucleotide signatures.
The circular fractions of GT-AG-flanking candidates
tended to be greater than those of the remaining candidates,
with the circular fractions of most non-GT-AG-flanking
candidates falling below 1% (Figure 1D). To test the ex-
tent to which the minor spliceosome might contribute
to circRNA formation, we examined the distribution of
circular fractions for AT-AC-flanking candidates, but
observed no difference from the other non-GT-AG-
flanking candidates (Figure 1D).
Collectively, these results indicated that back-splicing
by the major spliceosome generates most, if not all, cellu-
lar circRNAs. Candidates without these splicing signals
were more likely to have arisen from sequencing artifacts
(such as chimeric RNA-seq reads resulting from template
(See figure on previous page.)
Figure 1 Global identification of human circRNAs. (A) Schematic illustration of the alternative-splicing isoforms generated from linear splicing
(left) and back splicing (right). Two-part alignments identified junction-spanning reads indicative of circRNAs (bottom left). Exons are colored, and
donor (GU) and acceptor (AG) signals at splice sites are indicated. (B) The computational pipeline developed to identify and quantify circRNAs
from long-read RNA-seq data. (C) Enrichment of donor GT and acceptor AG splicing signals in genomic windows flanking candidate circular
junctions supported by ≥5 junction-spanning reads in the CD34 sample. Similar results were obtained from all other cell types. (D) Distribution of
circular fractions for circRNA candidates in (C), grouped based on whether their circular junctions were flanked by splicing signals of the major or
minor spliceosome (GT-AG- and AT-AC-flanking, respectively). (E) Distributions of exon numbers for circRNAs, mRNAs, and other annotated
ncRNAs. (F) Annotations of genomic regions mapping to inferred circRNA exons. CDS, coding sequence; lincRNA, long intervening ncRNA; UTR,
untranslated region. (G) Splicing within circRNAs of the CD34 sample. Mapped locations of the mates of junction-spanning reads were compared
to the genomic annotations 200 nucleotides downstream and upstream of back-spliced acceptors and donors, respectively. Because the fragment
size for the paired-end sequencing averaged 200 nucleotides, these genomic annotations resembled those expected if the introns within the
circRNAs were retained.
Guo et al. Genome Biology 2014, 15:409 Page 3 of 14
http://genomebiology.com/2014/15/7/409
switching during reverse transcription or PCR), which
justified the filter for GT-AG splicing signals imposed in
previous pipelines [8]. To maximize the specificity of
our pipeline, we carried forward only those candidates
flanked by the GT-AG splicing signals, recognizing the
possibility that a few candidates discarded by this filter
might be authentic circRNAs generated by mechanisms
that do not involve the spliceosome, as shown in Archaea
[13]. As a second quality filter, we also required that each
circRNA have a circular fraction ≥10% in two or more
samples. This requirement filtered out about two-thirds of
the circRNAs in each sample. With these filters, we an-
notated 7,112 circRNAs from 39 biological samples
representing a large variety of human cell lines (Table S2A
in Additional file 3).
Assuming that each circRNA had the same exon struc-
ture as the current GENCODE annotation at its locus, we
found that most circRNAs spanned <5 exons (Figure 1E),
with the distribution of exon abundance resembling that
reported for the other GENCODE-annotated ncRNA
genes in the human genome [2]. The distribution of
circRNA exonic sequence lengths also resembled that
of ncRNAs, with a median length of 547 nucleotides,
compared with 566 and 2,149 nucleotides for ncRNAs
and mRNAs, respectively (Additional file 4). More than
half of the circRNAs consisted of only protein-coding
exons (Figure 1F), whereas smaller fractions also con-
tained 5′ untranslated regions (UTRs), 3′ UTRs, or both.
CDR1as was among the 68 circRNAs that mapped anti-
sense to annotated protein-coding genes. Another 67 cir-
cRNAs mapped to annotated long intervening ncRNAs
(lincRNAs) [19], and 342 mapped between annotated
genes, with no sense or antisense overlap.
Because many circRNAs contained multiple exons
(Figure 1E) and previous studies have noticed retained
introns in a few circRNAs [10,15], we more systematically
examined whether introns within circRNAs were effi-
ciently removed. We started by mapping all the mate
reads of the circular junction-spanning reads in the CD34+
hematopoietic progenitor cells sample. If intra-circular spli-
cing did not occur, most of the mate reads would be
expected to map to the first upstream or downstream in-
tron from the back-spliced donor or acceptor, respectively
(Figure 1G). We found that approximately 80% of the
mates reads that did not map to the same exons as the cir-
cular junctions mapped to their neighboring exons, indicat-
ing that introns within circRNAs were usually spliced out,
although a substantial fraction (approximately 20%) were
retained (Figure 1G).
Comparison with previous circRNA catalogs
When comparing our circRNA catalog with those of pre-
vious studies, we found that most annotated circRNAs
were present in only one catalog (Additional file 5),
presumably because of differences in cell types, cutoffs
and computational pipelines. A key difference between
our catalog and those of others was our requirement
that the circRNAs have a circular fraction ≥10%, which
prompted us to examine the extent to which this filter ex-
plained the differences between our catalog and those of
others. For each catalog, we randomly selected one cell
type used to build the catalog and quantified the circular
fraction of the circRNAs identified in that cell type by the
corresponding study, using non-poly(A)-selected RNA-
seq data of that cell type. Due to our circular-fraction fil-
ter, all the circRNAs from our study had circular fractions
of ≥10% (Additional file 5). About half of the circRNAs
identified by the Memczak et al. study [8] had circular
fractions of ≥10%, whereas less than 10% of the circRNAs
from the other two studies, which used either RNase R-
treated [14] or poly(A)-depleted RNA-seq data [15] to en-
rich for circRNAs, had circular fractions ≥10%.
Trans-splicing rarely contributed to back-spliced junctions
Trans-splicing between pre-mRNAs can also give rise to
the appearance of shuffled exons [20,21], many of which
would produce sequencing reads indistinguishable from
those that we and others [8] attributed to back-spliced
products (Figure 2A). To distinguish between back-
splicing and trans-splicing, we used the approach used
previously on a smaller set of circRNAs [12]. This ap-
proach took advantage of the paired-end RNA-seq data
and examined the mate reads of the junction-spanning
reads, which for some trans-spliced products would
map beyond the genomic regions spanning the acceptors
and donors of the junction-spanning reads (Figure 2A).
Out of >6,000 mates of junction-spanning reads mapped
in the CD34+ hematopoietic progenitor cells sample, only
four (all from the ANKRD28 locus) mapped upstream of
the back-spliced acceptors, and only one (from the
ATF7IP locus) mapped downstream of the back-spliced
donors (Figure 2B,C).
Although analysis of mate reads would have identified
more trans-spliced products if many members of our
catalog were in fact trans-spliced and not circular, this
analysis presumably missed evidence of trans-splicing in
cases for which the exonic distance between the trans-
spliced acceptor and donor was too large to exclude
any mate reads, which was the case for most circRNAs
(Additional file 4). As an orthogonal approach for discrim-
inating between back-spliced and trans-spliced products
we considered their polyadenylation status [12]. Poly(A)
selection should deplete circRNAs but not trans-spliced
products, which are linear and thus expected to have poly
(A) tails (Figure 2A). Indeed, using data from U2OS cells,
which were independent of the data we used for circRNA
discovery, we found that of the 598 members of our cata-
log detected through junction-spanning reads in non-poly
Guo et al. Genome Biology 2014, 15:409 Page 4 of 14
http://genomebiology.com/2014/15/7/409
(A)-selected RNA-seq data, only 20 were detected in poly
(A)-selected RNA-seq data, as indicated by circular frac-
tions exceeding zero for only these 20 members in the
poly(A)-selected data (Figure 2D). Moreover, only six
members of our catalog were detected in the poly(A)-
selected data but not the non-poly(A)-selected data.
The 20 detected in both datasets presumably include
both trans-spliced products and circRNAs from loci that
also produce trans-spliced isoforms. These observations,
in conjunction with the lack of translation across the cir-
cular junctions (see below), indicated that trans-splicing
contributed very few (<5%) false positives in our cir-
cRNA catalog, despite a previous study reporting that
shuffled splice isoforms are predominantly trans-spliced
products [20]. We attribute our high specificity to our
use of non-poly(A)-selected samples for circRNA identi-
fication (whereas the previous report started with poly
(A)-selected samples) and our requirement that the cir-
cular fraction exceeded 10% in at least two samples.
These results are consistent with previous studies showing
that circRNAs are non-polyadenylated [12] or RNase
R-resistant [8,14].
B D
A
C
Figure 2 Trans-splicing rarely contributed to back-spliced junctions. (A) Schematic illustration of the analysis of paired-end reads used to distinguish
trans-spliced products from circRNAs. Depending on the insert size, mate reads of trans-spliced but not back-spliced junction-spanning reads could
potentially map to adjacent linear exons. Based on the insert sizes of the ENCODE paired-end RNA-seq libraries, we only considered circRNAs that were
<400 nucleotides. (B) Distances of all mapped mate reads from the acceptors (left) and donors (right). Two possible trans-spliced events are indicated.
(C) The identified trans-spliced event from the ANKRD28 locus. (D) Circular fractions of 598 circRNAs detected in non-poly(A)-selected RNA-seq data
from U2OS cells, analyzed using non-poly(A)-selected RNA-seq data (Ribo-Zero) and poly(A)-selected RNA-seq data (poly(A)+).
Guo et al. Genome Biology 2014, 15:409 Page 5 of 14
http://genomebiology.com/2014/15/7/409
Expression of circRNAs
To act as miRNA sponges or perform other non-catalytic
cellular functions, the circRNAs would need to be
expressed at consequential levels within the cell. To
infer the abundance of each circRNA we multiplied its
circular fraction by the density of RNA-seq reads arising
from the cognate gene locus (measured in fragments per
kilobase of transcript per million fragments sequenced, or
FPKM). As observed for all protein-coding genes with
FPKM ≥0.1, approximately 40% of all circRNAs annotated
from each cell type had an inferred FPKM ≥1, as illus-
trated for the CD34+ hematopoietic progenitor cells sam-
ple (Figure 3A). However, the abundances of circRNAs
tailed off much more quickly than did those of mRNAs.
For example, when considering the 562 circRNAs with in-
ferred FPKM ≥1.0, only 37 had FPKM ≥10 and none had
FPKM ≥100. As a result, our circRNAs comprised a small
fraction of the transcriptome of each sample, accounting
for an estimated 0.2 to 0.9% of all the exon-mapping reads
(Figure 3B). This range is slightly lower than a recent esti-
mate of 1% [15], presumably because most low circular-
fraction circRNAs were discarded in our analysis.
We next examined the cell type specificity of circRNA
expression. The 39 biological samples varied in the num-
ber of detectable circRNAs (Figure 3C). Although 1,500
to 3,000 circRNAs passed our cutoffs in most cell types,
some cell types (for example, HFDPCs (follicle dermal
papilla cells)) had approximately three times more cir-
cRNAs in the final catalog than others (for example,
HAoECs (thoracic aortic endothelial cells)) (Figure 3C).
This variation could not be explained by the differences
in sequencing depths (Additional file 6).
Although some circRNAs (including CDR1as) were more
ubiquitously expressed, most were found in only a few cell
types (Figure 3D). To assess whether circRNAs were any
more cell type specific than their linear counterparts, we
compared the Jensen-Shannon specificity scores [19] of
circRNAs with those of a cohort of linearly spliced exon
pairs with the same distribution of expression levels (that
is, the same distribution of total junction-spanning reads)
as the circRNA set. The expression of circular junctions
was not more cell type-specific than that of the control
cohort of linear junctions (Figure 3E), and the expression
of both was less cell type-specific than that of lincRNAs
[19]. To test whether the efficiency of circularization
might be regulated in a cell-type-specific manner, we ex-
amined the circular fractions of 1,299 circRNAs for which
the availability of both the donor and the acceptor sites
were each supported by ≥5 reads in all 39 samples. The
circular fractions of these circRNAs were nearly as corre-
lated between cell types (median Spearman’s ρ = 0.60 to
0.75) (Figure 3F) as they were between biological repli-
cates (median Spearman’s ρ = 0.75). Taken together, our
results suggested that circRNA expression is not any more
regulated than expected from the availability of the pri-
mary transcripts. We compiled a list of 57 circRNAs,
including CDR1as, for which the circular fraction was ≥50%
in most cell types in which transcript isoforms were
detected (Table S2B in Additional file 3).
To examine their subcellular localization, we quanti-
fied the circular fractions of circRNAs in each of the
subcellularly fractionated K562 samples, focusing on
the 514 circRNAs detected in the K562 whole-cell sam-
ples (Additional file 7). Consistent with previous results
on a few circRNAs [12,14], most of these circRNAs
were predominantly in the poly(A)-depleted cytoplas-
mic samples.
Conservation of circRNAs between human and mouse
Using the non-poly(A)-selected RNA-seq data from
mouse ENCODE cell lines and some other available
non-poly(A)-selected RNA-seq datasets (Table S1B in
Additional file 1), we also identified and quantified 635 ro-
bustly detectable mouse circRNAs (Additional file 8).
When analyzing human and mouse genes with clear one-
to-one orthologs, we observed that if the mouse gene had
a circRNA in our dataset, its human ortholog was likely to
also have one (66%), whereas if the mouse gene did not
have a circRNA in our dataset, the human gene was less
likely to have one (19%) (Figure 4A). The overlap of hu-
man and mouse circRNAs genes was not simply due to
similarity in exon numbers between orthologs because
the enrichment was still observed within subsets of
mouse genes grouped by exon numbers (Additional
file 9). To test whether human and mouse circRNAs arose
from orthologous exons, we used whole-genome align-
ments to identify the regions of the mouse genome that
corresponded to the human circRNAs (no longer limiting
the analysis to one-to-one orthologs) and quantified the
degree to which our mouse circRNAs overlapped these re-
gions. Among the 350 mouse circRNAs for which the
aligned human gene orthologs also had circRNAs, about a
third used the orthologous splice sites of human cir-
cRNAs (a higher rate than that previously reported
[14]), whereas the remaining two-thirds either partially
overlapped (32%) or did not overlap (31%) with aligned
human circRNA loci (Figure 4B,C). These results indi-
cated that human and mouse circRNAs were often gen-
erated not only from orthologous genes but also from
orthologous exons. The circular fractions of mouse
circRNAs (averaged across all cell types in which the
transcript was represented by both donor- and acceptor-
matching reads) were weakly yet significantly correlated
with those of their human orthologs (Spearman’s ρ = 0.30;
Figure 4D), which was slightly lower than those between
any two human cell types (typically 0.60 to 0.75).
The derivation of most circRNAs from coding exons
complicates analysis of sequence conservation that might
Guo et al. Genome Biology 2014, 15:409 Page 6 of 14
http://genomebiology.com/2014/15/7/409
provide evidence for sequence-dependent biological function
of the circular isoforms. A previous analysis of 223 cir-
cRNAs that both derive from coding exons and have
orthologous circular isoforms in mouse reported elevated
conservation levels in the third nucleotide positions of
codons when compared to a control cohort of linear cod-
ing exons that were chosen to match the conservation
levels at the first and second codon positions [8]. We were
able to reproduce these results using the previous list of
circRNAs and found that the elevated conservation at the
B
EA
C
D
F
Figure 3 Expression of human circRNAs. (A) Levels of circRNAs in CD34+ hematopoietic progenitor cells. The expression level was estimated
for each circRNA (using its circular fraction and the FPKM of the corresponding gene, which included both circular and linear isoforms) and the
cumulative distribution of levels is plotted. For comparison, the levels of mRNAs with FPKM ≥0.1 are also plotted. (B) Fractions of mRNA-mapping
reads estimated to derive from circRNAs. Reads derived from each circRNA were estimated as the product of the circular fraction, the gene FPKM
and the length of the circRNA exonic sequence. The fraction was estimated for each sample, and the distribution of fractions is plotted. (C) Numbers
of circRNAs identified in each biological sample. The number of circRNAs was tallied for each sample, and the distribution of values is plotted.
(D) Numbers of samples in which ≥10% circular fraction was observed. The number of samples with ≥10% circular fraction was tallied for each
circRNA, and the distribution of values is plotted. (E) Cumulative distribution of cell-type-specificity scores of circRNAs compared to mRNAs with
similar overall expression levels (linear controls). (F) Unsupervised hierarchical clustering of the circular fractions of 1,299 circRNAs for which the
availability of both the donor and the acceptor sites were each supported by ≥5 reads in all 39 samples.
Guo et al. Genome Biology 2014, 15:409 Page 7 of 14
http://genomebiology.com/2014/15/7/409
third codon positions was robust when compared with
1,000 different control cohorts (Figure S7A in Additional
file 10). Applying this analysis to our list of 130 human
circRNAs with mouse orthologs also indicated elevated
conservation of the third codon positions (Figure S7A in
Additional file 10). Following up on this result, we com-
pared the nucleotide conservation of coding exons within
circRNAs to their neighboring linear coding exons, rea-
soning that the neighboring linear exons would better
control for transcript expression levels as well as other
unanticipated factors that might correlate with circRNA
identification. When using these alternative controls,
we did not detect significantly elevated conservation in
the third codon positions for either the previous list of
circRNAs (Figure S7B in Additional file 10) or our new
list (Figure 4E), which argued against the notion that
sequence-dependent noncoding functions are enriched
within circRNAs.
No evidence for translation of circRNAs
The observation that most circRNAs are cytosolic [12] and
originate from protein-coding sequences raised the ques-
tion of whether they could be loaded into the ribosome
and be translated into polypeptides. Although circRNAs
are devoid of the structures typically required for efficient
translation initiation, that is, a 5′ cap and 3′ poly(A) tail,
cap-independent translation has been reported for many
linear mRNAs [22], and translation can proceed on cir-
cRNAs once initiated from an internal ribosome entry site
[23]. A few abundant circRNAs have been previously
shown to be untranslated [14]. To search systematically for
evidence of circRNA translation, we examined both ribo-
some footprinting data and non-poly(A)-selected RNA-seq
data for human U2OS cells. Of the 717 circRNAs with
RNA-seq reads spanning their circular junctions, 236
had ribosome protected fragments (RPFs) spanning the
RefSeq-annotated linear junctions at both splice sites.
A B
C D
E
Figure 4 Conservation between human and mouse circRNAs. (A) Analysis of enrichment in circRNAs from human orthologs of mouse genes
for which circRNAs were found. Only the mouse genes that had one-to-one human orthologs were considered. (B) Extent to which mouse circRNAs
align with human circRNA loci. (C) An example of conserved circRNAs, which derives from human PHF21A and mouse Phf21a loci. (D) Relationship
between average circular fractions observed for circRNAs conserved in human and mouse (n = 130). Spearman’s rank correlation coefficient is
shown. (E) Sequence conservation for the conserved circRNAs, compared with that of their neighboring exons. Distributions are of average
mammalian phyloP scores for each of the three codon positions in circular exons and their neighboring linear exons. No significant difference
was observed at any of the three positions (P > 0.1, paired Mann-Whitney test).
Guo et al. Genome Biology 2014, 15:409 Page 8 of 14
http://genomebiology.com/2014/15/7/409
Strikingly, after excluding the false-positive junction-
spanning reads arising from adjacent paralogous genes
(12 instances), no RPF reads could be found spanning
any of the remaining 224 circRNA junctions (Figure 5A),
which led to uniformly zero circular fractions; that is,
every informative RPF corresponded to the linear isoforms
(Figure 5B). Making the reasonable assumption that trans-
lation in alternative frames (which might terminate prior
to reaching the circular junction) is rare, our results
showed that, compared with their linear isoforms, most
circular isoforms are translated far less efficiently if at all
in human U2OS cells. Moreover, because trans-splicing is
unlikely to affect translational initiation, the absence of
RPFs mapping across the junctions that we classified as
circular provided additional evidence that these junctions
were indeed circular and not generated by trans-splicing.
As more ribosome profiling data become available, it will
be interesting to re-visit the question of whether some cir-
cRNAs might be translated in other cell types or species.
The potential of other circRNAs to act as miRNA sponges
To search for additional miRNA sponges that resemble
CDR1as, we considered several expected properties of
strong miRNA sponges. First, miRNA sponges would be
expected to bind many miRNA-loaded Argonaute pro-
teins. Using data from high-throughput in vivo crosslink-
ing experiments, which identified clusters of AGO2-
crosslinking sites that indicated AGO2 binding [24-26],
we compared the density of AGO2-crosslinking clusters
within exons that can form circRNAs to the density within
their neighboring linear exons. Exons that can form cir-
cRNAs did not exhibit greater cluster densities for AGO2,
with results resembling those for another RNA-binding
protein, IGF2BP1 (insulin-related growth factor 2-binding
protein 1) (Figure 6A). Similar analyses on 20 additional
RNA-binding proteins showed that circular exons gener-
ally had slightly higher cluster densities than their neigh-
boring exons (Additional file 11), which could be due to
either the circRNAs providing binding sites in addition to
those provided by the same exons in linear isoforms, or
the lack of translation of circular exons, which would pre-
vent proteins from being displaced by the translocating
ribosome. Strikingly, when counting the clusters of AGO2
crosslinks mapping to each circRNA [27], CDR1as had 26
clusters corresponding to miR-7 sites, which was by far
the most mapping to any circRNA for any miRNA family
R
PM
0 
   
   
2 
 
4
B
A
Ci
rc
u
la
r f
ra
ct
io
n
0
0.
4 
 
 
0.
8
RNA-seq     RPF
Linear junction (donor) Circular junction Linear junction (acceptor)
RNA-seq     RPF RNA-seq     RPF
RNA-seq     RPF
0 
 
 
0.
1 
 
 
0.
2 
 
 
0.
3
DonorAcceptor
AAA(N)
0 
 
2 
 
 
4
Figure 5 No evidence for translation of human circRNAs. (A) Numbers of RNA-seq and RPF reads that spanned the linear junction at the
donor end, the circular junction, and the linear junction at the acceptor end of 224 circRNAs that contained RPF reads corresponding to both
linear junctions in U2OS cells. (B) Circular fractions of 224 of the circRNAs of (A), calculated using either RNA-seq or RPF reads.
Guo et al. Genome Biology 2014, 15:409 Page 9 of 14
http://genomebiology.com/2014/15/7/409
BA
C
D
E
Figure 6 A search for additional circRNAs with the expected properties of miRNA sponges. (A) Frequency of AGO2-crosslinking clusters
observed in circRNAs compared with that of clusters observed in their neighboring exons (left). See Figure 4E for color keys. For comparison, the
analysis was repeated for a negative control, IGF2BP1 (right). No significant difference was observed between circular exons and their neighboring
exons (P > 0.1, paired Mann-Whitney test). (B) Numbers of AGO2-crosslinking clusters assigned to individual miRNA families. The number of
crosslinking clusters was tallied for each circRNA-miRNA pair, and the distribution of values is plotted. The outlying CDR1as-miR-7 pair is indicated.
(C) Numbers of 7- and 8-nucleotide sites for individual miRNA families found within each circRNA. The number of sites was tallied for each
circRNA-miRNA pair, and the distribution of values is plotted. The black curve indicates the averaged results when repeating the analysis 1,000
times using different permutations of the site sequences. The two outlying pairs are indicated. (D) Numbers of miRNA target sites in CDR1as and
top-ranking ZNF circRNAs. (E) Part of the ZNF91 locus containing the circRNA. miR-23 and miR-296 seed matches are indicated.
Guo et al. Genome Biology 2014, 15:409 Page 10 of 14
http://genomebiology.com/2014/15/7/409
(Figure 6B). No other circRNA stood out as a candidate to
act as a strong sponge for any of the other RNA-binding
proteins.
Because the AGO2-crosslinking sites were determined
in HEK293 cells, circRNAs and miRNAs not expressed
in HEK293 cells were missed by this analysis. We thus
concatenated the annotated exons within each circRNA,
and counted the number of canonical 7- and 8-nucleotide
target sites [7] for each of the 87 miRNA families con-
served across vertebrates. Again, CDR1as ranked on top,
containing 71 miR-7 sites (Figure 6C). CDR1as-miR-7 was
also the only circRNA-miRNA pair that exceeded the
upper limit of results from the negative control, in which
the analysis was repeated with permutated miRNA se-
quences (Figure 6C). We conclude that among the human
circRNAs, CDR1as stands alone as the most compelling
miRNA sponge for any conserved miRNA seed family.
Our analysis of miRNA site number also pointed to
circRNAs from the repeat-rich C2H2 zinc finger (ZNF)
gene family (Figure 6D). In particular, a circRNA gener-
ated from the ZNF91 locus (circRNA-ZNF91) contains 24
miR-23 sites (Figure 6E), 19 of which were 8-nucleotide
sites. These numbers exceeded that of the other proposed
miRNA sponge, mouse Sry, which has 16 miR-138 sites
[9]. ZNF91 belongs to a C2H2 zinc finger (ZNF) gene fam-
ily that is greatly expanded in the primate lineage and
known to contain exceptionally abundant target sites for
several miRNA families, including miR-23, miR-181 and
miR-199 [28]. The next nine ZNF circRNAs ranked by the
total number of sites for these three miRNA families had
7 to 15 sites to one of the 3 families (Figure 6D). Expand-
ing our miRNA site search beyond the 87 miRNA families
conserved beyond mammals to the 66 miRNA families
conserved only within the mammalian lineage (Figure S9A
in Additional file 12), we found that circRNA-ZNF91 had
39 additional sites for miR-296 (Figure 6E). CDR1as also
had 22 sites for the miR-876-5p/3167 family (Figure S9B
in Additional file 12), although they were not as conserved
as the miR-7 sites.
Discussion
Because molecular studies of eukaryotic RNA typically
begin with poly(A)-selection, circRNAs have often es-
caped detection and consideration. Our study adds to
previous circRNA annotation efforts [8,12,14,15] to yield
an expanded catalog of circRNAs robustly detected from
a large variety of human cell types. Our circRNA identi-
fication method resembles that previously used [8,14],
except we focused our analyses on the circRNA loci with
circular fractions ≥10%. Other recent studies take a
more targeted approach and search for back-spliced
junctions from annotated splice sites [12,15] and there-
fore miss the unannotated genes and exons, especially
those that have particularly high circular fractions and
are rarely found in the poly(A)+ RNA-seq data, such as
CDR1as. Moreover, unlike previous studies that identify
circRNAs from poly(A)-depleted RNA-seq data [14,15],
we applied our pipeline to non-poly(A)-selected RNA-
seq data, which were neither depleted nor enriched in
circRNAs or their linear isoforms. An advantage of using
these datasets is that we could directly estimate circular
fractions without experimental calibration [15].
With this catalog of 7,112 human circRNAs in hand,
the key question is whether they comprise an underap-
preciated class of molecules with cellular functions, or
whether they are largely inert side-products of imperfect
pre-mRNA splicing. The circRNA with the most com-
pelling evidence for a biological function is the miR-7
sponge, CDR1as. Although a biological context has not
yet been identified in which CDR1as loss-of-function in-
fluences miR-7 activity, this circRNA has >60 conserved
sites to miR-7 and a developmental phenotype following
its ectopic delivery [8,9]. The other circRNA proposed
to act as a miRNA sponge, mouse Sry [9], has only one
miR-138 site in its human homolog, which indicates
that the proposed sponge function is not conserved in
mammals.
What about functional potential of the other 7,000-plus
circRNAs? By characterizing the molecular abundance
and translation of circRNAs and providing an updated
perspective on their sequence conservation and potential
to act as miRNA sponges, our analyses can speak to this
question. Although we found thousands of circRNAs in
each cell type, only approximately 2% (20 to 60, depending
on the cell type) had circular fractions exceeding 50%,
which indicates that most were minor alternative isoforms
of their respective primary transcripts. Moreover, fewer
than 10% had FPKMs ≥10 in any of the 39 samples exam-
ined. Considering that in homogeneous cell types one
molecule per cell usually corresponds to an FPKM of 1 to
4 [29], most circRNAs only accumulated to a few mole-
cules per cell. This generally low circular fraction and
weak accumulation was observed despite the expectation
that each circRNA, by virtue of its exonuclease insuscepti-
bility, might persist in the cell much longer than its linear
alternative isoforms. Such low accumulation would not be
expected of molecules that titrate miRNAs or other abun-
dant regulators away from their regulatory targets. Indeed,
we find few circRNAs with the properties expected of
miRNA sponges. When circRNAs are experimentally
enriched by either poly(A)-depletion [15] or RNase R di-
gestion [14], tens of thousands of more circRNAs are
found, even when limiting the search to only those that
use annotated splice sites. Many of these low-abundance
circRNAs have zero junction-spanning reads when we
searched in the non-poly(A)-selected RNA-seq data, in
which circRNAs were neither enriched nor depleted
(Additional file 5). Perhaps it is not too far-fetched to
Guo et al. Genome Biology 2014, 15:409 Page 11 of 14
http://genomebiology.com/2014/15/7/409
speculate that all multi-exon genes generate one or
more circular isoforms at low frequencies, whereas
circularization of CDR1as is specific and efficient in all
cell types in which it is expressed.
To have a physiological effect at such low levels, cir-
cRNAs would need to either participate in a catalytic
process or interact very specifically with other molecules
that have important functions when present at very low
cellular levels. For example, mRNAs have physiological
effects when present at only a few molecules per cell be-
cause they participate in the catalytic process of transla-
tion, which can produce many protein molecules from
each mRNA molecule. However, we found that circRNAs
are rarely translated. Some linear lincRNAs are proposed
to interact with and modulate the output of a single gen-
omic locus, which would explain their physiological effect
despite their relatively low cellular abundance [5]. Like-
wise, a rare circRNA could conceivably recognize and
regulate a rare mRNA. However, a specific, high-affinity
interaction with an mRNA or other rare cellular compo-
nent would presumably rely on the circRNA sequence,
which would need to be conserved to retain its function
over evolutionary time, yet we found no evidence for cir-
cRNA sequence conservation beyond that observed for
neighboring linear exons.
We suspect that CDRas is not the only circRNA with
an evolutionarily conserved biological function. This being
said, our observations that most circRNAs 1) are ineffi-
ciently produced relative to their linear alternative iso-
forms, 2) accumulate to only low levels in the cell, and 3)
are no more conserved than their neighboring linear
exons, when considered together, suggest that most cir-
cRNAs may be inconsequential side-products of imperfect
pre-mRNA splicing. For linear alternative-spliced iso-
forms, preferential production of orthologous isoforms in
the same tissues of different species is considered evidence
of function [30,31]. For circular isoforms, this type of ana-
lysis would require non-poly(A)-selected datasets from
the same tissues of different species, which unfortunately
are not yet available. For now, the only observation con-
sistent with the idea that many circRNAs could be func-
tional is our finding that the loci that produce circRNAs
in mouse also tend to do so in humans. However, reten-
tion of circRNA production since the last common ances-
tor of mouse and human could have other causes apart
from selection for circRNA function. For example, slowed
splicing at the circRNA acceptor would presumably favor
circRNA production because it would allow for transcrip-
tion of the downstream donor, and if this slowed splicing
is conserved for reasons other than circRNA function,
then the production of circRNAs might nonetheless be
conserved. Therefore, considering the conserved produc-
tion of circRNAs as evidence against the idea that the vast
majority of circRNAs are inert splicing side-products
would require a more thorough understanding of the de-
terminants of circRNA biogenesis.
Conclusions
Mammalian cells produce a large number of circRNAs,
which have captured the interest of many biologists, par-
ticularly after the description of CDR1as and its many
conserved sites to miR-7. Our work identifies thousands
of additional circRNAs and focuses on those that have
circular fractions ≥10%. Unlike CDR1as, most of the pre-
viously and newly identified mammalian circRNAs rep-
resent alternatively spliced, low-abundance isoforms of
protein-coding genes. Expression of circRNAs is gener-
ally not more cell-type-specific than mRNAs with simi-
lar overall expression levels. Although orthologous
circRNAs were found between mouse and human, their
sequence conservation is no higher than that of their
neighboring linear exons, and no other identified cir-
cRNA is expected to function as a miRNA sponge nearly
as effectively as CDR1as. Although some circRNAs with
biological functions might exist, our results suggest that
a large majority of circRNAs are inconsequential side-
products of pre-mRNA splicing.
Materials and methods
circRNA identification and quantification
Human and mouse Ribo-Zero RNA-seq data were down-
loaded from either the ENCODE project or Gene Expres-
sion Omnibus (GEO). For each sample, Fastq reads were
first mapped to hg19 or mm9 genome by Bowtie, allowing
2 mismatches. After removing PCR-duplicated reads by
FASTX toolkit, all the unmapped reads were then aligned
by BLAT (no mismatch or gap allowed). Dual alignments
of two complimentary segments within a single read map-
ping to two regions on the same chromosome in the re-
verse order and no more than 100 kb away from each
other were selected as circular-junction candidates. Next,
GT and AG dinucleotides were searched for within 10 nu-
cleotides genomic windows flanking the donor and ac-
ceptor end of each junction, respectively. Candidates with
GT-AG-flanking junctions were carried forward, and the
GT-AG dinucleotides were used to identify the precise
splice sites. For human circRNAs, each junction required
support from at least two independent reads within the
sample.
To quantify the relative ratio of circular and linear iso-
forms, we focused on the two segments (20 nucleotides
upstream from the donor and 20 nucleotides downstream
from the acceptor) flanking the circular junction. Because
many linear isoforms may exist for a given splice site, we
took an inclusive approach and simply counted the reads
that contained either of these two sequences and have
Guo et al. Genome Biology 2014, 15:409 Page 12 of 14
http://genomebiology.com/2014/15/7/409
enough sequence space for the other sequence (ndonor and
nacceptor), and the reads that spanned the circular junction
and contained both sequences (njunction). The circular
fraction is calculated as njunction / (ndonor + nacceptor –
njunction + 1). To be accepted into the final circRNA
catalog, a circRNA candidate must have a circular frac-
tion ≥ 10% in at least two samples.
Conservation analyses
One-to-one gene ortholog tables for gene-level analysis
were downloaded from Ensembl [32]. For exon-level ana-
lysis, human circRNA junction coordinates were con-
verted to mouse (mm9) genome coordinates using the
UCSC liftOver tool, then intersected with mouse circRNA
junctions using BEDTools. To calculate the correlation of
average circular fractions of circRNA orthologs, circular
fractions of each circRNA in all cell types wherein it was
expressed (≥1 read for each of the donor and acceptor
ends) were averaged. Spearman’s rank correlation test was
performed.
Analysis of translation
Twenty-nucleotide sequences were taken from circular
junctions and each of the two linear junctions overlapping
the circular junctions (10 nucleotides from each side of
each junction). Numbers of reads containing each of these
sequences, as well as the circular fractions for each cir-
cRNA, were compared using RNA-seq and RPF data from
human U2OS cells.
miRNA and protein binding sites
PAR-CLIP data were downloaded from the GEO. After
read alignment by Bowtie, binding clusters were identified
using PARalyzer with default settings [24]. Cluster dens-
ities of all circular exons were calculated and compared to
those of their linear neighboring exons. To avoid biases,
only coding exons were considered. To quantify miRNA
targets sites, exonic segments within each circRNA were
concatenated using the transcript models built from all
ENCODE cytosolic RNA-seq data, and numbers of canon-
ical miRNA sites (7mer-A1, 7mer-m8, and 8mer sites) [7]
for the 87 miRNA families conserved across vertebrates
and 66 miRNA families conserved across mammals were
quantified for each circRNA. To estimate the distribution
of sites expected by chance, the procedure was repeated
using 1,000 cohorts consisting of 87 or 66 control k-mers.
To select a control k-mer, each 8mer site was randomly
permuted to preserve its mononucleotide composition.
Permutated sequences were chosen if they preserved the
CG dinucleotide number and possessed an A at the
3′-most nucleotide. Collectively, these constraints served
to select control k-mers with similar expected genome-
wide abundance.
Data availability
RNA-seq and RPF data of human U2OS cells have been
deposited in GEO under accession number GSE51584.
Additional files
Additional file 1: Table S1. Non-poly(A)-selected RNA-seq data used in
this study.
Additional file 2: Figure S1. Sequence characteristics of circular
junctions.
Additional file 3: Table S2. Human circRNA catalog.
Additional file 4: Figure S2. Length distribution of circRNAs.
Additional file 5: Figure S3. Comparison between circRNA
annotations.
Additional file 6: Figure S4. Relationship between number of circRNAs
detected in each sample and sequencing depth.
Additional file 7: Figure S5. Subcellular localization of circRNAs in
K562 cells.
Additional file 8: Table S3. Mouse circRNA catalog.
Additional file 9: Figure S6. Enrichment in circRNAs from human
orthologs of mouse genes for which circRNAs were found.
Additional file 10: Figure S7. Protein-coding-independent
conservation of circRNAs.
Additional file 11: Figure S8. Frequency of crosslinking clusters
observed in circRNAs compared to that of clusters observed in their
neighboring exons.
Additional file 12: Figure S9. Sites for mammal-specific miRNA families
found within each circRNA.
Abbreviations
circRNA: circular RNA; FPKM: fragments per kilobase of transcript per million
fragments sequenced; GEO: Gene Expression Omnibus; lincRNA: long
intervening non-coding RNA; miRNA: microRNA; ncRNA: non-protein-coding
RNA; RPF: ribosome protected fragment; UTR: untranslated region;
ZNF: zinc finger.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
JUG led the project and performed most of the analyses. VA contributed to
project design and performed miRNA site analyses. HG collected the U2OS
RNA-seq and RPF data. DPB supervised the project. JUG, VA and DPB wrote
the manuscript. All authors read and approved the final manuscript.
Acknowledgements
We thank C. Burge, S. Eichhorn, I. Ulitsky and O. Rissland for helpful
discussions and suggestions. This work was supported by NIH grant
GM067031 (D.P.B.), and a National Science Foundation Graduate Research
Fellowship (V.A.). J.U.G. is a Damon Runyon Fellow supported by the Damon
Runyon Cancer Research Foundation (DRG-2152-13). H.G. was supported by
the Agency for Science, Technology and Research, Singapore. D.P.B. is an
investigator of the Howard Hughes Medical Institute.
Author details
1Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA.
2Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA.
3Department of Biology, Massachusetts Institute of Technology, Cambridge,
MA 02139, USA. 4Computational and Systems Biology Program,
Massachusetts Institute of Technology, Cambridge, MA 02139, USA. 5Current
address: Institute of Molecular and Cell Biology, Singapore 138673,
Singapore. 6Current address: Department of Biological Sciences, National
University of Singapore, Singapore 117543, Singapore. 7Current address: Lee
Kong Chian School of Medicine, Nanyang Technological University-Imperial
College, Singapore 639798, Singapore.
Guo et al. Genome Biology 2014, 15:409 Page 13 of 14
http://genomebiology.com/2014/15/7/409
Received: 9 April 2014 Accepted: 29 July 2014
Published: 29 July 2014
References
1. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A,
Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA,
Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T,
Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S,
Chen X, Chrast J, Curado J: Landscape of transcription in human cells.
Nature 2012, 489:101–108.
2. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G,
Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y,
Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M,
Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J,
Guigó R: The GENCODE v7 catalog of human long noncoding RNAs:
analysis of their gene structure, evolution, and expression. Genome Res
2012, 22:1775–1789.
3. Sabin LR, Delas MJ, Hannon GJ: Dogma derailed: the many influences of
RNA on the genome. Mol Cell 2013, 49:783–794.
4. Guttman M, Rinn JL: Modular regulatory principles of large non-coding
RNAs. Nature 2012, 482:339–346.
5. Ulitsky I, Bartel DP: lincRNAs: genomics, evolution, and mechanisms.
Cell 2013, 154:26–46.
6. Batista PJ, Chang HY: Long noncoding RNAs: cellular address codes in
development and disease. Cell 2013, 152:1298–1307.
7. Bartel DP: MicroRNAs: target recognition and regulatory functions.
Cell 2009, 136:215–233.
8. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L,
Mackowiak SD, Gregersen LH, Munschauer M, Loewer A, Ziebold U,
Landthaler M, Kocks C, Ie Noble F, Rajewsky N: Circular RNAs are a large
class of animal RNAs with regulatory potency. Nature 2013, 495:333–338.
9. Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK,
Kjems J: Natural RNA circles function as efficient microRNA sponges.
Nature 2013, 495:384–388.
10. Hansen TB, Wiklund ED, Bramsen JB, Villadsen SB, Statham AL, Clark SJ,
Kjems J: miRNA-dependent gene silencing involving Ago2-mediated
cleavage of a circular antisense RNA. EMBO J 2011, 30:4414–4422.
11. Huntzinger E, Izaurralde E: Gene silencing by microRNAs: contributions of
translational repression and mRNA decay. Nat Rev Genet 2011, 12:99–110.
12. Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO: Circular RNAs are the
predominant transcript isoform from hundreds of human genes in
diverse cell types. PLoS One 2012, 7:e30733.
13. Danan M, Schwartz S, Edelheit S, Sorek R: Transcriptome-wide discovery of
circular RNAs in Archaea. Nucleic Acids Res 2012, 40:3131–3142.
14. Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF,
Sharpless NE: Circular RNAs are abundant, conserved, and associated
with ALU repeats. RNA 2013, 19:141–157.
15. Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO: Cell-type specific
features of circular RNA expression. PLoS Genet 2013, 9:e1003777.
16. Capel B, Swain A, Nicolis S, Hacker A, Walter M, Koopman P, Goodfellow P,
Lovell-Badge R: Circular transcripts of the testis-determining gene Sry in
adult mouse testis. Cell 1993, 73:1019–1030.
17. Nigro JM, Cho KR, Fearon ER, Kern SE, Ruppert JM, Oliner JD, Kinzler KW,
Vogelstein B: Scrambled exons. Cell 1991, 64:607–613.
18. Sharp PA, Burge CB: Classification of introns: U2-type or U12-type.
Cell 1997, 91:875–879.
19. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL:
Integrative annotation of human large intergenic noncoding RNAs
reveals global properties and specific subclasses. Genes Dev 2011,
25:1915–1927.
20. Al-Balool HH, Weber D, Liu Y, Wade M, Guleria K, Nam PL, Clayton J, Rowe W,
Coxhead J, Irving J, Elliott DJ, Hall AG, Santibanez-Koref M, Jackson MS:
Post-transcriptional exon shuffling events in humans can be evolutionarily
conserved and abundant. Genome Res 2011, 21:1788–1799.
21. Caudevilla C, Serra D, Miliar A, Codony C, Asins G, Bach M, Hegardt FG:
Natural trans-splicing in carnitine octanoyltransferase pre-mRNAs in rat
liver. Proc Natl Acad Sci U S A 1998, 95:12185–12190.
22. Gilbert WV: Alternative ways to think about cellular internal ribosome
entry. J Biol Chem 2010, 285:29033–29038.
23. Chen CY, Sarnow P: Initiation of protein synthesis by the eukaryotic
translational apparatus on circular RNAs. Science 1995, 268:415–417.
24. Corcoran DL, Georgiev S, Mukherjee N, Gottwein E, Skalsky RL, Keene JD,
Ohler U: PARalyzer: definition of RNA binding sites from PAR-CLIP
short-read sequence data. Genome Biol 2011, 12:R79.
25. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P,
Rothballer A, Ascano M Jr, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS,
Dewell S, Zavolan M, Tuschl T: Transcriptome-wide identification of
RNA-binding protein and microRNA target sites by PAR-CLIP.
Cell 2010, 141:129–141.
26. Kishore S, Jaskiewicz L, Burger L, Hausser J, Khorshid M, Zavolan M: A
quantitative analysis of CLIP methods for identifying binding sites of
RNA-binding proteins. Nat Methods 2011, 8:559–564.
27. Hafner M, Lianoglou S, Tuschl T, Betel D: Genome-wide identification of
miRNA targets by PAR-CLIP. Methods 2012, 58:94–105.
28. Schnall-Levin M, Rissland OS, Johnston WK, Perrimon N, Bartel DP, Berger B:
Unusually effective microRNA targeting within repeat-rich coding regions
of mammalian mRNAs. Genome Res 2011, 21:1395–1403.
29. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and
quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008,
5:621–628.
30. Merkin J, Russell C, Chen P, Burge CB: Evolutionary dynamics of gene and
isoform regulation in Mammalian tissues. Science 2012, 338:1593–1599.
31. Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ,
Slobodeniuc V, Kutter C, Watt S, Colak R, Kim T, Misguitta-Ali CM, Wilson
MD, Kim PM, Odom DT, Frey BJ, Blencowe BJ: The evolutionary landscape
of alternative splicing in vertebrate species. Science 2012, 338:1587–1593.
32. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara
GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates.
Genome Res 2009, 19:327–335.
doi:10.1186/s13059-014-0409-z
Cite this article as: Guo et al.: Expanded identification and
characterization of mammalian circular RNAs. Genome Biology
2014 15:409.
Submit your next manuscript to BioMed Central
and take full advantage of: 
• Convenient online submission
• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution
Submit your manuscript at 
www.biomedcentral.com/submit
Guo et al. Genome Biology 2014, 15:409 Page 14 of 14
http://genomebiology.com/2014/15/7/409
246
Curriculum Vitae 
Vikram Agarwal 
Education: 
Massachusetts Institute of Technology, Cambridge, MA, 2009 – 2015 
Ph.D. in Computational and Systems Biology  
Advisor: David P Bartel 
University of Texas at Austin, Austin, TX, 2005 – 2009 
B.S. in Biology: Honors 
Research Experiences: 
University of Texas at Austin, Austin, TX, 2006 – 2009 
Advisor: Z. Jeffrey Chen 
Computational characterization of miRNAs and their targets in developing cotton fibers 
University of Texas at Austin, Austin, TX, 2007 – 2008 
Advisor: John Wallingford 
RNA structural elements guide mRNA localization in Xenopus laevis 
Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 2007 
Advisor: Lincoln Stein 
Characterizing coverage and chromosomal rearrangement in the Watson genome 
Biomedical Research Institute/LSU Health Sciences Center, Shreveport, LA, 2005 
Advisor: Steven Alexander 
Immune roles in Alzheimer’s and inflammatory bowel disease 
Biomedical Research Institute/LSU Health Sciences Center, Shreveport, LA, 2004 
Advisor: Anping Chen 
Mechanism of curcumin in sensitizing human colon cancer cells 
Biomedical Research Institute/LSU Health Sciences Center, Shreveport, LA, 2003 
Advisor: Adrian Dunn 
Impact of cytokines on motor activity and appetite in mice 
Teaching Experience: 
Teaching Assistant, Foundations of Computational & Systems Biology (7.91), Spring 
2012, Massachusetts Institute of Technology 
Teaching Assistant, MIT Quantitative Biology Workshop, Independent Activities Period 
(IAP), Jan 2012 and Jan 2013, Massachusetts Institute of Technology 
247
Publications: 
Agarwal V, Subtelny AO, Jan CH, Ulitsky I, Bell GW, Bartel DP. "Evolutionary and 
quantitative models of Drosophila microRNA targeting". (In preparation). 
Wong SFL*, Agarwal V*, Mansfield JH, Denans N, Schwartz MG, Prosser HM, 
Pourquié O, Bartel DP, Tabin CJ, McGlinn E. "Independent regulation of vertebral 
number and vertebral identity by microRNA-196 paralogs". 2015. Proceedings of the 
National Academy of Sciences USA. doi: 10.1073/pnas.1512655112. 
Agarwal V, Bell GW, Nam J-W, Bartel DP. "Predicting effective microRNA target 
sites in mammalian mRNAs". 2015. eLife 4:e05005. 1-38. 
Guo JU, Agarwal V, Guo H, Bartel DP. "Expanded identification and 
characterization of mammalian circular RNAs". 2014. Genome Biology 15(7):409. 1-14. 
Denzler R, Agarwal V, Stefano J, Bartel DP, Stoffel M. "Assessing the ceRNA 
hypothesis with quantitative measurements of miRNA and target abundance". 2014. 
Molecular Cell 54(5):766-776. 
Nam J-W, Rissland OS, Koppstein D, Abreu-Goodger C, Jan CH, Agarwal V, 
Yildirim MA, Rodriguez A, Bartel DP. "Global analyses of the effect of different cellular 
contexts on microRNA targeting". 2014. Molecular Cell 53(6):1031-43. 
Pang M*, Woodward AW*, Agarwal V*, Guan X, Ha M, Ramachandran V, Chen X, 
Triplett BA, Stelly DM, Chen ZJ. "Genome-wide analysis reveals rapid and dynamic 
changes in miRNA and siRNA sequence and expression during ovule and fiber 
development in allotetraploid cotton (Gossypium hirsutum L)". 2009. Genome Biology 
10(11):R122. 1-21. 
Ha M, Pang M, Agarwal V, Chen ZJ. "Interspecies regulation of microRNAs and 
their targets". 2008. Biochim Biophys Acta 1779(11):735-742. 
*These authors contributed equally to the work and are shared co-first authors
Selected Talks: 
"Independent Regulation of Vertebral Number and Vertebral Identity by microRNA-
196 Paralogs". Jul 2014. Society for Developmental Biology 73rd Annual Meeting, 
University of Washington. Seattle, WA. 
"Predicting effective microRNA target sites in mammalian mRNAs". May 2014. 9th 
Microsymposium on Small RNAs, Institute of Molecular Biotechnology. Vienna, 
Austria. 
"Quantitative Models of Vertebrate and Drosophila MicroRNA Targeting". Oct 2013.  
Institute of Molecular Health Sciences, Swiss Federal Institute of Technology (ETH 
Zürich). Zürich, Switzerland. 
Awards/Achievements/Memberships: 
2009 – NSF Graduate Research Fellowship (GRFP) 
2008 – Barry M. Goldwater Scholarship 
2008 – University of Texas Distinguished Scholar 
2008 – Unrestricted Endowed Presidential Scholarship, UT Austin 
248