Technology Development in Mouse Genetics and Epigenetics By Chikdu Shakti Shivalila B.S. Biology The University of Pittsburgh, 2009 SUBMITTED TO THE DEPARTMENT OF BIOLOGY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2016 0 2016 Massachusetts Institute of Technology All rights reserved Signature of author: Signature redacted Chikdu Shakti Shivalila Department of Biology January 8, 2016 Certified Signature redacted Rudolf Jaenisch Professor of Biology Founding Member, Whitehead Institute Thesis Supervisor Accepted by: Signature redacted MASS-ACHUSETS INSTITUTE OF TECHNOLOGY JAN 2 7 2016 LIBRARIES ARCHIVES Michael T. Hemann Associate Professor of Biology Co-Chair, Biology Graduate Committee 2 3Technology Development in Mouse Genetics and Epigenetics By Chikdu Shakti Shivalila Submitted to the Department of Biology In partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The importance and significance of a model organism in biological research cannot be overstated. The mouse in particular has been very useful in understanding questions in many areas of research such as developmental biology, cancer biology, neuroscience and genetics. However, even though the methods to make transgenic mice and gene knockin and knockouts have been successful, they are very inefficient, labor intensive and costly. Therefore, in this thesis we developed a novel methodology to rapidly and efficiently modify the mouse genome. Using CRISPR/Cas9, a novel genome- engineering technology developed from bacteria, we were able to genetically modify mouse embryonic stem cells and make mice that carried genetic modification by zygotic injections. Using CRISPR/Cas9 we were able to make mice in as little as three weeks that contained multiple gene knockouts, single nucleotide modifications, GFP and mCherry reporter alleles, epitope-tagged alleles, and conditional alleles. Another interesting area of research in mouse genetics is epigenetic regulation, specifically how DNA methylation regulates development, gene expression, and cell state. Multiple studies have shown that this epigenetic modification plays an important regulatory role in these processes; however, the technology that has existed so far to investigate DNA methylation has only been able to look at snapshots of methylation patterns in fixed cell populations. In this thesis we have developed a novel technology named Reporter of Genomic Methylation (RGM), which allows for the investigation of methylation dynamics at single cell-resolution in vivo. The RGM technology was developed using a minimal synthetic secondary DMR promoter that drives the expression of a florescent protein. Using CRISPR/Cas9 the RGM reporter can be integrated into any genomic locus where it can report on the methylation state of its surroundings. We further show that the RGM reporter activity reflects the methylation state of non-coding regulatory elements such as promoters and enhancers. Furthermore, we show that the RGM technology allows for the dynamics of methylation and demethylation to be observed at these non-coding loci as cells transition between a pluripotent and differentiated state. Thesis Supervisor: Rudolf Jaenisch Title: Professor of Biology and Member of the Whitehead 4 5Dedication I dedicate this thesis to my family. 6Acknowledgments Nothing that is done is done alone. I think that science, like all creative pursuits, is not something that stands alone in isolation, but is rather the result of a temporal and spatial dynamic network of people that share a common interest. For this reason, there are to many people to thank, but I will try my best. MIT (2011-2015) First of all, I would like to thank my mentor Rudolf Jaenisch for giving me the opportunity to work in his laboratory for my PhD studies. Rudolf thank you for always giving me the freedom to research what I was interested in, for always giving me insightful advice on my research projects, and for always being patient with me. Rudolf you inspire in me the scientist I hope to one day become, your knowledge on so many vast areas of biology, your perpetual curiosity, your patience and understanding, your firm foundations in good experimentation, and your fortitude and flexibility in trying new things. I would also like to thank you for assembling a wonderful lab of great people, and for letting me be apart of it for my time at MIT. It has been an honor to work in your lab, and I will always have so much gratitude for the opportunities that you have given me. I would also like to thank some amazing people I have had the chance to work with in the Jaenisch lab. I would like to thank Haoyi Wang, who when I first joined the Jaenisch lab, spent a lot of time mentoring me in basic experimental procedures, such as ES cell culture and TALEN genome engineering. As I adapted to become more of a self sufficient scientist we continued to work together. Thank you Haoyi for being an awesome mentor, friend, and colleague. I would also like to thank Hui Yang and Albert Wu Cheng for being good friends and also great collaborators. I would also like to thank Yonathan Stelzer for being a great friend and colleague, it has been really fun working with you. I have tremendously enjoyed all our talks about science and non-science related stuff. I would like to thank Frank Soldner, for being a good friend, a cynical but thought provoking science critique, and great person to go out for coffee and talk to about whatever. I would also like to thank Meelad Dawlaty, for being a good friend and for all the late night lab conversations. And also thanks to everyone else who has been in the Jaenisch for the last few years. I would also like to thank my committee, Dr. David Bartel and Dr. Piyush Gupta. I really appreciate the time you have taken to give me advise on my research and for all the helpful feedback through my PhD studies. In addition, I really enjoyed your lectures that I attended during my first year here at MIT. Dr. Piyush you made computational biology interesting and fun. Dr. Bartel, I learned so much about RNA in your class, and that was the first time I ever started to think about CRISPR/Cas. University of Pittsburgh (2007-2011) I would like to thank Dr. Vernon Twombly, your lectures, lab courses and after class discussions had a big impact on me when I was deciding what to do after undergrad. In 7addition, I am really thankful for your help in finding me an undergrad laboratory where I could do research while at the University of Pittsburgh. If it was not for your mentorship I probable would not have had the opportunities that I have had so far. I would also like to thank Dr. Maria Teresa Saenz Robles. Dr. Robles I can't even express how much gratitude I have for your mentorship. I learned so much from you, both how to pursue a scientific question and the technical skill to do so. You were always so patient and kind while teaching me all kinds of things. You were such a joy to work for. I will always keep the great memories of working with you. And I will always be really thankful for the opportunities you gave me. I would also like to Thank Dr. James Pipas. Dr. Pipas, I appreciate so much the opportunity you gave me to work in your laboratory at the University of Pittsburgh. Working in your laboratory made me want to become a scientist, to do research and to discover new things. I think your excitement and fascination with biology was contagious and I got the bug.... Or virus. I will always have so much gratitude for everything you did for me. University at Albany (2005-2007) I would like to thank Keith Derbyshire for giving me my first opportunity to work in a biology lab Adirondack community college (2004-2005) I would like to thank professors Dave Hodgson (botany) and Joseph Eagan (zoology) for first introducing me to biology. Your lectures were amazing. My family (1987-present) I would like to thank my whole family, who I am today is because of every single one of you. Thank you to my Mom, Kamala, my Dad, Uli, my aunt Jammu, my uncle Nigel. My sisters: Roshi, Sodasi, Bumi and my Brothers: Osel, Pele, Hawazin, Ituri, Naya. To my mom: Thank you for always being so kind, caring and loving. And thank you for raising me, teaching me, and always encouraging me to become whoever I wanted to become. To my Dad: Thank you for teaching me to have independent thought, for all of the discussions about life, science, society etc.... Thank you for always being there when I needed you. Thank you for your patience, caring advice, and guidance. To my brothers and sisters: Thank you all, we are always there for each other. To my aunt Jammu and uncle Nigel: Thank you for all your support and kindness 8 9Table of Contents A b stract................................................................................................3 A cknow ledgem ents............................................................................... 6 T able of C ontents.....................................................................................9 C hapter 1. Introduction.............................................................................11 Part 1. The mouse as a genetic model organism Early mouse development and pluripotent cells............................12 Methods for generating genetically modified mice........................14 Types of genetic modifications...............................................17 Zygotic injections vs. Blastocyst injections................................20 Part 2. The development of programmable nucleases for genome-engineering Zinc Finger Nucleases and TALENS.........................................21 C R ISPR /C as......................................................................23 DNA repair machinery.........................................................29 Part 3. Epigenetics: DNA methylation DNA methylation..............................................................32 DNA methylation changes in mouse development........................35 Methylation and gene regulation............................................37 DNA methylation and cancer...................................................38 Technology available to study DNA methylation.........................39 Part 4. Thesis outline Locus-specific genome editing in the single-cell zygote..................41 A reporter for genomic methylation ......................................... 41 Chapter 2. One-step generation of mice carrying mutations in multiple genes targeted by CRISPR-Cas mediated genome-engineering.....................57 Chapter 3. One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome-engineering...........................91 Chapter 4. Tracing dynamic changes of DNA methylation at single-cell reso lution ............................................................................... 12 9 Chapter 5. Future directions 10 CRISPR/Cas9 genome-engineering in mice...........................................169 Further characterization and improvement of RGM.................................170 Application for RGM in imprinting, screening, and cancer.........................173 C urriculum v itae...................................................................................179 11 Chapter 1. Introduction The mouse model has been an invaluable tool for understanding mammalian biology. The mouse shares 99% of its genes with humans, which has not only allowed the mouse to be used to understand basic mammalian biology in such research areas as neuroscience, epigenetics, and development, but also to model human diseases in vivo such as cancer and neurological disorders (Waterston 2002). One of the main reasons why the mouse has been used so extensively in biological research is because it has been well established as a genetic model organism (Paigen 2003). Genetic research in mice has been facilitated by the development of methods and technology to manipulate the mouse genome. These methods consist of either transgene expression, mediated through random integration, or endogenous locus- specific genome editing, mediated through homologous recombination (Jaenisch et al., 1974; Smithies et al., 1985). However, the existing methods for mouse genome- engineering are still inefficient and can be improved by further technology development. Programmable nucleases, specifically CRISPR/Cas9, have the potential, when combined with existing methods and tools such as dsDNA- targeting vectors, ssDNA-oligos, and zygotic injection, to increase the efficiency of genome-engineering in the mouse model. Epigenetics, specifically DNA methylation, has been shown to take part in many important processes in vertebrate biology such as: imprinting, gene regulation, development, and cancer (Holliday and Pugh, 1975; Riggs 1975 Multiple technologies have been developed to investigate DNA methylation. However, all of the current methods can only take a static snapshot of methylation patterns in a cell or tissue (Stelzer 12 and Jaenisch, 2015). Methods to Investigate DNA methylation is another area of research in mouse genetics that could be improved by further technology development. A technology that could report on the dynamics of locus-specific DNA methylation when combined with CRISPR/Cas9 genome engineering would allow for a better analysis of how this important epigenetic modification helps regulate non-coding regions such as promoters, enhancers, and imprinted regions in vivo. Part 1. The mouse as a genetic model organism Early mouse development and pluripotent cells Upon fertilization the single-cell mouse zygote goes through a series divisions starting with the 2-and 4-cell-stage embryos where each cell is believed to be uniform in nature (Zernicka-Goetz et al., 2009). Upon division to the 8 and 16-cell stage, the embryo starts to specialize with cells in the center becoming the inner cell mass (ICM), and cells on the outside becoming the trophectoderm (TE). The compartmentalization of the ICM and TE is further completed in the 16-32-cell stage and early blastocyst (Zemicka-Goetz et al., 2009). Upon implantation of the late-stage blastocyst, the ICM will further differentiate to form the primitive endoderm (PE) and the epiblast (EPI). Further along the development of the embryo the EPI will form the three germ layers, the endoderm, mesoderm, and ectoderm. These three germ layers will eventually form all tissues in the adult organism (Zernicka-Goetz et al., 2009). Mouse embryonic stem cells (mESCs) are designated as pluripotent cells that can give rise to the three germ layers, endoderm, ectoderm, and mesoderm. Therefore, 13 mESCs have the developmental potential to form every cell in the adult mouse but cannot give rise to the extra embryonic tissue (Jaenisch and Young, 2008). Dr. Martin Evans showed in 1981 that mouse pluripotent cells could be extracted from the ICM, and under the right culture conditions they could be propagated indefinitely in vitro (Martin et al., 1981). Further work showed that mESCs could contribute to form chimeric mice when injected into blastocyst-stage embryos, and that these cells could also contribute to the germline (Bradley et al., 1984). The maintenance of mESCs in vitro is facilitated by the extracellular signaling molecules LIF, Wnt, and activin/nodal which are thought to maintain the necessary microenvironment to keep mESCs in a pluripotent state (Smith et al., 1988; Ogawa et al., 2006). Furthermore, mESCs are characterized by the expression of a set of core transcription factors: Nanog, Sox2, and Oct4. These master transcription factors form an autoregulatory loop that maintains their own expression and represses other transcription factors necessary for differentiation (Boyer et al., 2005). Although pluripotent mESCs are usually isolated from the ICM, work by Takahashi and Yamanaka showed that adult somatic cells could be reprogrammed into induced pluripotent stem cells (IPSCs) by viral transgenic expression of the transcription factors: Oct4, Sox2, c-Myc, and Klf4 (Takahashi et al., 2006). Further work showed that when the these IPSCs were selected for by endogenous Oct4 or Nanog expression they had an almost identical gene-expression profile compared with mESCs (Wernig et al., 2007). In addition, it was shown that when IPSCs are injected into blastocyst-stage embryos they contribute to form chimeric mice with chimerism occurring in both somatic and germline tissue (Okita et al., 2007, Maherali et al., 2007, Wernig et al., 2007). Since the initial work by Yamanaka, additional methods to induce reprogramming have been 14 established such as the use of doxycycline inducible polycistronic vectors, in vitro transcribed mRNAs of the reprogramming factors, and small-molecule inhibitors (Warren et al., 2010; Carey et al., 2009; Vidal et al., 2014). Methods for generating genetically modified mice The mouse was first adapted as a model organism for genetic studies in 1902 when Lucien Cuenot showed that coat color in mice followed Mendelian ratios (Cuenot 1902). However, the discipline of mouse genetics was not fully recognized until 1909 when C. C. Little established the first true inbred strains of mice so as to have reproducibility in genetic crosses (Paigen 2003). Early mouse genetics relied on spontaneous chemical- or radiation-induced mutations to investigate how specific genetic mutations caused phenotypic differences and abnormalities (Paigen 2003; Van der Weyden et al., 2011). This forward genetics approach was successful at isolating and cloning many important genes such as the c-kit tyrosine receptor W gene and its ligand steel (Chabot et al 1988; Brannan et al., 1991). However, the isolation of genes mutated by chemical mutagens such as ENU is cumbersome, and it was the development of transgenic technology that allowed for the isolation of genes for which when mutated affected development. In 1974 Rudolf Jaenisch and Beatrice Mintz developed the first transgenic mouse by injecting SV40 virus into mouse blastocyst-stage embryos, and then letting the blastocysts develop into pups by implanting them back into pseudo-pregnant female mice (Jaenisch et al., 1974). The integrated SV40 viral genome could be detected in multiple tissues from the adult mice that resulted from these injections. Jaenisch further showed that the Moloney Leukemia Virus (M-MuLV) could be transmitted to the 15 germline of mice when injected into pre-implantation stage embryos, and that the adult mice that developed from these embryos could pass on M-MuLV to their progeny (Jaenisch 1976). This early work was significant for two important reasons. First, it showed that foreign DNA could be integrated into the mouse genome. Second, this work showed that if a genetic modification occurred early enough in development then it could contribute to the germline and be propagated to the next generation. Following on Dr. Jaenisch's work, multiple groups developed a system of generating transgenic mice by injecting plasmid DNA into the pro-nucleus of the mouse zygote (Gordon et al., 1980; Brinster et al., 1981; Costantini and Lacy, 1981; Harbers et al., 1981; Wagner et al., 1981). The first transgene to be successfully expressed in a transgenic mouse model was the viral TK gene (Gordon et al., 1980). Follow up work by Wagner showed that the complete rabbit Beta-globin gene could be integrated into the mouse genome, and that this transgene was expressed in a tissue specific manner (Wagner et al., 1981). This early work in mouse transgenics coincided with the development of plasmids that were able to express genes in mammalian cells. The synthesis of mammalian expression plasmids was facilitated by the isolation and cloning of mammalian promoters and polyA sequences that allowed for stable gene expression in mouse and human cells. (Doyle et al., 2012). Furthermore, this early work in mouse transgenesis led to the development of other techniques that are still often used in mouse genetics. Lentiviruses were developed to express genes in mammalian cells and to create transgenic mice by infection of early blastocyst or zygote-stage embryos (Lois et al., 2002). Furthermore, gene-knockout mice were made by gene-trap experiments using lentivirus or other retroviral elements such as 16 the sleeping beauty or piggyback transposons (Perry et al., 1995; Luo et al., 1998; Dupuy et al., 2001). In addition, lentiviruses can be used to infect and express Cre-recombinase to knockout conditional alleles in adult tissues (DuPage et al., 2009). Although transgenic mice proved to be very useful, this method relied on the random integration of a transgene. Random integration into the genome can lead to the disruption of a gene coding sequence, or can separate a gene from its endogenous cis- regulatory elements, which may result in epigenetic silencing. Furthermore, this method of making transgenic mice is not very efficient for knocking out an endogenous gene because the mutation cannot be targeted to a desired locus. This method rather relies on the isolation of knockout mutants by a forward genetics approach (Doyle et al., 2012). The development of homologous recombination mediated genome editing by Capecchi and Smithies represents a major breakthrough as it allowed to predetermine the gene to be mutated. The method relies on targeting a specific genetic locus with an exogenous dsDNA-targeting vector that contains regions of homology to the locus of interest (Smithies et al., 1985; Thomas et al., 1986). The cell integrates the dsDNA-targeting vector, at a very low efficiency, into the designated locus through homologous recombination. The combination of homologous recombination and ES cell technology revolutionized our ability to generate mice with precise genetic modifications. To generate a mutant mESC clone, an exogenous dsDNA-targeting vector is introduced into the cells, and individual mESC colonies carrying the desired mutation are selected and verified initially by southern blot and later by PCR (Evans et al., 1981; Smithies et al., 1985; Thomas et al., 1986). Positive-selection antibiotic-resistant genes, such as puromycin, neomycin, and the negative-selection TK gene, greatly increase the isolation 17 of mESCs clones that contain the correct genetic modification. The antibiotic resistant cassette is added to the exogenous dsDNA-targeting vector and allows for selection against mESCs that did not become genetically modified (Doyle et al., 2012). After correctly genetically engineered mES cells have been isolated, they are injected into blastocyst-stage embryos to produce chimeric mice (Koller et al., 1989 Thompson et al., 1989). The mice are then mated to establish germline transmission of the modified ES cell clone. This strategy of genetic engineering mESCs by homologous recombination has been the basis for further technical development in mouse genetics such as the generation of reporter alleles, specific gene knockouts, epitope-tagged alleles, and conditional alleles. However, one limitation of this method is that it is time consuming, requires multiple steps, and cannot be used in most other mammalian cell types where chimera competent ES cells have not been isolated. Furthermore, this method is very inefficient, with an estimated rate of recombination ranging from one in lx106 to lx107 (Kim and Kim, 204). Types of genetic modifications One of the first and most common genetic modifications made through homologous recombination in mESCs was locus-specific gene knockouts (Zijlstra et al., 1990; Donehower et al., 1992; Rudnicki et al., 1992). Knocking out a gene through homologous recombination can be accomplished through different mechanisms. The simplest way to cause a knockout is to introduce a stop codon or frameshift mutation into the coding region of a gene. This can be accomplished through the delivery of a dsDNA- 18 targeting vector that carries a stop codon or frameshift mutation, and which targets an exon of a gene (Sage et al., 2000). Another possible mechanism is to introduce a large fragment of DNA, such as a puromycin cassette, into the coding region of a gene, thereby terminating transcription (Xiong et al., 2012). Finally, a gene knockout can be made through deleting an entire exon(s) by having the dsDNA-targeting vector's homology arms flank the region to be removed (Xiong et al., 2012). Because the efficiency of homologous recombination is so low, the sequential insertion of two different targeting vectors that express different resistant cassettes, such as hygromycin and neomycin, is required for deriving homozygous mutants by targeting both alleles. (Hudson et al., 1998) Alternatively, heterozygous knockout mice can be made and then bred for homozygosity. Single or multiple nucleotides can be exchanged through homologous recombination by using a dsDNA-targeting vector that contains the desired nucleotide differences (Wu et al., 1994). For example, this is done to change specific nucleotides that code for an amino acid that is important in the active site of an enzyme, or to introduce a mutation that activates an oncogene or deactivates a tumor suppressor. Because targeting efficiency is low, the targeting vector usually contains a selection cassette to allow for isolation of mESC colonies that have properly integrated the construct. The selection cassette can be flanked by two loxP sequences that allow for it to be removed after the mutation has been made. However, even with the removal of the selection cassette, this will not be a seamless mutation because the modified locus will still contain one loxP sequence. Another important genetic modification is the introduction of a reporter allele into an endogenous gene to monitor the gene's spatial and temporal expression (Bouabe et al., 19 2013). Reporter alleles can help address questions such as in what tissue is a specific gene expressed, and at what time during development does a specific gene turn on or become repressed. The most common reporter allele is the green florescent protein (GFP) isolated from the jellyfish Aequorea victoria or its many different synthetic variants that come in an array of florescent colors like red (mCherry), yellow (YFP), and blue (BFP) (Abe et al., 2013; Prasher et al., 1992, Srinivas et al., 2001; Tsien 1998). Reporter alleles can be introduced at the 5' or 3' terminus of a gene, so as to not disrupt its function, or they can be introduced in-frame of a coding exon, so as to both knockout the gene and report on its activity (Croxford et al., 2011). Alternatively, instead of a florescent reporter, an epitope-tag such as V5, FLAG, Myc, or Strep/Biotin can be inserted into the 5' or 3' coding region of an endogenous gene. Epitope-tags are short amino acid sequences that are recognized very strongly and specifically by their respective antibodies (Evan et al., 1985; Brizzard et al., Schmidt et al., 2007). Epitope-tags allow for the isolation, or relative quantification, of a protein when an antibody to that specific protein does not exist. Isolation or quantification of an epitope-tagged protein is important in many experimental protocols such as chromatin immunoprecipitation, mass spectrometry, ChIP-seq, western blot, biochemical assays, and immunofluorescence (Gavin et al., 2002; Ho et a., 2002, Kolodziej et al., 2009). Sometimes it is impossible to make a knockout of a specific gene in a mouse because the mutation causes embryonic lethality. This prompted the development of conditional mutant alleles. This system takes advantage of the Cre-recombinase enzyme and its recognition sequence motif (loxP) isolated from the P1 bacteriophage (Sternberg et al 1981). In this system two loxP sequences are inserted on either side of an exon. 20 Upon expression of Cre-recombinase, the loxP-flanked (floxed) exon will recombine out resulting in a gene knockout (Dawlaty et al., 2011). Cre-recombinase can be expressed under a tissue specific promoter, so that the floxed-gene will only become deleted in the desired tissue (Orban et al., 1992). Furthermore, the Cre-recombination system can allow for cell lineage tracing. In this method, the expression of GFP is dependent on Cre- mediated recombination. When Cre is expressed under a tissue- or cell-specific promoter, only the progeny from cells that expressed Cre will be labeled by GFP (Mao et al., 2001). Zygotic injections vs. Blastocyst injections Mouse transgenics is a rapid method to genetically modify mice because direct zygotic injection of a DNA construct results in low chimerism and high germline contribution (Gordon et al., 1980). However, this method does not allow for locus- specific genetic modifications (Xiong et al., 2012). Homologous recombination mediated gene targeting in mES cells allows for locus-specific genome editing. However, modified mES cells are injected into blastocyst-stage embryos which results in high chimerism and low germline contribution. (Paigen 2003). Not all mES cell injections into blastocysts will result in chimeric mice, and the efficiency of germline contribution can vary depending on the mES cell line that is used (Guo et al., 2014). Furthermore, this method requires a long time to make genetically modified mice because of the multiple steps involved in the process. Ideally the most efficient way to genetically engineer a mouse would be locus-specific genome editing in the single-cell zygote-stage embryo because it would combine the expediency obtained by transgenics with the specificity obtained by 21 homologous recombination. The use of programmable nucleases has the potential to allow for locus-specific genome editing in the single-cell zygote. Part 2. Programmable nucleases for gene targeting Zinc Finger Nucleases and TALENS Zinc Finger Nucleases (ZFNs) were the first programmable nucleases created for locus-specific genetic engineering. ZFNs are composed of two functional units, zinc finger proteins (ZFPs) and a FokI nuclease domain (Kim et al., 1996). ZFPs consist of a tandem array of C2H2 zinc fingers where each zinc finger can recognize a specific 3-bp DNA sequence motif (Tupler et al., 2001; Wolfe et al., 2000). To generate ZFNs that recognize and bind a specific DNA sequence multiple zinc fingers are arranged in one construct with the goal of creating a ZFP array that can bind to a 9-18 bps target sequence (Kim and Kim, 2014). The Fokl nuclease is a type II restriction enzyme from Flavobacterium okeanokoites, which contains two domains, a DNA-binding domain and a nuclease domain (Kim et al., 1996). To generate a programmable nuclease, the FokI nuclease domain is fused to the ZNP. Because the Foki nuclease domain has to dimerize to cleave dsDNA, two ZFNs need to be spaced 5-7 bp apart to cleave the target DNA (Kim and Kim, 2014). A complication of the ZNF technology is the difficulty of assembling ZFNs to target a specific sequence (Bae et al., 2003; Segal et al., 1999). A frequent problem is that the ZFN will not cause the desired dsDNA break because the ZFN can't bind to and cut the target sequence, or that too many off target effects cause cytotoxicity (Kim and Kim, 2014). Nevertheless, ZFNs have been very useful in 22 genetically modifying hESCs and IPSCs (Soldner et al., 2011; Yusa et al., 2011). In addition, ZFNs have recently been used to genetically engineer hematopoietic stem and progenitor cells (Wang et al., 2015). Transcription activator-like effector nucleases (TALENs) are similar to ZFNs in that both use the FokI nuclease domain as their functional unit to initiate a dsDNA break (Kim and Kim, 2014). However, unlike ZFNs, TALENs use a much more modular DNA binding domain that consists of transcription activator-like effectors (TALEs) (Mak et al., 2012; Deng et al., 2009; Boch et al., 2009). TALEs were discovered in the Xanthomonas species of bacteria and are composed of an array of 33-35 amino acid repeats (Kim and Kim, 2014). Each repeat can vary in the amino acids at position 12 and 13, which are called the repeat-variable diresidues (RVDs). Each repeat-domain depending on its RVD can recognize one of the 4 nucleotides: C, G, T, and A (Kim and Kim, 2014). Because of this single repeat to single nucleotide recognition, the repeats can be assembled to recognize any sequence of DNA. One TALEN is usually composed 12-20 RVDs that are attached to a FokI nuclease domain (Kim and Kim, 2014). Like ZFNs, a pair of TALENs must be used to target a specific genomic locus to cause a dsDNA break. Furthermore, even though TALENs are much easier to design and construct than ZFNs, they still require a complex cloning strategy called Golden Gate which is time consuming and not very user friendly (Ding et al., 2013). Despite the difficulty of assembly, TALENs have been used to efficiently edit the genome in human cells, zebrafish, Oryza sativa, Caenorhabditis elegans, and bovines (Carlson et al., 2012; Wood et al., 2011; Hockemeyer et al., 2011; Li et al., 2012). 23 CRISPR/Cas In the 1990s, bioinformatic analysis discovered long repeats of short palindromic sequences in the genomes of some Bacteria and Archaea (Hermans et al., 1991; Bult et al., 1996; Hoe et al., 1999). The short hypervariable sequences between these palindromic repeats were shown to be homologous to sequences found in bacteriophage genomes and parasitic plasmids (Bolotin et al., 2005; Pourcel et al., 2005; Mojica et al., 2005). Further computational analysis led to the hypothesis that prokaryotes that carried these repetitive loci somehow used them as a way to protect themselves against bacteriophages or parasitic plasmids (Makarova et al., 2006). Research gave experimental validation that this repetitive locus, named Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), could act as an adaptive immune system when it was shown that that the CRISPR/Cas system in Streptococcus thermophilus provided resistance against phages that infect this species (Barrangou et al., 2007). Further research showed that there are three phases of this adaptive immune response in Bacteria and Archaea, named adaption, expression, and interference (Figure 1.) (Bhaya et al., 2011). Adaption or spacer acquisition occurs when a segment of DNA from a phage or parasitic plasmid is recognized as being foreign and is incorporated in between the palindromic repeats in the CRISPR locus (Garrett et al., 2010). These integrated segments of foreign DNA are called protospacers (Deveau et al., 2008). The second stage, expression, occurs when the pre-CRISPR RNA (pre-crRNA), which contains an array of multiple palindromic repeats and protospacers, is transcribed as a single long primary transcript. The pre-crRNA is then processed to form short CRISPR RNAs (crRNAs), with each crRNA only containing one protospacer sequence (Brouns et 24 al., 2008). In the third and final stage, interference, the short crRNA guides a protein or protein complex, which contains a functional DNA nuclease, to the invading phage or parasitic plasmid. Upon recognition, it causes degradation of the phage genome or parasitic plasmid by targeting the sequence that is complementary to the crRNA protospacer sequence (Deveau et al., 2010). An additional requirement for successful targeting is the presence of a protospacer adjacent motif (PAM) directly 3' to the target protospacer sequence. Each CRISPR system recognizes a unique PAM motif, and even the same type of CRISPR system found in different species of bacteria can recognize different PAM motifs (Deveau et al., 2010). The CRISPR locus itself does not contain a PAM site directly 3' to the incorporated protospacers. Therefore, the CRISPR system cannot target its own locus, but can only recognize protospacers that are flanked 3' by a PAM motif in foreign DNA. However, the PAM motif is important for protospacer acquisition (Mojica et al., 2009). There are three main subclasses of the CRIPSR loci, Type-I, Type-II, and Type-III (Bhaya et al., 2011). All three subtypes share a common repetitive palindromic repeat sequence where new protospacers are acquired and expressed in the long primary pre-crRNA transcript (Deveau et al., 2010). Additionally, the loci of all three subtypes contain two genes which code for the proteins Casl and Cas2. These two proteins are important for protospacer acquisition (Bhaya et al., 2011). The main difference between the three subtypes is the protein or protein complex that is used to degrade the invading phage genome. The Type-I CRISPR locus contains multiple genes that code for a large multisubunit complex named CASCADE which binds the crRNA to target the phage genome (Jore et al., 2011). The Type-2 system uses a single protein called Cas9 and an 25 upstream non-coding RNA named the tracrRNA. Both of these components form a complex with the crRNA, which then functions to target and destroy the phage genome (Garneau et al., 2010). The type-3 subclass locus is even more complex and can use either a complex called the CMR, to target phage RNA, or a complex called the CSM, to target phage DNA (Bhaya et al., 2011). Vkus DNA p Cos locus Cas protens Expr I'Irl DNA devead a 10 9 878654 32 1 CRISPRarray intoderence ession Pre-crRNA VlrasNA cleaved crdNA Key - Protospacer * PAM * Repeat N Spacer Transcription start Figure 1. CRISPR/Cas adaptive immune response. Bacteria and Achaia use CRISPR as an adaptive immune response to fight against invading phages or parasitic plasmids, this process is orchestrated through three main phases. (1) Acquisition is when a new protospacer from the invading phage is acquired and inserted into the CRISPR array. This new Protospacer needs to be flanked directly 3' by a PAM sequence highlighted in red; the PAM sequence is not inserted into the CRISPR array. (2) Expression is when the Pre-crRNA is expressed as one long primary transcript with multiple crRNAs in tandem; this Pre-crRNA is then processed to form multiple functional crRNAs. (3) Interference is when the functional Cas nuclease and crRNA ribonucleoprotein complex cleaves its target sequence on either the invading phage or parasitic plasmid. Figure adopted from (Bhaya et al., 2011) 26 The type-Il CRISPR locus is the simplest of the three subtypes. Along with the pre-crRNA and tracrRNA, the type-Il locus contains only four protein coding genes, cas9, casi, cas2 and either cas4 or csn2 (Bhaya et al., 2011). Work in bacteria showed that Cas9 along with a mature crRNA and tracrRNA, was necessary for resistance to phages (Garneau et al., 2010). Seminal in vitro work by Jennifer Doudna and Emmaneulle Charpentier showed that a purified Cas9 protein from Steptococcus pyogenes, along with a mature crRNA and tracrRNA, could initiate a dsDNA break in a plasmid that contains a PAM motif and a target sequence complementary to the crRNA protospacer sequence (Jinek et al., 2012). This in vitro work showed that Cas9, along with a mature crRNA and tracrRNA, is sufficient to target and cleave dsDNA. Furthermore, it was revealed that the tracrRNA and crRNA form a stem-loop complex that binds Cas9. The crRNA then directs this ribonucleoprotein complex to bind its protospacer target sequence through Watson-Crick base complementarity (figure 2.) (Jinek et al., 2012). Upon recognition of its target sequence and PAM motif, Cas9, which contains two nuclease domains RuvC and HNH, will cleave dsDNA to form two blunt ends directly 3 nucleotides 5' to the PAM motif (Jinek et al., 2012). The PAM motif for S. Pyogenes Cas9 is 5'-NGG-3' (Jinek et al., 2012). Furthermore, this work showed that the tracrRNA and crRNA could be combined together to form a synthetic chimeric RNA designated as "gRNA". This chimeric gRNA forms the same stem-loop structure as the separate crRNA and tracrRNA, and along with Cas9 it can cleave dsDNA in vitro (Jinek et al., 2012). This work by Doudna and Charpentier suggested that because the Type-Il CRISPR/Cas system is simple, consisting of only Cas9, a tracrRNA, and a programmable crRNA or gRNA, it could be developed 27 as a technology that could be used for genome-engineering in eukaryotic cells (Jinek et al., 2012). Research by Fang Zhang and George Church further solidified CRISPR/Cas9 as a technology for genome-engineering when both labs simultaneously showed that this system could be used to edit genes in vivo, both in human and mouse cells (Cong et al., 2013; Mali et al., 2013). Both labs expressed a human codon-optimized version of Streptococcus pyogenes Cas9, and either a crRNA and tracrRNA or a chimeric gRNA, in mammalian cells, and showed that Cas9 could be directed to cleave an endogenous genomic locus that was complementary to the co-expressed crRNA or gRNA. In addition, this research indicated that CRISPR/Cas9 worked as well or even better than TALENs or Zinc Finger Nucleases when directed to the same locus (Cong et al., 2013). This in vivo work also showed that the dsDNA break caused by Cas9 could be repaired through homology-directed repair (HDR) by supplying either a ssDNA-oligo (to make a point mutations) or a dsDNA-template (to make a GFP reporter allele) (Cong et al., 2013; Mali et al., 2013). Furthermore, the CRISPR/Cas9 system only requires the expression of Cas9 and a gRNA that recognizes a 20nt genomic target sequence flanked directly 3' by the 5'NGG-3' PAM motif for successful genomic targeting. This fact makes the CRISPR/Cas9 system a much easier technology to design, construct, and implement than either TALENs or ZFNs. In addition, because Cas9 is the functional nuclease, it can be imagined that this system is very easy to multiplex to make multiple genetic modifications simultaneously by co-expressing different chimeric gRNAs along with Cas9 (Cong et al., 2013). The only limitation for CRISPR/Cas9 genome targeting is the presence of a PAM motif. Statistically, the 5'-NGG-3' PAM motif for S. pyogenes Cas9 28 should occur every 16 nucleotides. However, this is not always the case in a genomic context. A potential solution is the recent development of CRISPR/Cas9 systems from different species of bacteria with alternative PAM motifs, which allows for a greater set of possible genomic targets (Zetsche et al., 2015; Kleinstiver et al., 2015). 5'GGACATCGAT C T A 4 0 T 3' Non-complementary strand GW deI Inc, (20 :p) ' ...i I ! ! -I I Stem-40op I 1 Stem-4oop 2 Stern-loop 3 b CadQ 5C AA CAACOAT T 3 Non-complementary strand V'CICITIGITA CTIACIAIGTOQJGAGG IT TIA CIT GJA T CtCICIAC CTTT T11CTA1 IAl1TIA15' Comiplementary strand 5' Q CIIITCC A 2 T C A CT 0 G;Ila Figure 2. In vitro and in vivo Cas9 target recognition. Cas9 from the type-Il CRISPR system forms a complex with both a crRNA and tracrRNA. The 20nt guide sequence on the crRNA recognizes its target sequence through base pairing to the 20nt complementary strand directly 5' to the NGG PAM motif. The constant region of the crRNA, highlighted in orange, forms a stem loop with the tracrRNA, highlighted in green. The stem loop conformation between the crRNA and tracrRNA and the stem loops 1,2,3 in the tracrRNA helps these RNAs bind Cas9. Upon target recognition, the Cas9 will cause a blunt end dsDNA break 3nt 5' of the PAM motif. Cas9 functions in vitro and in vivo when (a) the tracrRNA and crRNA are expressed separately or (b) when the tracrRNA and crRNA are fused together to form a chimeric sgRNA or gRNA. Figure adopted from (Kim and Kim, 2014) 29 DNA repair machinery Locus-specific nucleases like CRISPR/Cas9, ZFNs, and TALENs increase the efficiency of genome-engineering because they cause a dsDNA break at the region of DNA to be modified. The cell is then forced to repair the dsDNA break. However, how the dsDNA break is repaired depends on various factors that are intrinsic to the cell, such as the time in the cell cycle when the dsDNA break is detected, and whether or not there is a segment of homologous DNA sequence present near the vicinity of the break (figure 3) (Kim and Kim, 2014). The two main repair pathways the cell uses to repair a dsDNA break are non-homologous-end joining (NHEJ) and homology-directed repair (HDR) (Kim and Kim, 2014). NHEJ is the most common form of dsDNA break repair because it does not require the presence of a homologous DNA sequence, and it can occur in all phases of the cell cycle (Rothkamm et al., 2003). During NHEJ repair, when a dsDNA break is detected, a protein called KU binds very strongly (Kd 10-9) to each of the free DNA duplex ends (Falzon et al., 1993). KU then acts as a hub to recruit all of the different factors necessary for NHEJ repair. These include the Artemis:DNA-PKcs nuclease complex, the Polp and Polk DNA polymerases, and the XLF:XRCC:DNA ligase-IV complex (Ma et al., 2004). The Artemis:DNA-PKcs nuclease complex has both a 5' and 3' endonuclease activity, hairpin opening activity, and 5' exonuclease activity (Ma et al., 2002). These different functions allow the Artemis:DNA-PKcs nuclease complex to remove or modify any damaged single stranded ends that might result from a dsDNA break (Ma et al., 2005, Yannone et al., 2008). The Polp and Polk DNA polymerases are used to fill in single-stranded DNA ends. However, PolI can also perform template- 30 independent synthesis (Lee et al., 2003; Nickmcelhinny et al., 2003). The XLF:XRCC: DNA ligase-IV complex is a very adaptable ligase complex that can ligate non- compatible ends, blunt ends, and ssDNA ends (Gu et al., 2007). There is not a specific order of recruitment of these complexes to KU, and different complexes can be recruited at the same time to the two separate KU bound DNA duplex ends (Ma et al., 2004). This fact is what leads to the random insertion or deletion of nucleotides during NHEJ repair and usually results in a random mutation once the two ends are joined (Lieber 2010). Because NHEJ randomly inserts or deletes nucleotides during the repair process, CRISPR/Cas9 can be used to knockout genes. If Cas9 is targeted to the coding region of a gene, any random insertion or deletion of In or 2n nucleotides will result in a frame- shift mutation and subsequent gene knockout if the exon is incorporated into the mRNA transcript after splicing. A cell can also repair a dsDNA break through homology-directed repair (HDR) when the cell is in S/G2 phase and there is an adequate homologous dsDNA template present (Jasin et al., 2013). HDR is initiated when there is resection of the 5' strand at each end of the dsDNA break which creates two 3' ssDNA overhangs (Forget et al., 2010). In mammals, the resection of the 5' strand is accomplished by the MRN complex along with the CtlP protein, and possibly other factors such as BRCA 1, EXO 1, and BLM (Jasin et al., 2013). The protein RAD51, which is recruited by BRCA2, binds to the 3' strand and allows for strand invasion of the homologous dsDNA template (Sugawara et al., 2003). Upon invasion, the 3' strand can act as a primer for homologous template- dependent synthesis by DNA polymerase (Forget et al., 2010). The resulting Holiday junction complex that forms can then be resolved by either non-crossover mechanism that 31 require RecQ and topoisomerase III or crossover mechanism that require Holiday junction Resolvase and DNA ligase (Jasin et al., 2013). Furthermore, there are alternative pathways of HDR called alt-NHEJ, MMEJ, or Single Strand Annealing (SSA) that require a 3' ssDNA overhang and 5' strand resection. However, rather than invasion, there is annealing across the break point (Jasin et al., 2013). HDR is the mechanism by which large pieces of DNA (such a GFP reporter alleles or Cre) can incorporated into the genome, and the alternative pathways are how ssDNA-oligos (such as epitope-tags or loxP sequences) are incorporated into the genome after a double stranded-break is created through CRISPR/Cas9 targeting. Cleavage by nucleases DSB MDR NHEJ Dnor DNA G onor DNA ssODN Figure 3. DNA repair pathways and genome-engineering Site-specific nucleases, like CRISPR/Cas, can be used for genome-engineering because they cause a dsDNA break that can be repaired either through either NHEJ or HDR. When no exogenous DNA template is supplied, the dsDNA break will be repaired through NHEJ, and will usually result in the random insertion or deletion of nucleotides at the break junction, this is ideal to knockout a gene by causing a frameshift mutation. If either ssDNA or dsDNA-targeting vector is supplied that has homology to both sides of the break- junction, this DNA targeting vector will be incorporated through HDR. This ideal to make specific genetic modification, such specific point mutations, reporter alleles, conditional alleles, and epitope tagged alleles. Figure adopted from (Kim and Kim, 2014) 32 Part 3. Epigenetics: DNA methylation. DNA methylation DNA methylation is a highly conserved epigenetic modification that attaches a methyl group to the 5-carbon position of cytosine in DNA (Feng et al., 2010). The most prevalent DNA methylation in the genome occurs at CpG dinucleotides, but methylation can also occur at CpA dinucleotides (Arand et al., 2012). DNA methylation is a conserved epigenetic modification that is found in multiple species such as bacteria, flies, mice, and humans, where it is thought to play an essential role in gene regulation and development. One indication that CpG methylation plays an important regulatory role in the mammalian genome comes from the fact that globally the human genome is depleted of CpG dinucleotides (Smith and Meissner 2013). Furthermore, almost 60-80% of the 25 million CpG dinucleotides remain methylated in almost all adult tissues (Smith and Meissner 2013). However, a segment of these CpG dinucleotides occur in dense CpG regions called CpG islands, most of which are not methylated. Although, a few are methylated depending on cell type or developmental state (Smith and Meissner 2013; Schubeler 2015). Furthermore, CpGs are also found in low density CpG regions that are not CpG islands, but can become differentially methylated depending on cell or tissue type. These differently methylated regions (DMRs) sometimes overlap with promoters and enhancer elements which is a possible indication that methylation is playing a direct role in regulating these important cis-regulatory regions (Schubeler 2015). DNMTs are a family of conserved enzymes that function to methylate DNA. In mice and humans there are 3 members: DNMT1, DNMT3A, and DNMT3B (Okano et 33 al., 1999; Li et al., 1992). In addition, there is another important non-enzymatic family member DNMT3L that is conserved between mice and humans (Chen et al., 2005). Each DNMT, except DNMT3L, can catalyze the methylation of cytosine in a CpG dinucleotide pair; however, each DNMT has a different function in this epigenetic process. DNMT1 is a maintenance methylase, which recognizes hemimethylated CpGs in dsDNA and then methylates the unmethylated cytosine to maintain the methylation pattern during DNA replication in mitosis (Avvakumov et al., 2008). DNMT1 interacts with PCNA and UHRF1, and this complex is recruited to and binds hemimethylated replicated DNA during S-phase of the cell cycle (Sharif et al., 2007; Chuang et al., 1997). DNMT3A and DNMT3B are de novo methylases and can methylate non-methylated DNA (Morgan et al., 2005). It has been shown that DNMT3A/B can interact with nucleosome remodeling complexes and histone methylases, which is exemplified during mESC differentiation when DNMT3A/B is recruited in a complex with G9A and LSH to the promoters of important ES-specific genes where it stably silences these promoters through methylation (Epsztejn-Litman et al., 2008; Myant et al., 2011). In addition, the factors TRIM28, SETDB1, and ZFP809 are known to be important for recruiting DMNT3A to silence LTRs in the genome (Wolf et al., 2009; Rowe et al., 2010; Wolf et al., 2007). DNMT3L cannot methylate DNA; however, it does play an important role of recruiting DNMT3A or DMTA3B to silence LINEI and retroviral elements in the genome (Smith and Meissner 2013). The importance of these enzymes is highlighted by the fact that both Dnmti -/- and Dnmt3b -/- knockouts are embryonic lethal and dnmt3a -/- mice die shortly after birth (Li et al., 1992; Okano et al., 1999). 34 DNA demethylation is thought to occur through two different mechanisms. The first is passive DNA demethylation in which the maintenance DNMTI is blocked and the genomic locus becomes diluted of the methyl mark through cell division (Kohli and Zhang, 2013). The second mechanism is active DNA demethylation in which the 5- methyl group on cytosine is removed by a series of enzymatic reactions without the need for cell division (Kohli and Zhang, 2013). Recently a family of enzymes called the Ten- Eleven Translocation (TET) enzymes, which consist of three family members, TETI, TET2, and TET3, were shown to be able to convert 5-methyl cytosine (5mc) to 5- hydroxymethylcytosine (5hmc) (Tahilani et al.,2009; Ito et al., 2010; Kriaucionis et al., 2009). The TET enzymes can further catalyze 5hmc to form 5-formylcytosine (5fC) and 5-carboxlcytosine (5caC) (Ito et al., 2011; He et al., 2011). TET enzymes could play an important function in either active or passive DNA demethylation. Converting 5mc to 5hmc, 5fc or 5caC could block DNMTl from properly propagating the methyl mark during cell division. Alternatively, through an active process an enzyme called TDG, which is part of the BER DNA repair pathway, could remove these modified cytosines where they would then be replaced with non-methylated cytosines (Dalton and Bellacosa 2012). Besides the role of TET enzymes in either active or passive DNA demethylation, the modification of 5mc to 5hmc, 5fC, and 5caC is thought to play a role in gene regulation and other epigenetic processes (Kohli and Zhang, 2013). Furthermore, besides the TET family of proteins, other factors have been reported to play a role in active DNA demethylation such as the protein AID from the APOBEC family of enzymes (Bhutani et al., 2010; Popp et al., 2010). 35 Methylation changes in development Methylation is a highly dynamic process during development, both globally and locally at specific genomic regions (Figure 4) (Kohli and Zhang, 2013). DNA methylation is thought to be important for committing cells to a specific lineage and making differentiation a unidirectional process (Messerschmidt et al., 2014). Upon fertilization, there is a global loss of methylation from both the paternal and maternal genomes. This epigenetic reprogramming event allows for the erasure of the germ cell- specific epigenetic signature (Santos and Dean 2004). The rate of methylation loss from the maternal and paternal genomes is not the same. Loss of methylation occurs more rapid on the paternal genome possibly due to active demethylation mediated by the TET enzymes (Oswald et al., 2000; Kohli and Zhang, 2013). The loss of methylation from the maternal genome is the result of passive demethylation during DNA replication (Messerschmidt et al., 2014). After implantation a global wave of de novo methylation establishes the genomic methylation level that is characteristic of somatic cells (Smallwood and Kelsey, 2012). During this period, low-density CpG regions that overlap with certain developmental specific cis-regulatory elements become methylated or demethylated in a cell-type and tissue-specific manner allowing for specification and directionality of somatic differentiation (Messerschmidt et al., 2014; Schubeler 2015). However, around embryonic day 7.25, a small fraction of cells migrates out of the epiblast and colonize the genital ridge, these cells establish the primordial germ cells (PGCs) which will further develop to form the sperm and oocytes after sex specification of the embryo (Smallwood and Kelsey, 2012). During PGC development there is another wave of global genomic demethylation, in all regions of the genome except LINEI and 36 retrotransposable elements (Popp et al., 2010; Sasaki et al., 2008). This global demethylation during PGC development allows for the erasure of the somatic epigenetic signature and establishment of a new germ cell-specific signature (Messerschmidt et al., 2014). There are a few very specific loci in the genome that do not become demethylated after fertilization. These regions are called imprinted germline differentially methylated regions (gDMRs). Maternal imprinted gDMRs are methylated in the oocyte and paternal imprinted gDMRs are methylated in the sperm (Smallwood and Kelsey, 2012). After fertilization these imprinted regions retain their parent-of-origin methylation pattern. These imprinted DMRs are CpG islands and are often imprint-control regions that control the imprinting of gene clusters next to them (Wutz et al. 1997; Thorvaldsen et al. 1998; Fitzpatrick et al. 2002; Lin et al. 2003). One example of this is the Prader- Willi/Angelman region which is a maternally imprinted gDMR. This region has a primary imprinted gDMR termed PWS-SRO which can regulate the imprinting of nearby secondary DMRs in the promoters of cis-proximal genes, such as SNRPN and UBE3 (Yang et al., 1998; Edwards and Ferguson-Smith 2007). 37 -- Female M11 Embryo 16- it BlatocystPGC. Prmr CM TE oocyto IP (f 4Uitotic arrest * A Proliferation AUMeiomsIog M, a n'iotic Figure 4. DNA methylation dynamics during development. DNA methylation is a highly dynamic process during development. During development of the embryo primordial germ cells, PGCs, start to proliferate and migrate to form the genital ridge. As PGCs migrate, there is a progressive global loss of methylation from their genome. As the PGCs develop into either mature spermatogonia or oocytes, there is a wave of global genomic methylation. After fertilization of the sperm and egg and as the single-cell zygote develops to form the blastocyst, there is another loss of global genome methylation. However, a few loci in the genome do not become demethylated, these are called imprinted regions, and they retain their parent-of-origin methylation patterns. Finally, as the blastocyst develops to form the complete organism, there is further genomic methylation. This methylation event is tissue and cell-type specific, where it plays an important mechanism is gene regulation, development and unidirectional differentiation. Figure adopted from (Smallwood and Kelsey, 2012) Methylation and gene regulation Low CpG density promoters are regulated by methylation. In contrast, most CpG island-associated promoters are not regulated by DNA methylation; however, a subset of CpG islands associated promoters can be switched on and off by this epigenetic modification (Smith and Meissner, 2013). In addition, many repetitive elements, such as LINEI and ERVs, are silenced through methylation of their respective promoters (Liang et al., 2002). Silencing of promoters through methylation coincides with changes in 38 histone modifications; specifically, the H3K9 methylation mark is highly associated with silenced promoters (Ayyanathan et al., 2003). The promoter for the Oct4 (pou5fl) gene, which is a master transcription factor for the pluripotent cell state, is a good example of a promoter that is regulated by DNA methylation and H3K9 methylation. (Athanasiadou et al., 2010). Upon differentiation and exit from pluripotency, the protein G9A is recruited to the Oct4 promoter where it initiates H3K9 methylation. This histone modification is followed by recruitment of the heterochromatin protein 1 (HP1). Finally, DNMT3A or DNMT3B are recruited to de novo methylate the promoter allowing for stable epigenetic silencing (Feldman et al., 2006; Athanasiadou et al., 2010). In addition to promoters, it is thought that the activity of other cis-regulatory regions, such as enhancers that overlap with low density CpG regions, can be regulated through DNA methylation (Schubeler 2015). Hypomethylation of enhancers has been correlated with gene expression changes. This is especially true in some cancer cell lines where it has been shown that enhancer methylation for certain genes is a better predictive indicator of a gene's activity than promoter methylation (Aran et al., 2013, Sandovici et al., 2011, Messerchmidt et al., 2014). DNA methylation and cancer Aberrant methylation is associated with many different forms of cancer. Hypermethylation and subsequent silencing of genes that are important in DNA repair, cell-cycle regulation, apoptosis, and tumor cell invasion occur in many different types of cancer (Laird et al., 1995; Robertson 2005). Furthermore, the loss of imprinting (LOI) of some genes can cause or be associated with different cancer types. One important 39 example of this is the gene IGF2 which is an imprinted gene that is expressed from the paternal allele (Barlow et al. 1991). Loss of methylation on the maternal allele can cause overexpression of IGF2, and can lead to cancer in multiple tissue types such as the lung, liver, and colon (Moulton et al., 1994; Steenman et al 1994). In addition, LOI can also occur when the normally expressed imprinted allele of a tumor suppressor gene becomes silenced (Robertson 2005). Examples include loss of expression of kinase inhibitor IC, CDKN 1 C, in Wilms' tumor, and RAS-related gene, DIRAS3 in colon cancer (Feinberg et al., 2002; Thompson et al., 1996). Besides the aberrant regulation of tumor suppressor genes and oncogenes by methylation, mutations in many of the core proteins that have a role in methylation dynamics are also found in cancer. A fusion of the MLL gene with TETI is found in some patients with myloid leukemia and mutations in the gene TDG occur in various cancer types (Ono et al., 2002; Dalton and Bellacosa 2012). Technology available to study DNA methylation One of the first experiments to show that DNA methylation occurred in eukaryotes made use of methylation-sensitive restriction enzymes (Bird and Southern, 1978; Cedar et al., 1979). Certain restriction enzymes cannot cut methylated DNA. Therefore, these enzymes can be used to distinguish if their respective sites are methylated in genomic DNA. This method was further improved when it was coupled with HPLC or mass spectrometry which allowed for relative total levels of methylated and non-methylated cytosine to be compared across different tissue or cell types (Gama- Sosa et al., 1983; Bestor et al., 1984). This method works well if the aim is to compare 40 the overall methylated cytosine content; however, it does not give any sequence specific- information. To investigate base-resolution methylation patterns, a method termed bisulfate sequencing was developed (Herman et al., 1996; Frommer et al., 1992; Oakeley et al., 1997). This method makes use of the chemical sodium bisulfate which specifically deaminates non-methylated cytosine to form uracil; however, when the 5-carbon of cytosine is methylated, this reaction does not proceed. Therefore, treating total genomic DNA isolated from a cell or tissue in vitro with sodium bisulfate will convert all non- methylated cytosine into uracil and all methylated cytosine will remain protected. Sodium bisulfate treatment can be followed by PCR to gain information on the base-resolution methylation pattern at any genomic locus that is able to be PCR amplified (Frommer et al., 1992). However, primer design for bisulfate sequencing can be difficult because the primers cannot contain CpG dinucleotides which would bias the reaction. Finally, methods have been developed to investigate full-genome methylation patterns. Methylated-cytosine-specific antibodies were combined with microarray technology providing whole genome information on methylation albeit only with low resolution (Gitan et al., 2002). New methods that couple bisulfate treatment with modern sequencing technologies have now allowed for the establishment of full-genome base- resolution methylation maps for many different species including human (Cokus et al., 2008; Lister et al., 2009). Furthermore, a new initiative, the Roadmap Epigenomics consortium, has recently published the full-genome methylation maps in multiple different human and mouse cell types (Smith et al., 2014; Roadmap epigenomics Consortium et al., 2015, Schulz et al., 2015). 41 Part 4: Thesis outline Locus-specific genome editing in the single-cell zygote. While the method of making knockin and knockout mice through homologous recombination has been very successful, the technique is time consuming, is inefficient, and is expensive. In the first aim of this thesis we investigate the possibility of developing a more efficient methodology for genetically engineering mice by combining the method of mouse zygotic injections, which was developed during the early period of transgenics, with that of programmable nucleases, specifically the CRISPR/Cas9 system. Because of its ability to cause locus-specific dsDNA breaks, the CRISPR/Cas9 system has been shown to increase the efficiency of making gene knockouts through NHEJ and genetic modification through HDR in human cells. The high efficiency of genome-engineering mediated by CRISPR/Cas9 could be used to make all of the classical genetic modifications, such as multiple gene knockouts, reporter alleles, epitope-tagged alleles, and conditional alleles, directly through zygotic injections. Cas9- mediated locus-specific genome-engineering through zygotic injections would allow for mice to be made in as little as three weeks at a much lower cost because it bypasses the intermediate steps of making genetically engineered mESCs though homologous recombination, blastocyst injections, and breeding chimeric mice to make pure F Is. A reporter for genomic methylation DNA methylation is a dynamic epigenetic modification that does not only play an important role in regulating normal biological processes, such as gene transcription, cell 42 identity, and development, but can also becomes highly deregulated in human diseases such as cancer. Although DNA methylation has been known to be a dynamic process for quite some time, present technology to capture DNA methylation patterns has been restricted to static snapshots in fixed cell populations. As of yet no technology exists to investigate the dynamics of DNA methylation in vivo. The second aim of this thesis is to establish a technology that can report on the DNA methylation state of an endogenous locus in vivo at single-cell resolution. 43 References Abe T, Fujimori T. (2013) Reporter mouse lines for Florescence imaging. Develop. Growth Differ. 55: 390-405 Aran D, Sabato S, Hellman A. (2013) DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol 14: R2 1. Arand J, Spieler D, Karius T., Branco MR, Meilinger D, Meissner A, Jenuwein T, Xu G, Leonhardt H, Wolf V, Walter J. (2012) In vivo control of CpG and Non-CpG DNA methylation by DNA methyltransferases. Plos Gent. 8(6) Athanasiadou R, Sousa DD, Myant K, Merusi C, Stancheva I, Bird A. (2010) Targeting of de novo DNA methylation throughout the Oct-4 gene regulatory region in differentiating embryonic stem cells. PLoS ONE 5, e9937. Avvakumov GV. et al. 2008. Structural basis for recognition of hemi-methylated DNA by the SRA domain of human UHRF 1. Nature 455, 822-825 Ayyanathan, K. et al. 2003. Regulated recruitment of HP 1 to a euchromatic gene induces mitotically heritable, epigenetic gene silencing: a mammalian cell culture model of gene variegation. Genes Dev. 17, 1855-1869 Bae KH, Kwon YD, Shin HC, Hwang MS, Ryu EH, Park KS. (2003) Human zinc fingers as building blocks in the construction of artificial transcription factors. Nature Biotech. 21, 275-280 Barlow DP, Stoger R, Herrmann BG, Saito K, Schweifer N. (1991) The mouse insulin- like growth factor type-2 receptor is imprinted and closely linked to the Tme locus. Nature 349: 84-87. Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P. (2007) CRISPR provides acquired resistance against viruses in prokaryotes. Science 315:1709-12 Bestor TH, Hellewell SB, Ingram VM. (1984) Differentiation of two mouse cell lines is associated with hypomethylation of their genomes. Mol Cell Biol 4: 1800-1806. Bird AP, Southern EM. (1978) Use of restriction enzymes to study eukaryotic DNA methylation: 1. The methylation pattern in ribosomal DNA from Xenopus laevis. J Mol Biol 118: 27-47. Bhaya D, Davidson M, Barrangou R. (2011) CRISPR-Cas systems in bacteria and archaea: versatile small RNAs for adaptive defense and regulation. Annu Rev Genet. 45:273-97 44 Bhutani N, Brady JJ, Damian M, Sacco A, Corbel SY, Blau HM. (2010) Reprogramming towards pluripotency requires AID-dependent DNA demethylation. Nature 463, 1042- 1047 Boch J, Scholze H, Schornack S, Landgraf A, Hahn S, Kay S, Lahaye T, Nickstadt A, Bonas U. (2009) Breaking the code of DNA binding specificity of TAL-type III effectors. Science 326, 1509-1512 Bolotin A, Quinquis B, Sorokin A, Ehrlich SD. 2005. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology 151:2551-61 Bouabe H, Okkenhaug K. (2013) Gene Targeting in mice: a review. Methods Mol Biol. 1064: 213-336 Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Melton DA, Jaenisch R, Young RA. 2005. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 122(6): 947-56 Bradley A, Evans M, Kaufman MH, Robertson E. (1984) Formation of germ-line chimaeras from embryo-derived teratocarcinoma cell lines. Nature 309: 255-256. Brannan CI, Lyman SD, Williams DE, Eisenman J, Anderson DM. (1991) Steel-Dickie mutation encodes a c-kit ligand lacking transmembrane and cytoplasmic domains. Proc. Natl. Acad. Sci. USA 88: 4671-4674. Brinster RL, Chen Y, Trumbauer M, Senear AW, Warren R. (1981) Somatic expression of herpes thymidine kinase in mice following injection of a fusion gene into eggs. Cell 27: 223-231. Brizzard BL, Chubet RG, Vizard DL. (1994) Immunoaffinity purification of FLAG epitope-tagged bacterial alkaline phosphatase using a novel monoclonal antibody and peptide elution. BioTechniques 16:730-735. Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ. 2008 Small CRISPR RNAs guide antiviral defense in prokaryotes. Science 321:960-64 Bult CJ, White 0, Olsen GJ, Zhou L, Fleischmann RD. 1996 Complete genome sequence of the methanogenic archaeon, Methanococcusjannaschii. Science 273:1058-73 Carey BW, Markoulaki S, Hanna J, Saha K, Gao Q, Mitalipova M, Jaenisch R. (2009) Reprogramming of murine and human somatic cells using a single polycistronic vector. Proc Nat] Acad Sci USA. 106:157-162. 45 Carlson DF, Tan WF, Lillico SG, Stverakova D, Proudfoot C, Christian M, Voytas DF, Long CR, Whitelaw CBA, Fahrenkrug SC. (2012) Efficient TALEN-mediated gene knockout in livestock. Proc Natl Acad Sci U S A. 109:17382-17387. Cedar H, Solage A, Glaser G, Razin A. (1979) Direct detection of methylated cytosine in DNA by use of the restriction enzyme MspI. Nucleic Acids Res 6: 2125-2132. Chabot B, Stephenson DA, Chapman VM, Besmer P, Bernstein A. (1988) The proto- oncogene c-kit encoding a trans membrane tyrosine kinase receptor maps to the mouse W locus. Nature 335: 88-89. Chen ZX, Mann JR, Hsieh CL, Riggs A, Chedin F. (2005) Physical and Functional Interaction between DNMT3L protein and members of the De Novo Methyltransferase Family. Journal of Cellular Biochemistry. 2005; 95:902-917 Chuang LS, Ian HI, Koh TW, Ng HH, Xu G, Li BF. (1997) Human DNA-(cytosine-5) methyltransferase-PCNA complex as a target for p21 WAF 1. Science 277, 1996-2000 Cohen SN, Chang AC, Boyer HW, Helling RB. (1973) Construction of biologically functional bacterial plasmids in vitro. Proc Natl Acad Sci USA 70: 3240-3244. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE. (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452: 215-219. Cong L, Ran FA, Cox D, Lin S, Barretto R, Habib N, Hsu PD, Wu X, Jiang W. (2013) Marraffini LA, et al. Multiplex genome engineering using CRISPR/Cas systems. Science. 339:819-823. Costantini F, Lacy E. (1981) Introduction of a rabbit beta globin gene into the mouse germ line. Nature 294: 92-94. Croxford AL, Buch T. (2011) Cytokine reporter mice in immunological research: perspectives and lessons learned. Immunology. 132: 1-8 Cue'not L, (1902) Notes et revues, Arch. Zool. Exp. Gen., xxvii. Dalton SR, Bellacosa A. (2012) DNA demethylation by TDG. Epigenomics 4, 459-467 Dawlaty MM, Ganz K, Powell BE, Hu YC, Markoulaki S, Cheng AW, Gao Q, Kim J, Choi SW, Page DC, Jaenisch R. (2011). Teti is dispensable for maintaining pluripotency and its loss is compatible with embryonic and postnatal development. Cell Stem Cell 9, 166-175. Deng D, Yan C, Pan X, Mahfouz M, Wang J, Zhu JK, Shi Y, Yan N. (2012) Structural basis for sequence-specific recognition of DNA by TAL effectors. Science 335, 720-723 46 Deveau H, Barrangou R, Garneau JE, Labonte J, Fremaux C. (2008) Phage response to CRISPR encoded resistance in Streptococcus thermophilus. J. Bacteriol. 190:1390-400 Deveau H, Garneau JE, Moineau S. (2010) CRISPR/Cas system and its role in phage- bacteria interactions. Annu. Rev. Microbiol. 64:475-93 Ding, Q. et al. (2013) A TALEN genome-editing system for generating human stem cell- based disease models. Cell Stem Cell 12, 238-251 Doetschman T, Gregg RG, Maeda N, Hooper ML, Melton DW. (1987) Targeted correction of a mutant HPRT gene in mouse embryonic stem cells. Nature 330: 576-578 Donehower LA, Harvey M, Slagle BL, McArthur MJ, Montgomery CA. (1992) Mice deficient for p53 are developmentally normal but susceptible to spontaneous tumours. Nature 356: 215-221. Doyle A, McGarry MP, Lee NA, Lee JJ. (2012) The construction of transgenic and gene knockout/knockin mouse models of human diseases. Transgenic Res. 21(2): 327-349 Dupage M, Dooley A, Jacks T. (2009) Conditional mouse lung cancer models using adenoviral or lentiviral delivery of Cre recombinase. Nat Protoc. 4(7): 1064-1072 Dupuy AJ, Fritz S, Largaespada DA. (2001) Transposition and gene disruption in the male germline of the mouse. Genesis. 30:82-88. Edwards CA, Ferguson-Smith AC. (2007) Mechanisms regulating imprinted genes in clusters. Curr Opin Cell Biol 19: 281-289. Es Tej . e a-. 208. De~ n rvt DA hai n rmted by G9LJA pievDEns reprogramming of embryonically silenced genes. Nature Struct. Mol. Biol. 15, 1176- 1183 Evan GI, Lewis GK, Ramsay G, Bishop JM. (1985) Isolation of monoclonal antibodies specific for human c-myc proto-oncogene product. Mol. Cell. Biol. 5:3610-3616 Falzon M, Fewell J, Kuff EL. (1993) EBP-80, a transcription factor closely resembling the human autoantigen Ku, recognizes single- to double-strand transitions in DNA. J. Biol. Chem. 268:10546-52 Feinberg AP, Cui H, Ohlsson R. (2002) DNA methylation and genomic imprinting: insights from cancer into epigenetic mechanisms. Sem. Canc. Biol. 12, 389-398 Feng S, Jacobsen SE, Reik W. (2010) Epigenetic reprogramming in plant and animal development. Science 330, 622-627 47 Feldman, N. et al. 2006. G9A-mediated irreversible epigenetic inactivation of Oct-3/4 during early embryogenesis. Nature Cell Biol. 8, 188-194 Fitzpatrick GV, Soloway PD, Higgins MJ. (2002) Regional loss of imprinting and growth deficiency in mice with a targeted deletion of KvDMRI. Nat Genet 32: 426-431. Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW, Molloy PL, Paul CL. (1992) A genomic sequencing protocol that yields a positive display of 5- methylcytosine residues in individual DNA strands. Proc Natl Acad Sci 89: 1827-1831. Gama-Sosa MA, Midgett RM, Slagel VA, Githens S, Kuo KC, Gehrke CW, Ehrlich M. (1983) Tissue-specific differences in DNA methylation in various mammals. Biochim Biophys Acta 740: 212-219. Garneau JE, Dupuis ME, Villion M, Romero DA, Barrangou R. (2010) The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 468:67-71 Gavin AC, Bosche M, Krause K, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141-147. Geurts AM, Cost GJ, Freyvert Y, Zeitler B, Miller JC, Choi VM, Jenkins SS, Wood A, Cui XX, Meng XD. (2009) Knockout Rats via Embryo Microinjection of Zinc-Finger Nucleases. Science. 325:433-433. Gordon JW, Scangos GA, Plotkin DJ, Barbosa JA, Ruddle FH. (1980) Genetic transformation of mouse embryos by microinjection of purified DNA. Proc. Natl. Acad. Sci. USA 77: 7380-7384 Guo H, Zhu P, Wu X, Li X, Wen L, Tang F. (2013) Single-cell methylome landscapes of mouse embryonic stem cells and early embryos analyzed using reduced representation bisulfite sequencing. Genome Res 23: 2126-2135. Gou J, Wu B, Li S, Bao S, Zhao L, Hu S, Sun W, Sun W, SU J, Dai Y, Li X. (2014) Contribution of Mouse embryonic stem cells and induced pluripotent stem cells to chimeras through injection and coculture of embryos. Stem cell int. 409021 Gu J, Lu H,Tsai AG, Schwarz K, Lieber MR. (2007) Single-stranded DNA ligation and XLF-stimulated incompatible DNA end ligation by the XRCC4-DNA ligase IV complex: influence of terminal DNA sequence. Nucleic Acids Res. 35:5755-62 Harbers K, Jahner D, Jaenisch R. (1981) Microinjection of cloned retroviral genomes into mouse zygotes: integration and expression in the animal. Nature 293: 540-542. He YF. et al. (2011) Tet-mediated formation of 5-carboxylcytosine and its excision by TDG in mammalian DNA. Science 333, 1303-1307 48 Herman JG, Graff JR, Myohanen S, Nelkin BD, Baylin SB. (1996) Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. Proc Natl Acad Sci 93: 9821-9826. Hermans PW, van Soolingen D, Bik EM, de Haas PE, Dale JW, van Embden JD. 1991. Insertion element IS987 from Mycobacterium bovis BCG is located in a hot-spot integration region for insertion elements in Mycobacterium tuberculosis complex strains. Infect. Inimun. 59:2695-705 Hockemeyer D. et al. (2011) Genetic engineering of human pluripotent cells using TALE nucleases. Nat. Biotechnol. 29, 731-734 Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P. (2002) Systematic identification of protein complexes in Saccharomyces cervisiae by mass spectrometry. Nature 415:180-183. Hoe N, Nakashima K, Grigsby D, Pan X, Dou SJ, et al. 1999. Rapid molecular genetic subtyping of serotype MI group A Streptococcus strains. Emerg. Infect. Dis. 5:254-63 Holliday R, Pugh JE. (1975) DNA modification mechanisms and gene activity during development. Science 187: 226-232. Hudson DF, Fowler KJ, Earle E, Saffery R, Kalitsis P, Choo KHA. 1998. Centromere protein B null mice are mitotically and meiotically normal but have lower body and testis weights. JCB, 141(2): 309 Ito S, D'Alessio AC, Taranova OV, Hong K, Sowers LC, Zhang Y. (2010) Role of Tet proteins in 5mC to 5hmC conversion, ES-cell self-renewal and inner cell mass specification. Nature 466, 1129-1133 Ito S, Shen L, Dai Q, Wu SC, Collins LB, Swenberg JA, He C, Zhang Y. (2011) Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science 333, 1300-1303 Jaenisch R. (1977) Germ line integration of moloney leukemia virus: effect of homozygosity at the m-mulV locus. Cell. 12(3): 691-6 Jaenisch R, Mintz B. (1974) Simian virus 40 DNA sequence in DNA of healthy adult mice derived from preimplantation blastocysts injected with viral DNA. Proc Natl Acad Sci USA. 71(4):1250-4 Jaenisch R, Young R. (2008) Stem cells, the molecular circuitry of pluripotency and nuclear reprograming. Cell. 132: 567-582 49 Jinek M, Chylinski K, Fonfara 1, Hauer M, Doudna JA, Charpentier E. (2012) A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821. Jore MM, Lundgren M, van Duijn E, Bultema JB, Westra ER, et al. 2011. Structural basis for CRISPR RNA-guided DNA recognition by CASCADE. Nat. Struct. Mol. Biol. 18:529-36 Kim H, Kim J. (2014) A guide to genome engineering with programmable nucleases. Nature Reviews Genetics. 15: 321-334 Kim YG, Cha J, Chandrasegaran S. (1996) Hybrid restriction enzymes: zinc finger fusions to FokI cleavage domain. Proc. Natl. Acad. Sci. USA 93, 1156-1160 Kleinstiver BP, Prew MS, Tsai SQ, Nguyen NT, Topkar VV, Zheng Z, Joung KJ. (2015) Broadening the targeting range of staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition. Nature Biotechnology. Advanced online pub. Koller BH, Hagemann LJ, Doetschman T, Hagaman JR, Huang S. (1989) Germ-line transmission of a planned alteration made in a hypoxanthine phosphoribosyltransferase gene by homologous recombination in embryonic stem cells. Proc. Natl. Acad. Sci. USA 86: 8927-8931. Kohli RM, Zhang Y. (2013) TET enzymes, TDG and the dynamics of DNA demethylation. Nature. 502: 472-479 Kolodziej KE, Pourfarzad F, Boer ED, Krpic S, Grosveld F, Strouboulis J. (2009) Optimal use of tandem biotin and V5 tags in ChIO assays. BMC Molecular Biology. 10:6 Kriaucionis S, Heintz N. (2009) The nuclear DNA base 5-hydroxymethylcytosine is present in Purkinje neurons and the brain. Science 324, 929-930 Laird PW, Jacjson-Grusby L, Fazeli A, Dickinson SL, Jung WE, Weinberg RA, Jaenisch R. (1995) Suppression of intestinal neoplasia by DNA hypomethylation. Cell; 81(2): 197- 205 Lee JW, Blanco L, Zhou T, Garcia-Diaz M, Bebenek K. 2003 Implication of DNA polymerase lambda in alignment-based gap filling for nonhomologous DNA end joining in human nuclear extracts. J. Biol. Chem. 279:805-11. Liang G, Chan MF, Tomigahara Y, Tsai YC, Gonzales FA, Li E, Laird PW, Jones PA. (2002) Cooperativity between DNA methyltransferases in the maintenance methylation of repetitive elements. Mol. Cell. Biol. 22, 480-491 50 Li E, Bestor TH, Jaenisch R. (1992) Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell 69, 915-926 Li T, Liu B, Spalding MH, Weeks DP, Yang B. (2012) High-efficiency TALEN-based gene editing produces disease-resistant rice. Nat. Biotechnol. 30, 390-392 Lieber M. 2010. The Mechanism of double-strand DNA break repair by the nonhomologous DNA end joining pathway. Annu Rev Biochem. 79: 181-211 Lin SP, Youngson N, Takada S, Seitz H, Reik W, Paulsen M, Cavaille J, Ferguson-Smith AC. (2003) Asymmetric regulation of imprinting on the maternal and paternal chromosomes at the Dlkl-Gtl2 imprinted cluster on mouse chromosome 12. Nat Genet 35: 97-102. Lois C, Hong EJ, Pease S, Brown EJ, Baltimore D. (2002) Germline transmission and tissue-specific expression of transgenes delivered by lentiviral vectors. Science. 295:868- 872. Luo G, Ivics Z, Izsvak Z, Bradley A. (1998) Chromosomal transposition of a tc 1/mariner- like element in mouse embryonic stem cells. Proc Natl Acad Sci U S A. 95:10769- 10773. Oakeley EJ, Podesta' A, Jost JP. (1997) Developmental changes in DNA methylation of the two tobacco pollen nuclei during maturation. Proc Natl Acad Sci 94: 11721-11725. Ogawa K, Nishinakamura R, lwamatsu Y, Shimosato D, Niwa H. (2006). Synergistic action of Wnt and LIF in maintaining pluripotency of mouse ES cells. Biochem. Biophys. Res. Commun. 343, 159-166. A 1A Dpll MAIJ Unt._, r I .T tr orn FNXA ----P% ~. NI4IJ IV1" LII M, e V, Haber i-A , LA E1. ( 1999) " imihiiuytransfas z5nmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247-257 Okita K, Ichisaka T, Yamanaka S. (2007) Germline competency of mouse induced pluripotent stem cells selected for Nanog expression. Nature 448, 313-317 Ono R. et al. (2002) LCX, leukemia-associated protein with a CXXC domain, is fused to MLL in acute myeloid leukemia with trilineage dysplasia having t(l0;1 1)(q22;q23). Cancer Res. 62, 4075-4080 Orban PC, Chui D, Marth JD. (1992) Tissue-and site-specific DNA recombination in transgenic mice. Proceedings of the National Academy of Sciences. 89: 6861-6865 Oswald J, Engemann S, Lane N, Mayer W, Olek A, Fundele R, Dean W, Reik W, Walter J. (2000) Active demethylation of the paternal genome in the mouse zygote. Curr Biol 10: 475-478. 51 Ma Y, Pannicke U, Schwarz K, Lieber MR. 2002. Hairpin opening and overhang processing by an Artemis:DNA-PKcs complex in V(D)J recombination and in nonhomologous end joining. Cell. 108:781-94. Ma Y, Lu H, Tippin B, Goodman MF, Shimazaki N. (2004) A biochemically defined system for mammalian nonhomologous DNA end joining. Mol. Cell 16:701-13 Ma Y, Schwarz K, Lieber MR. 2005 The Artemis:DNA-PKcs Endonuclease Can Cleave Gaps, Flaps, and Loops. DNA Repair. 4:845-51. Maherali, N. et al. (2007) Global epigenetic remodeling in directly reprogrammed fibroblasts. Cell Stem Cell 1, 55-70 Mak AN, Bradley P, Cernadas RA, Bogdanove AJ, Stoddard BL. (2012) The crystal structure of TAL effector PthXol bound to its DNA target. Science 335, 716-719 Makarova KS, Grishin NV, Shabalina SA, Wolf YI, Koonin EV. (2006) A putative RNA- interference based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol. Direct 1:7 Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, Norville JE, Church GM. (2013). RNA-guided human genome engineering via Cas9. Science 339, 823-826 Mao X, Fujiwara Y, Chapdelaine A et al. (2001) Activation of EGFP expression by Cre- mediated excision in a new ROSA26 reporter mouse strain. Blood 97:324-6 Martin GR, (1981) Isolation of a pluripotent cell line from early mouse embryos cultured in medium conditioned by teratocarcinoma stem cells. Proc. Natl. Acad. Sci. USA 78: 7634-7638. Messerschmidt D, Knowles B, Solter D. (2014) DNA methylation dynamics during epigenetics reprogramming in the germline and peimplantation embryos. Genes and Development. 28: 812-828 Mojica FJ, Diez-Villasenor C, Garcia-Martinez J, Soria E. 2005. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J. Mol. Evol. 60:174-82 Mojika FJ, Garcia-Martinez J, Almendros C. 2009. Short motif sequences determine the targets of the prokaryotic CRISPR defence system. Microbiology 155:733-40 Morgan HD, Santos F, Green K, Dean W, Reik W. (2005) Epigenetic reprogramming in mammals. Hum Mol Genet 14(Spec No 1): R47-R58. 52 Morgan HD, Santos F, Green K, Dean W, Reik W. (2005) Epigenetic reprogramming in mammals. Hum Mol Genet 14(Spec No 1): R47-R58. Moulton, T. et al. (1994) Epigenetic lesions at the H19 locus in Wilms' tumor patients. Nature Genet. 7, 440-447 Myant, K. et al. 2011. LSH and G9A/GLP complex are required for developmentally programmed DNA methylation. Genome Res. 21, 83-94 NickMcElhinny SA, Ramsden DA. 2003 Polymerase mu is a DNA-directed DNA/RNA polymerase. Mol. Cell. Biol. 23:2309-15 Paigen K. (2003) One hundred years of mouse genetics: an intellectual history. I. the classical period (1002-1980). Genetics. 163: 1-7 Perry WL, Vasicek TJ, Lee JJ, Rossi JM, Zeng L, Zhang T, Tilghman SM, Costantini F. (1995) Phenotypic and molecular analysis of a transgenic insertional allele of the mouse fused locus. Genetics. 141:321-332. Popp, C. et al. (2010) Genome-wide erasure of DNA methylation in mouse primordial germ cells is affected by AID deficiency. Nature 463, 1101-1105 Pourcel C, Salvignol G, Vergnaud G. 2005. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology 151:653-63 Prasher DC, Eckenrode VK, Ward WW, Prendergast FG, Cormier MJ. (1992) Primary structure of the Aequorea victoria green-fluorescent protein. Gene 111, 229-233. Riggs AD. (1975) X inactivation, differentiation, and DNA methylation. Cytogenet Cell Genet 14: 9-25. Rothkamm K, Kruger I, Thompson LH, Lobrich M. (2003) Pathways of DNA double- strand break repair during the mammalian cell cycle. Mol Cell Biol 23: 5706-5715. Rouet P, Smih F, Jasin M, (1994) Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol. Cell. Biol. 14, 8096- 8106 (1994). Rowe, H. M. et al. (2010) KAPI controls endogenous retroviruses in embryonic stem cells. Nature 463, 237-240 Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J, et al. (2015) Integrative analysis of I1 I reference human epigenomes. Nature 518: 317-330. 53 Robertson K. (2005) DNA methylation and human diseases. Nature Reviews Genetics. 6: 597-610 Rudnicki MA, Braun T, Hinuma S, Jaenisch R. (1992) Inactivation of MyoD in mice leads to up-regulation of the myogenic HLH gene Myf-5 and results in apparently normal muscle development. Cell 71: 383-390. Sage J, Mulligan GJ, Attardi LD, Milller A, Chen S, William B, Theodorou E, Jacks T. (2000) Targeted disruption of the three Rb-related genes leads to loss of G1 control and immortalization. Genes Dev. 14923: 3037-3050 Sandovici I, Smith NH, Nitert MD, Ackers-Johnson M, Uribe-Lewis S, Ito Y, Jones RH, Marquez VE, Cairns W, Tadayyon M. (2011) Maternal diet and aging alter the epigenetic control of a promoter-enhancer interaction at the Hnf4a gene in rat pancreatic islets. Proc Natl Acad Sci 108: 5449-5454. Santos F, Dean W. (2004) Epigenetic reprogramming during early development in mammals. Reproduction 127: 643-651. Sasaki H, Matsui Y. (2008) Epigenetic events in mammalian germ-cell development, reprogramming and beyond. Nat. Rev. Genet. 9, 129-140 Sharif J. et al. (2007) The SRA protein Np95 mediates epigenetic inheritance by recruiting DNMTI to methylated DNA. Nature 450, 908-912 Schubeler D. (2015) Function and information content of DNA methylation. Nature 517: 321-326. Schultz MD, He Y, Whitaker JW, Hariharan M, Mukamel EA, Leung D, Rajagopal N, Nery JR, Urich MA, Chen H, et al. (2015) Human body epigenome maps reveal noncanonical DNA methylation variation. Nature 523: 212-216. Segal DJ, Dreier B, Beerli RR, Barbas CF. (1999) Toward controlling gene expression at will: selection and design of zinc finger domains recognizing each of the 5'-GNN-3' DNA target sequences. Proc. Natl. Acad. Sci. USA 96, 2758-2763 Smallwood SA, Kelsey G. (2012) De novo DNA methylation: a germ cell perspective. Trends in Genetics. 28(1) 33-42 Soldner F. et al. (2011) Generation of isogenic pluripotent stem cells differing exclusively at two early onset Parkinson point mutations. Cell 146, 318-331 Smallwood SA, Lee HJ, Angermueller C, Krueger F, Saadeh H, Peat J, Andrews SR, Stegle 0, Reik W, Kelsey G. (2014) Single-cell genome-wide bisulfite sequencing for assessing epigenetic heterogeneity. Nat Methods 11: 817-820. 54 Smallwood SA, Kelsey G, 2012. De novo DNA methylation: a germ cell perspective. Trends in Genetics. 28(1): 33-42 Smith AG, Heath JK, Donaldson DD, Wong GG, Moreau J, Stahl M, Rogers D. (1988). Inhibition of pluripotential embryonic stem cell differentiation by purified polypeptides. Nature 336, 688-690. Smith ZD, Chan MM, Humm KC, Kamik R, Mekhoubad S, Regev A, Eggan K, Meissner A. (2014) DNA methylation dynamics of the human preimplantation embryo. Nature 511: 611-615. Smith ZD, Meissner A. (2013) DNA methylation: Roles in mammalian development. Nat Rev Genet 14: 204-220. Srinivas S, Watanabe T, Lin CS, William CM, Tanabe Y, Jessell TM, Costantini F. (2001) Cre reporter strains produced by targeted insertion of EYFP and ECFP into the ROSA26 locus. BMC Dev. Biol. 1, 4. Steenman MJC, Rainier S, Dobry CJ, Feinberg AP. (1994) Loss of imprinting of IGF2 is linked to reduced expression and abnormal methylation of H19 in Wilms' tumor. Nature Genet. 7, 433-439 Stelzer Y, Jaenisch R. (2015) Monitoring Dynamics of DNA methylation at single-cell resolution during development and disease. Cold Spring Harb Quant Biol. Sternberg N, Hamilton D. (1981) Bacteriophage P1 site-specific recombination. I. Recombination between loxP sites. Journal of Molecular biology. 150: 467-486 Sugawara N, Wang X, Haber JE. (2003) In vivo roles of Rad52, Rad54, and Rad55 III 1.11,a it I III'.Aj -1CI(LC.U IC..AIU1IaLIVl1I. IVIVI 1'I 1/-. 1 17. Tahiliani M. et al. (2009) Conversion of 5-methylcytosine to 5-hydroxymethylcytosine in mammalian DNA by MLL partner TETI. Science 324, 930-935 Takahashi K, Yamanaka S. (2006) Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell 126, 663-676. Thomas KR, Folger KR, Capecchi MR. (1986) High frequency targeting of genes to specific sites in the mammalian genome. Cell. 44:419-428. Thomas KR, Capecchi MR. (1987) Site-directed mutagenesis by gene targeting in mouse embryo-derived stem cells. Cell 51: 503-512. Thompson S, Clarke AR, Pow AM, Hooper ML, Melton DW. (1989) Germ line transmission and expression of a corrected HPRT gene produced by gene targeting in embryonic stem cells. Cell 56: 313-321. 55 Thompson JS, Reese KJ, DeBaun MR, Perlman EJ, Feinberg AP. (1996) Reduced expression of the cyclindependent kinase inhibitor gene p57KIP2 in Wilms' tumor. Cancer Res. 56, 5723-5727 Thorvaldsen JL, Duran KL, Bartolomei MS. (1998) Deletion of the H19 differentially methylated domain results in loss of imprinted expression of H19 and lgf2. Genes Dev 12: 3693-3702. Tsien RY. (1998) The green fluorescent protein. Annu. Rev. Biochem. 67, 509-544. Tupler R, Perini G, Green MR. (2001) Expressing the human genome. Nature 409, 832- 833 Vidal SE, Amlani B, Chen T, Tsirigos A, Stadtfeld M. (2014) Combinatorial modulation of signaling pathways reveals cell-type-specific requirements for highly efficient and synchronous iPSC reprogramming. Stem Cell Reports 3, 574-584. Van der Weyden L, White JK, Adams DJ, Logan DW. (2011) The mouse genetics toolkit: revealing function and mechanism. Genome Biology. 12: 224 Wagner TE, Hoppe PC, Jollick JD, Scholl DR, Hodinka RL. (1981) Microinjection of a rabbit beta-globin gene into zygotes and its subsequent expression in adult mice and their offspring. Proc. Natl. Acad. Sci. USA 78: 6376-6380. Wang J, Exiline CM, DeClercq J, Hayward SB, Li PW, Holmes C, Cannon PM. (2015) Homology-Driven genome editing in hematopoietic stem and progenitor cells using ZFN mRNA and AAV6 donors. Nature Biotechnology. Warren L, Manos PD, Ahfeldt T, Loh YH, Li H, Lau F, Ebina W, Mandal PK, Smith ZD, Meissner A. (2010) Highly efficient reprogramming to pluripotency and directed differentiation of human cells with synthetic modified mRNA. Cell Stem Cell.7:618-630. Waterston RH, Lindblad-Toh K. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature. 420:520-562. Wernig M, Meissner A, Foreman R, Brambrink T, Ku M, Hochedlinger K, Bernstein BE, Jaenisch R. (2007). In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 448, 318-324. Wolf D, Goff SP. (2009) Embryonic stem cells use ZFP809 to silence retroviral DNAs. Nature 458, 1201-1204 Wolf D, Goff SP. (2007) TRIM28 mediates primer binding site-targeted silencing of murine leukemia virus in embryonic cells. Cell 131, 46-57 56 Wolfe SA, Nekludova L, Pabo CO. (2000) DNA recognition by Cys 2His 2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct. 29, 183-212 Wood AJ. et al. (2011) Targeted genome editing across species using ZFNs and TALENs. Science 333, 307 Wu H, Liu X, Jaenisch R. (1994) Double replacement: strategy for efficient introduction of subtle mutations into the murine Colla-1 gene by homologous recombination in embryonic stem cells. Proc Nat] Acad Sci U S A. 91:2819-2823. Wutz A, Smrzka OW, Schweifer N, Schellander K, Wagner EF, Barlow DP. (1997) Imprinted expression of the Igf2r gene depends on an intronic CpG island. Nature 389: 745-749. Xiong S, Parker-Thornburgh J, Lozano G. (2012) Debopling genetically engineered mouse models to study tumor suppression. Curr Protoc Mouse Biol. 2(1): 9-24 Yang T, Adamson TE, Resnick JL, Leff S,Wevrick R, Francke U, Jenkins NA, Copeland NG, Brannan CI. (1998) A mouse model for Prader-Willi syndrome imprinting center mutations. Nature Genet 19: 25-31. Yannone SM, Khan IS, Zhou RZ, Zhou T, Valerie K, Povirk LF. 2008 Coordinate 5' and 3' endonucleolytic trimming of terminally blocked blunt DNA double-strand break ends by Artemis nuclease and DNA-dependent protein kinase. Nucleic Acids Res. 36:3354- 65 Yusa, K. et al. (2011) Targeted gene correction of alphal-antitrypsin deficiency in induced pluripotent stem cells. Nature 478, 391-394 Zijlstra M, Bix M, Simister NE, Loring JM, Raulet DH, Jaenisch R. (1990) Beta2- Mjicrgluuln defici_ mce -T- ak C4-8+ cytyti T es.NatuA. 44:742- Zernicka-Goetz M, Morris SA, Bruce AW. (2009) Making a firm decision: multifaceted regulation of cell fate in early mouse embryo. Nature Rev Genetics. 10(7): 467-77 Zetsche B, Gootenberg JS, Abudayyeh 00, Regev A, Koonin EV, Zhang F. (2015) Cpfl is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell. 163: 1-13 57 Chapter 2. One-step generation of mice carrying mutations in multiple genes targeted by CRISPR-Cas mediated genome-engineering Haoyi Wang' 7 , Hui Yang"'7, Chikdu S. Shivalila",2 ,7 , Meelad M. Dawlaty', Albert W. Cheng' 3, Feng Zhang 5'6 , Rudolf Jaenisch' 3'8 1. Whitehead Institute for Biomedical Research, Cambridge, Massachusetts, USA 2. Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 3. Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 5. Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA and 6. McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 7. These authors contributed equally to this work. 8. Correspondence should be addressed to Rudolf Jaenisch [jaenisch@wi.mit.edu] Published as: Wang H*, Yang H*, Shivalila CS*, Dawlaty MM, Cheng AW, Zhang F, Jaenisch R. One-step generation of mice carrying mutations in multiple genes targeted by CRISPR- Cas mediated genome-engineering. Cell. 153(4): 910-918. 2013. *indicates equal contribution HW, HY and CSS shared all experiments and analyses equally, except HY did all zygotic injections. MMD helped with 5hmc analysis. AWC helped with computational analysis of off targets. FZ proved CRISPR plasmids. HW, HY, CSS and RJ designed and conceived of experiments. 58 Mice carrying mutations in multiple genes are traditionally generated by sequential recombination in embryonic stem cells and/or time-consuming intercrossing of mice with a single mutation. The CRISPR/Cas system has been adapted as an efficient gene-targeting technology with the potential for multiplexed genome editing. We demonstrate that CRISPR/Cas-mediated gene editing allows the simultaneous disruption of five genes (TetI, 2, 3, Sry, Uty - 8 alleles) in mouse embryonic stem (ES) cells with high efficiency. Coinjection of Cas9 mRNA and single-guide RNAs (sgRNAs) targeting Tetl and Tet2 into zygotes generated mice with biallelic mutations in both genes with an efficiency of 80%. Finally, we show that coinjection of Cas9 mRNA/sgRNAs with mutant oligos generated precise point mutations simultaneously in two target genes. Thus, the CRISPR/Cas system allows the one- step generation of animals carrying mutations in multiple genes, an approach that will greatly accelerate the in vivo study of functionally redundant genes and of epistatic gene interactions. Genetically modified mice represent a crucial tool for understanding gene function in development and disease. Mutant mice are conventionally generated by insertional mutagenesis (Copeland and Jenkins, 2010; Kool and Berns, 2009) or by gene- targeting methods (Capecchi, 2005). In conventional gene-targeting methods, mutations are introduced through homologous recombination in mouse embryonic stem (ES) cells. Targeted ES cells injected into wild-type (WT) blastocysts can contribute to the germline of chimeric animals, generating mice containing the targeted gene modification (Capecchi, 2005). It is costly and time consuming to produce single-gene knockout mice and even more so to make double-mutant mice. Moreover, in most other mammalian species, no established ES cell lines are available that contribute efficiently to chimeric animals, which greatly limits the genetic studies in many species. Alternative methods have been developed to accelerate the process of genome modification by directly injecting DNA or mRNA of site-specific nucleases into the one- cell embryo to generate DNA double-strand break (DSB) at a specified locus in various species (Bogdanove and Voytas, 2011; Carroll et al., 2008; Urnov et al., 2010). DSBs 59 induced by these site-specific nucleases can then be repaired by error-prone nonhomologous end joining (NHEJ) resulting in mutant mice and rats carrying deletions or insertions at the cut site (Carbery et al., 2010; Geurts et al., 2009; Sung et al., 2013; Tesson et al., 2011). If a donor plasmid with homology to the ends flanking the DSB is coinjected, high-fidelity homologous recombination can produce animals with targeted integrations (Cui et al., 2011; Meyer et al., 2010). Because these methods require the complex designs of zinc finger nucleases (ZNFs) or Transcription activator-like effector nucleases (TALENs) for each target gene and because the efficiency of targeting may vary substantially, no multiplexed gene targeting in animals has been reported to date. To dissect the functions of gene family members with redundant functions or to analyze epistatic relationships in genetic pathways, mice with two or more mutated genes are required, prompting the development of efficient technology for the generation of animals carrying multiple mutated genes. Recently, the type II bacterial CRISPR/Cas system has been demonstrated as an efficient gene-targeting technology with the potential for multiplexed genome editing. Bacteria and archaea have evolved an RNA-based adaptive immune system that uses CRISPR (clustered regularly interspaced short palindromic repeat) and Cas (CRISPR- associated) proteins to detect and destroy invading viruses and plasmids (Horvath and Barrangou, 2010; Wiedenheft et al., 2012). Cas proteins, CRISPR RNAs (crRNAs), and transactivating crRNA (tracrRNA) form ribonucleoprotein complexes, which target and degrade foreign nucleic acids, guided by crRNAs (Gasiunas et al., 2012; Jinek et al., 2012). It was shown that the Cas9 endonuclease from Streptococcus pyogenes type II CRISPR/Cas system can be programmed to produce sequence-specific DSB in vitro by 60 providing a synthetic single-guide RNA (sgRNA) consisting of a fusion of crRNA and tracrRNA (Jinek et al., 2012). More intriguingly, Cas9 and sgRNA are the only components necessary and sufficient for induction of targeted DNA cleavage in cultured human cells (Cho et al., 2013; Cong et al., 2013; Mali et al., 2013) as well as in zebrafish (Chang et al., 2013; Hwang et al., 2013). A recent report also demonstrated disruption of a GFP transgene in mice using the CRISPR/Cas system (Shen et al., 2013). The ease of design, construction, and delivery of multiple sgRNAs suggest the possibility of multiplexed genome editing in mammals. Indeed, one study demonstrated that two loci separated by 119 bp could be cleaved simultaneously in cultured human cells at a low efficiency (Cong et al., 2013). The extent of achievable multiplexed genome editing has yet to be demonstrated in stem cells as well as in animals. Here, we use the CRISPR/Cas system to drive both NHEJ-based gene disruption and homology directed repair (HDR)- based precise gene editing to achieve highly efficient and simultaneous targeting of multiple genes in stem cells and mice. Results Simultaneous targeting up to five genes in ES cells To test the possibility of targeting functionally redundant genes from the same gene family, we designed sgRNAs targeting the Ten-eleven translocation (Tet) family members, Tetl, Tet2, and Tet3 (Figure IA). Tet proteins (Tetl/2/3) convert 5- methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC) in various embryonic and adult tissues and mutant mice for each of these three genes have been produced by homologous recombination in ES cells (Dawlaty et al., 2011; Gu et al., 2011; Li et al., 61 2011; Moran-Crusio et al., 2011). To test whether the CRISPR/Cas system could produce targeted cleavage in the mouse genome, we transfected plasmids expressing both the mammalian-codon-optimized Cas9 and a sgRNA targeting each gene (Cong et al., 2013; Mali et al., 2013) into mouse ES cells and determined the targeted cleavage efficiency by the Surveyor assay (Guschin et al., 2010). All three Cas9-sgRNA transfections produced cleavage at target loci with high efficiency of 36% at Teti, 48% at Tet2, and 36% at Tet3 (Figure 1 B). Because each target locus contains a restriction enzyme recognition site (Figure 1 A), we PCR amplified an 500 bp fragment around each target site and digested the PCR products with the respective enzyme. A correctly targeted allele will lose the restriction site, which can be detected by failure to cleave upon enzyme treatment. Using this restriction fragment length polymorphism (RFLP) assay, we screened 48 ES cell clones from each single-targeting experiment. Consistent with the Surveyor analysis, a high percentage of ES cell clones were targeted, with a high probability of having both alleles mutated (Figure SlA available online). The results summarized in Table 1 demonstrate that between 65% and 81% of the tested ES cell clones carried mutations in the Tet genes with up to 77% having mutations in both alleles The high efficiency of single-gene modification prompted us to test the possibility of targeting all three genes simultaneously. For this we cotransfected ES cells with the constructs expressing Cas9 and three sgRNAs targeting Teti, 2, and 3. Of 96 clones screened using the RFLP assay, 20 clones were identified as having mutations in all six alleles of the three genes (Figures IC and SIB and Table 1). To exclude that a PCR bias could give false positive results, we performed Southern blot analysis and confirmed complete agreement with the RFLP results (Figure IC). We subcloned and sequenced the 62 PCR products of Tetl-,Tet2-, and Tet3-targeted regions to verify that all of eight tested clones carried biallelic mutations in all three genes with most clones displaying two mutant alleles for each gene with small insertions or deletions (indels) at the target site (Figure 1D).To test whether these mutant alleles would abolish the function of Tet proteins, we compared the 5hmC level of targeted clones to WT ES cells. Previously, we reported a depletion of 5hmC in Tetl/Tet2 double-knockout ES cells derived using traditional gene-targeting methods (Dawlaty et al., 2013). As expected from loss of function alleles, we found a significant reduction of 5hmC levels in all clones carrying biallelic mutations in the three genes (Figure 1 E) To further test the potential of multiplexed gene targeting by CRISPR/Cas system, we designed sgRNAs targeting two Y-linked genes, Sry and Uty (Figure SIC). Short PCR products encoding sgRNAs targeting all five genes (Teti, Tet2, Tet3, Sry, and Uty) were pooled and cotransfected with a Cas9 expressing plasmid and the PGK puroR cassette into ES cells. Of 96 clones that were screened using the RFLP assay, 10% carried mutations in all eight alleles of the five genes (Figure SID and Table Si), demonstrating the capacity of the CRISP/Cas9 system for highly efficient multiplexed gene targeting. One-step generation of single-gene mutant mice by zygote injection We tested whether mutant mice could be generated in vivo by direct embryo manipulation. Capped polyadenylated Cas9 mRNA was produced by in vitro transcription and coinjected with sgRNAs. Initially, to determine the optimal concentration of Cas9 mRNA for targeting in vivo, we microinjected varying 63 amounts of Cas9-encoding mRNA with Teti targeting sgRNA at constant concentration (20 ng/ml) into pronuclear (PN) stage one-cell mouse embryos and assessed the frequency of altered alleles at the blastocyst stage using the RFLP assay. As expected, higher concentration of Cas9 mRNA led to more efficient gene disruption (Figure S2A). Nevertheless, even embryos injected with the highest amount of Cas9 mRNA (200 ng/ml) showed normal blastocyst development, suggesting low toxicity. To investigate whether postnatal mice carrying targeted mutations could be generated, we coinjected sgRNAs targeting Teti or Tet2 with different concentrations of Cas9 mRNA. Blastocysts derived from the injected embryos were transplanted into foster mothers and newborn pups were obtained. As summarized in Table 2, about 10% of the transferred blastocysts developed to birth independent of the RNA concentrations used for injection suggesting low fetal toxicity of the Cas9 mRNA and sgRNA. RFLP, Southern blot, and sequencing analysis demonstrated that between 50 and 90% of the postnatal mice carried biallelic mutations in either target gene (Figures 2A, 2B, and 2C and Table 2). Surprisingly, specific D9 Teti and specific D8 and D15 Tet2 mutant alleles were repeatedly recovered in independently derived mice. Preferential generation of these alleles is likely caused by a short sequence repeat flanking the DSB (see Figure S2B) consistent with previous reports demonstrating that perfect microhomology sequences flanking the cleavage sites can generate microhomology-mediated precise deletions by end repair mechanism (MMEJ) (McVey and Lee, 2008; Symington and Gautier, 2011)(Figure S2B). A similar observation was also made when TALEN mRNA was injected into one-cell rat embryos (Tesson et al., 2011). 64 We also derived blastocysts from zygotes injected with Cas9 mRNA and Tet3 sgRNA. Genotyping of the blastocysts demonstrated that of eight embryos three were homozygous and three were heterozygous Tet3 mutants (two failed to amplify) (Figure S2C). Some blastocysts were implanted into foster mothers and, upon C section, we readily identified multiple mice of smaller size (Figure S2D), many of which died soon after delivery. Genotyping shown in Figure S2E indicated that all pups with mutations in both Tet3 alleles died neonatally. Only 2 out of 15 mice survived that were either Tet3 heterozygous mutants or WT (Figure S2F). These results are consistent with the lethal neonatal phenotype of Tet3 knockout mice generated using traditional methods (Gu et al., 2011), although we have not yet established which of the Tet3 mutations produced loss of function rather than hypomorphic alleles. One-Step generation of double-gene mutant mice by zygote injection To test whether Tetl/Tet2 double-mutant mice could be produced from single embryos, we coinjected Teti and Tet2 sgRNAs with 20 or 100 ng/ml Cas9 mRNA into zygotes. A total of 28 pups were born from 144 embryos transferred into foster mothers (21% live-birth rate) that had been injected at the zygote stage with high concentrations of RNA (Cas9 mRNA at 100 ng/ml, sgRNAs at 50 ng/ml), consistent with low or no toxicity of the Cas9 mRNA and sgRNAs (Table 3). RFLP, Southern blot analysis, and sequencing identified 22 mice carrying targeted mutations at all four alleles of the Tetl and Tet2 genes (Figures 2D and 2E) with the remaining mice carrying mutations in a subset of alleles (Table 3). Injection of zygotes with low concentration of RNA (Cas9 mRNA at 20 ng/ml, sgRNAs at 20 ng/ml) yielded 19 pups from 75 transferred embryos 65 (25% live-birth rate), which is a higher survival rate than from embryos injected with 100 ng/ml of Cas9 RNA. Nevertheless, more than 50% of the pups were biallelic Teti/Tet2 double mutants (Table 3). These results demonstrate that postnatal mice carrying biallelic mutations in two different genes can be generated within one month with high efficiency (Figure 2F) Although the high live-birth rate and normal development of mutant mice suggest low toxicity of CRISPR/Cas9 system, we sought to determine the off-target effects in vivo. Previous work in vitro, in bacteria, and in cultured human cells suggested that the protospacer-adjacent motif (PAM) sequence NGG and the 8 to 12 base "seed sequence" at the 3' end of the sgRNA are most important for determining the DNA cleavage specificity (Cong et al., 2013; Jiang et al., 2013; Jinek et al., 2012). Based on this rule, only three and four potential off targets exist in mouse genome for Tet I and Tet2 sgRNA, respectively (Table S2 and Experimental Procedures), with each of them perfectly matching the 12 bp seed sequence at the 3' end and the NGG PAM sequence of the sgRNA (there is no potential off-target site for Tet3 sgRNA using this prediction rule). From seven double-mutant mice produced from injection with high RNA concentration we PCR amplified 400 to 500 bp fragments from all seven potential off-target loci and found no cleavage in the Surveyor assay (Figure S3), suggesting a high specificity of CRISPR/Cas system. Multiplexed Precise HDR-mediated genome editing in vivo The NHEJ-mediated gene mutations described above produced mutant alleles with different and unpredictable insertions and deletions of variable size. We explored 66 the possibility of precise homology directed repair (HDR)-mediated genome editing by coinjecting Cas9 mRNA, sgRNAs, and single-stranded DNA oligos into one-cell embryos. For this we designed an oligo targeting TetI so as to change two base pairs of a Sac restriction site and creating instead an EcoRI site and a second oligo targeting Tet2 with two base pair changes that would convert an EcoRV site into an EcoRI site (Figure 3A). Blastocysts were derived from zygotes injected with Cas9 mRNA and sgRNAs and oligos targeting Teti or Tet2, respectively. DNA was isolated, amplified, and digested with EcoRI to detect oligo-mediated HDR events. Six out of nine Tet 1-targeted embryos and 9 out of 15 Tet2-targeted embryos incorporated an EcoRi site at the respective target locus, with several embryos having both alleles modified (Figure S4A). When Cas9 mRNA, sgRNAs, and single-stranded DNA oligos targeting both Teti and Tet2 were coinjected into zygotes, out of 14 embryos, four were identified that were targeted with the oligo at the Teti locus, seven that were targeted with the oligo at the Tet2 locus and one embryo (2) that had one allele of each gene correctly modified (Figure S4B). All four alleles of embryo 2 were sequenced, confirming that one allele of each gene contained the 2 bp changes directed by the oligo, whereas the other alleles were disrupted by NHEJ- mediated deletion and insertion (Figure S4C). Blastocysts with double oligo injections were implanted into foster mothers and a total of 10 pups were born from 48 embryos transferred (21% live-birth rate). Upon RFLP analysis using EcoRI, we identified seven mice containing EcoRI sites at the Teti locus and eight mice containing EcoRI sites at the Tet2 locus, with six mice containing EcoRI sites at both Teti and Tet2 loci (Figure 3B). We also applied RFLP analysis using Sad and EcoRV to Teti and Tet2 loci, respectively, showing that all alleles not targeted 67 by oligos contained disruptions, which is in consistent with the high biallelic mutation rate by Cas9 mRNA and sgRNAs injection. These results were confirmed by sequencing demonstrating mutations in all four alleles of mouse 5 and 7 (Figure 3C). Our results demonstrate that mice with HDR-mediated precise mutations in multiple genes can be generated in one step by CRISPR/Cas-mediated genome editing. Discussion The genetic manipulation of mice is a crucial approach for the study of development and disease. However, the generation of mice with specific mutations is labor intensive and involves gene targeting by homologous recombination in ES cells, the production of chimeric mice, and, after germ line transmission of the targeted ES cells, the interbreeding of heterozygous mice to produce the homozygous experimental animals, a process that may take 6 to 12 months or longer (Capecchi, 2005). To produce mice carrying mutations in several genes requires time-consuming intercrossing of single-mutant mice. Similarly, the generation of ES cells carrying homozygous mutations in several genes is usually achieved by sequential targeting, a process that is labor intensive, necessitating multiple consecutive cloning steps to target the genes and to delete the selectable markers. As summarized in Figure 4, we have established three different approaches for the generation of mice carrying multiple genetic alterations. We demonstrate that CRISPR/Cas-mediated genome editing in ES cells can generate the simultaneous mutations of several genes with high efficiency, a single-step approach allowing the production of cells with mutations in five different genes (Figure 4A). We chose the three 68 Tet genes as targets because the respective mutant phenotypes have been well defined previously (Dawlaty et al., 2011, 2013; Gu et al., 201 1). Cells mutant for Tetl, 2 and 3 were depleted of 5hmCas would be expected for loss of function mutations of the genes (Dawlaty et al., 2013). However, we have not as yet established, which of the Cas9- mediated gene mutations produced loss of function rather than hypomorphic alleles. We also show that mouse embryos can be directly modified by injection of Cas9 mRNA and sgRNA into the fertilized egg resulting in the efficient production of mice carrying biallelic mutations in a given gene. More significantly, coinjection of Cas9 with Teti and Tet2 sgRNAs into zygotes produced mice that carried mutations in both genes (Figure 4B, upper). We found that up to 95% of newborn mice were biallelic mutant in the targeted gene when single sgRNA was injected and when coinjected with two different sgRNAs, up to 80% carried biallelic mutations in both targeted genes. Thus, mice carrying multiple mutations can be generated within 4 weeks, which is a much shorter time frame than can be achieved by conventional consecutive targeting of genes in ES cells and avoids time-consuming intercrossing of single-mutant mice. The introduction of DSBs by CRISPR/Cas generates mutant alleles with varying deletions or insertions in contrast to designed precise mutations created by homologous recombination. The introduction of point mutations into human ES cells, cancer cell lines, and mouse by ZNF or TALEN along with DNA oligo has been demonstrated previously (Chen et al., 2011; Soldner et al., 2011; Wefers et al., 2013). We demonstrate that CRISPR/Cas-mediated targeting is useful to generate mutant alleles with predetermined alterations, and coinjection of single-stranded oligos can introduce designed point mutations into two target genes in one step, allowing for multiplexed gene 69 editing in a strictly controlled manner (Figure 4B, lower). It will be of great interest to assess whether this targeting system allows for the production of conditional alleles, or precise insertion of larger DNA fragments such as GFP markers so as to generate conditional knockout and reporter mice for specific genes. There are several potential limitations of the CRISPR/Cas technology. First, the requirement for a NGG PAM sequence of S. pyogenes Cas9 limits the target space in the mouse genome. It has been shown that the Streptococcus thermophiles LMD-9 Cas9 using different PAM sequence can also induce targeted DNA cleavage in mammalian cells (Cong et al., 2013). Therefore, exploiting different Cas9 proteins may enable to target most of the mouse genome. Second, although the sgRNAs used here showed high targeting efficiency, much work is needed to elucidate the rules for designing sgRNAs with consistent high targeting efficiency, which is essential for multiplexed genome- engineering. Third, although our off-target analysis for the seven most likely off targets of Teti and Tet2 sgRNAs failed to detect mutations in these loci, it is possible that other mutations were induced following as yet unidentified rules. A more thorough sequencing analysis for a large number of sgRNAs will provide more information about the potential off-target cleavage of the CRISPR/Cas system and lead to a better prediction of potential off-target sites. Last, oligo-mediated repair allows for precise genome editing, but the other allele is often mutated through NHEJ (Figures 3B, 3C, and S4C). We have shown that using lower Cas9 mRNA concentration generates more mice with heterozygous mutations. Therefore, it maybe possible to optimize the system for more efficient generation of mice with only one oligo -modified allele. In addition, employment of Cas9 70 nickase will likely avoid this complication because it mainly induces DNA single-strand break, which is typically repaired through HDR (Cong et al., 2013; Mali et al., 2013). It is likely that a much larger number of genomic loci than targeted in the present work can be modified simultaneously when pooled sgRNAs are introduced. The methods presented here open up the possibility of systematic genome-engineering in mice, facilitating the investigation of entire signaling pathways, of synthetic lethal phenotypes or of genes that have redundant functions. A particularly interesting application is the possibility to produce mice carrying multiple alterations in candidate loci that have been identified in GWAS studies to play a role in the genesis of multigenic diseases. In summary, CRISPR/Cas mediated genome editing makes possible the generation of ES cells and mice carrying multiple genetic alterations and will facilitate the genetic dissection of development and complex diseases. Experimental Procedures Procedures for generating sgRNAs expressing vetor Bicistronic expression vector expressing Cas9 and sgRNA (Cong et al., 2013) were digested with Bbs! and treated with Antarctic Phosphatase, and the linearized vector was gel purified. A pair of oligos for each targeting site (Table S3) was annealed, phosphorylated, and ligated to linearized vector. Cell Culture and Transfection V6.5 ES cells (on a 129/Sv x C57BL/6 F1 hybrid background) were cultured on 71 gelatin-coated plates with standard ES cell culture conditions. Cells were transfected with a plasmid expressing mammalian-codon-optimized Cas9 and sgRNA (single targeting), three plasmids expressing Cas9 and sgRNAs targeting Teti, Tet2, and Tet3 (triple targeting), or five PCR products each coding for sgRNA targeting Tet 1, Tet2, Tet3, Sry, and Uty, along with a plasmid expressing PGK-puroR using FuGENE HD reagent (Promega) according to the manufacturer's instructions. Twelve hours after transfection, ES cells were re-plated at a low density on DR4 MEF feeder layers. Puromycin (2mg/ml) was added I day after replating and taken off after 48 hr. After recovering for 4 to 6 days, individual colonies were picked and genotyped by RFLP and Southern blot analysis, and the leftover ES cells on plate were collected for Suveryor assay. Suveryor Assay and RFLP analysis for genome modification Suveryor assay was performed as described by (Guschin et al., 2010). Genomic DNA from treated and control ES cells or targeted and control mice was extracted. Mouse genomic DNA samples were prepared from tail biopsies. PCR was performed using TetI- , 2-, and 3- specific primers (Table S3) under the following conditions: 95C for 5 min; 35x (95C for 30 s, 60C for 30 s, 68C for 40s); 68C for 2 min; hold at 4C. PCR products were then denatured, annealed, and treated with Suveryor nuclease (Transgenomic). DNA concentration of each band was measured on an ethidium bromide-stained 10% acrylamide Criterion TBE gel (BioRad) and quantified using ImageJ software. The same PCR products for Suveryor assay were used for RFLP analysis. Ten microliters of Teti, Tet2, or Tet3 PCR product was digested with Sac, EcoRV, or XhoI, respectively. 72 Digested DNA was separated on an ethidium bromide-stained agarose gel (2%). For sequencing, PCR products were cloned using the Original TA Cloning Kit (Invitrogen), and mutations were identified by Sanger sequencing. Dot Blot DNA was extracted from ES cells following standard procedures. DNA was transferred to nylon membrane using BioRad slot blot vacuum manifold apparatus. Anti-5hmC (Active Motif 1:10,000) was used to detect 5hmC following manufacturer's protocol. Production of Cas9 mRNA and sgRNA T7 promoter was added to Cas9 coding region by PCR amplification using primer Cas9 F and R (Table S3). T7-Cas9 PCR product was gel purified and used as the template for in vitro transcription (IVT) using mMESSAGE mMACHINE T7 ULTRA kit (Life Technologies). T7 promoter was added to sgRNAs template by PCR amplification using primer Tetl F and R, Tet2 F and R, and Tet3 F and R (Table S3). The T7-sgRNA PCR product was gel purified and used as the template for IVT using MEGAshortscript T7 kit (Life Technologies). Both the Cas9 mRNA and the sgRNAs were purified using MEGAclear kit (Life Technologies) and eluted in RNase-free water One-Cell embryo injection All animal procedures were performed according to NIH guidelines and approved by the Committee on Animal Care at MIT. B6D2Fl (C57BL/6 X DBA2) female mice and ICR mouse strains were used as embryo donors and foster mothers, respectively. 73 Superovulated female B6D2FI mice (7-8 weeks old) were mated to B6D2FI stud males, and fertilized embryos were collected from oviducts. Cas9 mRNAs (from 20 ng/ml to 200 ng/ml) and sgRNA (from 20 ng/ml to 50 ng/ml) was injected into the cytoplasm of fertilized eggs with well recognized pronuclei in M2 medium (Sigma). For oligos injection, Cas mRNA (100 ng/ml), sgRNA (50 ng/ml), and donor oligos (100 ng/ml) were mixed and injected into zygotes at the pronuclei stage. The injected zygotes were cultured in KSOM with amino acids at 37C under 5% C02 in air until blastocyst stage by 3.5 days. Thereafter, 15-25 blastocysts were transferred into uterus of pseudopregnant ICR females at 2.5 dpc. Southern Blotting Genomic DNA was separated on a 0.8% agarose gel after restriction digests with the appropriate enzymes, transferred to a nylon membrane (Amersham) and hybridized with 32P random primer (Stratagene)-labeled probes. Prediction of Potential off targets Potential targets of CRISPR sgRNAs were found using the rules outline in (Mali et al., 2013). For a 20 nt sgRNA targeting sequence of nnnnn nnMMM MMMMM MMMMM, where M are the seed bases preceding the PAM sequence NGG, four search sequences (MMM MMMMM MMMMM AGG; MMM MMMMM MMMMM CGG; MMM MMMMM MMMMM GGG; MMM MMMMM MMMMM TGG) were generated. Exact matches to these search sequences in the mouse genome (mm9) were found using bowtie and reported as potential targets of the CRISPR sgRNA 74 Acknowledgments We thank Ruth Flannery and Kibibi Ganz for support with animal care and experiments. We thank Jaenisch lab members for helpful discussions on the manuscript. We are also grateful to G. Grant Welstead and Daniel B. Dadon for the help of editing the manuscript. M.M.D is a Damon Runyon Postdoctoral Fellow. A.W.C. is supported by a Croucher scholarship. R.J. is an adviser to Stemgent and a cofounder of Fate Therapeutics. This work was supported by NIH grants R37-HD045022 and ROI-CA084198 to RJ. Sad si agtgkcgcaetmiaOAGCTCtgggausgaggga sad SOI ECoRV Cay WAM ATATCeaq9CWgMIegfg8 BwnHI XhoA Tet3 - I I 1 I Xhol ggiaaNnanaaa CAGjegtccakteC9atigg Expected 235 size, bp: 225 252 218 214 198 NHEJ, %: 36 48 36 C Triple mutants WT #61#52 #53 RFLP500bp , 200bp 9.4kb 6.5kb Southern 500bp - Teti 200bp TM 6.4kb 'WT 5.8kb 4.4kb. 2.3kb Tetl probe Triple mutants WT #61 #52 #53 9,4kb- 6.5kb - 4.4kb- 2,3kb - Tet2 probe Triple mutants WT #51 #52 #53 Tet2 500bp 200bp 9.4kb 6.5kb TM 5.8kb 4.4kb WT 4.3kb 2.3kb Tet3 probe D Triple mutant clone #14 Teti GACAAGTGTGGC GCTGTAGGGAG-TCA MAGACTAGGTGAGGAACTCTGCrT -1 bp CZA:C:AAm (;I (e{:'I'('.: 'rI~Ic ;t~;A( ;Ac'tUfa;AnAr tI'r' +1 bp B' AsAAC~CGGAAAGT~CCAACAGATATCahr'TGCAGAATCGGAGAC2ACGCC 3 WT Tet2 A.ACAA AGTrcCACAGA --------------- ATCGGAGAACc;CGC- -16 bp A$AAACACGTGAAAGTGCCAACAG---TCA CTGCrAGMATCrGArACACC -3 bp ' ECATCTACACGGCAAGGAGGGAAGA'1C f'imeTT ATtCCAAGTGG 3 WT TO3 ;c A AG G ----------- TT cATC:AA(; -II bp Triple mutant clone #41 5'GACCAAGTGTGG'TCTGTCAGGCATGAGACAGGTAG3.AAcTCTGCTT 3' WT Tefj AG '~c~G~cGUUCGGG :-8AC TA2MCTCT:;JTr -3 bp G.ACCAAnT.GG~CTCTC;----------G-A ACTA.GGT)X:AC .TCyT -i hp ' AGMA-UCACGTGAGTGCCAACAGATATCI,00CTG CGAATCGGA AACACC WT Tet2 AGACACGTG.AfGTGC------ -TCcACTGAGA CGGGAACCAC-C - bp AGAAACACTTGGCCAACAGATAYCAWCTG AArCGGAGAACCACGCC +1 bp 5' AT-T&CACGGGPCAAIGAGA3'TTcC AAGTGt 3 WT TeO vi'-'r r ------------------------ ~,.-r -24 bp ATCTACACGGGCAAGGAGGGGAAGAG- -- - TGTCCCATO3C^AAGTGGGT -8 bp Figure 1. Multiplexed Gene Targeting in mouse ES cells A Ttor Tet2 75 Teti Tet2 Tet3 Tet3 TM 8.1kb WT 3.2kb E 200ng 1 300ng 400ng 0 76 (A) Schematic of the Cas9/sgRNA-targeting sites in Tet 1, 2, and 3. The sgRNA-targeting sequence is underlined, and the protospacer-adjacent motif (PAM) sequence is labeled in green. The restriction sites at the target regions are bold and capitalized. Restriction enzymes used for RFLP and Southern blot analysis are shown, and the Southern blot probes are shown as orange boxes. (B) Surveyor assay for Cas9-mediated cleavage at Tet1, 2, and 3 loci in ES cells. (C) Genotyping of triple-targeted ES cells, clones 51, 52, and 53 are shown. Upper: RFLP analysis. Tetl PCR products were digested with Sac, Tet2 PCR products were digested with EcoRV, and Tet3 PCR products were digested with Xhol. Lower: Southern blot analysis. For the Tetl locus, Sac digested genomic DNA was hybridized with a 5' probe. Expected fragment size: WT = 5.8 kb, TM (targeted mutation) = 6.4 kb. For the Tet2 locus, Sac, and EcoRV double digested genomic DNA was hybridized with a 3' probe. Expected fragment size: WT = 4.3 kb, TM = 5.6 kb. For the Tet3 locus, BamHI and Xhol double-digested genomic DNA was hybridized with a 5' probe. Expected fragment size: WT = 3.2 kb, TM = 8.1 kb. (D) The sequence of six mutant alleles in triple-targeted ES cell clone 14 and 41. PAM sequence is labeled in red. (E) Analysis of 5hmC levels in DNA isolated from triple-targeted ES cell clones by dot blot assay using anti-5hmC antibody. A previously characterized DKO clone derived using traditional method is used as a control. See also Figure S1. 77 Table 1. CRISPRICas-Medlated Gene Targeting In V6.5 ES Cell. Mutant Alleles per Clone / Total Clones Tested Gene 6 5 4 3 2 1 0 Teti WA 27/48 4/48 17/48 Tet2 37/48 2/48 9/48 Tet3 32/48 3/48 13/48 Tet+ Tet2 + Te3 20/96 16/96 2/98 2/96 1/96 0/96 55/96 Table 1. Plasmids encoding Cas9 and sgRNAs targeting Teti, Tet2, and Tet3 were transfected separately (single targeting) or in a pool (triple targeting) into ES cells. The number of total alleles mutated in each ES cell clone is listed from 0 to 2 for single- targeting experiment, and 0 to 6 for triple-targeting experiment. The number of clones containing each specific number of mutated alleles is shown in relation to the total number of clones screened in each experiment. See also Table SI Table 2. CRISPR/Ca*-Medlatsd Single-Gene Targeting In BDF2 Mke Cas9/sg Blastocysts/injected Transferred Gene RNA (ng/l) Zygotes Embryos (Recipients) Teti 209/20 38/50 10(1) 100/20 50/80 25(1) 60/20 40150 40(2) 100/0 167/198 80(3) Tet2 100/50 176/203 10(5) To3 100/50 8W112 64(4) Newborns (Dead) 2 (0) 3K (3(0) 8(3) 12(2) 22(3) 15(13) Mutant Alleles per Mouse/Total Mice Tested8 2 1 0 2/2 0/2 02 2/3 0/3 1/3 4/ 2/7 1/7 9/11 1/11 1/11 19420 0120 1/20 9/13 2/3 213 Table 2. Cas9 mRNA and sgRNAs targeting Tetl, Tet2, or Tet3 were injected into fertilized eggs. The blastocysts derived from injected embryos were transplanted into foster mothers and newborn pups were obtained and genotyped. The number of total alleles mutated in each mouse is listed from 0 to 2. The number of mice containing each specific number of mutated alleles is shown in relation to the total number of mice screened in each experiment. See also Table S2. Some of the pups were cannibalized. Table 3. CRISPR/Ca-Medlated Double-Gene Targeting In BDF2 Mice Cas9/sgRNA Blastocystinjected Transferred Embryos Gene (ng/pL) Zygotes (Recipients) Tetl +T'e2 100150 194/229 144(") 20/20 92/109 75(s) Newborns (Dead 31(8) 19(3) Mutant Alleles per Mouse/Total Mice Tested' 4 3 2 1 0 22/28 4/28 1/28 1/28 0/28 11/19 1/19 2/19 3/19 2/19 Table 3. Cas9 mRNA and sgRNAs targeting Teti and Tet2 were coinjected into fertilized eggs. The blastocysts derived from the injected embryos were transplanted into foster mothers and newborn pups were obtained and genotyped. The number of total alleles mutated in each mouse is listed from 0 to 4 for Teti and Tet2. The number of mice containing each specific number of mutated alleles is shown in relation to the number of total mice screened in each experiment. Some of the pups were cannibalized Teti mutants #1 #2* #3 WT - TM 6 4kb WT 5.8kb Tet1 probe B Tet2 mutants #1 #2 #3 #4 #5 6.5kb 4.4kb Tet2 probe C 5'GACCAAGTG GGCTGCTGTCAGGGAGCTCATGGAGACTAGGTGAGGAACTrTGCTT 3' WT Tell #2 CACCAAGTGTGGCTGCTGTCA- - - - - - - - - TGGA;ACTAGGTGAGGAACTCTGCTT -9 bp GACCAAGTGTGGCTGCTOTCAGGGAGJCATGGAGACTAGGTGAGGAACTCTGCTT +1 bp C 5 AGAAACACGTGAAAGTGCCAACAGATATCCAGGCTGCAGAATCGGAGAACCACGCC 3' WT Tet2 #4 AGAAACACGTGAAAGT GCCAACAGA--- --------------- ATCGGAGAACCACGCC -15 bp AGAAACACGTGAAAGTGCCAACAG-------------------------------- -150 bp D RFLP Double mutants M #91 #2 #3 #4 #5 #6 #7 #89 #10 #12WT Teti 9A4kb- 6.5kb~ Southem 6.5kb TM 6 4kb 4kb- Wi' 5.8kb 4.4kb- W .k Tetl probe E Double mutant #9 Double mutants M #1 #2 #3 94 #5 #6 #7 #8#9#10*#11#12 WT Tet2 -TM 5.6kb -WT 4.3kb 7T2 probe F Double mutants 5 G3ACCAAGT1TGGC TGiCTGjTCAGGG(CA TCATGGAGACTA3GTGAGAACTCTGCT'I 3' WT Teti ------------------------------ TGGAGACTAGGTGAGGAACTCTGCTT -147bp GACCAAGTTGGCTGCTGTCA---------TGGAGACTAGGTGAGGAA iCTCTGCTT -9 bp 5'A(AAACA(:C;TGAAAGI GCCAACAGATATC AGGCTGC AAT GGAtAACCAC;cx 3 WT Tet2 AGAAACACG TGAAAGTGCCAACA(G ;TA CAGGCTGCAGAATCGGAGAACCACGCC +1 bp, mi AGAAAtCACGTGAAAGTGCCAACAGA A(CCAGGCTGCAGAATCGGAGAACCACG.CC +1 bp Double mutant #10 5'(GAC( AA( I'; CC C I' AGGUA CTCATOGA'ACI'A(T;(;'I'(A(,(;AACI'CIT((: 3' WT Tefl GACC ---------- CATGGAGACTAGGTGAGGAACTCTGCTT -24bp GACCAAGTGT3G;iCTGCTGTCAGGGAG---------AC'AGGTGAGGAA.CCTTG'CTT 9bp 5'AGAAACACGTGAAAGTGCCAACAGA'ATCCAGGCTGCAGAATCGGCAGAACCACGCC 3' WT Tet2 A'AAACACGTIAAAGT(C7AA--------CAGG(TGCA AACANAAC(C -7 bp AGAAA CACGTGAAAT;'CJAACAGA' CCAGGCGAGAAT G95% of cells carrying the Gapdh reporter expressed GFP. In contrast, >30% of cells carrying the Dazl reporter were GFP-negative, corresponding to reporter silencing. The effect of the Dazl reporter becomes more robust upon continued passage, with >80% of the cells silencing their reporter within 4 weeks (Figure IB). To assess the DNA methylation levels of the Gapdh and Dazl reporters following introduction into mESCs, we sorted Gapdh GFP-positive and Dazl GFP-negative cell populations (Figure IC). The GFP expression state was stable upon continuous culture and passaging of the two sorted cell populations for over 7 weeks (Figure IC). DNA was extracted from both Gapdh GFP-positive and Dazl GFP-negative cells and subjected to bisulfite conversion and PCR sequencing. Figure 1D shows that Gapdh GFP-positive cells maintained the hypomethylated state at both Gapdh CGI and the Snrpn promoter regions, whereas Dazl GFP-negative cells became highly de novo methylated at the Dazi CGI region and its corresponding downstream Snrpn promoter (Figure 1 E). These results are consistent with the hypothesis that DNA methylation can be propagated from the CGI into the Snrpn promoter region resulting in repression of transcriptional activity. RGM Is a Reporter for In Vivo Demethylation The experiments described above showed that RGM reports on de novo methylation imposed in vivo on the unmethylated Dazl CGI donor test sequence. Conversely, we were interested to assess whether a methylated and silent donor Snrpn promoter can be reactivated by means of demethylation acquired in vivo. For this, we 134 used the CpG methyltransferase M.SssI to in vitro methylate both Gapdh and Dazi reporter constructs. Treatment of the plasmids with M.Sssl enzyme followed by bisulfite conversion, PCR amplification, and sequencing, confirmed the complete hypermethylation of both the CGI and Snrpn promoter regions (Figures 2A, S1B, and SIC). ESCs were transfected with either Gapdh or Dazi reporter and selected for cells carrying stably integrated vectors. Following I week of culture, we identified robust activation of GFP in virtually all cells carrying the integrated Gapdh reporter, whereas cells carrying the Dazi reporter remained GFP-negative (Figures 2B-2D). To assess the DNA methylation state of the Gapdh and Dazl CGI and the respective downstream Snrpn promoter regions, DNA was extracted from the two cell lines, subjected to bisulfite conversion, PCR amplification and sequencing. Figure 2E demonstrates that, consistent with high GFP expression, the Gapdh CGI and its downstream Snrpn promoter had become fully demethylated. In contrast, the Dazl CGI and its downstream Snrpn promoter sequences maintained the hypermethylated state in agreement with complete repression of the GFP signal (Figure 2F). Thus, our data support the hypothesis that a Snrpn promoter can report on in vivo demethylation of the CGI in its proximity. Dnmtl, Dnmt3a, and Dnmt3b Mediate Methylation and Reporter Activity We used ESCs deficient for the DNA methyltransferases Dnmtl, Dnmt3a, and Dnmt3b to gain mechanistic insights into demethylation and de novo methylation imposed on the Snrpn promoter in transfected ESCs. Figure 2G shows that introduction of an in vitro methylated Dazi Snrpn vector into Dnmtl mutant cells resulted in -80% GFP-positive cells by passage five, in contrast to no GFP-positive cells when inserted 135 into wild-type (WT) cells. In agreement with the role of Dnmtl as being the maintenance DNA methyltransferase (Li et al., 1992), bisulfite sequencing analysis on the sorted GFP- positive cells confirmed that reactivation of the methylated Dazi reporter occurred by passive demethylation (Figure 2H). To clarify the mechanism of de novo methylation, we introduced an unmethylated version of both vectors into mESCs deficient for both de novo DNA methyltransferases Dnmt3a and Dnmt3b (Pawlak and Jaenisch, 2011). Figure 2 1 shows that the vast majority of cells carrying the Dazl or the Gapdh reporters were positive for GFP unlike Dazl reporter expression in control V6.5 cells (Figure 21), which is consistent with Dnmt3a/b mediating de novo methylation and reporter silencing. Recent studies have shown that culturing mESCs in 2i medium (inhibitors of MEK and GSK3), and leukemia inhibitory factor (LIF) results in downregulation of Dnmt3a and Dnmt3b, consequently leading to global hypomethylation (Lee et al., 2014). To assess whether these culture conditions affect reporter activity, we transfected the unmethylated Gapdh and Dazl reporters into WT mESCs cultured in 2i and LIF. Figure 21 shows that the great majority of the stably transfected cells were GFP-positive, consistent with 2i-mediated down regulation of the Dnmt3a and Dnmt3b. RGM Can Report on Methylation Associated with Endogenous Gene Promoters To test whether the Snrpn promoter could also report on DNA methylation levels associated with endogenous gene promoters, we utilized CRISP/Cas-mediated gene editing to target the endogenous CGIs located at the promoter regions of Gapdh and Dazl (Figures 3A, S2A, and S2B). Figure 3B shows 35/36 Dazl-vector-transfected clones were GFP-negative indicating robust silencing of the DazI reporter whereas 20/21 Gapdh- 136 vector-transfected clones were GFP-positive (Figure 3B). FACS analysis of correctly targeted clones confirmed that Gapdh reporter cells were all GFP-positive with the CGI and Snrpn promoter unmethylated (Figures 3C and 3D) in contrast to DazI GFP-negative clones with the corresponding sequences methylated (Figures 3E and 3F). Our results demonstrate that Snrpn reporter activity reports on the methylation state of its surrounding sequences and does not alter their methylation state. Furthermore, the endogenous targeting results suggested that the partial repression of the DazI reporter (Figure 11B), observed at early passages of the transgene experiment, may be due to multiple genome integration and position effects. RGM Can Report on Methylation of Pluripotency-Specific Super-Enhancers Methylation of super enhancers (SEs) has been shown to change during differentiation. We tested whether RGM would report on the active and hypomethylated state of the pluripotency-specific SEs associated with the miR290 and Sox2 genes in mESCs and their methylated and inactive state in somatic cells (Figures 4A and S3A). In contrast to the CGIs located at gene promoters (Gapdh and Dazl), the SE regions of both Sox2 and miR290 represent low-density CpG sequences. Utilizing CRISP/Cas-mediated gene editing, we inserted a Snrpn tdTomato reporter into the endogenous miR290 and Sox2 enhancer (Figures 4B and S3B, respectively). As recipient cells, we used the previously established Oct4, Sox2, Klf4, and c-Myc (OSKM) polycistronic dox-inducible secondary reprogrammable mESCs (Carey et al., 2011), which also carried a GFP reporter knocked into the endogenous Nanog locus. Correct integration of the vector was validated by PCR and Southern analysis (Figure S3G). Figure 4C Shows that both 137 targeted ESC lines (miR290 #21 and Sox2#2) expressed tdTomato as well as Nanog- GFP. To assess whether the tdTomato expression correlated with hypomethylation of the inserted RGM, DNA extracted from the bulk mESCs population was bisulfite converted, amplified by PCR, and sequenced with the PCR amplification including both the SE CpG region and the downstream Snrpn promoter. As predicted from the methylation maps (Figures 4A and S3A), both endogenous miR290 and Sox2 CpG regions were mostly hypomethylated (Figure 4D). Importantly, the Snrpn promoter was also hypomethylated consistent with reporter expression. Of note, a few highly methylated alleles were detected (Figure 4D), possibly reflecting an inherent variation in the bulk population due to the presence of cells that carry an inactive reporter. To test this possibility, we analyzed the Sox2 SE region in the untargeted parental cell, which identified the presence of both methylated and unmethylated alleles at the same frequency as the targeted reporter cell line (Figure S3D). We conclude that RGM can report on the methylation state of distal genomic regulatory regions. Dynamic De Novo DNA Methylation during Differentiation To monitor real-time changes in genomic DNA methylation during in vitro differentiation, mESCs carrying the tdTomato reporters reflecting DNA methylation levels at the SE regions, were exposed to retinoic acid (RA), which induces a rapid exit from pluripotency, and cellular differentiation (Rhinn and Dolle, 2012). The presence of the Nanog-GFP reporter allowed monitoring exit from pluripotency by loss of GFP expression. Sorted double-positive (tdTomato+/GFP+) miR290 and Sox2 cells were plated on feeder-free gelatin coated plates, treated with 0.25 mM RA the following day 138 (Figure 5A) and analyzed at different times after addition of RA (Figures 5A and 5B). As expected, undifferentiated cells were double-positive (tdTomato+/GFP+). However, upon induction of differentiation a gradual reduction in the fraction of double-positive cells was observed with most disappearing over the time course of 7 days, resulting in a largely double-negative cell population (Figures 5B and 5C). This is in contrast to control Gapdh reporter cells that, as expected, appeared completely GFP-positive following 7 days of RA differentiation (Figure S4A). tdTomato and Nanog-GFP-positive cells disappeared with different kinetics: while singly tdTomato-positive cells (tdTomato+/GFP-) appeared after 2 days, only a few single Nanog-GFP-positive cells (tdTomato-/GFP+) were detected during differentiation (Figures 5B and 5C) suggesting that Nanog was silenced prior to methylation and silencing of the miR290 and Sox2 SEs. To confirm that loss of the tdTomato signal correlated with accumulation of de novo methylation in both SE regions, we sorted the main populations at different time points during RA differentiation (Figure 5C). DNA was extracted from the different cell populations and subjected to bisulfite sequencing, thus allowing a comprehensive analysis of the methylation state in both the endogenous miR290 and Sox2 SE and their respective Snrpn promoter regions (Figures 5D, 5E, S4B, and S4C). In contrast to the bulk population of mESCs (Figure 4D), the sorted double-positive cells did not harbor completely methylated alleles, consistent with the notion that methylated alleles in the bulk population represent intrinsic variation. The methylation of both miR290 and Sox2 in single-positive cells (tdTomato+/GFP-) was low, consistent with tdTomato expression. The overall increased de novo methylation in the single-positive cells, compared with the double-positive cells, may suggest that DNA methylation mediated silencing was already 139 initiated in this intermediate cell population. Notably, our analysis identified completely methylated genomes in the Sox2 single-positive (tdTomato+/GFP-) cell population (Figure 5E). This suggests that during rapid changes of de novo methylation, the half-life of the fluorescent protein (FP) may lead to an over-estimation of cells that are still hypomethylated during cell-state transitions. Finally, in agreement with the silencing of tdTomato expression, the double-negative cells (tdTomato/GFP) exhibited robust hypermethylation on both endogenous SE regions and their respective Snrpn promoters (Figures 5D, 5E, S4B, and S4C). To test whether the targeted reporter allele correlated with the methylation levels of the untargeted allele (WT), we analyzed the WT allele in Sox2 reporter cells at different time points during differentiation. Figure S4D shows that similar to the reporter allele, the WT allele exhibited low levels of methylation in the sorted double-positive cells and high levels of methylation following 7 days of differentiation. We conclude that RGM allows dynamic monitoring de novo methylation events that are imposed on genomic sequences upon exiting from pluripotency. Our data suggest that the differentiation of ESCs induces silencing of Nanog prior to de novo methylation of the two miR290 and Sox2 SEs. To test whether in vivo differentiation resulted in silencing of the tdTomato reporter in both miR290 and Sox2 SE regions, we analyzed 13.5 dpi chimeric embryos. As control, we injected ESCs harboring the Gapdh CGI reporter driving a GFP sequence, which had also been infected with lentiviruses resulting in constitutive expression of tdTomato. The robust expression of GFP in the Gapdh control embryos demonstrated the widespread expression signature of the Snrpn promoter throughout mouse tissues (Figure 6A). Unlike the Gapdh control, both miR290 and Sox2 embryos were completely 140 negative for both GFP and tdTomato, demonstrating robust repression of Nanog and the Snrpn promoter during in vivo differentiation (Figure 6A). DNA Demethylation during Cellular Reprogramming Reprogramming of somatic cells to iPS cells involves demethylation and activation of the pluripotency SEs Sox2 and miR290 (see Figures 4A and S3A). We investigated whether RGM could be used to capture demethylation events that are gradually acquired during cellular reprogramming. For this, we used secondary Dox- inducible reprogrammable mouse embryonic fibroblasts (MEFs) isolated from 13.5 dpi chimeric embryos that had been injected at the blastocyst stage with the OSKM DOX- inducible ESCs (Carey et al., 2011) carrying Nanog-GFP and the tdTomato reporter reflecting DNA methylation levels at the Sox2 or miR290 SE alleles (see Figure 6B). Culture of these MEFs in DOX induces the reprogramming factors while Nanog-GFP activation allows monitoring the course of reprogramming in the bulk somatic cell population (Buganim et al., 2012). As expected, MEFs isolated from 13.5 dpi embryos were negative for both GFP and tdTomato expression, as measured by fluorescent microscopy and fluorescence-activated cell sorting (FACS) analysis (Figures 6C and S5A). Importantly, consistent with tdTomato repression, both endogenous miR290 and Sox2 SE regions as well as their corresponding downstream Snrpn promoter regions were hypermethylated (Figure 6D). Further analysis of the WT allele in Sox2 MEF showed high correlation with the targeted reporter allele, demonstrating robust repression of the SE region in vivo (Figure S5B). 141 To test whether reprogramming-induced demethylation can be visualized by RGM, we treated the secondary MEFs with serum and LIF medium supplemented with 2mg/ml doxycycline (Dox). Both miR290 and Sox2 MEFs were successfully reprogrammed, resulting in double-positive cells (tdTomato+/GFP+ data not shown). It was recently shown that a combination of three chemicals, TGF-b antagonist ALK5 inhibitor II, GSK3b antagonist CHIR99021, and ascorbic acid, an enzymatic cofactor (from here on referred to as 3C), results in more efficient and synchronous reprogramming (Vidal et al., 2014). To achieve more synchronized and efficient reprogramming, both miR290 and Sox2 MEFs were subjected to 3C culture conditions and the dynamics of reporter activation was monitored by flow cytometry. While the first expression of tdTomato+ and GFP+ cells emerged at dayl6(Figure 6E), reporter activation of both miR290 and Sox2 occurred with different kinetics. Figure6E shows accumulation of miR290 reporter cells that activated both GFP and tdTomato (tdTomato+/GFP+) over time. A small population of single-positive GFP cells appeared in late stages of reprogramming consistent with a stochastic sequence of events in the reprogramming of the miR290 SE region. Unlike miR290 reporter cells, however, Sox2 cells showed a more robust and defined dynamic of activation of both reporters. By day 16, a population of single-positive GFP cells (tdTomato-/GFP+) had accumulated, which gradually shifted to become double-positive (tdTomato+/GFP+) over time (Figures 6E and S5C). To test whether the single-positive GFP cells give rise to double-positive cells, we sorted the single-positive GFP cells and replated them on feeders using Dox independent culture conditions. Consistent with the repression of the tdTomato signal, bisulfite sequencing confirmed that the single-positive GFP cells exhibit high levels of 142 methylation in the SE region, as well as in the downstream Snrpn promoter region (Figure S5D). Upon further culture, tdTomato-positive cells appeared demonstrating that single-positive GFP cells give rise to double-positive cells (Figure S5E). Our results suggest that reprogramming of both miR290 and Sox2 SE regions are late events, with the Sox2 SE region being reprogrammed subsequently to the activation of endogenous Nanog. miR290 and Sox2 double-positive (tdTomato+/GFP+) cells invariably proceed to a Dox-independent iPS cell state (Figure 6F). To assess the methylation state of the Sox2 and miR290 SEs, we performed bisulfite sequencing on DNA extracted from sorted double-positive (tdTomato+/GFP+) iPS cells. As shown in Figure 6G, both miR290 and Sox2 SE regions and their corresponding downstream Snrpn promoters were demethylated. These results confirmed that RGM can visualize demethylation of regulatory genomic regions during reprogramming with single-cell resolution. DISCUSSION In this work, we have generated a DNA methylation reporter (RGM) that allows imaging of DNA methylation with single-cell resolution. The design of the reporter system took advantage of the intrinsic characteristics of imprinted gene promoters, for which the transcriptional activity reflects the DNA methylation state of adjacent sequences. Importantly, imprinted promoters are neutral to developmental or tissue- specific DNA methylation changes, with their activity strictly dependent on the methylation state of the adjacent regulatory elements. This is in contrast to CGI sequences such as Gapdh or tissue-specific elements such as the DazI promoter- 143 associated sequences, which become demethylated or de novo methylated, respectively, when inserted into the genome of ESCs (Brandeis et al., 1994; Sabag et al., 2014). This indicates that methylation of these elements as opposed to imprinted promoters is sequence-dependent and subject to trans-acting signals and cell state-dependent regulation. The RGM reporter system described here is based on the Snrpn minimal promoter that is not subjected to methylation changes by itself, and therefore GFP expression is solely dependent on the methylation state of surrounding sequences. Consistent with this premise, ES cells appeared GFP-positive when stably transfected with the methylated or unmethylated Gapdh/Snrpn-GFP vector, but were GFP-negative when transfected with the methylated or unmethylated Dazl/Snrpn-GFP reporter. This indicates that the Snrpn promoter region can be used as a faithful sensor for regional methylation changes of adjacent sequences. To investigate whether RGM can report on the methylation state of endogenous loci, we targeted CGIs located at Gapdh and Dazl promoter regions, resulting in differential methylation and activity of the Snrpn reporter. Thus, the Snrpn promoter effectively reflects local methylation patterns without affecting the endogenous epigenetic state. As most of the tissue-specific DNA methylation changes occur in low- density CpG regulatory regions, we asked whether RGM could report on the methylation state of non-coding low-density CpG regions. We chose two pluripotency-specific SEs that are associated with the miR290 and Sox2 genes and are known to be active and unmethylated in ESCs but become methylated and inactive upon cellular differentiation. CRISPR/Cas-mediated insertion of the Snrpn-tdTomato reporter into ESCs resulted in 144 tdTomato-positive clones but tdTomato expression was silenced in mid-gestation chimeric embryos, which reflects the demethylation state of the SEs in pluripotent cells and their de novo methylation upon induction of differentiation. Consistent with this, MEFs isolated from chimeric embryos were tdTomato-negative with both elements highly methylated. Upon conversion of the MEFs into induced pluripotent stem cells (iPSCs), however, the cells became tdTomato-positive reflecting demethylation of the SEs during reprogramming to pluripotency. Our results establish that RGM reporter activity mirrors the changes of DNA methylation imposed on endogenous CGI and low- density CpG genomic elements during development, upon cellular differentiation, and during reprogramming. Extensive epigenomic analyses of multiple tissues and cell types in both human and mice, suggest that embryonic development and cell-type specification are associated with massive epigenomic remodeling at discrete enhancers (Hon et al., 2013; Kundaje et al., 2015; Schultz et al., 2015; Ziller et al., 2013). It will thus be of interest to test whether RGM can be utilized to report on the DNA methylation state associated with more discrete regulatory regions. Implementing the methylation reporter to tissue-specific DMRs holds the promise to further elucidate the link between DNA methylation and other epigenetic mechanisms, with cell-fate regulation. Reprogramming of somatic cells into iPSCs involves extensive resetting of the epigenome (Buganim et al., 2013; Hanna et al., 2010), and coinciding with this notion, recent studies identified a key role for epigenetic modifiers during this process (Mansour et al., 2012; Rais et al., 2013; Soufi et al., 2012). However, the exact kinetics of these epigenetic changes during the reprogramming process are difficult to define because of cell heterogeneity and the stochastic nature of the reprogramming process. Here, we 145 followed the methylation changes of two SEs associated with Sox2 and miR290, demonstrating that demethylation of both regions is a late event in the reprogramming process. Simultaneous activation of endogenous Nanog and miR290 SE demethylation is consistent with Nanog directly regulating the expression of miR290 cluster during reprogramming to iPS cells (Gingold et al., 2014). The gradual activation of the Sox2 tdTomato reporter followed expression of endogenous Nanog consistent with demethylation of Sox2 SE being a late event in the process (Buganim et al., 2012). Systematic deletion of the Sox2 upstream SE region was recently shown to dramatically affect Sox2 expression in ESCs (Li et al., 2014; Zhou et al., 2014). Thus, the Sox2 SE methylation reporter cells provide a rigorous experimental system to investigate how DNA methylation changes at distal regulatory region influence the expression of downstream target genes. Changes in DNA methylation during development, lineage commitment, and disease are dynamic, and studies of epigenetic changes are hampered by two experimental constraints that limit mechanistic studies of methylation and gene regulation: (1) current methodology provides only a static "snapshot" view of the methylation state during cell state transitions, and (2) current methylation analyses require the examination of multiple cells precluding assessment of epigenetic changes in single cells. Given the overwhelming evidence of cell-cell heterogeneity in embryos, cultured cells, or disease states such as cancer (Junker and van Oudenaarden, 2014), this is a serious limitation for a mechanistic understanding of the epigenetic state and gene expression during these complex processes. For example, monitoring the course of differentiation in both miR290 and Sox2 reporter cells confirmed the co-existence of cell 146 populations that harbor distinct epigenetic states. In contrast, commonly used bulk methodologies would not allow isolating and distinguishing the different cell populations. Thus, sorting and isolating different cell types according to their methylation states can be achieved only by using readout for methylation state at single-cell resolution. The RGM reporter system overcomes some of the limitations of conventional methylation analyses by providing real-time visualization of DNA methylation at single-cell resolution. As with any fluorescent protein-based reporter system, the accuracy to trace real-time changes depends on the half-life of the respective FP. Because the current version of the methylation reporter does not use a destabilized FP, silencing of the reporter after de novo methylation-induced repression of the Snrpn promoter is likely delayed. To generate a reporter that more rapidly responds to DNA methylation, changes would require the use of a destabilized FP. Targeting additional loci in future studies will allow us to further elucidate other possible limitations of the RGM reporter system, such as inhibition of the Snrpn transcriptional activity by chromatin conformation. As RGM allows measuring dynamics of DNA methylation at single-cell resolution, it provides a framework for understanding epigenetic changes during cell state transition in heterogeneous cell populations. For example, replacing the fluorescent-based reporter system with Cre-Lox will enable the generation of epigenetic lineage tracing maps. Furthermore, utilizing RGM together with conventional gene expression reporters may offer detailed insights into the interplay between epigenetic cues and the execution of tissue-specific gene expression programs. The use of fluorescent reporters as readout for locus-specific methylation changes may also provide an effective screening platform 147 for the isolation of small molecule compounds that affect the methylation state of specific genomic regions. EXPERIMENTAL PROCEDURES mESCs Cell Culture V6.5 mouse embryonic stem cells (mESCs) were cultured on irradiated mouse embryonic fibroblasts (MEFs) with standard ESCs medium: (500 ml) DMEM supplemented with 10% FBS (Hyclone), 10mg recombinant leukemia inhibitory factor (LIF), 0.1 mM beta- mercaptoethanol (Sigma-Aldrich), penicillin/streptomycin, 1 mM L-glutamine, and 1% nonessential amino acids (all from Invitrogen). For experiments in 2i culture conditions, mESCs were cultured on gelatin-coated plates with N2B27 + 2i + LIF medium containing: (500 ml), 240 ml DMEM/F12 (Invitrogen; 11320), 240 ml Neurobasal media (Invitrogen;21103), 5 ml N2 supplement (Invitrogen; 17502048), 10 ml B27 supplement (Invitrogen;17504044), 10mg recombinant LIF, 0.1 mM beta-mercaptoethanol (Sigma- Aldrich), penicillin/streptomycin, 1 mM L-glutamine, and 1% nonessential amino acids (all from Invitrogen), 50mg/ml BSA (Sigma), PD0325901 (Stemgent, 1mM), and CHIR99021 (Stemgent, 3mM). Reporter Cell Lines To generate stably integrated Gapdh and Dazi transgene reporter cell lines, either Gapdh-or Dazl-modified PiggyBac transposon (see Supplemental Experimental Procedures), and a helper plasmid expressing transposase, were transfected into mESCs cells using Xfect mESC Transfection Reagent (Clontech), according to the provider's 148 protocol. Stably integrated reporter cells were selected with puromycin (2 mg/ml) for 4 days. To generate Dazl, Gapdh, miR290, and Sox2 SE reporter cell lines, targeting vectors, and CRISPR/Cas9 were transfected into mESCs using Xfect mESC Transfection Reagent (Clontech), according to the provider's protocol. Forty-eight hours following transfection, cells were FACS-sorted for GFP or tdTomato expression (respectively) and plated on MEF feeder plates. Single colonies were further analyzed for proper and single integration by Southern blot and PCR analysis Flow Cytometry To assess the proportion of GFP and tdTomato in the established reporter cell lines, a single-cell suspension was filtered and assessed on the LSR II SORP, LSRFortessa SORP, or FACSCanto II. Retinoic Acid-Induced Differentiation mESCs carrying the reporter for both miR290 and Sox2 SE regions were sorted for double-positive GFP and tdTomato expression and plated on gelatin-coated plates in ES cell medium (+LIF). The next day, cells were washed with PBS, resuspended in basal N2B27 medium (2i medium without LIF, insulin, and the two inhibitors), and supplemented with 0.25mM RA. Medium was replaced every other day. Blastocyst Injections for the Generation of Chimeras and Secondary MEFs Blastocyst injections were performed using (C57B1/6xDBA) B6D2F2 host embryos. In brief, B6D2F1 females were hormone primed by an intraperitoneal (i.p.) injection of 149 pregnant mare serum gonadotropin (PMS, EMD Millipore) followed 46 hr later by an injection of human chorionic gonadotropin (hCG, VWR). Embryos were harvested at the morula stage and cultured in a C02 incubator overnight. On the day of the injection, groups of embryos were placed in drops of M2 medium using a 16-um diameter injection pipet (Origio). Approximately ten cells were injected into the blastocoel cavity of each embryo using a Piezo micromanipulator (Prime Tech). Approximately 20 blastocysts were subsequently transferred to each recipient female; the day of injection was considered as 2.5 days postcoitum (DPC). Fetuses were collected at 13.5 DPC for the extraction of embryonic fibroblasts as described before (Buganim et al., 2012). Southern Blots Genomic DNA (10-15 mg) was digested with appropriate restriction enzymes overnight. Subsequently, genomic DNA was separated on a 0.7% agarose gel, transferred to a nylon membrane (Amersham) and hybridized with 32P random primer (Stratagene)-labeled probes. Reprogramming to iPSCs MEFs isolated from miR290 and Sox2 fetuses were plated at density of 50,000 cells per 6-well in gelatin-coated plates with standard MEF medium (mESCs media without LIF). The following day MEF medium was replaced with mESCs medium containing 2 mg/ml doxycycline (Sigma). Alternatively, cells were grown in mESCs medium containing 2 mg/ml doxycycline and a combination of three compounds (TGF-b antagonist ALK5 150 inhibitor II, GSK3b antagonist CHIR99021, ascorbic acid) as described before (Vidal et al., 2014). Medium was replaced every other day during the course of reprogramming. ACKNOWLEDGMENTS We thank Thorold W. Theunissen, Patti Wisniewski, and Colin Zollo for FACS analyses and cell sorting, Denes Hnisz for providing ChIP-seq tracks, Kibibi Ganz for mouse injections, Huijing Yu for help in cloning, and Stefan Semrau for help with the RA differentiation and comments on the manuscript. This study was supported by NIH grant HD 045022. Y.S. is supported by a Human Frontier Postdoctoral Fellowship and. R.J. is co-founder of Fate Therapeutics and an adviser to Stemgent. Gapdh Dazi 17 weeks) (7 weeksl B Gapdh 105 (00 ) 2.9% S' 102 os (L 102 io3 i04 1os -- 10S.C 14.4% 1. 4 11)2 163 164 16s5 105. 0 Sg104. 0.25996 0.1 M .10 21 1o 1 Dazl 10S.4 104.I 311% 103 to.8 102 10 3 io4 ios 10s. 509% 102 jJ~s 102 103 164 165 105. 10 J805% 102 1435 to6 D E p.- GFP Gapdh CGI 000000000000000000 000000000000000000 00000000000000000 00000000000000000 1.5% Snrpn Promoter 000000000000000 000000000000000 000000000000000 000000000000000 1% Dazi CGI Snrpn Promoter O0C0000 OO ------- omegas" -----m- - --- 80004060 081 86% 81% Figure 1. An Active Minimal Snrpn Promoter Can Be Repressed Spreading of DNA Methylation into the Promoter Region in cis by Means of (A) Schematic representation of the sleeping-beauty-based vectors. Endogenous CpG Islands (CGI) of Dazi and Gapdh genes were cloned upstream of a minimal Snrpn promoter region-driving GFP. Open circle lollipops schematically represent individual unmethylated CpG. (B) Flow cytometric analysis of V6.5 mESCs cultured in serum + LIF, following stable integration of unmethylated Gapdh and DazI reporter vectors, demonstrating robust repression of GFP signal in the Dazl reporter cells over time. Shown are the mean percentages of GFP-negative cells t STD of two biological replicates. (C) Phase and fluorescence images of the sorted V6.5 mESCs, comprising stable integration of the Gapdh (left) and Dazl (right) vectors following prolonged culturing for 7 weeks. (D and E) Bisulfite sequencing analysis of the stably transfected Gapdh (D) and Dazi (E) reporter cell lines was performed on the gene promoter-associated CGI (left) and the A C 151 0) 11L00 CL LL 0 A- 152 downstream Snrpn promoter region (right). Open circles represent unmethylated CpGs; Filled circles, methylated CpGs. See also Figure SI. 153 A met Gardh met Dart U. 0 C met Gapdh V6.5 to to io to' l D met Dazl Wo V6.5 400 300- 0.04% 200 - too- GF GFPP E Gapdh CGI r6 ---n ---------- ED 000 E 3,7% "O E Snrpn Promoter 4.1% DazICGI Snrpn Promoter m~mm sssssmwss soi ----- oe - ---- i 97% 95% G I rmet Dazi - V6.5 met Dazi - Dnmt1 KO met Dazi - Dnmil KO Gapdh - V6.5 Gapdh - Dnmt3ab KO Gapdh - V6.5 2i+LIF(PS) (P3) (P5) (P3) (P3) (P3) S3 2.7 39% 3.4% 1,45% GPGFP GFP GFP H 0 Dazi CGI Snrpn Promoter Dazi - V6.5 Dazl - Dnmi3ab KO Dazi - V6,5 2i+LIF U (P3) P3) (P3) C + 0 3275% 5.5% 0 %.GF C8C000 C 0 C8C C E 7%2.9% GFP GFP GFP Figure 2. An In Vitro Repressed Snrpn Promoter Can Be Reactivated in cis by Means of Spreading of DNA Demethylation into the Promoter Region (A) Schematic representation of in vitro methylated sleeping-beauty-based vectors. Closed circle lollipops schematically represent individual methylated CpG. (B) Phase and fluorescence images of the stably integrated V6.5 mESCs harboring Gapdh (left) and Dazl (right) in vitro methylated vectors, following I week of antibiotics selection. (C and D) Flow cytometric analysis of the proportion of GFP-positive cells in mESCs, stably integrated with either Gapdh (C) or DazI (D) in vitro methylated vectors, following 2 weeks in culture. V6.5 (E and F) Bisulfite sequencing analysis of the stably transfected Gapdh (E) and Dazl (F) reporter cell lines, was performed on the gene promoter-associated CGI (left) and the downstream Snrpn promoter region (right). (G) Flow cytometric analysis of the proportion of GFP-positive cells in V6.5 mESCs and Dnmtl KO mESCs, stably integrated with in vitro methylated Dazl reporter vector. B 154 (H) Bisulfite sequencing analysis of sorted GFP-positive Dnmtl KO mESCs, stably integrated with in vitro methylated Dazl reporter vector. (1) Flow cytometric analysis of the proportion of GFP-negative cells in control V6.5 mESCs, mESCs deficient for both Dnmt3a and Dnmt3b (Dnmt3abKO) and V6.5 mESCs cultured in 2i + LIF, which were stably integrated with unmethylated Gapdh (top) and Dazl (bottom) reporter vectors. See also Figure S I 155 Targeting vector 5' arin Son] GFPour Hlm. 3 ar TSS S =DNA Gapdh 5I'C I I 1 HiGCCCGCCTCATTTTTGAAATGTGCACGCACCAAGC 3' Dazi s'GrmA17AGC T4C1TGGGAGATAACC1TACGGCAGAACC 3 1 Gapdh 0s Gapdh 102 1 103 104 1 102 101 104 105 GFP E ______ DazI 105 DazI 0 (#24) (#28) j308% 10- 0-2% 1IN 102. 10 I 104 JOS 102 10 0 105 GFP B Gepdh CGI I 2 3 4 5 6 7 8 9 101112 131415161718192 21 Dazi CGI 1 2 3 4 5 6 7 8 9 101112 25 2627 28 29 30 313233343536 Mean Fluorescent GFP- GFP+ S Gapdh CGI Snrpn Promoter ------ --A- AA- x- A -A M 21% 93% F Dazi CGI Snrpn Promoter 00------ ---- r* .......so S----- ----------- c SS ................. O 91% 95% Figure 3. Generation of DNA Methylation Reporter Cell Lines for Endogenous Gene Promoters (A) CRISPR/Cas-based strategy used to integrate the DNA methylation reporter into the endogenous promoter region of Gapdh and Dazi genes. TSS, transcription start site; green sequence, endogenous CGI region; black sequence, targeting CRISPR; red sequence, PAM recognition site. (B) Flow cytometric analysis depicting the mean GFP intensity of randomly picked clones following antibiotic selection of both (top) Gapdh- and (bottom) DazI-reporter- transfected V6.5 mESCs. (C) Flow cytometric analysis of the proportion of GFP-positive cells in two representative clones correctly targeted with the methylation reporter at the promoter region of Gapdh. (D) Bisulfite sequencing analysis was performed on mESCs harboring the DNA methylation reporter in Gapdh promoter region. For each cell line, the PCR amplicon A 156 (marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). (E) Flow cytometric analysis of the proportion of GFP-positive cells in two representative clones correctly targeted with the methylation reporter at the promoter region of Dazl. (F) Bisulfite sequencing analysis was performed on mESCs harboring the DNA methylation reporter in Dazi promoter region. For each cell line, the PCR amplicon (marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). See also Figure S2. 157 B - 5 ICTTCC~TA~A(5TCGAAW(ACAAGCGAGAI GAC- speWe9-psl ---- W290 1**-+-+AUOIW091 LC000 65-/-iNrp12*-- Mii Mknbikik ighWA di lae LI A, A""WdLa i sk iilBBMinai UL Ia Alih"Ot I i ad k'i A isAiaal dyshee l LdnL, Al II i''AA ica inalAi W~Ithdi n 601a ALaiM L I"L I a ias i tl al iIk 1l Adissd Ag INII 1A ,Ie hi "A& sA'A mlajs Wke dIkVj!W lw|Maffll eu iM - I "" 5Arm - C4 C Phane SE WdOMMAD GFP OME V D ~IL~ 1*UI.L SF Endaqwi&u SE Endogus CpG Rogion Snrpn Promoter 8o0xc .........C.. .. ox00000 ............ 27%2% Figure 4. Generation of DNA Methylation Reporter Cell Lines for the Pluripotent- Specific miR290 and Sox2 SE Regions (A) Regional view depicting the DNA methylation (top) and chromatin (bottom) landscape of miR290 upstream pluripotent-specific SE. Shown are average methylation levels and enrichment of chromatin marks in mouse undifferentiated cells (green) and in adult tissues (gold), with respect to the genomic organization of the genes. DNA methylation varies from 1 -hypermethylated to 0-hypomethylated. Characteristic clusters of typical enhancer marks and binding of tissue-specific TF determine the SE region (light blue). (B) CRISPR/Cas-based strategy used to integrate the DNA methylation reporter into the endogenous SE region. HR, homologous recombination; green sequence, endogenous miR290 CpG region; black sequence, targeting CRISPR; red sequence, PAM recognition site. (C) Phase and fluorescence images of correctly integrated DNA methylation reporter cell lines for miR290 (upper panel) and Sox2 (lower panel) endogenous SE regions. GFP marks endogenous expression levels of Nanog, whereas tdTomato reflects the endogenous DNA methylation levels at both miR290 and Sox2 SE regions. A mES( -4t', t 1 mPSC H3K0-1 mESC nOt4 MESC ao ~ ~1 I Q! 158 (D) Bisulfite sequencing analysis was performed on undifferentiated mESCs harboring the DNA methylation reporter in either miR290 SE region (top) or Sox2 SE region (bottom). For each cell line, the PCR amplicon (marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). See also Figure S3. 159 A B Sax2jmIR290mESCS IMR290#2l R29O#21 mR290#21 mfR?9 #21 M|R29 #210024h 8 144ht SE tdTorsat o p on '0' [7 feede-free plates do- N2827 rnedia + 0100.25 uM RAd0 1012 medp no [0' 10 o It 0 IJ131 .10 001 0 V 0 24h Analyze by FACS Sox2 2 Sox2#2 Sox#2 Scw2 #2 Sox2 #2 o0 24h 48h 72h 144h 48h Analyze by FACS 10 72h Analyze by FACS -- - y 7d Analyze by FACS Nafiog GF1P C D mi20EOR+G+ UR+G- ER-G+ MR-G- SE Region Srwpn promoter Sox2 SE Region Snrpn promoter miR290 SE 888 88888*88 13% 10% 8% 10% 4Mk do 24h 48h 72h 144h M Sox2 SE 1 1.0000 lam 35% 20% 15% 22%6 W% XMM ... I= XX0= M.o..j do 24h 48h 72h 144h 87% 93% 84% 89% Figure 5. Dynamics of De Novo DNA Methylation of miR290 and Sox2 SE Regions upon In Vitro Differentiation (A) Schematic representation of the RA-based differentiation protocol used on miR290 and Sox2 reporter cell lines. GFP marks endogenous expression levels of Nanog, whereas tdTomato reflects the endogenous DNA methylation levels at both miR290 and Sox2 SE regions. (B) Flow cytometric analysis of the proportion of Nanog-GFP-positive cells (x axis) and tdTomato-positive cells (y axis) during 7 days of differentiation of miR290#21 (top) and Sox2#2 (bottom) reporter cell lines. (C) Bar graph summarizing the proportion of the different cell populations during the course of 7 days RA differentiation for both miR290#21 (top) and Sox2#2(bottom) reporter cell lines. Data represent two biological replicates. R, tdTomato; G, GFP. (D and E) Bisulfite sequencing analysis on the three main cell populations sorted at 48 hr following initial treatment with RA. For both miR290 #21 (D) and Sox2#2 (E) cell lines, the PCR amplicon (marked with dashed line) includes the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). R, tdTomato; G, GFP. See also Figure S4. A BF GFP tdTomato C Control Experiment Control- Expemient- Control- Experiment- Gapd-GFP Nwiog-GFP IdTom5to SE-tdTm o Reprogrammable MEFs iPSCs SE tdTomato - SE tdTomato4 I Nanog GFP - , - - - - *,, :+Dox+ K - M- - NanLg GFP D E miR290 #21 MEFs Sox2 #2 MEFs 10 10 4 1 to, - 103. to102 0 Nanog GFP miR290 SE Region X w Snrpn promoter 73% 76% Sox2 SE Region Snrpn promoter 82% 84% d7 d16 d18 d20 d24 105 0% 0% 10'06% 035% o, 1% 0,5% 1 1A% 0.8% 10.8% 21% M 1i4 10 4 1O 1* 10 10103 S0.2% 0,2% 3 0.7% W o , I1 ,65 , , 10 10 16 10 10 101 1 105 1 t o) 14 10 10 5 0% (% 105 0.1% 0% 10 0.1% 0.5% 10 0.5% 0.9% 105 0.4% 2.5% 040 * e 104 104 104 104 rM+ 1+ 19_ __ 10 1 00 8 % 2 .8 % 1 0 .8 % 1 02 1 )04 10S top 101 104 105 I 0 105 4 10 % oa J 10 5 Nanog GFP SE tdTomato Nanog GFP G miR290SE Region Snrpn promoter 4,C)(0 ('.4 0 0, SW SE Re-ion Sn n promoter Figure 6. Dynamics of DNA Demethylation of miR290 and Sox2 SE Regions during Cellular Reprogramming 160 B E13.5 E 0 E 0 Phase F 050(I, A 161 (A) miR290 (top) and Sox2 (bottom) reporter chimeric experimental embryos (right embryo in each panel). As controls, Gapdh CGI reporter mESCs driving GFP and constitutively expressing tdTomato (Control, Gapdh-GFP, and tdTomato, respectively) were injected into host blastocysts (left embryo in each panel). (B) Schematic representation of the experimental procedure to monitor the dynamics of demethylation during reprogramming of miR290 and Sox2 reporter cell lines. GFP marks endogenous expression levels of Nanog, whereas tdTomato reflects the endogenous DNA methylation levels at both miR290 and Sox2 SE regions. C) Flow cytometric analysis of the proportion of GFP-positive cells (x axis) and tdTomato-positive cells (y axis) in PO MEFs derived from miR290#21 (left) and Sox2#2 (right) chimeric embryos. (D) Bisulfite sequencing analysis was performed on PO MEFs derived from miR290#21 (top) and Sox2 #2 (bottom) chimeras. For each cell line, the PCR amplicon (marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). (E) Analysis of the proportion of GFP-positive cells (x axis) and tdTomato-positive cells (y axis) during the course of reprogramming of MEFs derived from miR290 #21 (upper panel) and Sox2#2 (lower panel) chimeras. Shown are flow cytometric data from different time points following addition of dox supplemented with 3C culture condition. (F) Representative images of established miR290 and Sox2 iPSC lines, derived from sorted double-positive (tdTomato+/GFP+) colonies. (G) Bisulfite sequencing analysis was performed on P2 iPSCs derived from miR290#21 (top) and Sox2#2 (bottom) MEFs. For each cell line, the PCR amplicon (marked with dashed line) includes both the endogenous CGI (left) and the downstream integrated Snrpn promoter region (right). See also Figure S5. 162 REFERENCES Bird, A. (2002). DNA methylation patterns and epigenetic memory. Genes Dev. 16, 6-2 1. Brandeis, M., Frank, D., Keshet, I., Siegfried, Z., Mendelsohn, M., Nemes, A., Temper, V., Razin, A., and Cedar, H. (1994). SpI elements protect a CpG island from de novo methylation. Nature 371, 435-438. Buganim, Y., Faddah, D.A., Cheng, A.W., Itskovich, E., Markoulaki, S., Ganz, K., Klemm, S.L., van Oudenaarden, A., and Jaenisch, R. (2012). Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 150, 1209-1222. Buganim, Y., Faddah, D.A., and Jaenisch, R. (2013). Mechanisms and models of somatic cell reprogramming. Nat. Rev. Genet. 14, 427-439. Buiting, K., Saitoh, S., Gross, S., Dittrich, B., Schwartz, S., Nicholls, R.D., and Horsthemke, B. (1995). Inherited microdeletions in the Angelman and Prader-Willi syndromes define an imprinting centre on human chromosome 15. Nat. Genet. 9, 395-400. Carey, B.W., Markoulaki, S., Hanna, J.H., Faddah, D.A., Buganim, Y., Kim, J.,Ganz, K., Steine, E.J., Cassady, J.P., Creyghton, M.P., et al. (2011). Reprogramming factor stoichiometry influences the epigenetic state and biological properties of induced pluripotent stem cells. Cell Stem Cell 9, 588-598. Cedar, H., and Bergman, Y. (2012). Programming of DNA methylation patterns. Annu. Rev. Biochem. 81, 97-117. Deaton, A.M., and Bird, A. (2011). CpG islands and the regulation of transcription. Genes Dev. 25, 1010-1022. Dowen, J.M., Fan, Z.P., Hnisz, D., Ren, G., Abraham, B.J., Zhang, L.N., Weintraub, A.S., Schuijers, J., Lee, T.I., Zhao, K., and Young, R.A. (2014). Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell 159, 374-387. Ferguson-Smith, A.C. (2011). Genomic imprinting: the emergence of an epigenetic paradigm. Nat. Rev. Genet. 12, 565-575. Gingold, J.A., Fidalgo, M., Guallar, D., Lau, Z., Sun, Z., Zhou, H., Faiola, F., Huang, X., Lee, D.F., Waghray, A., et al. (2014). A genome-wide RNAi screen identifies opposing functions of Snail and Snai2 on the Nanog dependency in reprogramming. Mol. Cell 56, 140-152. 163 Hackett, J.A., Sengupta, R., Zylicz, J.J., Murakami, K., Lee, C., Down, T.A., and Surani, M.A. (2013). Germline DNA demethylation dynamics and imprint erasure through 5- hydroxymethylcytosine. Science 339, 448-452. Hanna, J.H., Saha, K., and Jaenisch, R. (2010). Pluripotency and cellular reprogramming: facts, hypotheses, unresolved issues. Cell 143, 508-525. Hnisz, D., Abraham, B.J., Lee, T.I., Lau, A., Saint-Andre, V., Sigova, A.A., Hoke, H.A., and Young, R.A. (2013). Super-enhancers in the control of cell identity and disease. Cell 155, 934-947. Hon, G.C., Rajagopal, N., Shen, Y., McCleary, D.F., Yue, F., Dang, M.D., and Ren, B. (2013). Epigenetic memory at embryonic enhancers identified in DNA methylation maps from adult mouse tissues. Nat. Genet. 45, 1198-1206. Irizarry, R.A., Ladd-Acosta, C., Wen, B., Wu, Z., Montano, C., Onyango, P., Cui, H., Gabo, K., Rongione, M., Webster, M., et al. (2009). The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat. Genet. 41, 178-186. Ivics, Z., Hackett, P.B., Plasterk, R.H., and Izsvak, Z. (1997). Molecular reconstruction of Sleeping Beauty, a Tcl-like transposon from fish, and its transposition in human cells. Cell 91, 501-5 10. Jaenisch, R., and Bird, A. (2003). Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat. Genet. 33 (Suppl), 245-254. Jones, P.A. (2012). Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 13, 484-492. Junker, J.P., and van Oudenaarden, A. (2014). Every cell is special: genome-wide studies add a new dimension to single-cell biology. Cell 157, 8-11. Kantor, B., Kaufman, Y., Makedonski, K., Razin, A., and Shemer, R. (2004). Establishing the epigenetic status of the Prader-Willi/Angelman imprinting center in the gametes and embryo. Hum. Mol. Genet. 13, 2767-2779. Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M.J., et al.; Roadmap Epigenomics Consortium (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317-330. Lee, H.J., Hore, T.A., and Reik, W. (2014). Reprogramming the methylome: erasing memory and creating diversity. Cell Stem Cell 14, 710-719. 164 Li, E., Bestor, T.H., and Jaenisch, R. (1992). Targeted mutation of the DNA methyltransferase gene results in embryonic lethality. Cell 69, 915-926. Li, Y., Rivera, C.M., Ishii, H., Jin, F., Selvaraj, S., Lee, A.Y., Dixon, J.R., and Ren, B. (2014). CRISPR reveals a distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS ONE 9, e 114485. Mansour, A.A., Gafni, 0., Weinberger, L., Zviran, A., Ayyash, M., Rais, Y., Krupalnik, V., Zerbib, M., Amann-Zalcenstein, D., Maza, I., et al. (2012). The H3K27 demethylase Utx regulates somatic and germ cell epigenetic reprogramming. Nature 488, 409-413. Mummaneni, P., Walker, K.A., Bishop, P.L., and Turker, M.S. (1995). Epigenetic gene inactivation induced by a cis-acting methylation center. J. Biol. Chem. 270, 788-792. Pawlak, M., and Jaenisch, R. (2011). De novo DNA methylation by Dnmt3a and Dnmt3b is dispensable for nuclear reprogramming of somatic cells to a pluripotent state. Genes Dev. 25, 1035-1040. Rais, Y., Zviran, A., Geula, S., Gafni, O., Chomsky, E., Viukov, S., Mansour, A.A., Caspi, I., Krupalnik, V., Zerbib, M., et al. (2013). Deterministic direct reprogramming of somatic cells to pluripotency. Nature 502, 65-70. Reik, W., Dean, W., and Walter, J. (2001). Epigenetic reprogramming in mammalian development. Science 293, 1089-1093. Rhinn, M., and Dolle' , P. (2012). Retinoic acid signalling during development. Development 139, 843-858. Rivera, C.M., and Ren, B. (2013). Mapping human epigenomes. Cell 155, 39-55. Sabag, 0., Zamir, A., Keshet, I., Hecht, M., Ludwig, G., Tabib, A., Moss, J., and Cedar, H. (2014). Establishment of methylation patterns in ES cells. Nat. Struct. Mol. Biol. 21, 110-112. Schultz, M.D., He, Y., Whitaker, J.W., Hariharan, M., Mukamel, E.A., Leung, D., Rajagopal, N., Nery, J.R., Urich, M.A., Chen, H., et al. (2015). Human body epigenome maps reveal noncanonical DNA methylation variation. Nature 523, 212-216. Smith, Z.D., and Meissner, A. (2013). DNA methylation: roles in mammalian development. Nat. Rev. Genet. 14, 204-220. Smith, Z.D., Chan, M.M., Humm, K.C., Karnik, R., Mekhoubad, S., Regev, A., Eggan, K., and Meissner, A. (2014). DNA methylation dynamics of the human preimplantation embryo. Nature 511, 611-615. 165 Soufi, A., Donahue, G., and Zaret, K.S. (2012). Facilitators and impediments of the pluripotency reprogramming factors' initial engagement with the genome. Cell 151, 994-1004. Stadler, M.B., Murr, R., Burger, L., Ivanek, R., Lienert, F., Scho& ler, A., van Nimwegen, E., Wirbelauer, C., Oakeley, E.J., Gaidatzis, D., et al. (2011). DNA-binding factors shape the mouse methylome at distal regulatory regions. Nature 480, 490-495. Turker, M.S. (2002). Gene silencing in mammalian cells and the spread of DNA methylation. Oncogene 21, 5388-5393. Vidal, S.E., Amlani, B., Chen, T., Tsirigos, A., and Stadtfeld, M. (2014). Combinatorial modulation of signaling pathways reveals cell-type-specific requirements for highly efficient and synchronous iPSC reprogramming. Stem Cell Reports 3, 574-584. Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H., Rahl, P.B., Lee, T.I., and Young, R.A. (2013). Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307-319. Xie, W., Schultz, M.D., Lister, R., Hou, Z., Rajagopal, N., Ray, P., Whitaker, J.W., Tian, S., Hawkins, R.D., Leung, D., et al. (2013). Epigenomic analysis of multilineage differentiation of human embryonic stem cells. Cell 153, 1134-1148. Zhou, H.Y., Katsman, Y., Dhaliwal, N.K., Davidson, S., Macpherson, N.N., Sakthidevi, M., Collura, F., and Mitchell, J.A. (2014). A Sox2 distal enhancer cluster regulates embryonic stem cell differentiation potential. Genes Dev. 28, 2699-2711. Ziller, M.J., Gu, H., Mu* lier, F., Donaghey, J., Tsai, L.T., Kohlbacher, 0., De Jager, P.L., Rosen, E.D., Bennett, D.A., Bernstein, B.E., et al. (2013). Charting a dynamic DNA methylation landscape of the human genome. Nature 500, 477-481. 166 Supplemental Experimental Procedures Plasmid Cloning To clone the PiggyBac-Insulator-GapdhCGI-Snrpn-GFP-polyA-PGK-PURO-sv4OPolyA- Insulator construct, the minimal Snrpn promoter was PCR amplified using primers A] and A2 (see complete primer list below). Snrpn PCR fragment was subsequently digested using Mfel and Nhel restriction enzymes. GapdhCGl sequence was PCR amplified using primers A3 and A4, following digestion using Sbfl and Mfel. A pCR2.1-TOPO-TA cloning vector (Life technologies) vector containing a GFP-PolyA-PGK-Puro cassette was digested using Sbfl and Nhel. Subsequently, these 3 DNA fragments were cloned using three-way ligation. The resulting GapdhCGI-Snrpn-GFP-PolyA-PGK-Puro cassette was then cloned into a PiggyBac transposon using the restriction enzymes Sbfl and SaclI to generate the PiggyBac-Insulator-GapdhCGI-Snrpn-GFP-polyA-PGK-PURO- sv40PolyA-Insulator vector. For the PiggyBac-Insulator-DazlCGI-Snrpn-GFP-polyA- PGK-PURO-sv4OPolyA-Insulator construct, the same method was used, except that DazlCGI DNA fragment was PCR amplified using primers A5 and A6. To Clone the mi290 super enhancer (SE) targeting vector, the 5' homology arm was PCR amplified using the primers B I and B2, this DNA fragment was then digested using Sbfl and Mfel restriction enzymes. The 3' homology arm was PCR amplified using the Primers B3 and B4, following digestion with AscI and FseI restriction enzymes. Both homology arms were subsequently ligated with Snrpn-tdTomato-PolyA-PGK-Puro fragment that had been digested with NheI and AscI restriction enzymes, and a pCR2.1 - TOPO-TA cloning vector (Life technologies) backbone that had been digested with Sbfl and Fsel. To clone the Sox2 SE targeting vector, the same method was used except that 5' homology arm was amplified using primers Cl and C2, and the 3' homology arm was amplified using primers C3 and C4. C1PIIonclUti vvds i 1igatu IILU i Anx3 vecor UsIng Bbs1 reLiction site as previously described (Wang et al., 2013). For Mi290 SE region oligonucleotides D3 and D4 were used and for the Sox2 SE region, the oligonucleotides DI and D2 were used (see complete primer list below). Bisulfite Conversion, PCR and Sequencing Bisulfite conversion of DNA was established using the EpiTect Bisulfite Kit (Qiagen) following the manufacturer's instructions. The resulting modified DNA was amplified by first round of nested PCR, following a second round using loci specific PCR primers (see complete list of primers below). The first round of nested PCR was done as follows: 94 'C for 4 min ;55 'C for 2 min; 72 'C for 2 min; Repeat steps 1-3 1X; 94 'C for I min; 55 'C for 2 min; 72 'C for 2 min; Repeat steps 5-7 35X; 72 'C for 5 min; Hold 12'C. The second round of PCR was as follows: 95 'C for 4 min ;94 'C for 1 min; 55 'C for 2 min; 72 'C for 2 min; Repeat steps 2-4 35X; 72 'C for 5 min; Hold 12'C. The resulting amplified products were gel-purified, subcloned into A pCR2. I -TOPO-TA cloning vector (Life technologies), and sequenced. 167 Primer List - cloning Al snrpnF-mfe AATTAACAATTGACGCTCAAATTTCCGCAGTAGG A2 snrpnR-nhe AATTAAGCTAGCAGAATCCACAAGCCCAGCTG A3 gapdhF-sbf AATTAACCTGCAGGAGCCGAGAGGAATGAGGTTAGTC A4 gapdhR-mfe AATTAACAATTGGAGAGAGGCCCAGCTACTCG A5 dazlF-sbf AATTAACCTGCAGGTTATGCCCTCTCCCCACTTCTC A6 dazlR-mfe AATTAACAATTGCCAAGCACCCTACAGCTCG B1 miR290-5'arm F AATAACCTGCAGGGATACTGTGTCTGGGGAGAAAGC B2 miR290-5'arm R AATTAACAATTGATACGGGAAGGAGTGCCGGG B3 miR290-3'arm F AATTAAGGCGCGCCCAGCTCTGAAATCTGCAGAGCTG B4 miR290-3'arm R AATTAAGGCCGGCCGGCATTTGCCACTATGCCTGC C1 Sox2-5'arm F AATTAACCTGCAGGCCGGGGTTTCCTGATCTCTTGC C2 Sox2-5'arm R AATTAACAATTGTCTGGCTCGGAAAGCTGGG C3 Sox2-3'arm F AATTAAGGCGCGCCGGAGGGGGCTGCATTCTCAG C4 Sox2-3'arm R AATTAAGGCCGGCCGCTACGAAACAGGTTCGAGACC D1 SOX2-SE Crispr CACCGCCAGCTTTCCGAGCCAGATG D2 SOX2-SE Crispr AAACCATCTGGCTCGGAAAGCTGGC D3 miR290-EN2 Crspr CACCGCAGATTTCAGAGCTGATAC D4 miR290-EN2 Crispr AAACGTATCAGCTCTGAAATCTGC DazF-5' arm AATTAACCTGCAGGTTATGCCCTCTCCCCACTTCTC DazR-5' arm AATTAACAATTGCCAAGCACCCTACAGCTCG Dazi F - 3'arm AAATTAGGCGCGCCTGGAGATAACCTTACGGCAGAACC Dazi R - 'arm AAATTAGGCCGGCCCGCCAAACTTGGAGAGCGC DazI F Crispr CACCGCCGAGCTGTAGGGTGCTTGG Dazi R Crspr AAACCCAAGCACCCTACAGCTCGGC Gapdh F - 3'arm AATTAAGGCGCGCCT1T1TGAAATGTGCACGCACCAAGC Gapdh R - 3arm AATTAAGGCCGGCCCTCTCAGGTTCCGAGGAGGG Gapdh F - 5'arm AATTAACCTGCAGGAGCCGAGAGGAATGAGGTTAGTC Gapdh R - 5'arm AATTAACAATTGGAGAGAGGCCCAGCTACTCG Gapdh F Crispr CACCGCGTGCACATTTCAAAAATG Gapdh R Crispr AAACCATT17TGAAATGTGCACGC Primer List - Bisulfite GFP Nested R CTCGACCAAAATAAACACCACCCC Dazi Nested F GAAG1TUTGTGAAATAAGTTTTGGGTAGG Dazi F CGATTAGAGAGTAGG1TTTGTTTGG Dazi R CGTCAATTACCAAACACCCTACAAC DazI-Snrpn F CGAGTTGTAGGGTGTTTGGTAATTG DazI-Snrpn R ACGTTACAAATCACTCCTCAAAACC Gapdh Nested F GGTTGTAGGAGAAGAAAATGAGATTAG Gapdh F GGTTGTAGGAGAAGAAAATGAGATTAG Gapdh R ACGTCAATTAAAAAAAAACCCAACTAC Gapdh-Snrpn F TAGTTTAAGGGCGTAGAGGTTTGAG Gapdh-Snrpn R ACGTTACAAATCACTCCTCAAAACC miR290 Nested F GAGGGGA 1TT1TGGGGTAGAG miR290 Nested R CCCTTACTCACCATACTAACAAAATCC miR290-Snrpn F GAT1T1TTGGGGTAGAGGTAGGTGTG miR290-Snrpn R CCACAAACCCAACTAACCTTCCTC Sox2 Nested F GTGGTTGTTGTGTTTAGTATGTGGG Sox2 Nested R CCCTTACTCACCATACTAACAAAATCC Sox2-Snrpn F GGTTGTTGTGTTTAGTATGTGGGTT Sox2-Snrpn R CCACAAACCCAACTAACCTTCC 168 Chapter 5. Future directions CRISPR-Cas9 genome-engineering in mice. In chapter 2 and 3 of this thesis we presented work that shows that CRISPR/Cas9 can be used to efficiently genetically engineer the mouse genome. Since this work was completed the technology has been substantially improved. One of the biggest questions that still needed to be addressed after the completion of this work was the potential off- target effects of using CRISPR/Cas9 for genetic engineering. However, recently full genome sequencing of CRISPR/Cas9 targeted human IPSCs and mice that where generated through Cas9 injected zygotes showed that there are very few off-target effects (Smith et al., 2014; Veres et al., 2014; Iyer et al., 2015). In addition, the CRISPR/Cas9 technology has been improved to further decrees off-target effects. Development of a Cas9 nickase, dCas9 fused to Fokl, and/or shorter gRNAs where shown to decrease unwanted off-target effects (Fujii et al., 2014; Ran et al., 2013; Tsai et al., 2014; Fu et al., 2014). Another area of our methodology that could be improved upon is the efficiency of HDR for larger dsDNA-targeting vectors. The efficiency of HDR has been increased by two different methods. The first method showed that the efficiency of HDR could be increased by injecting Cas9 protein into the zygote instead of mRNA (Aida et al., 2015). The second method showed that injecting an inhibitor SCR7 into the zygote, along with Cas9 mRNA, gRNA, and a dsDNA-targeting vector or ssDNA-oligos, could increase HDR (Maruyama et al., 2015). The SCR inhibits the DNA ligase-IV enzyme that is a critical component in the NHEJ pathway. Therefore, increasing the probability that the cell repairs a dsDNA break through HDR instead of NHEJ (Maruyama et al., 2015). Finally, one of the most difficult parts of our methodology for genome-engineering the 169 mouse by zygotic injections of CRISPR/Cas9, is the actual injection themselves. It requires a very skilled research scientist or technician to be able to inject Cas9 mRNA, gRNA, and DNA into the single-cell zygote. To make CRISPR/Cas9 genome- engineering in the mouse more accessible, a new delivery system was developed that showed that Cas9 mRNA, gRNA, and DNA could be electroporated into the single-cell zygote (Qin et al., 2015). Future directions for RGM In chapter 4 of this thesis we presented research on the creation of a new technology called Reporter for Genomic Methylation (RGM) that can be used to investigate the dynamics of methylation at an endogenous locus in vivo. We further showed that this reporter was able to accurately report on the dynamics of methylation at a few important cis-regulatory elements such as promoters and super-enhancers. However, there is still much work to be done, both in investigating how the tool works, improving the technology, and in understanding important biological questions that the technology can help address. Further characterization and improvement of RGM One important question that needs to be answered is the correct positional targeting of the RGM technology into an endogenous DMR. Is there a specific spatial distance where the RGM has to be integrated, does it have to be integrated into the DMR, very close to the DMR, or can it be inserted farther away? This most likely depends on how far methylation/demethylation can spread from the DMR and could be locus-specific 170 or be affected by other proximal sequence elements such as CTCF or SPI binding sites. Research has shown that methylation can spread from DMRs, and indeed the fact that the RGM technology works proves that methylation and demethylation can spread from DMRs into cis-proximal promoters (Turker 2002; Irizarry et al., 2009). However, the distance that methylation and demethylation can spread is currently unknown. Furthermore, transcription factors such as SPI have been shown to block the spread of methylation (Brandies et al 1994; Macleod et al., 1994). Also factors such as CTCF that act as insulators, and are important for the 3D chromatin architecture, could also impede the spreading of methylation and demethylation (Herold et al., 2012). These questions could be investigated by targeting an endogenous DMR, such as the DAZL DMR or the mi290 super-enhancer DMR, with the RGM technology. In each targeting event the RGM would be integrated progressively further away from the DMR boundary, this would allow for a better understanding into the spatial distance that methylation and demethylation can spread. Furthermore, repeating this at multiple endogenous DMRs, and bioinformatically aligning the genomic target sites with SP1, CTCF, other structural component, and transcription factor binding motifs, would allow for the formation of a set of parameters for targeting endogenous CpG islands/DMRs. This would indicate whether any specific sequence motifs effect the spread of methylation/demethylation. Another important question that needs to be addressed is the on/off rate of our synthetic SNRP promoter. Is the methylation/demethylation of the synthetic promoter rapid or slow, does it vary at different loci, does it vary when the distance between the synthetic promoter and DMR it is reporting on changed? These are very complex questions to answer and control for, but they will have to be investigated so as to build a 171 framework for what the technology can be used for. One factor that will help in the investigation of the on/off rate will be to clone the synthetic SNRP promoter upstream of an unstable GFP or luciferase reporter. Because of the short half-life of these proteins, the promoter activity will be more efficiently correlated with the expression signal (Solberg and Krauss, 2013; Li et al., 1998). Another interesting question that was not resolved in the initial study, is what CpGs in the synthetic promoter are important for controlling the on/off state when it is methylated/demethylated. The synthetic SNRP promoter is 284bp long and contains 16 CpGs dinucleotide. Therefore, this question could be addressed by doing site-directed mutagenesis of each of the CpGs, to see which are important for regulating the on/off state of this synthetic minimal promoter. Research on the HPRT promoter, which is also regulated by methylation, showed that only 3 CpGs were critical for turning off the promoter when methylated (Chen et al., 2001). This information could be used to potentially make a more sensitive variant of the synthetic SNRP promoter. Finally, it will be interesting to investigate if the synthetic SNRP promoter can be used in other species besides the mouse model organism. The synthetic SNRP promoter was initially designed by taking the core homologous sequence elements between the human and mouse SNRP promoters. Therefore, it is highly probable that the RGM technology should work just as efficiently in human cells, especially because the methylation machinery is also highly conserved between humans and mice (Smith and Meissner, 2013). However, it will'be interesting to see if the synthetic SNRP promoter will work in more evolutionary divergent model organisms, such as bacteria, zebrafish, fruit flies, and plants. Furthermore, if the synthetic SNRP promoter cannot work in any of 172 these model organisms, it will be interesting to investigate if alternative synthetic promoters can be made using similar design principles for these species. Applications for RGM in imprinting, screening and cancer There are multiple biological questions to answer using the RGM technology. However, for brevity, this discussion will look at a few of the more discernible ones. These include questions in imprinting, locus-specific methylation and demethylation, and how methylation regulates tumor suppressor and oncogene expression. For imprinting, there are multiple questions that can be addressed using the RGM technology. First of all, it will be interesting to see if the RGM technology can be used to faithfully report on parent-of-origin imprints. This can be studied by making a mouse model that has the RGM reporter integrated into an imprinted DMR such as H19, IGF2, or DLK. In these mice the RGM reporter activity should depend on whether the allele is inherited from the mother or father (Smallwood and Kelsey, 2012). Furthermore, these mice could be used to investigate whether there is heterogeneous loss or gain of imprints in somatic tissues. Research on Macaque monkeys has indicated that there might be heterogeneous loss of imprinting at IGF2 and DLK in some somatic tissues (Cheong et al., 2015). In addition, it has been reported that there is loss of imprinting in mice at the DLK locus in niche astrocytes (Ferron et al., 2011). Another interesting question that can be addressed by using the RGM technology to target imprinted regions is whether there is heterogeneous loss or gain of methylation at these loci in mESCs and IPSCs. Some research indicates that ESCs and IPSCs heterogeneously lose their imprint at some loci and that this aberrant loss of methylation affects their developmental potential (Stadtfeld 173 et al., 2012; Dean et al., 1998; Sun et al., 2012). Heterogeneity of methylation at imprinted loci in mouse ES or IPS cell lines can be investigated by targeting both alleles of an imprinted locus, such as DLK-DI03, with the RGM technology that expresses two different florescent markers. The ESC's or IPSC's developmental potential can be investigated by FACS sorting the negative, single-positive, and double-positive cells, and then subjecting these different populations of cells to tetraploid complementation or blastocyst injections. Another interesting application for the RGM technology is to use it to genetically screen for factors that are important for locus-specific methylation during development. Although it is known that the DNMTs are the functional enzymes that methylate DNA, less is known about how these enzymes are recruited to specific loci such as super- enhancers or promoters during development. A search for the factors that recruit DNMTs to specific super-enhancers could be investigated by using the two cell lines, SOX2#2 and mi290#21, that were reported in chapter 4 of this thesis. A full genome CRISPR/Cas9 lentivirus library could be used to infect a population of these cells in the ES cell state (Shalem et al., 2014; Wang et al; 2014). These cells could then be subjected to in vitro differentiation. As was reported in chapter 4, when these cells differentiate the mi290 and Sox2 super enhancer becomes methylated and the RGM reporter shuts off. However, the RGM reporter should remain active in any cells that have lost a factor that is necessary for recruiting DNMTs to the mi290 or Sox2 super-enhancer. After differentiation, the remaining florescent cells can be isolated by FACS, and the gene knockout of interest can be found by sequencing the integrated CRISPR gRNA. 174 As was noted in chapter 1 of this thesis, many genes in cancer become deregulated by changes that occur in the methylation state of their promoter elements. This includes loss of methylation at the promoters of oncogenes and the gain of methylation at the promoters of tumor suppressor genes. Furthermore, it was described how LOI at certain imprinted loci, such as the IGF2 locus, can also lead to cancer in multiple tissue types such as the lung and colon (Moulton et al., 1994; Steenman et al 1994). It would be interesting to target the RGM technology to one of these important differentially methylated regions in a relevant cancer model/cell type. One locus of particular interest would be the imprinted DMR that regulates IGF2 in lung and colon cancer. If it were verified that there is LOI in this cancer model, it would be of a possible therapeutic benefit to use this RGM-targeted cancer cell model to screen for small molecules that restore normal imprinting at this locus. Concluding remarks The mouse model organism has had tremendous impact on biological research, allowing for the elucidation of many fundamental biological questions that are important not just for basic science but for human health. Future research using the mouse model organism will continue along this path. It is the hope of this author that the research presented in this thesis will help, even in a small way, future researchers in this endeavor. 175 References Brandeis M, Frank D, Keshet I, Siegfried Z, Mendelsohn M, Nemes A, Temper V, Razin A, Cedar H. (1994) SpI elements protect a CpG island from de novo methylation. Nature 371: 435-438. Cheong C, Chng K, Ng S, Chew SB, Chan L, Feruson-Smith AC. (2015) Germline and somatic imprinting in the nonhuman primate highlights species differences in oocyte methylation. Genome Research. 25: 611-623 Dean W, Bowden L, Aitchison A. (1998) Altered imprinted gene methylation and expression in completely ES cell-derived mouse fetuses: Association with aberrant phenotypes. Development. 125:2273-2282 Ferron S, Charalambous M, Radford E, McEwen K, Wildner H, Hind E, Redolat JM, Laborda J, Guillemot F, Bauer S, Farinas I, Ferguson-Smith AC. (2011) Postnatal loss of DIkI imprinting in stem cells and niche astrocytes regulates neurogenesis. Nature. 475: 381-385 Fujii W, Onuma A, Sugiura K, Naito K. (2014) Efficient generation of genome-modified mice via offset-nicking by CRISPR/Cas system. Biochem. Biophys. Res. Commun. 445: 791-794. Fu Y, Sander JD, Reyon D, Cascio VM, Joung JK. (2014) Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32:279-284. Herold M, Bartkuhn M, Renkawitz. (2012) CTCF: Insights into insulator function during development. Development. 139: 1045-1057 Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, Onyango P, Cui H, Gabo K, Rongione M, Webster M. (2009) The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 41: 178-186. Iyer V, Shen B, Zhang W, Hodgkins A, Keane T, Huang X, Skarnes W. (2015) Off-target mutations are rare in Cas9-modified mice. Nature Methods. 12:479 Li X, Fang Y, Jiang X, Duong T, Fan C, Huang C, Kain S. (1998) Generation of destabilized green florescent protein as a transcription reporter. The Journal of Biological Chemistry. 273(52): 34970-34976 Macleod D, Charlton J, Mullins J, Bird AP. (1994) SpI sites in the mouse aprt gene promoter are required to prevent methylation of the CpG island. Genes Dev 8: 2282- 2292. 176 Maruyama T, Dougan SK, Mathhias TC, Bilate AM, Ingram JR, Ploegh HL. (2015) Increasing the efficiency of precise genome editing with CRISPR-Cas9 by inhibition of nonhomologous end joining. Nat Biotechnol. 33: 538-542 Moulton, T. et al. (1994) Epigenetic lesions at the H19 locus in Wilms' tumor patients. Nature Genet. 7, 440-447 Qin W, Dion S, Kutny P, Zhnag Y, Cheng A, Jillette NL, Malhotra A, Geurts AM, Chen YG, Wang H. (2015) Efficient CRISPR/Cas9-mediated genome editing in mice by zygote electroporation of nuclease. Genetics. 200(2): 423-430 Ran FA, Hsu PD, Lin CY, Gootenberg JS, Konermann S, Zhang F. (2013) Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154: 1380-1389. Shalem 0, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelson T, Heckl D, Ebert BL, Root DE, Doench JG, Zhang F. (2014) Genome-Scale CRISPR-Cas9 Knockout screening in human cells. Science. 343(6166): 84-87 Smallwood SA, Kelsey G. (2012) De novo DNA methylation: a germ cell perspective. Trends in Genetics. 28(1) 33-42 Smith C, Gore A, Yan W, Abalde-Atristain L, Li Z. (2014) Whole-genome sequencing analysis reveals high specificity of CRISPR/Cas9 and TALEN-based genome editing in human iPSCs. Cell Stem Cell 15: 12-13. Smith ZD, Meissner A. (2013) DNA methylation: Roles in mammalian development. Nat Rev Genet 14: 204-220. oulu1rg 1'4, irauss 3. (20vi13) Luc..,iase assay LU sLUUy tihe actAiVILy Uf %,loned.U pr1U110tLr DNA fragment. Methods Mol Biol. 977:65-78 Stadtfeld M, Apostolou E, Ferrari F, Choi J, Walch RM, Chen T, Ooi S, Kim S, Bestor TH, Shioda T, Park PJ, Hochedlinger K. (2012) Ascorbic acid prevents loss of DLK1- Dio3 imprinting and facilitates generation of all-iPS cell mice from terminally differentiated B cells. Nature Genetics. 44(4): 398-405 Steenman, MJC. et al. (1994) Loss of imprinting of IGF2 is linked to reduced expression and abnormal methylation of H19 in Wilms' tumor. Nature Genet. 7, 433-439 Sun B, Ito M, Medjan S, Ito Y, Brons GM, Murrel A, Vallier L, Ferguson-Smith AC, Pedersen RA. (2012) Status of Genomic Imprinting in epigenetically distinct pluripotent stem cells. Stem Cell. 30: 161-168 Tsai SQ, Wyvekens N, Khayter C, Foden JA, Thapar V, Reyon D, Goodwin MJ, 177 Aryee MJ, Joung JK. (2014) Dimeric CRISPR RNA-guided Foki nucleases for highly specific genome editing. Nat. Biotechnol. 32:569-576. Turker MS. (2002) Gene silencing in mammalian cells and the spread of DNA methylation. Oncogene 21: 5388-5393. Veres A, Gosis BS, Ding Q, Collins R, Ragavendran A. (2014) Low incidence of off- target mutations in individual CRISPR-Cas9 and TALEN targeted human stem cell clones detected by whole-genome sequencing. Cell Stem Cell 15: 27-30. Wang T, Wei JJ, Sabatini DM, Lander ES. (2014) Genetic screens in human cells using the CRISPR-Cas9 system. 343(6166): 80-84 178 Curriculum vitae Chikdu Shakti Shivalila Degree * PhD. Biological Sciences : Massachusetts Institute of Technology Expected Graduation Date: December 2015 - B.S. Biological Sciences : University of Pittsburgh Graduation date: December 2009 Undergraduate GPA: 3.92 Current Occupation * Fourth year Graduate student in the Department of Biology at the Massachusetts Institute of Technology. August 2011-Present Laboratory Experience " Position: Graduate student in Dr. Rudolf Jaenisch's lab at MIT/Whitehead Institute of Biomedical Sciences (May 2012-Present) * Research: Genome engineering, Human and mouse stem cell biology, Epigenetics, Neuroscience In Dr. Rudolf Jaenisch's Lab I have focused my research on technology development and synthetic biology, specifically in the area of genome engineering and epigenetic engineering. My earlier work consisted of developing a rapid way to genetically engineer the mouse model organism and mouse stem cells using Talens and Crispr/Cas. In addition, I have worked on building protocols to efficiently genetically engineer human stem cells using Crispr/Cas. More recently, I have been focused on developing a technology that can accurately report on DNA methylation/Demethylation dynamics at a single cell resolution. Furthermore, I have used this technology to understand DNA methylation dynamics at interesting genomic loci such as: super-enhancers, imprinted regions, and promoters. Position: Research specialist i1/Lab imanager in Dr. James Pipas's Lab at the University of Pittsburgh (January 2009 - August 2011) * Research: Virology, Cancer Biology, Cell Biology, Biochemistry, Epigenetics In Dr. James Pipas's laboratory my research was focused on SV40 Polyoma Virus, specifically the SV40 Large TAg and how this oncoprotein causes cellular transformation through epigenetic deregulation. Publications - Stelzer Y*, Shivalila CS*(co-first author), Soldner F, Markoulaki S, Jaenisch R. Tracing dynamic changes of DNA methylation at single cell resolution. Cell. 2015. Sep 24; 163(l):218- 229 * Wang H*, Yang H*, Shivalila CS* (co- first author), Dawlaty MM, Cheng AW, Zhang F, Jaenisch R. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas- mediated genome engineering. Cell. 2013 May 9;153(4):910-8 * Yang H*, Wang H*, Shivalila CS*(co- first author), Cheng AW, Shi L, Jaenisch R. One-step generation of mice carrying reporter and conditional alleles by CRISPR/Cas-mediated genome engineering. Cell. 2013 Sep 12;154(6):1370-9 179 - Cheng AW, Wang H, Yang H, Shi L, Katz Y, Theunissen TW, Rangarajan S, Shivalila CS, Dadon DB, Jaenisch R. Multiplexed activation of endogenous genes by CRISPR-on, an RNA- guided transcriptional activator system. Cell Res. 2013 Oct;23(10):1163-71. - Wang H, Hu YC, Markoulaki S, Welstead GG, Cheng AW, Shivalila CS, Pyntikova T, Dadon DB, Voytas DF, Bogdanove AJ, Page DC, Jaenisch R. TALEN-mediated editing of the mouse Y chromosome. Nat Biotechnol. 2013 Jun;31(6):530-2. - Sienz Robles MT, Shivalila CS, Wano J, Sorrells S, Roos A, Pipas JM. Two independent regions of simian virus 40 T antigen increase CBP/p300 levels, alter patterns of cellular histone acetylation, and immortalize primary cells. J ViroL 2013 Dec;87(24):13499-5 Teaching Experience * Teaching assistant: 7.01(introductory biology) MIT. Fall 2012 - Teaching assistant: 7.02 (intro to Exp biology and comm) MIT. Spring 2015 References Dr. Rudolf Jaenisch MIT/ Whitehead Institute of Biomedical Sciences 9 Cambridge Center, Cambridge, MA 02142: Lab phone: 617-258-5189 Dr. David Bartel MIT/ Whitehead Institute of Biomedical Sciences 9 Cambridge Center, Cambridge, MA 02142: Lab phone: 617-258-5287 Dr. Piyush Gupta MIT/ Whitehead Institute of Biomedical Sciences 9 Cambridge Center, Cambridge, MA 02142: Lab phone: 617-324-0086 Dr. James Pipas University of Pittsburgh, Department of Biological Sciences, 559B Crawford Hall 4249 Fifth Avenue, Pittsburgh, PA 15260 : Lab Phone: (412) 624-4691 180 1RO