dc.contributor.advisor | Bonnie Berger. | en_US |
dc.contributor.author | Hie, Brian. | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2019-07-17T20:59:03Z | |
dc.date.available | 2019-07-17T20:59:03Z | |
dc.date.copyright | 2019 | en_US |
dc.date.issued | 2019 | en_US |
dc.identifier.uri | https://hdl.handle.net/1721.1/121734 | |
dc.description | Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019 | en_US |
dc.description | Cataloged from PDF version of thesis. | en_US |
dc.description | Includes bibliographical references (pages 57-65). | en_US |
dc.description.abstract | Researchers are generating single-cell RNA sequencing (scRNA-seq) profiles of diverse biological systems [1]-[7] and every cell type in the human body [8] at an unprecedented scale, with scRNA-seq experiments regularly profiling gene expression in hundreds of thousands or even millions of cells [9]. Leveraging this data to gain unprecedented insight into biology and disease requires algorithms that can scale to the tremendous amount of data being generated and can integrate information across multiple experiments, laboratories, and technologies. Here, we present two algorithms that aim to aid researchers in gaining better insight from scRNA-seq data sets. The first, Scanorama, inspired by algorithms for panorama stitching, achieves accurate integration of heterogeneous scRNA-seq data sets, which we use to integrate a number of large and complex collections of data sets. The second algorithm, geometric sketching, is a sampling approach that aims to evenly cover the low-dimensional manifold spanned by the cells to capture more of the rare transcriptional structure than would uniform subsampling with equal probability for each cell, obtaining sketches that better capture the transcriptional heterogeneity of the original data. Moreover, geometric sketching can be used to improve the computational efficiency of algorithms for single-cell integration, including Scanorama. We anticipate that both algorithms will play an important role in the analysis and interpretation of large-scale single-cell transcriptomic data sets. | en_US |
dc.description.statementofresponsibility | by Brian Hie. | en_US |
dc.format.extent | 111 pages | en_US |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | MIT theses are protected by copyright. They may be viewed, downloaded, or printed from this source but further reproduction or distribution in any format is prohibited without written permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | Stitching and sketching large-scale single-cell transcriptomic data | en_US |
dc.type | Thesis | en_US |
dc.description.degree | S.M. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | en_US |
dc.identifier.oclc | 1102049692 | en_US |
dc.description.collection | S.M. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science | en_US |
dspace.imported | 2019-07-17T20:59:01Z | en_US |
mit.thesis.degree | Master | en_US |
mit.thesis.department | EECS | en_US |