Dimensionality reduction in immunology : from viruses to cells

Karthik Shekhar

dc.contributor.advisor	Arup K. Chakraborty.	en_US
dc.contributor.author	Karthik Shekhar	en_US
dc.contributor.other	Massachusetts Institute of Technology. Department of Chemical Engineering.	en_US
dc.date.accessioned	2015-09-02T15:28:42Z
dc.date.available	2015-09-02T15:28:42Z
dc.date.copyright	2014	en_US
dc.date.issued	2015	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/98339
dc.description	Thesis: Ph. D., Massachusetts Institute of Technology, Department of Chemical Engineering, February 2015.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (pages 301-318).	en_US
dc.description.abstract	Developing successful prophylactic and therapeutic strategies against infections of RNA viruses like HIV requires a combined understanding of the evolutionary constraints of the virus, as well as of the immunologic determinants associated with effective viremic control. Recent technologies enable viral and immune parameters to be measured at an unprecedented scale and resolution across multiple patients, and the resulting data could be harnessed towards these goals. Such datasets typically involve a large number of parameters; the goal of analysis is to infer underlying biological relationships that connect these parameters by examining the data. This dissertation combines principles and techniques from the physical and the computational sciences to "reduce the dimensionality" of such data in order to reveal novel biological relationships of relevance to vaccination and therapeutic strategies. Much of our work is concerned with HIV. 1. How can collective evolutionary constraints be inferred from viral sequences derived from infected patients? Using principles of Random Matrix Theory, we derive a low dimensional representation of HIV proteins based on circulating sequence data and identify independent groups of residues within viral proteins that are coordinately linked. One such group of residues within the polyprotein Gag exhibits statistical signatures indicative of strong constraints that limit the viability of a higher proportion of strains bearing multiple mutations in this group. We validate these predictions from independent experimental data, and based on our results, propose candidate immunogens for the Caucasian American population that target these vulnerabilities. 2. To what extent do mutational patterns observed in circulating viral strains accurately reflect intrinsic fitness constraints of viral proteins? Each strain is the result of evolution against an immune background, which is highly diverse across patients. Spin models constructed to reproduce the prevalence of sequences have tested positively against intrinsic fitness assays (where immune selection is absent). Why "prevalence" should correlate with "replicative fitness" in the case of such complex evolutionary dynamics is conceptually puzzling. We combine computer simulations and analytical theory to show that the prevalence can correctly reflect the fitness rank order of mutant viral strains that are proximal in sequence space. Our analysis suggests that incorporating a "phylogenetic correction" in the parameters might improve the predictive power of these models. 3. Can cellular phenotypes be discovered in an unbiased way from high dimensional protein expression data in single cells? Mass cytometry, where > 40 protein parameters can be quantitated in single cells affords a route, but analyzing such high dimensional data can be challenging. Traditional "gating approaches" are unscalable, and computational methods that account for multivariate relationships among different proteins are needed. High-dimensional clustering and principal component analysis, two approaches that have been explored so far, suffer from important limitations. We propose a computational tool rooted in nonlinear dimensionality reduction which overcomes these limitations, and automatically identifies phenotypes based on a two-dimensional distillation of the cellular data; the latter feature facilitates unbiased visualization of high dimensional relationships. Our tool reveals a previously unappreciated phenotypic complexity within murine CD8+ T cells, and identifies a novel phenotype that is conflated by traditional approaches. 4. Antigen-specific immune cells that mediate efficacious antiviral responses in infections like HIV involve complex phenotypes and typically constitute a small fraction of the population. In such circumstances, seeking correlative features in bulk expression levels of key proteins can be misleading. Using the approach introduced in 3., we analyze multiparameter flow cytometry data of CD4+ T-cell samples from 20 patients representing diverse clinical groups, and identify cellular phenotypes whose proportion in patients is strongly correlated with quantitative clinical parameters. Many of these correlations are inconsistent with bulk signals. Furthermore, a number of correlative phenotypes are characterized by the expression of multiple proteins at individually modest levels; such subsets are likely be missed by conventional gating strategies. Using the in-patient proportions of different phenotypes as predictors, a cross-validated, sparse linear regression model explains 87 % of the variance in the viral load across the twenty patients. Our approach is scalable to datasets involving dozens of parameters.	en_US
dc.description.statementofresponsibility	by Karthik Shekhar.	en_US
dc.format.extent	318 pages	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Chemical Engineering.	en_US
dc.title	Dimensionality reduction in immunology : from viruses to cells	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph. D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Chemical Engineering
dc.identifier.oclc	919277227	en_US

Files in this item

Name:: 919277227-MIT.pdf
Size:: 33.26Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record