Show simple item record

dc.contributor.advisorFraenkel, Ernest
dc.contributor.advisorJegelka, Stefanie
dc.contributor.authorMurphy, Michael A.
dc.date.accessioned2024-04-02T14:56:52Z
dc.date.available2024-04-02T14:56:52Z
dc.date.issued2024-02
dc.date.submitted2024-03-21T19:56:11.095Z
dc.identifier.urihttps://hdl.handle.net/1721.1/154024
dc.description.abstractMachine learning is becoming a pivotal tool in the analysis of datasets generated from high-throughput biological omics experiments. However, omics data introduces distinctive algorithmic challenges that set it apart from other domains where machine learning is applied. These challenges encompass issues such as limited data availability, complex noise, ambiguities in representation, and the absence of definitive ground truth for validation. In this thesis, I present three examples of machine learning applications to different omics modalities in which I address these challenges. In my first project, I develop an approach for contrastive representation learning with immunohistochemistry images, which suffer complex technical and biological noise that render generic approaches ineffective; and I demonstrate how this approach can be combined with noisy labels derived from transcriptomics to derive an effective classifier of cell-type specificity. In my second project, I consider the problem of predicting mass spectra of small molecules: previous methods suffer from a tradeoff between capturing high-resolution mass information and a tractable learning problem, which I resolve by introducing a novel representation of the output space. In my third project, I perform gene regulatory network inference using a number of different single-cell sequencing platforms, and carry out a quantitative comparison of these technologies. In summary, this thesis showcases the difficulties that arise in applying modern machine learning approaches to high-throughput biological measurements, and empirical case studies of how these difficulties may be overcome.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution 4.0 International (CC BY 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleMachine Learning Methods for High Throughput Biological Data
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Computational and Systems Biology Program
dc.identifier.orcidhttps://orcid.org/0000-0002-7343-8383
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record