MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Investigating the Capacity of Generative AI to Learn Genotype-by-Environment Interactions in Brachypodium distachyon

Author(s)
Neufeldt, Charlie
Thumbnail
DownloadThesis PDF (7.069Mb)
Advisor
Marais, Dave Des
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Climate change exacerbates environmental stressors such as drought, challenging the resilience of agricultural systems and highlighting the need to understand plant genomic architecture and its responses to such environmental variation. A key molecular mechanism underlying these responses is transcriptional plasticity: environment-induced changes in gene expression that vary among genotypes, representing one way that genotype-by-environment (GxE) interactions manifest at the molecular level. While transcriptomic data offers a unique and powerful view into these responses, traditional modeling approaches often rely on linear assumptions, limiting their ability to detect complex, nonlinear patterns of regulation. This thesis investigates whether generative machine learning modeling, specifically the use of transformers, can extract biologically meaningful representations of gene expression dynamics in plants. Inspired by the successes of the scGPT model for human genomics, I developed and trained a compact transformer architecture, the PlantGeneEncoder, on bulk RNA-seq data from two natural accessions of Brachypodium distachyon grown under drought and control conditions. The model was trained on binned expression values using both a baseline configuration and a set of regularized variants incorporating noise injection, co-expression preservation, entropy-based sample weighting, and masked gene modeling as a self-supervised objective. While baseline models achieved perfect reconstruction accuracy, they failed to preserve meaningful biological structure in the latent space. Regularized models achieved a better trade-off, maintaining high reconstruction fidelity while demonstrating improved genotype classification performance and modestly better alignment with the original expression structure. However, environmental condition signals remained difficult to capture across all configurations, with classification accuracies only marginally above random chance. These findings highlight the promise and limitations of transformer-based generative modeling for plant transcriptomics and provide a flexible framework for future efforts to model transcriptional plasticity and regulatory responses to environmental stress.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/162443
Department
Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.