Show simple item record

dc.contributor.advisorMarais, Dave Des
dc.contributor.authorNeufeldt, Charlie
dc.date.accessioned2025-08-21T17:02:07Z
dc.date.available2025-08-21T17:02:07Z
dc.date.issued2025-05
dc.date.submitted2025-06-19T19:09:35.563Z
dc.identifier.urihttps://hdl.handle.net/1721.1/162443
dc.description.abstractClimate change exacerbates environmental stressors such as drought, challenging the resilience of agricultural systems and highlighting the need to understand plant genomic architecture and its responses to such environmental variation. A key molecular mechanism underlying these responses is transcriptional plasticity: environment-induced changes in gene expression that vary among genotypes, representing one way that genotype-by-environment (GxE) interactions manifest at the molecular level. While transcriptomic data offers a unique and powerful view into these responses, traditional modeling approaches often rely on linear assumptions, limiting their ability to detect complex, nonlinear patterns of regulation. This thesis investigates whether generative machine learning modeling, specifically the use of transformers, can extract biologically meaningful representations of gene expression dynamics in plants. Inspired by the successes of the scGPT model for human genomics, I developed and trained a compact transformer architecture, the PlantGeneEncoder, on bulk RNA-seq data from two natural accessions of Brachypodium distachyon grown under drought and control conditions. The model was trained on binned expression values using both a baseline configuration and a set of regularized variants incorporating noise injection, co-expression preservation, entropy-based sample weighting, and masked gene modeling as a self-supervised objective. While baseline models achieved perfect reconstruction accuracy, they failed to preserve meaningful biological structure in the latent space. Regularized models achieved a better trade-off, maintaining high reconstruction fidelity while demonstrating improved genotype classification performance and modestly better alignment with the original expression structure. However, environmental condition signals remained difficult to capture across all configurations, with classification accuracies only marginally above random chance. These findings highlight the promise and limitations of transformer-based generative modeling for plant transcriptomics and provide a flexible framework for future efforts to model transcriptional plasticity and regulatory responses to environmental stress.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleInvestigating the Capacity of Generative AI to Learn Genotype-by-Environment Interactions in Brachypodium distachyon
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Civil and Environmental Engineering
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Civil and Environmental Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record