Investigating the Capacity of Generative AI to Learn
Genotype-by-Environment Interactions in Brachypodium
distachyon

Neufeldt, Charlie

dc.contributor.advisor	Marais, Dave Des
dc.contributor.author	Neufeldt, Charlie
dc.date.accessioned	2025-08-21T17:02:07Z
dc.date.available	2025-08-21T17:02:07Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-19T19:09:35.563Z
dc.identifier.uri	https://hdl.handle.net/1721.1/162443
dc.description.abstract	Climate change exacerbates environmental stressors such as drought, challenging the resilience of agricultural systems and highlighting the need to understand plant genomic architecture and its responses to such environmental variation. A key molecular mechanism underlying these responses is transcriptional plasticity: environment-induced changes in gene expression that vary among genotypes, representing one way that genotype-by-environment (GxE) interactions manifest at the molecular level. While transcriptomic data offers a unique and powerful view into these responses, traditional modeling approaches often rely on linear assumptions, limiting their ability to detect complex, nonlinear patterns of regulation. This thesis investigates whether generative machine learning modeling, specifically the use of transformers, can extract biologically meaningful representations of gene expression dynamics in plants. Inspired by the successes of the scGPT model for human genomics, I developed and trained a compact transformer architecture, the PlantGeneEncoder, on bulk RNA-seq data from two natural accessions of Brachypodium distachyon grown under drought and control conditions. The model was trained on binned expression values using both a baseline configuration and a set of regularized variants incorporating noise injection, co-expression preservation, entropy-based sample weighting, and masked gene modeling as a self-supervised objective. While baseline models achieved perfect reconstruction accuracy, they failed to preserve meaningful biological structure in the latent space. Regularized models achieved a better trade-off, maintaining high reconstruction fidelity while demonstrating improved genotype classification performance and modestly better alignment with the original expression structure. However, environmental condition signals remained difficult to capture across all configurations, with classification accuracies only marginally above random chance. These findings highlight the promise and limitations of transformer-based generative modeling for plant transcriptomics and provide a flexible framework for future efforts to model transcriptional plasticity and regulatory responses to environmental stress.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Investigating the Capacity of Generative AI to Learn Genotype-by-Environment Interactions in Brachypodium distachyon
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Civil and Environmental Engineering

Files in this item

Name:: Neufeldt-charl14-MS-CEE-2025-t ...
Size:: 7.069Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record

Investigating the Capacity of Generative AI to Learn Genotype-by-Environment Interactions in Brachypodium distachyon

Files in this item

This item appears in the following Collection(s)