Equivariant Autoregressive Models for Molecular Generation

Kim, Song Eun

dc.contributor.advisor	Smidt, Tess E.
dc.contributor.author	Kim, Song Eun
dc.date.accessioned	2025-09-18T14:27:30Z
dc.date.available	2025-09-18T14:27:30Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-23T14:02:35.711Z
dc.identifier.uri	https://hdl.handle.net/1721.1/162690
dc.description.abstract	In-silico generation of diverse molecular structures has emerged as a promising method to navigate the complex chemical landscape, with direct applications to inverse material design and drug discovery. However, 3D molecular structure generation comes with several unique challenges; generated structures must be invariant under rotations and translations in 3D space, and must satisfy basic chemical bonding rules. Recently, E(3)-equivariant neural networks that utilize higher-order rotationally-equivariant features have shown improved performance on a wide range of atomistic tasks, including structure generation. Previously, we have developed Symphony, an E(3)-equivariant autoregressive generative model for 3D structures of small molecules. At each sampling iteration, a single focus atom is selected, which is then used to decide on the next atom’s position within its neighborhood. Symphony built on previous autoregressive models by using message-passing with higher-order equivariant features, allowing a novel representation of probability distributions via spherical harmonic signals. Symphony’s performance approached that of state-of-the-art diffusion models while remaining relatively lightweight. However, it continued to face challenges in error accumulation and determining bond lengths, and it was only evaluated against small organic molecules. Here, we expand on Symphony’s capabilities and make it more compatible with larger atomic structures. We add improvements to the embedders, split the radial and angular components when predicting atom positions, and increase the radial cutoff for atomic neighborhoods considered during prediction. We also increase Symphony’s training and inference speeds through a new implementation in PyTorch, making inference nearly 4x faster than previously. In addition, we demonstrate its effectiveness across a variety of tasks, including small molecule and protein backbone generation.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Equivariant Autoregressive Models for Molecular Generation
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: kim-songk-meng-eecs-2025-thesis.pdf
Size:: 5.907Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record