seqgra: principled selection of neural network architectures for genomics prediction tasks
Author(s)
Krismer, Konstantin; Hammelman, Jennifer; Gifford, David K
DownloadPublished version (1.901Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Abstract
Motivation: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high
accuracy, their contributions to a mechanistic understanding of the biology of regulatory elements is often hindered
by the complexity of the predictive model and thus poor interpretability of its decision boundaries. To address this, we
introduce seqgra, a deep learning pipeline that incorporates the rule-based simulation of biological sequence data and
the training and evaluation of models, whose decision boundaries mirror the rules from the simulation process.
Results: We show that seqgra can be used to (i) generate data under the assumption of a hypothesized model of
genome regulation, (ii) identify neural network architectures capable of recovering the rules of said model and (iii)
analyze a model’s predictive performance as a function of training set size and the complexity of the rules behind
the simulated data.
Availability and implementation: The source code of the seqgra package is hosted on GitHub (https://github.com/gif
ford-lab/seqgra). seqgra is a pip-installable Python package. Extensive documentation can be found at https://
kkrismer.github.io/seqgra.
Date issued
2022-04-28Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Biological Engineering; Massachusetts Institute of Technology. Computational and Systems Biology Program; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Bioinformatics
Publisher
Oxford University Press (OUP)
Citation
Krismer, Konstantin, Hammelman, Jennifer and Gifford, David K. 2022. "seqgra: principled selection of neural network architectures for genomics prediction tasks." Bioinformatics, 38 (9).
Version: Final published version