MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

seqgra: principled selection of neural network architectures for genomics prediction tasks

Author(s)
Krismer, Konstantin; Hammelman, Jennifer; Gifford, David K
Thumbnail
DownloadPublished version (1.901Mb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
Abstract Motivation: Sequence models based on deep neural networks have achieved state-of-the-art performance on regulatory genomics prediction tasks, such as chromatin accessibility and transcription factor binding. But despite their high accuracy, their contributions to a mechanistic understanding of the biology of regulatory elements is often hindered by the complexity of the predictive model and thus poor interpretability of its decision boundaries. To address this, we introduce seqgra, a deep learning pipeline that incorporates the rule-based simulation of biological sequence data and the training and evaluation of models, whose decision boundaries mirror the rules from the simulation process. Results: We show that seqgra can be used to (i) generate data under the assumption of a hypothesized model of genome regulation, (ii) identify neural network architectures capable of recovering the rules of said model and (iii) analyze a model’s predictive performance as a function of training set size and the complexity of the rules behind the simulated data. Availability and implementation: The source code of the seqgra package is hosted on GitHub (https://github.com/gif ford-lab/seqgra). seqgra is a pip-installable Python package. Extensive documentation can be found at https:// kkrismer.github.io/seqgra.
Date issued
2022-04-28
URI
https://hdl.handle.net/1721.1/143575
Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Biological Engineering; Massachusetts Institute of Technology. Computational and Systems Biology Program; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Journal
Bioinformatics
Publisher
Oxford University Press (OUP)
Citation
Krismer, Konstantin, Hammelman, Jennifer and Gifford, David K. 2022. "seqgra: principled selection of neural network architectures for genomics prediction tasks." Bioinformatics, 38 (9).
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.