Convolutional neural network architectures for predicting DNA–protein binding

Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.

dc.contributor.author	Zeng, Haoyang
dc.contributor.author	Edwards, Matthew D
dc.contributor.author	Liu, Ge
dc.contributor.author	Gifford, David K
dc.date.accessioned	2017-08-24T20:08:41Z
dc.date.available	2017-08-24T20:08:41Z
dc.date.issued	2016-06
dc.identifier.issn	1367-4803
dc.identifier.issn	1460-2059
dc.identifier.uri	http://hdl.handle.net/1721.1/111019
dc.description.abstract	Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology.	en_US
dc.description.sponsorship	National Institutes of Health (U.S.) (Grant 1U01HG007037)	en_US
dc.description.sponsorship	National Institutes of Health (U.S.) (Grant 5P01NS055923)	en_US
dc.language.iso	en_US
dc.publisher	Oxford University Press	en_US
dc.relation.isversionof	http://dx.doi.org/10.1093/bioinformatics/btw255	en_US
dc.rights	Creative Commons Attribution-NonCommercial 4.0 International	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	en_US
dc.source	PMC	en_US
dc.title	Convolutional neural network architectures for predicting DNA–protein binding	en_US
dc.type	Article	en_US
dc.identifier.citation	Zeng, Haoyang et al. “Convolutional Neural Network Architectures for Predicting DNA–protein Binding.” Bioinformatics 32, 12 (June 2016): i121–i127 © 2016 The Authors	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.mitauthor	Zeng, Haoyang
dc.contributor.mitauthor	Edwards, Matthew D
dc.contributor.mitauthor	Liu, Ge
dc.contributor.mitauthor	Gifford, David K
dc.relation.journal	Bioinformatics	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0003-1057-2865
dc.identifier.orcid	https://orcid.org/0000-0002-5845-748X
dc.identifier.orcid	https://orcid.org/0000-0001-9383-5186
dc.identifier.orcid	https://orcid.org/0000-0003-1709-4034
mit.license	PUBLISHER_CC	en_US

Files in this item

Name:: Convolutional neural network ...
Size:: 439.3Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record