Show simple item record

dc.contributor.authorGuo, Yuchun
dc.contributor.authorTian, Kevin J.
dc.contributor.authorZeng, Haoyang
dc.contributor.authorGuo, Xiaoyun
dc.contributor.authorGifford, David K
dc.date.accessioned2018-12-17T13:34:30Z
dc.date.available2018-12-17T13:34:30Z
dc.date.issued2018-04
dc.identifier.issn1088-9051
dc.identifier.issn1549-5469
dc.identifier.urihttp://hdl.handle.net/1721.1/119653
dc.description.abstractThe representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consists of a set of aligned k-mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations.en_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (grant 1U01HG007037)en_US
dc.description.sponsorshipNational Institutes of Health (U.S.) (grant 1R01HG008363)en_US
dc.publisherCold Spring Harbor Laboratoryen_US
dc.relation.isversionofhttp://dx.doi.org/10.1101/GR.226852.117en_US
dc.rightsCreative Commons Attribution-NonCommercial 4.0 Internationalen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/en_US
dc.sourceCold Spring Harbor Laboratory Pressen_US
dc.titleA novel k-mer set memory (KSM) motif representation improves regulatory variant predictionen_US
dc.typeArticleen_US
dc.identifier.citationGuo, Yuchun, Kevin Tian, Haoyang Zeng, Xiaoyun Guo, and David Kenneth Gifford. “A Novel k-Mer Set Memory (KSM) Motif Representation Improves Regulatory Variant Prediction.” Genome Research 28, no. 6 (April 13, 2018): 891–900.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computational and Systems Biology Programen_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Biological Engineeringen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Biologyen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Mathematicsen_US
dc.contributor.departmentMassachusetts Institute of Technology. Research Laboratory of Electronicsen_US
dc.contributor.mitauthorGuo, Yuchun
dc.contributor.mitauthorTian, Kevin J.
dc.contributor.mitauthorZeng, Haoyang
dc.contributor.mitauthorGuo, Xiaoyun
dc.contributor.mitauthorGifford, David K
dc.relation.journalGenome Researchen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2018-12-03T17:34:00Z
dspace.orderedauthorsGuo, Yuchun; Tian, Kevin; Zeng, Haoyang; Guo, Xiaoyun; Gifford, David Kennethen_US
dspace.embargo.termsNen_US
dc.identifier.orcidhttps://orcid.org/0000-0003-2357-1546
dc.identifier.orcidhttps://orcid.org/0000-0003-1057-2865
dc.identifier.orcidhttps://orcid.org/0000-0003-1709-4034
mit.licensePUBLISHER_CCen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record