MIT Libraries homeMIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction

Author(s)
Guo, Yuchun; Tian, Kevin J.; Zeng, Haoyang; Guo, Xiaoyun; Gifford, David K
Thumbnail
DownloadEMB UNTIL Oct 13, 2018 Genome Res.-2018-Guo-891-900.pdf (3.745Mb)
PUBLISHER_CC

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution-NonCommercial 4.0 International http://creativecommons.org/licenses/by-nc/4.0/
Metadata
Show full item record
Abstract
The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the k-mer set memory (KSM), which consists of a set of aligned k-mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations.
Date issued
2018-04
URI
http://hdl.handle.net/1721.1/119653
Department
Massachusetts Institute of Technology. Computational and Systems Biology Program; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Biological Engineering; Massachusetts Institute of Technology. Department of Biology; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Department of Mathematics; Massachusetts Institute of Technology. Research Laboratory of Electronics
Journal
Genome Research
Publisher
Cold Spring Harbor Laboratory
Citation
Guo, Yuchun, Kevin Tian, Haoyang Zeng, Xiaoyun Guo, and David Kenneth Gifford. “A Novel k-Mer Set Memory (KSM) Motif Representation Improves Regulatory Variant Prediction.” Genome Research 28, no. 6 (April 13, 2018): 891–900.
Version: Final published version
ISSN
1088-9051
1549-5469

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries homeMIT Libraries logo

Find us on

Twitter Instagram YouTube

MIT Libraries navigation

SearchHours & locationsBorrow & requestResearch supportAbout us
PrivacyPermissionsAccessibility
MIT
Massachusetts Institute of Technology
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.