dc.contributor.author | Puccinelli, Robert | |
dc.contributor.author | Kim, Ryan | |
dc.contributor.author | Fordyce, Polly | |
dc.contributor.author | Orenstein, Yaron | |
dc.contributor.author | Berger Leighton, Bonnie | |
dc.date.accessioned | 2018-05-16T13:17:47Z | |
dc.date.available | 2018-05-16T13:17:47Z | |
dc.date.issued | 2017-09 | |
dc.date.submitted | 2017-04 | |
dc.identifier.issn | 2405-4712 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/115384 | |
dc.description.abstract | Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost. We present a new compact sequence design that covers all k-mers utilizing joker characters and develop an efficient algorithm to generate such designs. We show through simulations and experimental validation that these sequence designs are useful for identifying high-affinity binding sites at significantly reduced cost and space. Keywords: sequence libraries; microarray design; de Bruijn graph | en_US |
dc.description.sponsorship | National Institutes of Health (U.S.) (Grant R01GM081871) | en_US |
dc.publisher | Elsevier | en_US |
dc.relation.isversionof | http://dx.doi.org/10.1016/J.CELS.2017.07.006 | en_US |
dc.rights | Creative Commons Attribution-NonCommercial-NoDerivs License | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | en_US |
dc.source | Elsevier | en_US |
dc.title | Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Orenstein, Yaron et al. “Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping.” Cell Systems 5, 3 (September 2017): 230–236 © 2017 The Authors | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Mathematics | en_US |
dc.contributor.mitauthor | Orenstein, Yaron | |
dc.contributor.mitauthor | Berger Leighton, Bonnie | |
dc.relation.journal | Cell Systems | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2018-05-15T18:30:46Z | |
dspace.orderedauthors | Orenstein, Yaron; Puccinelli, Robert; Kim, Ryan; Fordyce, Polly; Berger, Bonnie | en_US |
dspace.embargo.terms | N | en_US |
dc.identifier.orcid | https://orcid.org/0000-0002-3583-3112 | |
dc.identifier.orcid | https://orcid.org/0000-0002-2724-7228 | |
mit.license | PUBLISHER_CC | en_US |