Show simple item record

dc.contributor.authorAndoni, Alexandr
dc.contributor.authorDaskalakis, Constantinos
dc.contributor.authorHassidim, Avinatan
dc.contributor.authorRoch, Sebastien
dc.date.accessioned2012-10-16T14:04:37Z
dc.date.available2012-10-16T14:04:37Z
dc.date.issued2010
dc.identifier.urihttp://hdl.handle.net/1721.1/74010
dc.description.abstractMolecular phylogenetic techniques do not generally account for such common evolutionary events as site insertions and deletions (known as indels). Instead tree building algorithms and ancestral state inference procedures typically rely on substitution-only models of sequence evolution. In practice these methods are extended beyond this simplified setting with the use of heuristics that produce global alignments of the input sequences—an important problem which has no rigorous model-based solution. In this paper we open a new direction on this topic by considering a version of the multiple sequence alignment in the context of stochastic indel models. More precisely, we introduce the following trace reconstruction problem on a tree (TRPT): a binary sequence is broadcast through a tree channel where we allow substitutions, deletions, and insertions; we seek to reconstruct the original sequence from the sequences received at the leaves of the tree. We give a recursive procedure for this problem with strong reconstruction guarantees at low mutation rates, providing also an alignment of the sequences at the leaves of the tree. The TRPT problem without indels has been studied in previous work (Mossel 2004, Daskalakis et al. 2006) as a bootstrapping step towards obtaining information-theoretically optimal phylogenetic reconstruction methods. The present work sets up a framework for extending these works to evolutionary models with indels. In the TRPT problem we begin with a random sequence x[subscript 1], . . . , x[subscript k] at the root of a d-ary tree. If vertex v has the sequence y[subscript 1], . . . y[subscript kv] , then each one of its d children will have a sequence which is generated from y[subscript 1], . . . y[subscript kv] by flipping three biased coins for each bit. The first coin has probability p[subscript s] for Heads, and determines whether this bit will be substituted or not. The second coin has probability p[subscript d], and determines whether this bit will be deleted, and the third coin has probability pi and determines whether a new random bit will be inserted. The input to the procedure is the sequences of the n leaves of the tree, as well as the tree structure (but not the sequences of the inner vertices) and the goal is to reconstruct an approximation to the sequence of the root (the DNA of the ancestral father). For every χ > 0 we present an algorithm which outputs with probability 1−χ an approximation of x[subscript 1], . . . , x[subscript k] if p[subscript i] + p[subscript d] < O(1/k[subscript 2/3] log n) and (1 − 2p[subscript s])[superscript 2] > Cd[superscript −1] log d for some constant C > 0, and every large enough d. To our knowledge, this is the first rigorous trace reconstruction result on a tree in the presence of indels.en_US
dc.language.isoen_US
dc.publisherInstitute for Theoretical Computer Science (ITCS)en_US
dc.relation.isversionofhttp://conference.itcs.tsinghua.edu.cn/ICS2010/content/paper/Paper_28.pdfen_US
dc.rightsCreative Commons Attribution-Noncommercial-Share Alike 3.0en_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/en_US
dc.sourcearXiven_US
dc.titleGlobal Alignment of Molecular Sequences via Ancestral State Reconstructionen_US
dc.typeArticleen_US
dc.identifier.citationAlexandr Andoni et al. "Global Alignment of Molecular Sequences via Ancestral State Reconstruction" Proceedings of the Innovations in Computer Science 2010, ITCS.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.mitauthorDaskalakis, Constantinos
dc.relation.journalProceedings of the Innovations in Computer Science 2010en_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
dc.identifier.orcidhttps://orcid.org/0000-0002-5451-0490
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record