Signal processing for DNA sequencing
Author(s)
Boufounos, Petros T., 1977-
DownloadFull printable version (4.001Mb)
Alternative title
Signal processing for Deoxyribonucleic acid sequencing
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Alan V. Oppenheim.
Terms of use
Metadata
Show full item recordAbstract
DNA sequencing is the process of determining the sequence of chemical bases in a particular DNA molecule-nature's blueprint of how life works. The advancement of biological science in has created a vast demand for sequencing methods, which needs to be addressed by automated equipment. This thesis tries to address one part of that process, known as base calling: it is the conversion of the electrical signal-the electropherogram--collected by the sequencing equipment to a sequence of letters drawn from ( A,TC,G ) that corresponds to the sequence in the molecule sequenced. This work formulates the problem as a pattern recognition problem, and observes its striking resemblance to the speech recognition problem. We, therefore, propose combining Hidden Markov Models and Artificial Neural Networks to solve it. In the formulation we derive an algorithm for training both models together. Furthermore, we devise a method to create very accurate training data, requiring minimal hand-labeling. We compare our method with the de facto standard, PHRED, and produce comparable results. Finally, we propose alternative HMM topologies that have the potential to significantly improve the performance of the method.
Description
Thesis (M.Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. Includes bibliographical references (p. 83-86).
Date issued
2002Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.