| dc.contributor.advisor | James R. Glass. | en_US |
| dc.contributor.author | Drexler, Jennifer Fox | en_US |
| dc.contributor.other | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. | en_US |
| dc.date.accessioned | 2016-12-05T19:58:26Z | |
| dc.date.available | 2016-12-05T19:58:26Z | |
| dc.date.copyright | 2016 | en_US |
| dc.date.issued | 2016 | en_US |
| dc.identifier.uri | http://hdl.handle.net/1721.1/105696 | |
| dc.description | Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. | en_US |
| dc.description | Cataloged from PDF version of thesis. | en_US |
| dc.description | Includes bibliographical references (pages 87-92). | en_US |
| dc.description.abstract | Automatic speech recognition (ASR) systems have become hugely successful in recent years - we have become accustomed to speech interfaces across all kinds of devices. However, despite the huge impact ASR has had on the way we interact with technology, it is out of reach for a significant portion of the world's population. This is because these systems rely on a variety of manually-generated resources - like transcripts and pronunciation dictionaries - that can be both expensive and difficult to acquire. In this thesis, we explore techniques for learning about speech directly from speech, with no manually generated transcriptions. Such techniques have the potential to revolutionize speech technologies for the vast majority of the world's population. The cognitive science and computer science communities have both been investing increasing time and resources into exploring this problem. However, a full unsupervised speech recognition system is a hugely complicated undertaking and is still a long ways away. As in previous work, we focus on the lower-level tasks which will underlie an eventual unsupervised speech recognizer. We specifically focus on two tasks: developing linguistically meaningful representations of speech and segmenting speech into phonetic units. This thesis approaches these tasks from a new direction: deep learning. While modern deep learning methods have their roots in ideas from the 1960s and even earlier, deep learning techniques have recently seen a resurgence, thanks to huge increases in computational power and new efficient learning algorithms. Deep learning algorithms have been instrumental in the recent progress of traditional supervised speech recognition; here, we extend that work to unsupervised learning from speech. | en_US |
| dc.description.statementofresponsibility | by Jennifer Fox Drexler. | en_US |
| dc.format.extent | 92 pages | en_US |
| dc.language.iso | eng | en_US |
| dc.publisher | Massachusetts Institute of Technology | en_US |
| dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
| dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | en_US |
| dc.subject | Electrical Engineering and Computer Science. | en_US |
| dc.title | Deep unsupervised learning from speech | en_US |
| dc.type | Thesis | en_US |
| dc.description.degree | S.M. | en_US |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| dc.identifier.oclc | 964523141 | en_US |