MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Improving Impulse Audio Source Separation using Generative Adversarial Networks for Phase Generation

Author(s)
Piercy, Phoebe K.
Thumbnail
DownloadThesis PDF (12.09Mb)
Advisor
Lang, Jeffrey H.
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
This thesis explored separating impulse noise from a desired signal, for the purposes of hearing protection for soldiers and musicians. An evaluation of current techniques in source separation, such as matrix demixing methods (Independent Component Analysis, Independent Vector Analysis), and masking methods (Ideal Ratio Mask, Ideal Binary Mask), amongst others, concluded that Time-Frequency masking of the noisy signal spectrogram was the best candidate audio separation method for dynamic soundscapes such as tactical fields and music. We followed with an experimental investigation of the role of phase in Time-Frequency masking, finding its importance to the intelligibility of speech to be paramount. In particular, the construction of a Complex Ideal Ratio Mask (cIRM), altering both magnitude and phase information in the spectrogram, was identified as the most promising method of impulse source separation, with separated speech intelligibility comparable to clean speech. This motivated us to develop a method to generate an approximation of the cIRM, but without prior source information. As such, the growing use of neural networks as a tool in source separation and phase estimation was presented and evaluated. Experiments were conducted to evaluate the potential of Generative Adversarial Networks (GANs), often used in image transformation, in generating the phase of the cIRM, with human test subjects to evaluate whether intelligibility of separated speech was improved. The GAN showed promise in generating phase-like results, although imperfect transformation resulted in an audible quality decrease, suggesting that the approach was unlikely to produce the natural sound required by musicians. However, for the tactical case, where intelligibility is valued over quality, consonant reconstruction and improved impulse attenuation was observed using our GAN-estimated cIRM. This improvement was reflected in an increase in the signal to noise ratio as compared to clean speech, and a decrease in the same metric compared to the impulse noise, demonstrating the improved clean speech contributions, and the reduction in impulse noise contributions in the separated output. These results show the potential, with better resources, for GAN-generated phase to be used to improve intelligibility during audio source separation of impulse noise from speech, and motivates further exploration on this topic.
Date issued
2021-06
URI
https://hdl.handle.net/1721.1/138956
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.