Towards co-channel speaker separation BY 2-D demodulation of spectrograms
Author(s)
Wang, Tianyu Tom; Quatieri, Thomas F.
DownloadWang-2009-Towards co-channel speaker separation by 2-D demodulation of spectrograms.pdf (2.897Mb)
PUBLISHER_POLICY
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps harmonically-related speech content to concentrated entities in a transformed 2-D space, thereby motivating 2-D demodulation of the spectrogram for analysis/synthesis and speaker separation. Using a priori pitch estimates of individual speakers, we show through a quantitative evaluation: 1) Utility of the model for representing speech content of a single speaker and 2) Its feasibility for speaker separation. For the separation task, we also illustrate benefits of the model's representation of pitch dynamics relative to a sinusoidal-based separation system.
Date issued
2009-12Department
Lincoln LaboratoryJournal
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009.
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Wang, Tianyu T., and Thomas F. Quatieri. “Towards Co-channel Speaker Separation BY 2-D Demodulation of Spectrograms.” IEEE, 2009. 65–68. © Copyright 2009 IEEE
Version: Final published version
ISBN
978-1-4244-3679-8
978-1-4244-3678-1
ISSN
1931-1168