Towards co-channel speaker separation BY 2-D demodulation of spectrograms

Wang, Tianyu T.; Quatieri, Thomas F.

Author(s)

Wang, Tianyu Tom; Quatieri, Thomas F.

DownloadWang-2009-Towards co-channel speaker separation by 2-D demodulation of spectrograms.pdf (2.897Mb)

PUBLISHER_POLICY

Terms of use

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Metadata

Show full item record

Abstract

This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps harmonically-related speech content to concentrated entities in a transformed 2-D space, thereby motivating 2-D demodulation of the spectrogram for analysis/synthesis and speaker separation. Using a priori pitch estimates of individual speakers, we show through a quantitative evaluation: 1) Utility of the model for representing speech content of a single speaker and 2) Its feasibility for speaker separation. For the separation task, we also illustrate benefits of the model's representation of pitch dynamics relative to a sinusoidal-based separation system.

Date issued

2009-12

URI

http://hdl.handle.net/1721.1/71798

Department

Lincoln Laboratory

Journal

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009.

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Citation

Version: Final published version

ISBN

978-1-4244-3679-8

978-1-4244-3678-1

ISSN

1931-1168

Collections

MIT Open Access Articles

DSpace@MIT