Speaker diarization in a meeting scenario
Author(s)Oseni-Adegbite, Adedotun J.
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
James Glass and Hao Tang.
MetadataShow full item record
Given the large amount of time workers are spending in meetings, having statistics to drive more effective meetings is desirable. Various workplaces have distinct types of meetings and workers present. So the more agnostic to the content and people present within the meeting, the more meeting scenarios these statistics can be applied. We propose a system that provides these statistics in the form of a summary of who is speaking within the meeting and at what times they are speaking whilst respecting the participants' privacy. The system aims to run completely online and locally. Therefore, no audio needs to be stored or transmitted on the device running the system. This is accomplished by displaying where speech originates in the room and also labeling the speaker. Time stamp labels are provided for all occurrences of a speaker's speech thus allowing a breakdown of how each speaker contributed to the meeting. We have created a dataset of emulated meeting-like scenario recordings to run experiments on. In an offline scenario, this system was able to achieve a DER of 27.8% with no overlap in the speech, 44.3% with small amounts of overlap, and 50.0% with large amounts of overlap. When run online, DERs of 16.9%, 37.2%, and 45.6% were achieved in situations of no overlap, small overlap, and large amounts of overlap respectively.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, May, 2020Cataloged from the official PDF of thesis.Includes bibliographical references (pages 81-84).
DepartmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.