Decoding the Depths: Developing a Click Separator for Predictive Speaker Recognition in Sperm Whale Conversations Using Machine Learning

Lee, Jason D.

Author(s)

Lee, Jason D.

DownloadThesis PDF (2.818Mb)

Advisor

Andreas, Jacob

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

This thesis describes the development of a tool that transcribes sperm whale communication similar to how human speech is recorded and transformed into written text. Sperm whales communicate using a sophisticated system of clicks and codas. Listening through hundreds of hours of sperm whale audio recordings, individuals have been producing annotations entirely by hand thus far. This research aimed to build a whale-speaker identification model that can be paired with a predictive click-detection mechanism to automate the production of accurate annotations. I discuss three methodologies that aim to achieve this objective. The first proposal is a heuristic-based whale-identification separator model. The second approach involves training both the click-detection and whale-identification separator models simultaneously. I find that these two methodologies yield unsatisfactory results. Lastly, the third proposal is a standalone deep network model using a supervised contrastive learning objective which demonstrates the best performance and ultimately the most potential for future applications.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/156776

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses