Decoding the Depths: Developing a Click Separator for Predictive Speaker Recognition in Sperm Whale Conversations Using Machine Learning
Author(s)
Lee, Jason D.
DownloadThesis PDF (2.818Mb)
Advisor
Andreas, Jacob
Terms of use
Metadata
Show full item recordAbstract
This thesis describes the development of a tool that transcribes sperm whale communication similar to how human speech is recorded and transformed into written text. Sperm whales communicate using a sophisticated system of clicks and codas. Listening through hundreds of hours of sperm whale audio recordings, individuals have been producing annotations entirely by hand thus far. This research aimed to build a whale-speaker identification model that can be paired with a predictive click-detection mechanism to automate the production of accurate annotations. I discuss three methodologies that aim to achieve this objective. The first proposal is a heuristic-based whale-identification separator model. The second approach involves training both the click-detection and whale-identification separator models simultaneously. I find that these two methodologies yield unsatisfactory results. Lastly, the third proposal is a standalone deep network model using a supervised contrastive learning objective which demonstrates the best performance and ultimately the most potential for future applications.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology