Audio Segmenting and Natural Language Processing in Oral History Archiving
Author(s)
Rieping, Holly Anne
DownloadThesis PDF (737.6Kb)
Advisor
Fendt, Kurt E.
Terms of use
Metadata
Show full item recordAbstract
Traditional archives preserve physical historical records, documents, artifacts, etc. and tell a story of some historical significance. As the digital age progresses, digital archives have become more commonplace and have given wider access to archival resources and knowledge to the general public. With wider access, historically marginalized groups now have the means to share stories that have typically been excluded from the dominant discourse. As a result, we are faced with both the challenge and the opportunity to tell and preserve stories from these groups and foreground diverse voices in these digital archives. Additionally, we are faced with the challenge of having an abundance of materials, both digitized and born digital, to use in an archive, and can utilize various computational methods to assist in the curatorial process of a digital archive by organizing the materials or finding connections between different materials that would otherwise take hundreds of hours for an archivist to do.
Using materials from the MIT Black Oral History Project, this thesis first explores ways to process digitized audio interviews through audio segmentation, using techniques including silence detection and speaker diarization, with the goal of creating a more flexible way to explore interviews in a digital oral history archive. Second, this thesis uses named entity recognition to experiment with metadata extraction for an archive. Next, this thesis explores ways to discover connections between segments of interviews by using topic modeling with LDA and LSI and topic classification using machine learning methods to identify topics, similarities, and dissimilarities across interviews. Finally, this thesis discusses how these computational methods may enhance the telling of diverse stories in digital oral history archives.
Date issued
2022-02Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology