MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Audio Segmenting and Natural Language Processing in Oral History Archiving

Author(s)
Rieping, Holly Anne
Thumbnail
DownloadThesis PDF (737.6Kb)
Advisor
Fendt, Kurt E.
Terms of use
In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Traditional archives preserve physical historical records, documents, artifacts, etc. and tell a story of some historical significance. As the digital age progresses, digital archives have become more commonplace and have given wider access to archival resources and knowledge to the general public. With wider access, historically marginalized groups now have the means to share stories that have typically been excluded from the dominant discourse. As a result, we are faced with both the challenge and the opportunity to tell and preserve stories from these groups and foreground diverse voices in these digital archives. Additionally, we are faced with the challenge of having an abundance of materials, both digitized and born digital, to use in an archive, and can utilize various computational methods to assist in the curatorial process of a digital archive by organizing the materials or finding connections between different materials that would otherwise take hundreds of hours for an archivist to do. Using materials from the MIT Black Oral History Project, this thesis first explores ways to process digitized audio interviews through audio segmentation, using techniques including silence detection and speaker diarization, with the goal of creating a more flexible way to explore interviews in a digital oral history archive. Second, this thesis uses named entity recognition to experiment with metadata extraction for an archive. Next, this thesis explores ways to discover connections between segments of interviews by using topic modeling with LDA and LSI and topic classification using machine learning methods to identify topics, similarities, and dissimilarities across interviews. Finally, this thesis discusses how these computational methods may enhance the telling of diverse stories in digital oral history archives.
Date issued
2022-02
URI
https://hdl.handle.net/1721.1/143185
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.