Exploiting E-mail structure to improve summarization
Author(s)
Lam, Derek Scott, 1979-
DownloadFull printable version (302.6Kb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Steven L. Rohall and Chris Schmandt.
Terms of use
Metadata
Show full item recordAbstract
For this thesis, I designed and implemented a system to summarize e-mail messages. The system exploits two aspects of e-mail, thread reply chains and commonly-found features, to generate summaries. The system uses existing software designed to summarize single text documents. Such software typically performs best on well-authored, formal documents. E-mail messages, however, are typically neither well-authored, nor formal. As a result, existing summarization software typically gives a poor summary of e-mail messages. To remedy this poor performance, the system's approach preprocesses e-mail messages to synthesize new input to this software, so that it will output more useful summaries of e-mail. This pre-processing involves a lightweight, heuristics-based approach to filtering e-mail to remove e-mail signatures, header fields, and quoted parent messages. I also present a heuristics-based approach to identifying and reporting names, dates, and companies found in e-mail messages. Lastly, I discuss conclusions from a pilot user study of my summarization system, and conclude with areas for further investigation.
Description
Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002. Includes bibliographical references (p. 77-81). This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Date issued
2002Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.