Compression without a common prior: An information-theoretic justification for ambiguity in language
Author(s)
Juba, Brendan Andrew; Kalai, Adam Tauman; Khanna, Sanjeev; Sudan, Madhu
DownloadSudan_Compression without.pdf (277.2Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
Compression is a fundamental goal of both human language and digital communication,
yet natural language is very different from compression schemes employed by modern computers. We
partly explain this difference using the fact that information theory generally assumes a common prior
probability distribution shared by the encoder and decoder, whereas human communication has to be
robust to the fact that a speaker and listener may have different prior beliefs about what a speaker may
say. We model this information-theoretically using the following question: what type of compression
scheme would be effective when the encoder and decoder have (boundedly) different prior probability
distributions. The resulting compression scheme resembles natural language to a far greater extent than
existing digital communication protocols. We also use information theory to justify why ambiguity is
necessary for the purpose of compression.
Date issued
2011-01Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Innovations in Computer Science (ICS 2011) Tsinghua University, Beijing, China
Publisher
Institute for Theoretical Computer Science
Citation
Juba, Brendan et al. "Compression without a common prior: an information-theoretic justification for ambiguity in language" in Proceedings of the Innovations in Computer Science (Tsinghua University, Jan. 6-9, 2011) Website: http://conference.itcs.tsinghua.edu.cn/ICS2011/content/papers/23.html
Version: Author's final manuscript