Compression without a common prior: An information-theoretic justification for ambiguity in language

Juba, Brendan; Kalai, Adam Tauman; Khanna, Sanjeev; Sudan, Madhu

Author(s)

Juba, Brendan Andrew; Kalai, Adam Tauman; Khanna, Sanjeev; Sudan, Madhu

DownloadSudan_Compression without.pdf (277.2Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/

Metadata

Show full item record

Abstract

Compression is a fundamental goal of both human language and digital communication, yet natural language is very different from compression schemes employed by modern computers. We partly explain this difference using the fact that information theory generally assumes a common prior probability distribution shared by the encoder and decoder, whereas human communication has to be robust to the fact that a speaker and listener may have different prior beliefs about what a speaker may say. We model this information-theoretically using the following question: what type of compression scheme would be effective when the encoder and decoder have (boundedly) different prior probability distributions. The resulting compression scheme resembles natural language to a far greater extent than existing digital communication protocols. We also use information theory to justify why ambiguity is necessary for the purpose of compression.

Date issued

2011-01

URI

http://hdl.handle.net/1721.1/62817

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

Innovations in Computer Science (ICS 2011) Tsinghua University, Beijing, China

Publisher

Institute for Theoretical Computer Science

Citation

Juba, Brendan et al. "Compression without a common prior: an information-theoretic justification for ambiguity in language" in Proceedings of the Innovations in Computer Science (Tsinghua University, Jan. 6-9, 2011) Website: http://conference.itcs.tsinghua.edu.cn/ICS2011/content/papers/23.html

Version: Author's final manuscript

Collections

MIT Open Access Articles

DSpace@MIT