Compression without a common prior: An information-theoretic justification for ambiguity in language

Juba, Brendan; Kalai, Adam Tauman; Khanna, Sanjeev; Sudan, Madhu

dc.contributor.author	Juba, Brendan Andrew
dc.contributor.author	Kalai, Adam Tauman
dc.contributor.author	Khanna, Sanjeev
dc.contributor.author	Sudan, Madhu
dc.date.accessioned	2011-05-11T20:08:31Z
dc.date.available	2011-05-11T20:08:31Z
dc.date.issued	2011-01
dc.identifier.uri	http://hdl.handle.net/1721.1/62817
dc.description.abstract	Compression is a fundamental goal of both human language and digital communication, yet natural language is very different from compression schemes employed by modern computers. We partly explain this difference using the fact that information theory generally assumes a common prior probability distribution shared by the encoder and decoder, whereas human communication has to be robust to the fact that a speaker and listener may have different prior beliefs about what a speaker may say. We model this information-theoretically using the following question: what type of compression scheme would be effective when the encoder and decoder have (boundedly) different prior probability distributions. The resulting compression scheme resembles natural language to a far greater extent than existing digital communication protocols. We also use information theory to justify why ambiguity is necessary for the purpose of compression.	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Award CCF-0939370)	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Award CCF-0635084)	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (Award IIS- 0904314.)	en_US
dc.language.iso	en_US
dc.publisher	Institute for Theoretical Computer Science	en_US
dc.relation.isversionof	http://conference.itcs.tsinghua.edu.cn/ICS2011/content/paper/23.pdf	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike 3.0	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/	en_US
dc.source	MIT web domain	en_US
dc.title	Compression without a common prior: An information-theoretic justification for ambiguity in language	en_US
dc.type	Article	en_US
dc.identifier.citation	Juba, Brendan et al. "Compression without a common prior: an information-theoretic justification for ambiguity in language" in Proceedings of the Innovations in Computer Science (Tsinghua University, Jan. 6-9, 2011) Website: http://conference.itcs.tsinghua.edu.cn/ICS2011/content/papers/23.html	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.approver	Sudan, Madhu
dc.contributor.mitauthor	Juba, Brendan Andrew
dc.contributor.mitauthor	Sudan, Madhu
dc.relation.journal	Innovations in Computer Science (ICS 2011) Tsinghua University, Beijing, China	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
dspace.orderedauthors	Juba, Brendan; Kalai, Adam Tauman; Khanna, Sanjeev; Sudan, Madhu
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Sudan_Compression without.pdf
Size:: 277.2Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record