Is the most likely model likely to be the correct model?

Yankama, Beracah

dc.contributor.advisor	Robert C. Berwick and Whitman A. Richards.	en_US
dc.contributor.author	Yankama, Beracah	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2010-04-28T17:15:25Z
dc.date.available	2010-04-28T17:15:25Z
dc.date.copyright	2009	en_US
dc.date.issued	2009	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/54654
dc.description	Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (p. 89-93).	en_US
dc.description.abstract	In this work, I test the hypothesis that the 2-dimensional dependencies of a deterministic model can be correctly recovered via hypothesis-enumeration and Bayesian selection for a linear sequence, and what the degree of 'ignorance' or 'uncertainty' is that Bayesian selection can tolerate concerning the properties of the model and data. The experiment tests the data created by a number of rules of size 3 and compares the implied dependency map to the (correct) dependencies of the various generating rules, then extends it to a composition of 2 rules of total size 5. I found that 'causal' belief networks do not map directly to the dependencies of actual causal structures. For deterministic rules satisfying the condition of multiple involvement (two tails), the correct model is not likely to be retrieved without augmenting the model selection with a prior high enough to suggest that the desired dependency model is already known - simply restricting the class of models to trees, and placing other restrictions (such as ordering) is not sufficient. Second, the identified-model to correct-model map is not 1 to 1 - in the rule cases where the correct model is identified, the identified model could just as easily have been produced by a different rule. Third, I discovered that uncertainty concerning identification of observations directly resulted in the loss of existing information and made model selection the product of pure chance (such as the last observation). How to read and identify observations had to be agreed upon a-priori by both the rule and the learner to have any consistency in model identification.	en_US
dc.description.abstract	(cont.) Finally, I discovered that it is not the rule-observations that discriminate between models, but rather the noise, or uncaptured observations that govern the identified model. In analysis, I found that in enumeration of hypotheses (as dependency graphs) the differentiating space is very small. With representations of conditional independence, the equivalent factorizations of the graphs make the differentiating space even smaller. As Bayesian model identification relies on convergence to the differentiating space, if those spaces are diminishing in size (if the model size is allowed to grow) relative to the observation sequence, then maximizing the likelihood of a particular hypothesis may fail to converge on the correct one. Overall I found that if a learning mechanism either does not know how to read observations or know the dependencies he is looking for a-priori, then it is not likely to identify them probabilistically. Finally, I also confirmed existing results - that model selection always prefers increasingly connected models over independent models was confirmed, as was the knowledge that several conditional-independence graphs have equivalent factorizations. Finally Shannon's Asymptotic Equipartition Property was confirmed to apply both for novel observations and for an increasing model/parameter space size. These results are applicable to a number of domains: natural language processing and language induction by statistical means, bioinformatics and statistical identification and merging of ontologies, and induction of real-world causal dependencies.	en_US
dc.description.statementofresponsibility	by Beracah Yankama.	en_US
dc.format.extent	94 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Is the most likely model likely to be the correct model?	en_US
dc.type	Thesis	en_US
dc.description.degree	S.M.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	606601169	en_US

Files in this item

Name:: 606601169-MIT.pdf
Size:: 7.289Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record