A Bayesian framework for concept learning

Tenenbaum, Joshua B. (Joshua Brett), 1972-

Author(s)

Tenenbaum, Joshua B. (Joshua Brett), 1972-

DownloadFull printable version (2.703Mb)

Advisor

Whitman A. Richards.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Human concept learning presents a version of the classic problem of induction, which is made particularly difficult by the combination of two requirements: the need to learn from a rich (i.e. nested and overlapping) vocabulary of possible concepts and the need to be able to generalize concepts reasonably from only a few positive examples. I begin this thesis by considering a simple number concept game as a concrete illustration of this ability. On this task, human learners can with reasonable confidence lock in on one out of a billion billion billion logically possible concepts, after seeing only four positive examples of the concept, and can generalize informatively after seeing just a single example. Neither of the two classic approaches to inductive inference hypothesis testing in a constrained space of possible rules and computing similarity to the observed examples can provide a complete picture of how people generalize concepts in even this simple setting. This thesis proposes a new computational framework for understanding how people learn concepts from examples, based on the principles of Bayesian inference. By imposing the constraints of a probabilistic model of the learning situation, the Bayesian learner can draw out much more information about a concept's extension from a given set of observed examples than either rule-based or similarity-based approaches do, and can use this information in a rational way to infer the probability that any new object is also an instance of the concept. There are three components of the Bayesian framework: a prior probability distribution over a hypothesis space of possible concepts; a likelihood function, which scores each hypothesis according to its probability of generating the observed examples; and the principle of hypothesis averaging, under which the learner computes the probability of generalizing a concept to new objects by averaging the predictions of all hypotheses weighted by their posterior probability (proportional to the product of their priors and likelihoods). The likelihood, under the assumption of randomly sampled positive examples, embodies the size principle for scoring hypotheses: smaller consistent hypotheses are more likely than larger hypotheses, and they become exponentially more likely as the number of observed examples increases. The principle of hypothesis averaging allows the Bayesian framework to accommodate both rule-like and similarity-like generalization behavior, depending on how peaked the posterior probability is. Together, the size principle plus hypothesis averaging predict a convergence from similarity-like generalization (due to a broad posterior distribution) after very few examples are observed to rule-like generalization (due to a sharply peaked posterior distribution) after sufficiently many examples have been observed. The main contributions of this thesis are as follows. First and foremost, I show how it is possible for people to learn and generalize concepts from just one or a few positive examples (Chapter 2). Building on that understanding, I then present a series of case studies of simple concept learning situations where the Bayesian framework yields both qualitative and quantitative insights into the real behavior of human learners (Chapters 3-5). These cases each focus on a different learning domain. Chapter 3 looks at generalization in continuous feature spaces, a typical representation of objects in psychology and machine learning with the virtues of being analytically tractable and empirically accessible, but the downside of being highly abstract and artificial. Chapter 4 moves to the more natural domain of learning words for categories of objects and shows the relevance of the same phenomena and explanatory principles introduced in the more abstract setting of Chapters 1-3 for real-world learning tasks like this one. In each of these domains, both similarity-like and rule-like generalization emerge as special cases of the Bayesian framework in the limits of very few or very many examples, respectively. However, the transition from similarity to rules occurs much faster in the word learning domain than in the continuous feature space domain. I propose a Bayesian explanation of this difference in learning curves that places crucial importance on the density or sparsity of overlapping hypotheses in the learner's hypothesis space. To test this proposal, a third case study (Chapter 5) returns to the domain of number concepts, in which human learners possess a more complex body of prior knowledge that leads to a hypothesis space with both sparse and densely overlapping components. Here, the Bayesian theory predicts and human learners produce either rule-based or similarity-based generalization from a few examples, depending on the precise examples observed. I also discusses how several classic reasoning heuristics may be used to approximate the much more elaborate computations of Bayesian inference that this domain requires. In each of these case studies, I confront some of the classic questions of concept learning and induction: Is the acquisition of concepts driven mainly by pre-existing knowledge or the statistical force of our observations? Is generalization based primarily on abstract rules or similarity to exemplars? I argue that in almost all instances, the only reasonable answer to such questions is, Both. More importantly, I show how the Bayesian framework allows us to answer much more penetrating versions of these questions: How does prior knowledge interact with the observed examples to guide generalization? Why does generalization appear rule-based in some cases and similarity-based in others? Finally, Chapter 6 summarizes the major contributions in more detailed form and discusses how this work ts into the larger picture of contemporary research on human learning, thinking, and reasoning.

Description

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, 1999.

Includes bibliographical references (p. 297-314).

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Date issued

1999

URI

http://hdl.handle.net/1721.1/16714

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences

Publisher

Massachusetts Institute of Technology

Keywords

Brain and Cognitive Sciences

Collections

Doctoral Theses