Coherence in natural language : data structures and applications

Wolf, Florian, 1975-

Author(s)

Wolf, Florian, 1975-

DownloadFull printable version (8.363Mb)

Other Contributors

Massachusetts Institute of Technology. Dept. of Brain and Cognitive Sciences.

Advisor

Edward Gibson.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

(cont.) baseline, and that some coherence-based approaches best predict the human data. However, coherence-based algorithms that operate on trees did not perform as well as coherence-based algorithms that operate on more general graphs. It is suggested that that might in part be due to the fact that more general graphs are more descriptively adequate than trees for representing discourse coherence.

The general topic of this thesis is coherence in natural language, where coherence refers to informational relations that hold between segments of a discourse. More specifically, this thesis aims to (1) develop criteria for a descriptively adequate data structure for representing discourse coherence; (2) test the influence of coherence on psycholinguistic processes, in particular, pronoun processing; (3) test the influence of coherence on the relative saliency of discourse segments in a text. In order to address the first aim, a method was developed for hand-annotating a database of naturally occurring texts for coherence structures. The thus obtained database of coherence structures was used to test assumptions about descriptively adequate data structures for representing discourse coherence. In particular, the assumption that discourse coherence can be represented in trees was tested, and results suggest that more powerful data structures than trees are needed (labeled chain graphs, where the labels represent types of coherence relations, and an ordered array of nodes represents the temporal order of discourse segments in a text). The second aim was addressed in an on-line comprehension and an off-line production experiment. Results from both experiments suggest that only a coherence-based account predicted the full range of observed data. In that account, the observed preferences in pronoun processing are not a result of pronoun-specific mechanisms, but a byproduct of more general cognitive mechanisms that operate when establishing coherence. In order to address the third aim, layout-, word-, and coherence-based approaches to discourse segment ranking were compared to human rankings. Results suggest that word-based accounts provide a strong

Description

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Brain and Cognitive Sciences, February 2005.

Includes bibliographical references (leaves [143]-148).

Date issued

2005

URI

http://hdl.handle.net/1721.1/28854

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences

Publisher

Massachusetts Institute of Technology

Keywords

Brain and Cognitive Sciences.

Collections

Doctoral Theses