Show simple item record

dc.contributor.advisorGlass, James R.
dc.contributor.authorChuang, Yung-Sung
dc.date.accessioned2024-03-15T19:23:06Z
dc.date.available2024-03-15T19:23:06Z
dc.date.issued2024-02
dc.date.submitted2024-02-21T17:10:06.811Z
dc.identifier.urihttps://hdl.handle.net/1721.1/153774
dc.description.abstractInformation retrieval, at the core of numerous applications such as search engines and open-domain question-answering systems, relies on effective textual representation and semantic matching. However, current approaches can lose nuanced lexical detail information due to an information bottleneck in dense retrieval, or rely on exact lexical matching and thus overlook the broader contextual relevance when using sparse retrieval. This thesis delves into improving both dense and sparse retrieval systems with advanced language models and training strategies. We first introduce DiffCSE, a difference-based contrastive learning framework for unsupervised sentence embedding and dense retrieval that can effectively capture minor differences in sentences, showcasing improved performance in semantic tasks and retrieval for open-domain question answering. We then address sparse retrieval's limitations by developing a query expansion and reranking procedure. Using pre-trained language models, we propose an expansion and reranking pipeline for better query expansion, achieving superior retrieval results both in-domain and out-of-domain, yet retaining sparse retrieval's computational efficiency. In summary, this thesis provides a comprehensive exploration of advancing information retrieval in the generation of large language models.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleInformation Retrieval with Dense and Sparse Representations
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.orcid0000-0002-1723-5063
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record