Learning sentiment and semantic relatedness in user generated content using neural models

Nassif, Henry Michel

Author(s)

Nassif, Henry Michel

DownloadFull printable version (6.210Mb)

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

James Glass and Mitra Mohtarami.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Online review platforms and discussion forums are filled with insights that are critical to unlocking the value in user-generated content. In this thesis, we investigate two major Natural Language Processing (NLP) research areas: Aspect-Based Sentiment Analysis (ABSA) and Community Question Answering (cQA) ranking problems, for the purposes of harnessing and understanding the sentiment and semantics expressed in review platforms and discussion forums. Riding on the recent trends of deep learning, this work applies neural networks to solve these tasks. We design neuralbased models including Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs) to capture the semantic and sentiment information. Aspect Based Sentiment Analysis is concerned with predicting the aspect categories mentioned in a sentence and the sentiments associated with each aspect category. We refer to these tasks as Aspect Category Detection and Aspect category Sentiment Prediction, respectively. We present a neural-based model with convolutional layers and Multi-Layer Perceptron (MLP) to address these tasks. The model uses the word vector representations generated using word2vec and computes the convolutional vectors of the user-generated reviews. These vectors are then employed to predict the aspect categories and their corresponding sentiments. We evaluate the performance of our ABSA models on a restaurant review dataset and show that our results on the aspect category detection task and aspect category sentiment prediction task outperform the baselines. The Community Question Answering system is concerned with automatically finding the related questions in an existing set of questions, and finding the relevant answers to a new question. We address these ranking problems, which we respectively refer to as similar-Question Retrieval and Answer Selection. We present a neural-based model with stacked bidirectional LSTMs and MLP to address these tasks. The model generates the vector representations of the question-question or question-answer pairs and computes their semantic similarity scores. These scores are then used to rank and predict relevancies. Extensive experiments demonstrate that our cQA models for the question retrieval and answer selection tasks outperform the baselines if enough training data is available.

Description

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 113-124).

Date issued

2016

URI

http://hdl.handle.net/1721.1/105969

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses