Sentiment analysis of textual content in social networks, from hand-crafted to deep learning-based models

  1. Hamood Abdullah Jabreel, Mohammed
Dirigida por:
  1. Antonio Moreno Ribas Director/a

Universidad de defensa: Universitat Rovira i Virgili

Fecha de defensa: 18 de mayo de 2020

Tribunal:
  1. Diana Maynard Presidente/a
  2. Aïda Valls Mateu Secretario/a
  3. Eugenio Martínez Cámara Vocal

Tipo: Tesis

Teseo: 625975 DIALNET lock_openTDX editor

Resumen

MOTIVATION In the last decade, the World Wide Web (WWW) has become one of the essential sources of information. People share their views, opinions, feelings and experiences online by writing blogs, posting comments on a microblogging service (e.g., Twitter), using a social networking service (e.g., Facebook), publishing a product review, or commenting in discussion forums and other types of social media. Such social media have adapted the role of users to be not only content consumers but also producers. Consequently, the web offers a large amount of public discourse and accessible content. The content shared by users comes in different modalities, i.e., textual content, images, and video. The analysis of user-generated content on the web has attracted researchers in different fields, including, among others, the fields of Natural Language Processing (NLP), computer vision, tourism, multi-criteria decision aid, smart health, complex networks, and location-based services analysis. One of the hottest topics in the field of NLP that targets the problem of analysing usergenerated content is Sentiment Analysis (SA) or Opinion Mining (OM). It has a wide range of applications in commerce, public health, social welfare, etc. For instance, it can be used in public health for detecting depression, identifying cases of cyber-bullying and tracking of well-being. SA can also be used in public opinion detection about political tendencies, brand management , stock market monitoring and education. Nowadays, several companies make massive investments towards the automatic analysis of the opinions of the customers about their products on the Web, seeking to detect trends that boost sales, consumer satisfaction and corporate profits. Governments also show interest in the analysis of the public opinion on different social and economic issues on the web. SA can be used to refer to many diverse but related problems. Most commonly, it is used to refer to the problem of automatically identifying the polarity of a piece of text, i.e., whether it is positive, negative, or neutral. However, more generally, it refers to determining one’s attitude towards a particular target or topic. Here, attitude can mean an evaluative judgment, such as positive or negative, or an emotional or effectual attitude such as frustration, joy, anger, sadness or excitement. Hence,in this thesis, we aim to develop methods to automatically analyse textual content shared on social networks and identify people' opinions, emotions and feelings at different level of analysis and in different languages. ETHODOLOGY This thesis proposes several advanced methods to automatically analyse textual content shared on social networks and identify people' opinions, emotions and feelings at a different level of analysis and in different languages. We start by proposing a sentiment analysis system, called SentiRich, based on a set of rich features, including the information extracted from sentiment lexicons and pre-trained word embedding models. Then, we propose an ensemble system based on Convolutional Neural Networks and XGboost regressors to solve an array of sentiment and emotion analysis tasks on Twitter. These tasks range from the typical sentiment analysis tasks, to automatically determining the intensity of an emotion (such as joy, fear, anger, etc.) and the intensity of sentiment (aka valence) of the authors from their tweets. We also propose a novel Deep Learning-based system to address the multiple emotion classification problem on Twitter. Moreover, we considered the problem of target-dependent sentiment analysis. For this purpose, we propose a Deep Learning-based system that identifies and extracts the target of the tweets. While some languages, such as English, have a vast array of resources to enable sentiment analysis, most low-resource languages lack them. So, we utilise the Cross-lingual Sentiment Analysis technique to develop a novel, multi-lingual and Deep Learning-based system for low resource languages. We propose to combine Multi-Criteria Decision Aid and sentiment analysis to develop a system that gives users the ability to exploit reviews alongside their preferences in the process of alternatives ranking. Finally, we applied the developed systems to the field of communication of destination brands through social networks. To this end, we collected tweets of local people, visitors, and official brand destination offices from different tourist destinations and analysed the opinions and the emotions shared in these tweets. CONCLUSIONS In this thesis, we used supervised hand-crafted, i.e., traditional, Machine Learning methods and also Deep Learning techniques. We tackled a wide array of problems related to sentiment analysis including the typical sentiment analysis tasks, automatically determining the intensity of emotions and the intensity of sentiment (aka valence) of the users from their tweets, target-dependent sentiment analysis and cross-lingual sentiment analysis. Below, we summarize the main contributions of this thesis. 1. We have developed a sentiment analysis system, called SentiRich, based on a set of rich features, including the information extracted from sentiment lexicons and pre-trained word embedding models. We have used the system to contribute to the international challenge Sentiment Analysis in Twitter task of SemEval-2017. The system was ranked seventh out of 38 systems in the English language. We have developed another version of SentiRich system for the Arabic language, and the system was ranked second out of eight systems. 2. We have proposed a system based on Convolutional Neural Networks and XGboost regressors to solve an array of sentiment and emotion analysis tasks on Twitter. These tasks range from the typical sentiment analysis tasks, to automatically determining the intensity of an emotion (such as joy, fear, anger, etc.) and the intensity of sentiment (aka valence) of the authors from their tweets. We have used the system to contribute to the international challenge SemEval-2018 Task 1: Affect in Tweets. In the analysis of emotions in tweets written in Arabic, the system obtained the first or second position –out of 14 participants- in four subtasks (numerical/categorical emotion intensity and numerical/categorical valence). In the analysis of English tweets, the proposed system was ranked between the fifth and the eleventh position –out of 39 participants- in the same tasks. 3. We have proposed a novel Deep Learning based system that addresses the multiple emotion classification problems on Twitter. We proposed a novel method to transform it into a binary classification problem and exploit a deep learning approach to solve the transformed problem. 4. We have proposed Deep Learning based, target-dependent sentiment analysis systems that identify and extract the target of the tweets. The proposed systems are composed of two main steps. First, the targets of the tweet to be analysed are extracted. Afterwards, the polarities of the tweet towards each extracted target are identified. 5. We have proposed a novel, multi-lingual and Deep Learning-based system called Universal Sentiment Analysis System for Low-Resource Languages (UniSent). The system was designed to address the problem of Cross-lingual Sentiment Analysis. It aims to transfer knowledge extracted from annotated sentiment resources in a richresource language (e.g., English) to low-resource languages. 6. As a novel contribution to the fields of Multi-Criteria Decision Aid (MCDA) and SA, we have developed a system called SentiRank. It is a system based on MCDA techniques and sentiment analysis that give users the ability to exploit reviews alongside their preferences in the process of alternatives ranking. To achieve that, we primarily examine the task of aspect-based sentiment analysis to automatically detect and analyse all expressions of sentiment towards a set of aspects in users' reviews. After that, we exploit an MCDA method, named ELECTRE-III, to develop a ranking system based on the decision maker's preferences and the users' reviews. 7. As a case study, and by applying the developed systems, we have collected tweets of local people, visitors, and official brand destination offices from different tourist destinations and analysed the opinions and the emotions shared in these tweets. This work has been carried out in collaboration with national and international researchers specialised in destination brand communication.