March 23, 2018 at 08:31AM
5 Things You Need to Know about Sentiment Analysis and Classification
We take a look at the important things you need to know about sentiment analysis, including social media, classification, evaluation metrics and how to visualise the results.
By Symeon Symeonidis, Democritus University of Thrace
In the last years, Sentiment Analysis has become a hot-trend topic of scientific and market research in the field of Natural Language Processing (NLP) and Machine Learning. Below, you can find 5 useful things you need to know about Sentiment Analysis that are connected to Social Media, Datasets, Machine Learning, Visualizations, and Evaluation Methods applied by researchers and market experts. Let’s get started!
1. Social Media are the main resource
Sentiment Analysis examines the problem of studying texts, like posts and reviews, uploaded by users on microblogging platforms, forums, and electronic businesses, regarding the opinions they have about a product, service, event, person or idea.
Figure 1. 3-Classes Sentiment Analysis [1]
The most common use of Sentiment Analysis is this of classifying a text to a class. Depending on the dataset and the reason, Sentiment Classification can be binary (positive or negative) or multi-class (3 or more classes) problem.
In addition, among researchers and stakeholders, you can find either similar or completely different opinions concerning the relation between emotion detection and sentiment analysis, depending on their perspective. However, regardless the result or approach, they all adopt the same techniques.
2. Before starting the Sentiment Analysis
Datasets
Many evaluations and labeled sentiment datasets have been created, especially for Twitter posts and Amazon product reviews.
The most popular and widespread are:
Also, anyone using the APIs provided by many platforms and forums can crawl and collect data. The most famous API is that of Twitter.
Pre-processing
An initial step in text and sentiment classification is pre-processing. A significant amount of techniques is applied to data in order to reduce the noise of text, reduce dimensionality, and assist in the improvement of classification effectiveness. The most popular techniques include:
- Remove numbers
- Stemming
- Part of speech tagging
- Remove punctuation
- Lowercase
- Remove stopwords
3. How to classify Sentiment?
Machine Learning
This approach, employes a machine-learning technique and diverse features to construct a classifier that can identify text that expresses sentiment. Nowadays, deep-learning methods are popular because they fit on data learning representations.
Lexicon-Based
This method uses a variety of words annotated by polarity score, to decide the general assessment score of a given content. The strongest asset of this technique is that it does not require any training data, while its weakest point is that a large number of words and expressions are not included in sentiment lexicons.
Hybrid
The combination of machine learning and lexicon-based approaches to address Sentiment Analysis is called Hybrid. Though not commonly used, this method usually produces more promising results than the approaches mentioned above.
Figure 2. Sentiment classification techniques [2]
4. Evaluation metrics
As a classification problem, Sentiment Analysis uses the evaluation metrics of Precision, Recall, F-score, and Accuracy. Also, average measures like macro, micro, and weighted F1-scores are useful for multi-class problems. Depending on the balance of classes of the dataset the most appropriate metric should be used.
Figure 3. Steps-to-Evaluate-Sentiment-Analysis [3]
5. Visualise Results
To visualize the results of Sentiment Analysis, many people employ well-known techniques, such as graphs, histograms, and confusion matrices. Because of present multiple data domains and tasks, visualizations approaches like wordcloud, interactive maps, sparkline-style plots are also very popular.
Figure 4. Sentiment Word Cloud [4]
To dive deeper into the fascinating world of Sentiment Analysis, we recommend you to follow some posts from KDnuggets:
[1] http://bit.ly/2HXFIS5 [2] http://bit.ly/2pCdvcT [3] http://bit.ly/2HXwKUY [4] http://bit.ly/2pz2OIaBio: Symeon Symeonidis is a PhD candidate in the area of intention and sentiment mining, at Democritus University of Thrace.
Related:
Read more at KDnuggets http://bit.ly/2I4zCA8
Information Studies