5 Things You Need to Know about Sentiment Analysis and Classification

March 23, 2018 at 08:31AM

5 Things You Need to Know about Sentiment Analysis and Classification






We take a look at the important things you need to know about sentiment analysis, including social media, classification, evaluation metrics and how to visualise the results.



By Symeon Symeonidis, Democritus University of Thrace

In the last years, Sentiment Analysis has become a hot-trend topic of scientific and market research in the field of Natural Language Processing (NLP) and Machine Learning. Below, you can find 5 useful things you need to know about Sentiment Analysis that are connected to Social Media, Datasets, Machine Learning, Visualizations, and Evaluation Methods applied by researchers and market experts. Let’s get started!

1. Social Media are the main resource

 
Sentiment Analysis examines the problem of studying texts, like posts and reviews, uploaded by users on microblogging platforms, forums, and electronic businesses, regarding the opinions they have about a product, service, event, person or idea.
Sentiment Fig 1

Figure 1. 3-Classes Sentiment Analysis [1]

The most common use of Sentiment Analysis is this of classifying a text to a class. Depending on the dataset and the reason, Sentiment Classification can be binary (positive or negative) or multi-class (3 or more classes) problem.

In addition, among researchers and stakeholders, you can find either similar or completely different opinions concerning the relation between emotion detection and sentiment analysis, depending on their perspective. However, regardless the result or approach, they all adopt the same techniques.

2. Before starting the Sentiment Analysis

 
Datasets

Many evaluations and labeled sentiment datasets have been created, especially for Twitter posts and Amazon product reviews.

The most popular and widespread are:

Also, anyone using the APIs provided by many platforms and forums can crawl and collect data. The most famous API is that of Twitter.

Pre-processing

An initial step in text and sentiment classification is pre-processing. A significant amount of techniques is applied to data in order to reduce the noise of text, reduce dimensionality, and assist in the improvement of classification effectiveness. The most popular techniques include:

  • Remove numbers
  • Stemming
  • Part of speech tagging
  • Remove punctuation
  • Lowercase
  • Remove stopwords

 

3. How to classify Sentiment?

 
Machine Learning

This approach, employes a machine-learning technique and diverse features to construct a classifier that can identify text that expresses sentiment. Nowadays, deep-learning methods are popular because they fit on data learning representations.

Lexicon-Based

This method uses a variety of words annotated by polarity score, to decide the general assessment score of a given content. The strongest asset of this technique is that it does not require any training data, while its weakest point is that a large number of words and expressions are not included in sentiment lexicons.

Hybrid

The combination of machine learning and lexicon-based approaches to address Sentiment Analysis is called Hybrid. Though not commonly used, this method usually produces more promising results than the approaches mentioned above.
Sentiment Fig 2

Figure 2. Sentiment classification techniques [2]

 

 

4. Evaluation metrics

 
As a classification problem, Sentiment Analysis uses the evaluation metrics of Precision, Recall, F-score, and Accuracy. Also, average measures like macro, micro, and weighted F1-scores are useful for multi-class problems. Depending on the balance of classes of the dataset the most appropriate metric should be used.
Sentiment Fig 3

Figure 3. Steps-to-Evaluate-Sentiment-Analysis [3]

 

5. Visualise Results

 
To visualize the results of Sentiment Analysis, many people employ well-known techniques, such as graphs, histograms, and confusion matrices. Because of present multiple data domains and tasks, visualizations approaches like wordcloud, interactive maps, sparkline-style plots are also very popular.
Sentiment Fig 4

Figure 4. Sentiment Word Cloud [4]

 

To dive deeper into the fascinating world of Sentiment Analysis, we recommend you to follow some posts from KDnuggets:

[1] http://bit.ly/2HXFIS5

[2] http://bit.ly/2pCdvcT

[3] http://bit.ly/2HXwKUY

[4] http://bit.ly/2pz2OIa

Bio: Symeon Symeonidis is a PhD candidate in the area of intention and sentiment mining, at Democritus University of Thrace.

Related:



Read more at KDnuggets http://bit.ly/2I4zCA8

Information Studies