TECS-14328 Camera Ready.pdf

Page 1 of 8

Transactions on Engineering and Computing Sciences - Vol. 11, No. 2

Publication Date: April 25, 2023

DOI:10.14738/tecs.112.14328.

Sargsyan, S., & Hovakimyan, A. (2023). Creating a Sentiment Analyzer for Text Messages. Transactions on Engineering and

Computing Sciences, 11(2). 53-60.

Services for Science and Education – United Kingdom

Creating a Sentiment Analyzer for Text Messages

Siranush Sargsyan

Yerevan State University, Yerevan, Armenia

Anna Hovakimyan

Yerevan State University, Yerevan, Armenia

ABSTRACT

The work is devoted to the use of Logistic Regression and Neural Network methods

to develop a methodology for creating and implementing an analyzer for

determining the sentiment of a text in Armenian. Based on a selection of data from

social networks, this analyzer determines the tone of the message entered by the

user. Note that such work with Armenian texts is carried out for the first time.

Sentiment analysis of the text helps to form an opinion about the content of the

message without reading the entire text, freeing the user from unnecessary work

and time. Sentiment analyzer based on various machine-learning methods for text

classification has been created. The paper presents the results of a comparative

analysis of the models underlying this analyzer. The created analyzer is used in

processing the results of a social survey of students both to determine the students’

opinion about the relevance of teaching a given subject and to assess the quality of

teaching a subject by a teacher. It is also supposed to use the analyzer to determine

the psychological state of the student before the exam in order to adapt the exam to

student.

Keywords: sentiment analysis, machine learning, logistic regression, neural network,

classifier, social networks.

INTRODUCTION

Recently, thanks to the spread and ubiquitous use of the Internet, the task of analyzing the

sentiment of a text is relevant and has gained wide interest. It is the task of Sentiment Analysis

[1]. The purpose of Sentiment Analysis is to identify sentiments expressed in textual content.

The Sentiment Analysis technique can be used to determine, for example, people's opinion of

the quality of a particular product, to analyze the results of a public opinion poll, to determine

the sentiment of a film or article review, to determine the quality of a subject's teaching, etc.

Sentiment Analysis allows researchers to solve a wide range of problems, such as forecasting

the stock market and election results, determining reactions to certain events or news, and so

on. Determining the tone of text messages is able to identify and analyze the emotional

assessment of the author in relation to the objects referred to in the text. This makes it possible

to learn about the author's attitude to any phenomenon, product, service [1, 2].

Sentiment Analysis can be seen as an alternative to traditional surveys. These methods allow

not only to show the current attitude to the object of study, but also to predict changes in

attitude to the object. For example, in the USA there is a project called Pulse of the Nation, whose

Page 2 of 8

Transactions on Engineering and Computing Sciences (TECS) Vol 11, Issue 2, April - 2023

Services for Science and Education – United Kingdom

goal is to determine the mood of the citizens of the country during the day by researching and

analyzing their records in the popular social network Twitter [2].

Since the beginning of the 2010s current trends in sentiment analysis are its use for social

media and multilingual tone analysis. Until now, such works have been occupied by research

outside of English-language texts. The reason for this is most often the lack of available thesauri

in national languages for tonal analysis of texts.

Until 2014, there were only 12 publicly available non-English lexicons for tone analysis. In

2014, an attempt was made to create dictionaries for other languages. The degree of

convergence of the result with the traditional approach was below 50%, and the proposed

dictionaries are practically not used.

In 2016-2018 many works have appeared in which texts in the main European languages are

analyzed. But in general, the authors do not develop dictionaries for each of the languages, but

use machine translation of negative and positive lexemes from available English thesauri. This,

of course, reduces the quality of the analysis [3].

Despite the prospects of this direction, it is not yet so actively used in text information

processing systems. The reasons are the difficulty of identifying emotional vocabulary in texts,

the imperfection of existing text analyzers, and dependence on the subject area. Therefore, the

improvement and development of new methods for sentiment analysis based on machine

learning is an urgent task [4].

The purpose of this work was to create a sentiment analyzer for Armenian texts from social

networks in accordance with their emotional characteristics. Two text classification methods

were applied: Logistic Regression and Neural Networks to create a hybrid analyzer model.

Various experiments were carried out to assess the quality of this analyzer.

We, the authors of the presented article, have been engaged in the educational process for many

years. During the exams, students are in different psychological situations. Before the exam,

with the help of a sentiment analyzer, classifying them into different categories using special

questionnaires, you can balance their condition and conduct the exam in an adaptive way. Work

is underway to create special extended questionnaires to classify students according to their

emotional state.

STATEMENT OF THE PROBLEM

The task of analyzing the sentiment of texts has many effective methods for solving, each of

which has its own characteristics. To solve these problems using information technology,

Natural Language Processing (NLP) methods are used [1, 4].

For a long time, linear models such as support vector machines, logistic regression, etc. have

been used in the mainstream NLP methods. Recently, other machine learning approaches have

taken over. In particular, neural network models are successfully applied in the field of natural

language processing [4,6].

Page 3 of 8

Sargsyan, S., & Hovakimyan, A. (2023). Creating a Sentiment Analyzer for Text Messages. Transactions on Engineering and Computing Sciences, 11(2).

53-60.

URL: http://dx.doi.org/10.14738/tecs.112.14328

This paper discusses the tools and methodologies that underlie logistic regression and

recurrent neural networks [5,6,8]. It is assumed that the models under consideration are able

to learn how to classify texts. Many of them learn when they are presented with examples with

correct answers.

The classic method for determining the tone of a text is the dictionary method. It is based on

dictionaries of emotionally colored words. Each term from the dictionary is assigned its own

weight, either manually or with the help of third-party tools.

There are different methods for determining tonality using dictionaries. For example, each term

in the dictionary is evaluated as a positive or negative sentiment. The positive coloring of the

text is formed based on the weight of the term included in the text with the maximum positive

coloring. Negative coloring is also determined. In the absence of a term in the dictionary, it is

assigned a minimum positive weight [10,11].

The main drawback when using dictionary methods is the procedure for compiling dictionaries

of terms and specifying their weights, since almost always the value of the weight of a term

depends on the chosen subject area. These methods do not allow creating a universal dictionary

of terms, since their weight in different subject areas can differ significantly or be opposite. The

compilation of language corpora has its own characteristics and difficulties. This is typical for

those languages that are relatively rarely used on the Internet (for example, the Armenian

language).

For our implementation of the models, the compilation of the corpus of the Armenian language

was especially time-consuming. The source of information for the Armenian texts was the social

networks Twitter and Facebook. We have developed and implemented a scheme for compiling

dictionaries of terms and indicating their weights based on these texts.

At the next stage, features are selected, for which it is determined whether they are present in

the text or not. For text classification, words, pairs of words, whole phrases, etc., occurring in it,

are considered as features. In the constructed analyzer, words expressing emotions are used as

features.

DESCRIPTION OF THE DOCUMENT CLASSIFICATION PROBLEM

The classification of texts is one of the main tasks of the search engine, the purpose of which is

to include a text in a particular class based on its content. This classification in this case is

performed by the sentiment analyzer [4, 5].

Assume that we have finite sets of classes C = {c1 ... c | C |}, documents D = {d1 ... d | D |} - and the

desired function Ф, which for arbitrary pair <c, d>, where c ∈ C, d ∈ D, determines whether

document d is in class C or not. Ф: D × C → {0,1}.

The task of classification is to find the function Ф', approximating the function Ф. This function

Ф' is the classifier function.