Page 1 of 8
Transactions on Engineering and Computing Sciences - Vol. 11, No. 2
Publication Date: April 25, 2023
DOI:10.14738/tecs.112.14328.
Sargsyan, S., & Hovakimyan, A. (2023). Creating a Sentiment Analyzer for Text Messages. Transactions on Engineering and
Computing Sciences, 11(2). 53-60.
Services for Science and Education – United Kingdom
Creating a Sentiment Analyzer for Text Messages
Siranush Sargsyan
Yerevan State University, Yerevan, Armenia
Anna Hovakimyan
Yerevan State University, Yerevan, Armenia
ABSTRACT
The work is devoted to the use of Logistic Regression and Neural Network methods
to develop a methodology for creating and implementing an analyzer for
determining the sentiment of a text in Armenian. Based on a selection of data from
social networks, this analyzer determines the tone of the message entered by the
user. Note that such work with Armenian texts is carried out for the first time.
Sentiment analysis of the text helps to form an opinion about the content of the
message without reading the entire text, freeing the user from unnecessary work
and time. Sentiment analyzer based on various machine-learning methods for text
classification has been created. The paper presents the results of a comparative
analysis of the models underlying this analyzer. The created analyzer is used in
processing the results of a social survey of students both to determine the students’
opinion about the relevance of teaching a given subject and to assess the quality of
teaching a subject by a teacher. It is also supposed to use the analyzer to determine
the psychological state of the student before the exam in order to adapt the exam to
student.
Keywords: sentiment analysis, machine learning, logistic regression, neural network,
classifier, social networks.
INTRODUCTION
Recently, thanks to the spread and ubiquitous use of the Internet, the task of analyzing the
sentiment of a text is relevant and has gained wide interest. It is the task of Sentiment Analysis
[1]. The purpose of Sentiment Analysis is to identify sentiments expressed in textual content.
The Sentiment Analysis technique can be used to determine, for example, people's opinion of
the quality of a particular product, to analyze the results of a public opinion poll, to determine
the sentiment of a film or article review, to determine the quality of a subject's teaching, etc.
Sentiment Analysis allows researchers to solve a wide range of problems, such as forecasting
the stock market and election results, determining reactions to certain events or news, and so
on. Determining the tone of text messages is able to identify and analyze the emotional
assessment of the author in relation to the objects referred to in the text. This makes it possible
to learn about the author's attitude to any phenomenon, product, service [1, 2].
Sentiment Analysis can be seen as an alternative to traditional surveys. These methods allow
not only to show the current attitude to the object of study, but also to predict changes in
attitude to the object. For example, in the USA there is a project called Pulse of the Nation, whose
Page 2 of 8
54
Transactions on Engineering and Computing Sciences (TECS) Vol 11, Issue 2, April - 2023
Services for Science and Education – United Kingdom
goal is to determine the mood of the citizens of the country during the day by researching and
analyzing their records in the popular social network Twitter [2].
Since the beginning of the 2010s current trends in sentiment analysis are its use for social
media and multilingual tone analysis. Until now, such works have been occupied by research
outside of English-language texts. The reason for this is most often the lack of available thesauri
in national languages for tonal analysis of texts.
Until 2014, there were only 12 publicly available non-English lexicons for tone analysis. In
2014, an attempt was made to create dictionaries for other languages. The degree of
convergence of the result with the traditional approach was below 50%, and the proposed
dictionaries are practically not used.
In 2016-2018 many works have appeared in which texts in the main European languages are
analyzed. But in general, the authors do not develop dictionaries for each of the languages, but
use machine translation of negative and positive lexemes from available English thesauri. This,
of course, reduces the quality of the analysis [3].
Despite the prospects of this direction, it is not yet so actively used in text information
processing systems. The reasons are the difficulty of identifying emotional vocabulary in texts,
the imperfection of existing text analyzers, and dependence on the subject area. Therefore, the
improvement and development of new methods for sentiment analysis based on machine
learning is an urgent task [4].
The purpose of this work was to create a sentiment analyzer for Armenian texts from social
networks in accordance with their emotional characteristics. Two text classification methods
were applied: Logistic Regression and Neural Networks to create a hybrid analyzer model.
Various experiments were carried out to assess the quality of this analyzer.
We, the authors of the presented article, have been engaged in the educational process for many
years. During the exams, students are in different psychological situations. Before the exam,
with the help of a sentiment analyzer, classifying them into different categories using special
questionnaires, you can balance their condition and conduct the exam in an adaptive way. Work
is underway to create special extended questionnaires to classify students according to their
emotional state.
STATEMENT OF THE PROBLEM
The task of analyzing the sentiment of texts has many effective methods for solving, each of
which has its own characteristics. To solve these problems using information technology,
Natural Language Processing (NLP) methods are used [1, 4].
For a long time, linear models such as support vector machines, logistic regression, etc. have
been used in the mainstream NLP methods. Recently, other machine learning approaches have
taken over. In particular, neural network models are successfully applied in the field of natural
language processing [4,6].
Page 3 of 8
55
Sargsyan, S., & Hovakimyan, A. (2023). Creating a Sentiment Analyzer for Text Messages. Transactions on Engineering and Computing Sciences, 11(2).
53-60.
URL: http://dx.doi.org/10.14738/tecs.112.14328
This paper discusses the tools and methodologies that underlie logistic regression and
recurrent neural networks [5,6,8]. It is assumed that the models under consideration are able
to learn how to classify texts. Many of them learn when they are presented with examples with
correct answers.
The classic method for determining the tone of a text is the dictionary method. It is based on
dictionaries of emotionally colored words. Each term from the dictionary is assigned its own
weight, either manually or with the help of third-party tools.
There are different methods for determining tonality using dictionaries. For example, each term
in the dictionary is evaluated as a positive or negative sentiment. The positive coloring of the
text is formed based on the weight of the term included in the text with the maximum positive
coloring. Negative coloring is also determined. In the absence of a term in the dictionary, it is
assigned a minimum positive weight [10,11].
The main drawback when using dictionary methods is the procedure for compiling dictionaries
of terms and specifying their weights, since almost always the value of the weight of a term
depends on the chosen subject area. These methods do not allow creating a universal dictionary
of terms, since their weight in different subject areas can differ significantly or be opposite. The
compilation of language corpora has its own characteristics and difficulties. This is typical for
those languages that are relatively rarely used on the Internet (for example, the Armenian
language).
For our implementation of the models, the compilation of the corpus of the Armenian language
was especially time-consuming. The source of information for the Armenian texts was the social
networks Twitter and Facebook. We have developed and implemented a scheme for compiling
dictionaries of terms and indicating their weights based on these texts.
At the next stage, features are selected, for which it is determined whether they are present in
the text or not. For text classification, words, pairs of words, whole phrases, etc., occurring in it,
are considered as features. In the constructed analyzer, words expressing emotions are used as
features.
DESCRIPTION OF THE DOCUMENT CLASSIFICATION PROBLEM
The classification of texts is one of the main tasks of the search engine, the purpose of which is
to include a text in a particular class based on its content. This classification in this case is
performed by the sentiment analyzer [4, 5].
Assume that we have finite sets of classes C = {c1 ... c | C |}, documents D = {d1 ... d | D |} - and the
desired function Ф, which for arbitrary pair <c, d>, where c ∈ C, d ∈ D, determines whether
document d is in class C or not. Ф: D × C → {0,1}.
The task of classification is to find the function Ф', approximating the function Ф. This function
Ф' is the classifier function.