Anda di halaman 1dari 3

Information Retrieval Techniques in Sentiment

Analysis - A Review
Graduate Review Paper

Sunanda Bansal
Computer Science Department
Concordia University
Montreal, Canada
sunandabansal92@gmail.com

Abstract—We discuss 5 research papers from the field of learning algorithms. It drew a comparison between various
sentiment analysis selected from over a range of about 15 years settings for implementing features in the algorithms, like
with respect to their approaches and use of information retrieval bag-of-unigrams and bigrams. In the end, unigram presence
techniques.
information was found out to be most effective for employing
I. I NTRODUCTION features. In the beginning, on a closer inspection of the
problem, a hypothesis suggesting classification on the basis of
Sentiment analysis is the identification of correct sentiment
bag-of-words was experimented with, which later gave better
associated with a document unit or topic. The sentiment can
results when the same was based on a preliminary examination
be a very a broad term ans thus the major focus of sentiment
of frequency counts and not just the presence in corpus, plus
analysis is to process document units that exhibit positive or
introspection. It led them to explore corpus-based techniques
negative sentiments. The research has been focused majorly,
to select better sentiment indicators, which further leads to
not exclusively, on the two subproblems
their machine learning approaches in analyzing sentiments.
• Subjectivity Classification - subjective/objective opinion
• Polarity Detection - positive/negative B. Machine Learning
Sentiment Analysis has been approached with various strate- One of the main things to consider with supervised learning
gies and has been investigated at different levels of the text has been the representation of a document unit. Some earlier
- document, sentence, entity and aspect level [3]. Lexicon work [1] used bag-of-words model to represent the document
based approach and the text classification approach are the two with the bag-of-unigrams focusing only on the presence or
major approaches taken to extract the sentiment automatically. absence of the words which provides a good baseline classi-
The latter is where most of the work in sentiment analysis is fication accuracy in comparison to other settings like bag-of-
focused on, i.e. supervised learning techniques, which can also bigrams, adjectives, etc. We will be starting from the research
be described as statistical or machine learning approach [6]. by Pang et al. in 2002 [1] which focused on classifying
Research papers by Pang et al. [1] and Pang et al. [4] mainly the document on the basis of frequency of appearance of
focus on various machine learning algorithms while research certain lexical features [1]. Some later approaches focused on
paper by JTaboada et al. [5] focuses on lexicon based approach applying text-categorization to only the subjective portions of
in which they use dictionary compilation. We also discuss the the documents [4].
approach used in Maas et al. [7] where they use a mix of The bag-of-features framework was used in [1] to imple-
supervised and unsupervised learning to learn word vectors ment three standard algorithms: Naive Bayes classification,
and propose a model that predicts sentiment annotations on maximum entropy classification, and Support Vector Machines
contexts of words by using the vector representation of the (SVM). Naive Bayes Classification performs optimally for
same. Paltoglou et al. [6], on the other hand examine whether certain problem classes with highly dependent features. But
term weighting functions from the information retrieval rank- maximum Entropy classification makes no assumption about
ing approaches could be useful in increasing the accuracy of the conditional independence of features, unlike Naive Bayes
classification. Classification, and thus may potentially outperform Naive
Bayes Classification in some cases.
II. A PPROACHES
The Naive Bayes Classification is based on the idea to
A. Bag of words model assign the document unit to the class that is most likely to be
Out of the various approaches used in [1], it made use the right one, given a training set, observed by Bayes’ rule. In
of standard bag-of-features framework, which is essentially Maximum Entropy approach, it also guesses the most likely
similar to bag-of-words model, to implement the machine class for a document, but based on a feature/class function
and feature-weight parameters, which are set such that it The unsupervised semantic similarity evaluating component of
maximizes the entropy of the induced distribution. The third the model does not require labeled data. But the supervised
algorithm used is Support Vector Machines (SVM) which is sentiment component uses sentiment labels so that the word
entirely different from the first two approaches. In the two- vectors should predict the sentiment label.
category case, such as this, the basic idea behind the training
approach is to find a hyperplane that separates document C. Dictionary Compilation
vectors in one class from those in other class with as large Dictionary compilation is a lexicon based approach where
a margin as possible. The classification is simply determined most of the research regarding this approach has focused on
by which side of the hyperplane they fall on. using adjectives as indicators of semantic orientation of text
Reference [4], on the other hand, focuses on applying [5]. Reference [5], uses dictionary of words annotated with
machine learning technique such that it focuses only on the the word’s semantic orientation or polarity which is calculated
subjective portions of the document. Unlike [1], [4] focuses by aggregation of the sentiment orientation score of all the
on discarding the sentences that can be labeled as objective adjectives in the document into a single score for the text.
in the document unit. The approach used for extracting these So, the sentiment orientation of entire document is basically
portions is finding minimum cuts in graphs. In reference [4], a combined effect of the adjectives or other sentiment-bearing
the approach is to integrate sentence level-subjectivity detec- words found withing the document, based upon a dictionary
tion with document-level sentiment polarity. For determining containing these words and their corresponding sentiment
whether two sentences that exhibit some proximity in terms of orientation scores. This paper criticizes machine learning
the subjectivity level, the idea used here, is to use graph-based based approach for it is highly domain specific and does
formulation relying on finding minimum cuts. It works on an not take the negation and intensification into account. This
undirected graph and calculates partition cost on the basis of paper also attempts to address the criticism against lexicon-
individual and associative scores for the class for each item. based approach that says the dictionaries are unreliable, by
In [5], Taboada et al. argue that the performance of classi- presenting results that attempt to show that their dictionaries
fiers so trained are domain specific and the polarity assigned are robust and reliable.
by these classifiers to the words makes sense mostly within the
context of that domain. In this way the lexicon based approach D. Weighting Schemes
seems more suitable for generalized analysis. Tabokada et al. Reference [6] hypothesize that the tf-idf weighting schemes
also argue that the classifiers that learn from Machine learning and the fact that these proved quite effective for information
approaches do not handle negation and intensification well but retrieval may indicate that these weighting schemes may ap-
their lexicon based model does in a way that ”generalizes to propriately model the distinctive power of terms of documents.
all the worlds that have a semantic orientation value” [5]. The experiment is not restricted to tf-idf weighting schemes.
Another approach is to capture semantic term-document The use of weighting schemes is shown to have significantly
information an rich semantic content by using a mix of improved the accuracy of classification compared to other
supervised and unsupervised learning techniques as proposed approaches.
in [7]. Using indices in a vocabulary to represent words fail In an attempt to evaluate their model, [7] attempts document
to capture the relational structure of the lexicons which can level polarity classification using bag of words vector v and
be useful in determining the continuous similarities between features obtained from their model Rv with arbitrary tf-idf
words. The vector-based model, on the other hand does that weights. Cosine normalization is not applied to the v instead
much better by encoding these similarities between words as it is applied to the final feature vector Rv, though in the
distance of angle between word vectors. The model proposed experiment they found that ’bnn’ weighting works best for
by Maas et al. (2011) attempts to capture both semantic and v, when document features are generated through the product
sentiment similarities among words. The semantic component Rv, and was then used to get multi-word representations in all
learns via unsupervised model of documents and thus can their experiments.
determine the words to be semantically close but does not
capture the crucial sentiment information. Thus the model is III. C ONCLUSION
extended to use a supervised sentiment component with the The simplistic model with bag-of-unigrams and supervised
vector representation of words where it predicts the sentiment learning, proposed by Pang et al. [1] gives the performance that
annotations on the context of the words. is not much outperformed by using other complicated feature
The model presented in [7] uses a probabilistic model settings. Though most research in sentiment analysis focuses
of documents, based on probabilistic topic model such as on supervised learning methods, it has its disadvantages for it
Latent Dirichlet Allocation (LDA) [8], to capture semantic is domain specific and does not account for the negation and
similarities. The model attempts to directly model the word intensification of sentiments. For these reasons, lexicon based
probabilities conditioned on the topic mixture, instead of model is more appropriate for general and domain-independent
modeling individual topics. This component doesn’t explicitly purposes. Lexicon based model can be criticized for the
capture sentiments associated, rather just produces representa- unreliability of dictionaries. In the lexicon based approach
tions of words that occur together with similar representations. most research is focused on using adjectives as appropriate
indicators for sentiment orientation. No one approach can be
assumed to be the best approach, in this case. As for, use
of information retrieval techniques, from weighting to bag-
of-words model to vector space model, it seems that these
techniques are widely used in the research field of Sentiment
Analysis.
R EFERENCES
[1] Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. ”Thumbs up?:
sentiment classification using machine learning techniques.” Proceedings
of the ACL-02 conference on Empirical methods in natural language
processing-Volume 10. Association for Computational Linguistics, 2002.
[2] Pang, Bo, and Lillian Lee. ”Opinion mining and sentiment analysis.”
Foundations and Trends in Information Retrieval 2.12 (2008): 1-135.
[3] Liu, Bing. ”Sentiment analysis and opinion mining.” Synthesis lectures
on human language technologies 5.1 (2012): 1-167.
[4] Pang, Bo, and Lillian Lee. ”A sentimental education: Sentiment analysis
using subjectivity summarization based on minimum cuts.” Proceedings
of the 42nd annual meeting on Association for Computational Linguis-
tics. Association for Computational Linguistics, 2004.
[5] JTaboada, Maite, et al. ”Lexicon-based methods for sentiment analysis.”
Computational linguistics 37.2 (2011): 267-307.
[6] Paltoglou, Georgios, and Mike Thelwall. ”A study of information
retrieval weighting schemes for sentiment analysis.” Proceedings of the
48th annual meeting of the association for computational linguistics.
Association for Computational Linguistics, 2010.
[7] Maas, Andrew L., et al. ”Learning word vectors for sentiment analy-
sis.” Proceedings of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies-Volume 1.
Association for Computational Linguistics, 2011.
[8] D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation.
Journal of Machine Learning Research, 3:9931022,May.

Anda mungkin juga menyukai