Sentiment Analysis

BHARATI VIDYAPEETH’S COLLEGE OF ENGINEERING
A-4 PASCHIM VIHAR,DELHI
Sentiment Analysis Using Live

Twitter Data
UNDER THE GUIDANCE OF
Akansha Tanwar
(ASSISTANT PROFESSOR,CSE DEPARTMENT)
SUBMITTED BY
Himesh S Kulshrestha Chirag Bansal
00751202713 01451202713
Devesh Agrawal Saurabh Chawla

01651202713 02251202713
November,2016
Introduction
Sentiment Analysis or Opinion Mining involves finding out relevant

information from source material using techniques like Natural Language
Processing and Machine Learning. It is usually aimed at finding out what
the speaker/writer meant while saying/writing the sentence. Now, since
microblogging websites like Twitter and Facebook are in rage today and
are slowly turning into a “forum” where people provide their opinions and
check other people’s opinions too. This makes it a heaven for businesses
or other entities as they can learn about people’s opinions about different
factors which might have an effect on their products.
The Process
Natural Language Machine Sentiment Analysis

Processing Learning
Training Testing
OBJECTIVE
Our work involves performing sentiment analysis on live twitter data i.e
real time data, which we gather from the Twitter website using Tweepy
(an API), using various Machine Learning algorithms like Naïve Bayes
and its variants, Support Vector Clustering and Logistical Regression
after performing the classification, chunking, and tagging the part of
speech using Natural Language Processing.
NATURAL LANGUAGE PROCESSING TECHNIQUES
1.TOKENIZING
It is the process of breaking a stream of text up into the words, phrases or symbols called
tokens.
2.STOP WORDS
One of the major forms of pre processing is to filter useless data and in natural language
processing useless data are referred to as stop words.
3.PART OF SPEECH TAGGING

Labeling words in a sentence as nouns, adjectives, verbs etc. It can also label by tense and
more.
4.CHUNKING
One of the major goals of chunking is to group into what are known as noun phrases.
NATURAL LANGUAGE PROCESSING TECHNIQUES
5. NAMED ENTITY RECOGNIZATION

One of the most major forms of chunking in natural language processing is called Name
entity recognition. The idea is to have the machine immediately be able to pull out "entities"
like people, places, things, locations, monetary figures, and more.
6. LEMMATIZATION
Stemming can often create non-existent words, whereas lemmas are actual words.
7. CORPORA WITH NLTK

We can use Word Net alongside the NLTK module to find the meanings of words,
synonyms, antonyms, and more.
WHERE WE GO DIFFERENT: TRAINING & NLP
Get a classified and

Use of another Perform NLP on clean dataset for
dataset for training this dataset training
WHAT HAPPENS AFTER TRAINING
MACHINE LEARNING ALGORITHMS
• NAIVE BAYES
The algorithm that we're going to use first is the Naïve Bayes. This
is a pretty popular algorithm used in text classification, so it is only
fitting that we try it out first. Before we can train and test our
algorithm, however, we need to go ahead and split up the data into
a training set and a testing set.
MULTINOMIAL NAÏVE BAYES
With a multinomial event model, samples represent the frequencies

with which certain events have been generated by
a multinomial where is the probability that event I occurs (or K such
multinomial in the multiclass case). A feature vector is then
a histogram, with counting the number of times event I was observed
in a particular instance.
SUPPORT VECTOR CLUSTERING
Similar to SVC with parameter kernel=’linear’, but implemented in

terms of lib linear rather than lib svm, so it has more flexibility in the
choice of penalties and loss functions and should scale better to large
numbers of samples.
This class supports both dense and sparse input and the multiclass
support is handled according to a one vs. the rest scheme.
SOME RESULTS: TWEETS
LOGISTIC REGRESSION
In general when we make a machine learning based program,

we are trying to come up with a function that can predict for
future inputs based on the experience it has gained through the
past inputs and their outputs.
SOME RESULTS: LIVE PLOTTING
CONCLUSION
Sentiment Analysis is rapidly gaining momentum as one of the leading

technologies in the emerging world. This paper tried to find out how it would fair
in a real life scenario, for example during a live debate of presidential elections,
how people react to different point of views.
We use algorithms such as Support Vector Machines and Logistic Regression
but after studying previous works [1-9] we found that they are not much better
than Naïve Bayes. Thus, it can be concluded that the need for a better
algorithm for sentiment analysis is the need for the hour.
Future Scope
 Different data extraction techniques can be utilized.
 Apart from Twitter, multiple data sources and data filtering

techniques can be used to find out relevant results.
REFERENCES
 Bo Pang, Lilllian Lee, Shivakumar V, “Thumbs up? Sentiment Classification using

Machine Learning Techniques”, Proceedings of EMNLP 2002, pp. 79–86.
 Esuli, A., and Sebastiani, F.. SentiWordNet: A publicly available lexical resource for
opinion mining. In Proceedings of LREC, 2006.
 Efthymios Kouloumpis, Theresea W., Johanna Moore, “Twitter Sentiment Analysis: The
Good the Bad and the OMG!”,Proceedings of the Fifth International AAAI Conference
on Weblogs and Social Media.
 Barbosa, L., and Feng, J. “Robust sentiment detection on twitter from biased and noisy
data.” In Proc. of Coling, 2010.
 P. D. Turney, “Thumbs up or thumbs down?: semantic orientation applied to
unsupervised classification of reviews,” in Proceedings of the 40th annual meeting on
association for computational linguistics, pp. 417–424, Association for Computational
Linguistics, 2002.
 Vishal A.K. and S.S.Sonawane, “Sentiment Analysis of Twitter Data: A Survey of
Techniques”, International Journal of Computer Applications (0975 – 8887), 2016
REFERENCES
 Yorick Wilks and Mark Stevenson.. The grammar of sense: Using part-of-speech tags
as a first step in semantic disambiguation. Journal of Natural Language Engineering,
4(2):135–14, 1998.
 DongSung Kim2 and Jong Woo Kim, “Public Opinion Mining on Social Media: A Case
Study of Twitter Opinion on Nuclear Power1”, Advanced Science and Technology
Letters,Vol.51 (CESCUBE 2014), pp.224-228.
 G. Vinodhini, RM. Chandrasekaran, “Sentiment Analysis and Opinion Mining : A
Survey”, International Journal of Advanced Research in Computer Science and
Software Engineering,Volume 2, Issue 6, June 2012.
 Ben Hur, David Horn, Hava T. Siegelman, Vladimir Vapnik, “Support Vector
Clustering”, Journal of Machine Learning Research 2 (2001) 125-137, 2001.
 B.Liu and L.Zhang ", A survey of opinion mining and sentiment analysis." Mining text
data.Springer US,.415-463, 2012.
 Maite Taboada, J.Brooke, M.Tofiloski, K.Voll, M.Stede "Lexicon-based methods for
sentiment analysis."Computational linguistics 37.2, 2012.
Thanks!

Sentiment Analysis

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Sentiment Analysis

Diunggah oleh

Hak Cipta:

Format Tersedia

BHARATI VIDYAPEETH’S COLLEGE OF ENGINEERING

A-4 PASCHIM VIHAR,DELHI

Sentiment Analysis Using Live

Devesh Agrawal Saurabh Chawla

Sentiment Analysis or Opinion Mining involves finding out relevant

Natural Language Machine Sentiment Analysis

3.PART OF SPEECH TAGGING

5. NAMED ENTITY RECOGNIZATION

7. CORPORA WITH NLTK

Get a classified and

MULTINOMIAL NAÏVE BAYES

With a multinomial event model, samples represent the frequencies

SUPPORT VECTOR CLUSTERING

Similar to SVC with parameter kernel=’linear’, but implemented in

In general when we make a machine learning based program,

Sentiment Analysis is rapidly gaining momentum as one of the leading

 Different data extraction techniques can be utilized.

 Apart from Twitter, multiple data sources and data filtering

 Bo Pang, Lilllian Lee, Shivakumar V, “Thumbs up? Sentiment Classification using

Anda mungkin juga menyukai