Anda di halaman 1dari 19

BHARATI VIDYAPEETH’S COLLEGE OF ENGINEERING

A-4 PASCHIM VIHAR,DELHI

Sentiment Analysis Using Live


Twitter Data
UNDER THE GUIDANCE OF
Akansha Tanwar
(ASSISTANT PROFESSOR,CSE DEPARTMENT)

SUBMITTED BY
Himesh S Kulshrestha Chirag Bansal
00751202713 01451202713

Devesh Agrawal Saurabh Chawla


01651202713 02251202713
November,2016
Introduction

Sentiment Analysis or Opinion Mining involves finding out relevant


information from source material using techniques like Natural Language
Processing and Machine Learning. It is usually aimed at finding out what
the speaker/writer meant while saying/writing the sentence. Now, since
microblogging websites like Twitter and Facebook are in rage today and
are slowly turning into a “forum” where people provide their opinions and
check other people’s opinions too. This makes it a heaven for businesses
or other entities as they can learn about people’s opinions about different
factors which might have an effect on their products.
The Process

Natural Language Machine Sentiment Analysis


Processing Learning

Training Testing
OBJECTIVE

Our work involves performing sentiment analysis on live twitter data i.e
real time data, which we gather from the Twitter website using Tweepy
(an API), using various Machine Learning algorithms like Naïve Bayes
and its variants, Support Vector Clustering and Logistical Regression
after performing the classification, chunking, and tagging the part of
speech using Natural Language Processing.
NATURAL LANGUAGE PROCESSING TECHNIQUES

1.TOKENIZING
It is the process of breaking a stream of text up into the words, phrases or symbols called
tokens.

2.STOP WORDS
One of the major forms of pre processing is to filter useless data and in natural language
processing useless data are referred to as stop words.

3.PART OF SPEECH TAGGING


Labeling words in a sentence as nouns, adjectives, verbs etc. It can also label by tense and
more.

4.CHUNKING
One of the major goals of chunking is to group into what are known as noun phrases.
NATURAL LANGUAGE PROCESSING TECHNIQUES

5. NAMED ENTITY RECOGNIZATION


One of the most major forms of chunking in natural language processing is called Name
entity recognition. The idea is to have the machine immediately be able to pull out "entities"
like people, places, things, locations, monetary figures, and more.

6. LEMMATIZATION
Stemming can often create non-existent words, whereas lemmas are actual words.

7. CORPORA WITH NLTK


We can use Word Net alongside the NLTK module to find the meanings of words,
synonyms, antonyms, and more.
WHERE WE GO DIFFERENT: TRAINING & NLP

Get a classified and


Use of another Perform NLP on clean dataset for
dataset for training this dataset training
WHAT HAPPENS AFTER TRAINING
MACHINE LEARNING ALGORITHMS

• NAIVE BAYES

The algorithm that we're going to use first is the Naïve Bayes. This
is a pretty popular algorithm used in text classification, so it is only
fitting that we try it out first. Before we can train and test our
algorithm, however, we need to go ahead and split up the data into
a training set and a testing set.
MACHINE LEARNING ALGORITHMS

MULTINOMIAL NAÏVE BAYES

With a multinomial event model, samples represent the frequencies


with which certain events have been generated by
a multinomial where is the probability that event I occurs (or K such
multinomial in the multiclass case). A feature vector is then
a histogram, with counting the number of times event I was observed
in a particular instance.
MACHINE LEARNING ALGORITHMS

SUPPORT VECTOR CLUSTERING

Similar to SVC with parameter kernel=’linear’, but implemented in


terms of lib linear rather than lib svm, so it has more flexibility in the
choice of penalties and loss functions and should scale better to large
numbers of samples.
This class supports both dense and sparse input and the multiclass
support is handled according to a one vs. the rest scheme.
SOME RESULTS: TWEETS
MACHINE LEARNING ALGORITHMS

LOGISTIC REGRESSION

In general when we make a machine learning based program,


we are trying to come up with a function that can predict for
future inputs based on the experience it has gained through the
past inputs and their outputs.
SOME RESULTS: LIVE PLOTTING
CONCLUSION

Sentiment Analysis is rapidly gaining momentum as one of the leading


technologies in the emerging world. This paper tried to find out how it would fair
in a real life scenario, for example during a live debate of presidential elections,
how people react to different point of views.
We use algorithms such as Support Vector Machines and Logistic Regression
but after studying previous works [1-9] we found that they are not much better
than Naïve Bayes. Thus, it can be concluded that the need for a better
algorithm for sentiment analysis is the need for the hour.
Future Scope

 Different data extraction techniques can be utilized.

 Apart from Twitter, multiple data sources and data filtering


techniques can be used to find out relevant results.
REFERENCES

 Bo Pang, Lilllian Lee, Shivakumar V, “Thumbs up? Sentiment Classification using


Machine Learning Techniques”, Proceedings of EMNLP 2002, pp. 79–86.
 Esuli, A., and Sebastiani, F.. SentiWordNet: A publicly available lexical resource for
opinion mining. In Proceedings of LREC, 2006.
 Efthymios Kouloumpis, Theresea W., Johanna Moore, “Twitter Sentiment Analysis: The
Good the Bad and the OMG!”,Proceedings of the Fifth International AAAI Conference
on Weblogs and Social Media.
 Barbosa, L., and Feng, J. “Robust sentiment detection on twitter from biased and noisy
data.” In Proc. of Coling, 2010.
 P. D. Turney, “Thumbs up or thumbs down?: semantic orientation applied to
unsupervised classification of reviews,” in Proceedings of the 40th annual meeting on
association for computational linguistics, pp. 417–424, Association for Computational
Linguistics, 2002.
 Vishal A.K. and S.S.Sonawane, “Sentiment Analysis of Twitter Data: A Survey of
Techniques”, International Journal of Computer Applications (0975 – 8887), 2016
REFERENCES
 Yorick Wilks and Mark Stevenson.. The grammar of sense: Using part-of-speech tags
as a first step in semantic disambiguation. Journal of Natural Language Engineering,
4(2):135–14, 1998.
 DongSung Kim2 and Jong Woo Kim, “Public Opinion Mining on Social Media: A Case
Study of Twitter Opinion on Nuclear Power1”, Advanced Science and Technology
Letters,Vol.51 (CESCUBE 2014), pp.224-228.
 G. Vinodhini, RM. Chandrasekaran, “Sentiment Analysis and Opinion Mining : A
Survey”, International Journal of Advanced Research in Computer Science and
Software Engineering,Volume 2, Issue 6, June 2012.
 Ben Hur, David Horn, Hava T. Siegelman, Vladimir Vapnik, “Support Vector
Clustering”, Journal of Machine Learning Research 2 (2001) 125-137, 2001.
 B.Liu and L.Zhang ", A survey of opinion mining and sentiment analysis." Mining text
data.Springer US,.415-463, 2012.
 Maite Taboada, J.Brooke, M.Tofiloski, K.Voll, M.Stede "Lexicon-based methods for
sentiment analysis."Computational linguistics 37.2, 2012.
Thanks!

Anda mungkin juga menyukai