Anda di halaman 1dari 45

Step by Step - Built your own

Sentiment Analysis Application

Wi-Fi: AIIS-GUEST
Password: @rFD5yg8inet

© 2018 KNIME AG. All Rights Reserved.


What is KNIME Analytics Platform?

• A tool for data analysis, manipulation, visualization, and reporting


• Based on the graphical programming paradigm
• Provides a diverse array of extensions:
• Text Mining
• Network Mining
• Cheminformatics
• Many integrations
such as Java, R, Python,
Weka, H2O, etc.

© 2018 KNIME AG. All Rights Reserved.


2 2
Installation

1.) 2.)

© 2018 KNIME AG. All Rights Reserved.


3 3
Sentiment Analysis

© 2018 KNIME AG. All Rights Reserved. 4


Sentiment Analysis

Task: Determine the expressed opinion in a document,


e.g. positive, negative.

Approach 1: Lexicon based


Approach 2: Supervised learning

© 2018 KNIME AG. All Rights Reserved. 5


Approach 1: Lexicon based
Idea: Rule-based classification with Dictionary Tagger
1. Use custom dictionary to tag positive and negative
words
2. Count number of positive and negative words per
document
3. Assign class depending on number of positive and
negative words
#Positive – #Negative <0 => Negative
#Positive – #Negative >0 => Positive
Advantage: No labels needed

© 2018 KNIME AG. All Rights Reserved. 6


Rule-based Classification with Dictionary Tagger

© 2018 KNIME AG. All Rights Reserved. 7


Approach 2: Supervised Learning
Idea: Train a model to make predictions
1. Collect labeled documents, or label your documents.
2. Extract a feature space from the documents, e.g. only
keywords.
3. Train a supervised model, e.g. decision tree, logistic
regression, LSTM models.

Advantage: Better performance


According to: A comparison study of sentiment analysis techniques by Mr. S. M. Vohra,
Prof. J. B. Teraiya

© 2018 KNIME AG. All Rights Reserved. 8


Approach 2: Supervised Learning

© 2018 KNIME AG. All Rights Reserved. 9


The KNIME Text Processing Extension in
KNIME Analytics Platform

© 2018 KNIME AG. All Rights Reserved. 10


Installation

1.) 2.)

© 2018 KNIME AG. All Rights Reserved.


11 11
Tip

• Increase maximum memory for KNIME


• Edit knime.ini
• Useful additional extensions
– Palladian (community extension)
• Web crawling, text mining
– XML-Processing (KNIME extension)
• Parsing and processing of XML documents

© 2018 KNIME AG. All Rights Reserved.


12 12
Philosophy
Reading/Parsing Data Enrichment Preprocessing

… perhaps your … perhaps your


name is name is
Rumpelstiltskin[Pers Rumpelstiltskin[Per
on] ? … son] ? …

Transformations / Frequencies Classification/Clustering/Visualization

Classifi-
cation

… perhaps your
name is
Rumpelstiltskin[Per
son] ? … Visualizatio
Cluster n
-ing

© 2018 KNIME AG. All Rights Reserved. 13


Additional Data Types
• Document Cell
– Encapsulates a document
• Title, sentences, terms, words
• Authors, category, source
• Generic meta data (key, value pairs)
• Term Cell
– Encapsulates a term
• Words, tags

© 2018 KNIME AG. All Rights Reserved. 14


Data Table Structures

• Document table
– List of documents

• Bag of words
– Tuples of documents
and terms

• Document vectors
– Numerical
representations of
documents

© 2018 KNIME AG. All Rights Reserved. 15


Importing Text (Reading and Parsing
Data)

© 2018 KNIME AG. All Rights Reserved. 16


Data Access

• Node Repository: IO

External
Data
Connectors

© 2018 KNIME AG. All Rights Reserved. 17


Create a Document

Transform Strings to Documents

© 2018 KNIME AG. All Rights Reserved. 18


Parser Nodes

• Node Repository: Other Data Types/Text


Processing/IO
• Available Parser Nodes
– Flat File Document Parser
– PDF Parser
– Word Parser
– Document Grabber
–…

© 2018 KNIME AG. All Rights Reserved. 19


Part 1: Reading and Transforming Strings to Documents

Read/Parse textual data

Other Reader nodes

© 2018 KNIME AG. All Rights Reserved. 21


Enrichment

© 2018 KNIME AG. All Rights Reserved. 22


Tagger Nodes
• Assignment of semantic information (tags) to terms
• Node Repository:
Other Data Types/Text Processing/Enrichment
• Available Tagger Nodes
– Stanford tagger
– Dictionary (& Wildcard) tagger
– OpenNLP tagger
– Abner tagger
– …

© 2018 KNIME AG. All Rights Reserved. 23


POS Tagger

• Assigns to each term of a document


a part of speech (POS) tag
• Also called grammatical tagging
• Process of marking up a word in a
text (corpus) as corresponding to a
particular part of speech, based on
both its definition and its context

© 2018 KNIME AG. All Rights Reserved. 24


Dictionary Tagger

• Assigns selected tag to matching terms


– Matches terms in documents against terms in dictionary
– Tag to be assigned to matching terms is specified in the
dialog
– Alternative node: Wildcard tagger
• Terms in dictionary may contain wild cards and regular
expressions

© 2018 KNIME AG. All Rights Reserved. 25


Inspect Documents

Document Viewer node

© 2018 KNIME AG. All Rights Reserved. 26


Part 2: Enrichment

Enrich documents with semantic information

© 2018 KNIME AG. All Rights Reserved. 27


Preprocessing

© 2018 KNIME AG. All Rights Reserved. 28


Preprocessing

• Reduction of feature space (terms)


• Filtering of unnecessary terms
– Stop words, based on POS tags, dictionaries, regex, …
• Normalization of terms
– Stemming, case conversion

© 2018 KNIME AG. All Rights Reserved. 29


Part 3: Preprocessing
Preprocess documents and filter words

© 2018 KNIME AG. All Rights Reserved. 30


Transformation

© 2018 KNIME AG. All Rights Reserved. 31


Transformation Nodes
• Node Repository:
Other Data Types/Text Processing/Transformation
• Available Transformation Nodes
– Bag of Words Creator
– Document Vector
– Strings to Document
– Sentence Extractor
– Document Data Extractor
– …

© 2018 KNIME AG. All Rights Reserved. 32


Bag of Words
A Bag of Words represents a text (e.g. sentence/ document) as the bag
(multiset) of its words, disregarding grammar and even word order but
keeping multiplicity.

© 2018 KNIME AG. All Rights Reserved. 33


Frequency Nodes

• Node Repository:
KNIME Labs/Text Processing/Frequencies
• Available Frequency Nodes
– TF
– IDF
– Ngram creator
–…

© 2018 KNIME AG. All Rights Reserved. 34


Document Vector

• Transforms bag of words into document vectors


– Requires numerical (frequency) column
– Creates bit or numerical vectors
Bag of words with Document
frequency column vector

© 2018 KNIME AG. All Rights Reserved.


35 35
Word Embeddings, e.g. Word2Vec

• Problem: Feature space can get really big


• Solution:
– Find a vector representation for each word
– Represent a document as sequence of vectors

© 2018 KNIME AG. All Rights Reserved. 36


Part 4: Transformation and Frequencies
Preprocess documents

© 2018 KNIME AG. All Rights Reserved. 37


Classification

© 2018 KNIME AG. All Rights Reserved. 38


Data Mining: Process Overview

Train
Training
Model
Set
Apply Score
Model Model
Original
Data Set
Test
Set

Train and Evaluate


Partition data
apply models performance

© 2018 KNIME AG. All Rights Reserved.


39 39
Learner-Predictor Motif

• Most data mining approaches in


KNIME use a Learner-predictor
motif.
• The Learner node trains the model Trained
Model
with its input data.
• The Predictor node applies the
model to a different subset of
data.
New data!

© 2018 KNIME AG. All Rights Reserved.


40 40
Part 5: Classification
Lexicon based approach Supervised learning approach

© 2018 KNIME AG. All Rights Reserved. 41


Todays Use Case

• Dataset: Subset of 2000 documents from the


trainings set of the Large Movie Review Dataset
v1.0.
– 1000 documents from the positive group
– 1000 documents from the negative group
• Goal: Assign the correct sentiment label to each
document.

© 2018 KNIME AG. All Rights Reserved. 42


The Workflows

© 2018 KNIME AG. All Rights Reserved. 43


From Words to Wisdom Book

Course Book downloadable from


KNIME Press
https://www.knime.com/knimepress

with code: SENTIMENT-ANALYSIS-0618

© 2018 KNIME AG. All Rights Reserved. 44


The KNIME® trademark and logo and OPEN FOR INNOVATION® trademark are used by
KNIME.com AG under license from KNIME GmbH, and are registered in the United States.
KNIME® is also registered in Germany.

© 2018 KNIME AG. All Rights Reserved. 45


Example: Classification with Deep Learning

© 2018 KNIME AG. All Rights Reserved. 46

Anda mungkin juga menyukai