Anda di halaman 1dari 10

Missing values

Data

Less outliers
Large
dataset

numeric

Mean

Median

Categori
cal

Mode

Missing value through prediction


Missing
variable

Associat
ed
variable

Type of
technique

Categorical

Categori
cal

Categorical

Numeric

Remarks

Assumptions

Decision
tree
Nave
Bayesian

Decesion tree
need no
assumption

Nave bayes
assume
independent
variables

Logistic
regression
K-NN
classifier

K-NN
CLASSIFIER
need no
assumption

Regression
assumption of
normality,
homoscedasticity
etc
Regression
assumption of
normality,
homoscedasticity
etc

Numeric

Numeric

Regression
model
Clustering

Clustering
need no
assumption

Numeric

Categori
cal

Clustering

No
assumption

Categorical

Both

Decision

No

Regression

K-NN Classifier

3-NN
classifier

K-NN Classifier

K-NN Classifier

K-NN Classifier

If k is too small, sensitive to noise points


If k is too large, neighborhood may include points
from other classes

K-NN Classifier
Attributes may have to be scaled to prevent distance measures from being
dominated by one of the attributes

Knn classifier is a lazy


learner because
It does not build models
explicitly

Testing with different k

Nave Bayesian
Classifier
P(A|B) = P(B|A) *P(A) / P(B)
theorem )

(Bayes

P(Spam|free)=P(free|spam)* P(Spam) /
P(free)
Since P(Spam|free) > P(ham|free) ,
hence with this word, the message is
classified as spam

Step 4 : Applying the classifier


library(e1071)
sms_classifier <- naiveBayes(sms_train, sms_raw_train$type)

How it works
1

If output eqn 1 is greater then eqn 2 ,


its classified as spam o/w ham

Anda mungkin juga menyukai