Shellcode
Spambot_Proxy_Control_Channel
Exploit_Suspected_PHP_Injection_Attack
Macro-Level
Classifier
Unknown
CL_A
Bad
CL_B CL_N
Micro-Level
Classifier
Learning/Training
Learning/Training
Narus Company Confidential 15 Narus Company Confidential
Binary Classifier Results
Biased SVM performance comparison with different kernels
Linear Kernel RBF Kernel Poly Kernel
Precision good 79.75 87.46 78.70
Recall good 87.07 90.42 97.79
F1 good 83.25 88.9347 87.2126
Precision bad 79.75 69.33 79.78
Recall bad 37.17 62.55 24.81
F1 bad 42.74 65.7657 34.8495
Accuracy 74.08 83.26 78.79
G-mean 56.89 75.21 49.25
Kernel Learning
Narus Company Confidential 16 Narus Company Confidential
Binary Classifier Results
Parameter selection for Biased SVM with RBF Kernel
When gamma=10,
C+/C_=0.5, win best
F1_bad = 0.6494
When gamma=10,
C+/C_=0.55, win best
F1_bad = 0.657657
Narus Company Confidential 17 Narus Company Confidential
Binary Classifier Results
F1 bad comparison of the methods for Binary classifier
F1 best performance with/without noise: 79.07/88.7 %
F1 bad comparison with noise
45.57 45.57
46.41
63.74
65.7657
76.01
79.07
0
10
20
30
40
50
60
70
80
90
Bagging
SMO
Adaboost SMO KNN Biased
SVM
Decision
Tree
Bagging
Decision
Tree
F1 bad comparison without noise
51.7 51.7
53.4
79.43
67.55
86.8
88.7
0
10
20
30
40
50
60
70
80
90
100
Bagging
SMO
Adaboost SMO KNN Biased
SVM
Decision
Tree
Bagging
Decision
Tree
Narus Company Confidential 18 Narus Company Confidential
Preprocessing (Multiclass)
Tree based generated features
For each class k, do
Repeat c times
Collect samples from class k, label them +1
Collect samples from class k
c
, label them -1.
Build a regression tree on above binary data.
Store the tree as T
ik
End
End
Example:
Home
owner
Marital status Annual
income
Number of
children
age
- married 125K - 41
No Not married 70K N/A 22
No - 59K 1 55
yes Not married - N/A 23
yes married 100K 1 -
Tree 1 Tree 2 Tree 3 Tree 4 Tree5
-0.25 -1 -0.5 -1 -0.14286
-0.25 -1 -0.5 -0.33333 -0.14286
-1 0.2 1 1 0.142857
0.5 0.714286 0.5 0.25 -1
-0.25 -0.33333 -0.5 0.777778 -0.14286
Original features Tree based features
transformation
Narus Company Confidential 19 Narus Company Confidential
Preprocessing
Multiclass results comparison with
Original features
Tree based generated features
Original Features
Tree based features
Class
ID
Precision Recall
F1 Precision Recall F1
24 77.65 78.30
77.97 86.12 88 87.05
25 63.62 70.02
66.67 79.3 82 80.63
28 99.36 99.70
99.53 100 100 100
48 82.16 73.95
77.84 79.68 77.9 78.78
68 69.05 71.38
70.20 67.7 76 71.61
76 66.58 71.23
68.83 68.45 66.6 67.51
76.40
80.21
77.43
81.75
76.84
80.93
73
74
75
76
77
78
79
80
81
82
83
Precision
original
features
Precision
Tree based
features
Recall
original
features
Recall Tree
based
features
F1 original
features
F1 tree
based
features
Average performance of 6 majority classes
Performance of 6 majority classes
Narus Company Confidential 20 Narus Company Confidential
Multi-class Classification
Identify individual threats
Identify new classes and provide properties
Classifiers
K-Nearest Neighbor
No training involved
Computationally intensive for testing
Ensemble methods
Failing to scale up for huge number of classes
Sphere-based SVM
Encapsulate each class in a hyper sphere.
Transform data into appropriate space such that
they cluster into single cohesive unit
Narus Company Confidential 21 Narus Company Confidential
Building Kernel
Let (X
i
,Y
i
) be the data points where Y
i
={+1,-1}
Construct ground truth kernel K
K
ij
= Y
i
Y
j
Now learn a parametric kernel as follows
K
ij
= f
(X
i
,X
j
)
Home
owner
Marital
status
Annual
income
Number
of
children
age Y
- married 125K - 41 +1
No Not married 70K N/A 22 +1
No - 59K 1 55 +1
yes Not married - N/A 23 -1
yes married 100K 1 - -1
- Married - 2 32 -1
K
ij
~f
(X
i
,X
j
)
Once is learned, it can be applied onto the test set.
=
T
y y
class
1 2 3 4 5 6
1 +1 +1 +1 -1 -1 -1
2 +1 +1 +1 -1 -1 -1
3 +1 +1 +1 -1 -1 -1
4 -1 -1 -1 +1 +1 +1
5 -1 -1 -1 +1 +1 +1
6 -1 -1 -1 +1 +1 +1
Narus Company Confidential 22 Narus Company Confidential
Kernel for Multi Class
For each class we do following
Collect samples belonging to class and label as +1
Collection samples from rest of data and label as -1
Build separate kernel for each class.
class
1 2 3 4 5 6
1 +1 +1 +1 -1 -1 -1
2 +1 +1 +1 -1 -1 -1
3 +1 +1 +1 -1 -1 -1
4 -1 -1 -1 +1 +1 +1
5 -1 -1 -1 +1 +1 +1
6 -1 -1 -1 +1 +1 +1
K
ij
~f
(X
i
,X
j
)
Narus Company Confidential 23 Narus Company Confidential
Boosted Trees for Kernel Learning
(
(
(
(
(
(
(
(
+
=
1
1
1
1
1
1
1
y
1 2 3 4 5 6
1 +1 -1 -1 +1 -1 +1
2 -1 +1 +1 -1 +1 -1
3 -1 +1 +1 -1 +1 -1
4 +1 -1 -1 +1 -1 +1
5 -1 +1 +1 -1 +1 -1
6 +1 -1 -1 +1 -1 +1
1 2 3 4 5 6
1 +1 +1 +1 -1 -1 -1
2 +1 +1 +1 -1 -1 -1
3 +1 +1 +1 -1 -1 -1
4 -1 -1 -1 +1 +1 +1
5 -1 -1 -1 +1 +1 +1
6 -1 -1 -1 +1 +1 +1
Output of tree 1 Kernel matrix for tree 1
(
(
(
(
(
(
(
(
+
=
1
1
1
1
1
1
2
y
Output of tree 2 Kernel matrix for tree 2
1 2 3 4 5 6
1 +1 -1 +1 +1 -1 +1
2 -1 +1 -1 +1 +1 -1
3 -1 +1 +1 -1 +1 -1
4 +1 -1 -1 +1 -1 +1
5 -1 +1 +1 -1 +1 -1
6 +1 -1 -1 +1 -1 +1
.
Narus Company Confidential 24 Narus Company Confidential
Multi class Results
Spheres require only K =6
(number of classes)
comparison whereas KNN
require N comparisons.
Narus Company Confidential 25 Narus Company Confidential
Classification +New Class Detection
Find transformation
to separate class +
from rest of data
Find transformation
to separate class x
from rest of data
Find transformation
to separate class --
from rest of data
Find transformation
to separate class ^
from rest of data
Build a
separate
Kernel for
each class
Narus Company Confidential 26 Narus Company Confidential
New Class Generation
Narus Company Confidential 27 Narus Company Confidential
Conclusion
CyberEagle: An enhanced comprehensive security
system
Bringing Host and Network security together to fight
security threats
Identify threats that IDS/IPS fails to detect
(Encrypted, evolved)
Identify new threats in the earliest stage
Generate signatures for the new threats and alert
the host security system in an automated way
Narus Company Confidential 28 Narus Company Confidential
Future Work
Improve classification accuracy
Scaling up for huge number of classes
Reduce computation during classification
Learn class hierarchy
Increase speed without sacrificing accuracy
Validate with diverse data
Reputation analysis of the ip addresses
Online update of the classifier
Mapreduce implementations
Narus Company Confidential 29
Summer 2011 Company Meeting
Thank You
Prakash, Lei, Saby