Usage
Usually used for grouping customers into clusters that have similar behaviour / attitude Helps marketer to
create product differentiation different offers of the same product to different segments in light of their common needs / preferences
July 24, 2011 Prepared by Prof C Y Nimkar 2
Collect data Specify method to compute distance between two respondents Specify method to form clusters Perform cluster analysis Obtain clusters
Collect data
Collect data on any variable to be used for segmentation It can be:
Customer needs Customers demographic data Customers opinion about product(s).
1 2 3 4 5
42 35 45 40 50
1 2
42 35
4.5 3.0
600 550
Squared Euclidean distance = (42-35)2 + (4.5-3.0)2 + (600-550)2 = 2551.25 units Euclidean distance = (42-35)2 + (4.5-3.0)2 + (600-550)2 = 2551.25 = 50.51 units
July 24, 2011 Prepared by Prof C Y Nimkar 10
1 2
It
42 35
4.5 3.0
600 550
is the sum of absolute (positive) differences In our example this distance = 42-35 + 4.5 3.0 + 600 550 = 58.5 units
July 24, 2011 Prepared by Prof C Y Nimkar 11
Chebychev distance
Customer No. Age Annual Income (Rs. Lacs) Area of house (Sq. ft)
1 2
42 35
4.5 3.0
600 550
In our example this distance = Max{ 42-35, 4.5 3.0, 600 550 }= 50 units
July 24, 2011 Prepared by Prof C Y Nimkar 12
13
.
4
.
6
. . .
8.
.
7
.
4
.
6
. . .
8.
.
7
.
4
.
1 2
It is the average distance between all pairs of customers in two different clusters
17
II
III
Within-group method considers distance between pairs of customers after combining two clusters.
For e.g. there are 3 clusters I, II and III Calculate average distance between pairs of customers if clusters I and II, I and III , II and III are combined Combine those clusters where average distance is least
July 24, 2011 Prepared by Prof C Y Nimkar 18
Centroid rule
Customer No. Age Annual Income (Rs. Lacs) Area of house (Sq. ft)
1 2 3 Centroid
42 35 45 (42+35+45)/3 = 40.7
Centroid of a cluster is a virtual customer with age 40.7, annual income 4.7 lacs and area of house 600 sq. ft Distance between two clusters is distance between their centroids Two clusters are combined whose centroids are closest
3
1 2
July 24, 2011 Prepared by Prof C Y Nimkar 19
Wards method
Distance is calculated between respondent and the cluster centroid by squared Euclidean method These distances are added for each cluster Same calculation is done after combining two clusters Two clusters are joined that result in smallest increase in sum
1 2
4 6 7 8
20
21
1 2 3 4 5
42 35 45 40 50
Pair (1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3, 5) (4, 5)
Distance 2551.25 2513.00 32406.25 122674.25 10112.25 52934.00 160369.00 16925.25 90097.25 29081.00
22
Centroid method
Pair (1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3, 5) (4, 5) Distance 2551.25 2513.00 32406.25 122674.25 10112.25 52934.00 160369.00 16925.25 90097.25 29081.00 Pair ((1, 2, 3), (4, 5)) Distance 70277.74 Pair ((1, 3), 2) ((1, 3), 4) ((1, 3), 5) (2, 4) (2, 5) (4, 5) Distance 5703.50 24037.50 105757.50 52934.00 160369.00 29081.00 Pair ((1, 2, 3), 4) ((1, 2, 3), 5) (4, 5) Distance 32402.22 122693.76 29081.00
23
24
25
Centroid method
Pair (1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3, 5) (4, 5) Distance 2551.25 2513.00 32406.25 122674.25 10112.25 52934.00 160369.00 16925.25 90097.25 29081.00
Agglomeration Schedule Cluster Combined Cluster 1 Cluster 2 1 3 1 2 4 5 1 4 Stage Cluster First Appears Cluster 1 Cluster 2 0 0 1 0 0 0 2 3
Pair ((1, 3), 2) ((1, 3), 4) ((1, 3), 5) (2, 4) (2, 5) (4, 5)
Distance 70277.74
Stage 1 2 3 4
2 clusters possible
July 24, 2011 Prepared by Prof C Y Nimkar 26
Dendrogram
Aggl Cl st r Com i Cl st r 1 Cl st r 2 1 3 1 2 4 5 1 4 St 1 2 3 4 Coefficients 2513. 5703.500 29081.000 70277.806
1 0 2
0 0 3
St
r ti l
ext Stage 2 4 4 0
27
28
25 res
29
Questi nnaire
Que.:
(a) I a ine y u want t uy a f r al shirt f r y urself. Here are s e features f a shirt. Please rank the acc r in t i rtance that y u w ul attach t the . The st i rtant feature t y u w ul et rank ,the sec n st rtant will et rank 2 an s n. i w all cate a t tal f 50 ints t the . The ints w ul e all cate ( ) such that rank will et the hi hest ints, rank 2 will et sec n hi hest an s n. Please ensure that sum f ints sh ul e 50.
Rank Fabric Brand image Style Colour Fitting July 24, 2011 Price TOTAL Prepared by Prof C Y Nimkar 50 30 Points
ints
Data
31
Hierarchical luster nalysis l merati n sche ule (Between r up linka e/square Eucli ean istance)
Number of clusters = 2
July 24, 2011 Prepared by Prof C Y Nimkar 32
33
K-Means
Iteration History
luster nalysis
a
Iterati n 1 2 3 4 5 6
Chan e in Cluster Centers 1 2 7. 03 7.347 .329 .258 . 0 .083 .078 .058 .081 .061 .000 .000
a. Conver ence achieve ue to no or small chan e in cluster centers. The ma imum a solute coor inate chan e for any center is .000. The current iteration is 6. The minimum distance etween initial centers is 15.634.
July 24, 2011 Prepared by Prof C Y Nimkar 34
ANOVA
ANOVA Cluster Mean Square 2.976 119.782 4.775 18.688 11.184 363.735 Error Mean Square 2.492 2.908 1.971 2.972 1.673 2.621
df 1 1 1 1 1 1
df 98 98 98 98 98 98
T e F tests s ould e used onl for descri ti e ur oses ecause t e clusters ave een c osen to maximize t e differences among cases in different clusters. T e observed significance levels ot esis t at t e cluster are not corrected for t is and t us cannot be inter reted as tests of t e means are equal.
Final Cluster Centers Cluster 1 FABRIC BRANDIMA STYLE COLOUR FITTING July 24, 2011 PRICE 2 9 9 6 8 8 9 9 9 9 10 Prepared by Prof C Y Nimkar 9 5
36
Cluster izes
Number of Cases in eac Cluster Cluster Valid Missin 1 2 43.000 57.000 100.000 .000
Final Cluster Centers Cluster 1 FABRIC BRANDIMA STY E COLOUR FITTING PRICE 9 6 8 9 9 9 2 9 8 9 9 10 5
Company can consider marketin shirts under two rands: One rand should associate to Best value for money Second rand should associate to Status symbol
38