Anda di halaman 1dari 38

Cluster Analysis

July 24, 2011

Prepared by Prof C Y Nimkar

Usage
Usually used for grouping customers into clusters that have similar behaviour / attitude Helps marketer to
create product differentiation different offers of the same product to different segments in light of their common needs / preferences
July 24, 2011 Prepared by Prof C Y Nimkar 2

Steps in Cluster analysis

July 24, 2011

Prepared by Prof C Y Nimkar

Collect data Specify method to compute distance between two respondents Specify method to form clusters Perform cluster analysis Obtain clusters

July 24, 2011

Prepared by Prof C Y Nimkar

Step 1 Collect data

July 24, 2011

Prepared by Prof C Y Nimkar

Collect data
Collect data on any variable to be used for segmentation It can be:
Customer needs Customers demographic data Customers opinion about product(s).

Data should be at least on interval scale


July 24, 2011 Prepared by Prof C Y Nimkar 6

Sample data on demographics


Customer Age No. Annual Income (Rs. Lacs) Area of house (Sq. ft)

1 2 3 4 5

42 35 45 40 50

4.5 3.0 6.5 6.0 15.0

600 550 650 780 950

July 24, 2011

Prepared by Prof C Y Nimkar

Step 2 Specify distance method

July 24, 2011

Prepared by Prof C Y Nimkar

Distance method Following distance methods are available:


Squared Euclidean distance method Euclidean distance method City-block (Manhattan) distance method Chebychev distance method

July 24, 2011

Prepared by Prof C Y Nimkar

Squared Euclidean/Euclidean distance


Customer No. Age Annual Income (Rs. Lacs) Area of house (Sq. ft)

1 2

42 35

4.5 3.0

600 550

Squared Euclidean distance = (42-35)2 + (4.5-3.0)2 + (600-550)2 = 2551.25 units Euclidean distance = (42-35)2 + (4.5-3.0)2 + (600-550)2 = 2551.25 = 50.51 units
July 24, 2011 Prepared by Prof C Y Nimkar 10

City Block (Manhattan) distance


Customer No. Age Annual Income (Rs. Lacs) Area of house (Sq. ft)

1 2
It

42 35

4.5 3.0

600 550

is the sum of absolute (positive) differences In our example this distance = 42-35 + 4.5 3.0 + 600 550 = 58.5 units
July 24, 2011 Prepared by Prof C Y Nimkar 11

Chebychev distance
Customer No. Age Annual Income (Rs. Lacs) Area of house (Sq. ft)

1 2

42 35

4.5 3.0

600 550

It is the maximum absolute distance

In our example this distance = Max{ 42-35, 4.5 3.0, 600 550 }= 50 units
July 24, 2011 Prepared by Prof C Y Nimkar 12

Step 3 Specify method to form clusters

July 24, 2011

Prepared by Prof C Y Nimkar

13

Methods to form clusters Following methods are available:


Single linkage rule (nearest neighbour) Complete linkage rule (farthest neighbour) Between-groups linkage rule Within-groups linkage rule Centroid rule Wards method
July 24, 2011 Prepared by Prof C Y Nimkar 14

Single linkage rule (nearest neighbours)


1 2

.
4

.
6

. . .
8.

.
7

Distance between clusters is the distance between two nearest neighbours


July 24, 2011 Prepared by Prof C Y Nimkar 15

Complete linkage rule (farthest neighbours)


1 2

.
4

.
6

. . .
8.

.
7

Distance between clusters is the distance between two farthest neighbours


July 24, 2011 Prepared by Prof C Y Nimkar 16

Between - group linkage


3

.
4

.
1 2

It is the average distance between all pairs of customers in two different clusters

July 24, 2011

Prepared by Prof C Y Nimkar

17

Within - group linkage

II

III

Within-group method considers distance between pairs of customers after combining two clusters.
For e.g. there are 3 clusters I, II and III Calculate average distance between pairs of customers if clusters I and II, I and III , II and III are combined Combine those clusters where average distance is least
July 24, 2011 Prepared by Prof C Y Nimkar 18

Centroid rule
Customer No. Age Annual Income (Rs. Lacs) Area of house (Sq. ft)

1 2 3 Centroid

42 35 45 (42+35+45)/3 = 40.7

4.5 3.0 6.5 (4.5+3.0+6.5)/3 = 4.7

600 550 650 (600+550+650)/3 = 600

Centroid of a cluster is a virtual customer with age 40.7, annual income 4.7 lacs and area of house 600 sq. ft Distance between two clusters is distance between their centroids Two clusters are combined whose centroids are closest

3
1 2
July 24, 2011 Prepared by Prof C Y Nimkar 19

Wards method
Distance is calculated between respondent and the cluster centroid by squared Euclidean method These distances are added for each cluster Same calculation is done after combining two clusters Two clusters are joined that result in smallest increase in sum
1 2

4 6 7 8

July 24, 2011

Prepared by Prof C Y Nimkar

20

Step 4 Perform cluster analysis

Perform Hierarchical cluster analysis

Perform K- means cluster analysis

July 24, 2011

Prepared by Prof C Y Nimkar

21

Perform Hierarchical cluster analysis


Hierarchical cluster analysis technique gives number of clusters that can be formed
Customer No. Age Annual Income (Rs. Lacs) Area of house (Sq. ft)

1 2 3 4 5

42 35 45 40 50

4.5 3.0 6.5 6.0 15.0

600 550 650 780 950

Distance matrix by squared Euclidean method

Pair (1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3, 5) (4, 5)

Distance 2551.25 2513.00 32406.25 122674.25 10112.25 52934.00 160369.00 16925.25 90097.25 29081.00

July 24, 2011

Prepared by Prof C Y Nimkar

22

Centroid method
Pair (1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3, 5) (4, 5) Distance 2551.25 2513.00 32406.25 122674.25 10112.25 52934.00 160369.00 16925.25 90097.25 29081.00 Pair ((1, 2, 3), (4, 5)) Distance 70277.74 Pair ((1, 3), 2) ((1, 3), 4) ((1, 3), 5) (2, 4) (2, 5) (4, 5) Distance 5703.50 24037.50 105757.50 52934.00 160369.00 29081.00 Pair ((1, 2, 3), 4) ((1, 2, 3), 5) (4, 5) Distance 32402.22 122693.76 29081.00

July 24, 2011

Prepared by Prof C Y Nimkar

23

Perform Hierarchical cluster analysis

July 24, 2011

Prepared by Prof C Y Nimkar

24

Perform Hierarchical cluster analysis

Obtain Agglomeration schedule and Dendrogram from software

July 24, 2011

Prepared by Prof C Y Nimkar

25

Centroid method
Pair (1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3, 5) (4, 5) Distance 2551.25 2513.00 32406.25 122674.25 10112.25 52934.00 160369.00 16925.25 90097.25 29081.00
Agglomeration Schedule Cluster Combined Cluster 1 Cluster 2 1 3 1 2 4 5 1 4 Stage Cluster First Appears Cluster 1 Cluster 2 0 0 1 0 0 0 2 3

Pair ((1, 3), 2) ((1, 3), 4) ((1, 3), 5) (2, 4) (2, 5) (4, 5)

Distance 5703.50 24037.50 105757.50 52934.00 160369.00 29081.00

Pair ((1, 2, 3), 4) ((1, 2, 3), 5) (4, 5)

Distance 32402.22 122693.76 29081.00

Pair ((1, 2, 3), (4, 5))


Next Stage 2 4 4 0

Distance 70277.74

Stage 1 2 3 4

Coefficients 2513.000 5703.500 29081.000 70277.806

Jump in coeff. seen between stage 3 and 4

2 clusters possible
July 24, 2011 Prepared by Prof C Y Nimkar 26

Dendrogram
Aggl Cl st r Com i Cl st r 1 Cl st r 2 1 3 1 2 4 5 1 4 St 1 2 3 4 Coefficients 2513. 5703.500 29081.000 70277.806

1 0 2

0 0 3

July 24, 2011

Prepared by Prof C Y Nimkar

e Cl ster irst ears Cl ster 1 Cl ster 2

St


r ti l



   

   



ext Stage 2 4 4 0

27

Pr ct: Mens Rea y a e r al hirt

July 24, 2011

Prepared by Prof C Y Nimkar

28

Short listing of shirts features


. a ric 2. Bran i a e . tyle . l r 5. ittin . Price

25 res

n ents were ntacte

For al isc ssion

July 24, 2011

Prepared by Prof C Y Nimkar

29

Questi nnaire

Que.:
(a) I a ine y u want t uy a f r al shirt f r y urself. Here are s e features f a shirt. Please rank the acc r in t i rtance that y u w ul attach t the . The st i rtant feature t y u w ul et rank ,the sec n st rtant will et rank 2 an s n. i w all cate a t tal f 50 ints t the . The ints w ul e all cate ( ) such that rank will et the hi hest ints, rank 2 will et sec n hi hest an s n. Please ensure that sum f ints sh ul e 50.
Rank Fabric Brand image Style Colour Fitting July 24, 2011 Price TOTAL Prepared by Prof C Y Nimkar 50 30 Points

ints

Data

July 24, 2011

Prepared by Prof C Y Nimkar

31

Hierarchical luster nalysis l merati n sche ule (Between r up linka e/square Eucli ean istance)

Number of clusters = 2
July 24, 2011 Prepared by Prof C Y Nimkar 32

Perform K-means cluster analysis

July 24, 2011

Prepared by Prof C Y Nimkar

33

K-Means
Iteration History

luster nalysis
a

Iterati n 1 2 3 4 5 6

Chan e in Cluster Centers 1 2 7. 03 7.347 .329 .258 . 0 .083 .078 .058 .081 .061 .000 .000

a. Conver ence achieve ue to no or small chan e in cluster centers. The ma imum a solute coor inate chan e for any center is .000. The current iteration is 6. The minimum distance etween initial centers is 15.634.
July 24, 2011 Prepared by Prof C Y Nimkar 34

ANOVA
ANOVA Cluster Mean Square 2.976 119.782 4.775 18.688 11.184 363.735 Error Mean Square 2.492 2.908 1.971 2.972 1.673 2.621

df 1 1 1 1 1 1

df 98 98 98 98 98 98

FABRIC BRANDI A STYL COLOUR FITTING PRICE

F 1.194 41.192 2.422 6.289 6.685 138.781

Sig. .277 .000 .123 .014 .011 .000

T e F tests s ould e used onl for descri ti e ur oses ecause t e clusters ave een c osen to maximize t e differences among cases in different clusters. T e observed significance levels ot esis t at t e cluster are not corrected for t is and t us cannot be inter reted as tests of t e means are equal.

Final Cluster Centers Cluster 1 FABRIC BRANDIMA STYLE COLOUR FITTING July 24, 2011 PRICE 2 9 9 6 8 8 9 9 9 9 10 Prepared by Prof C Y Nimkar 9 5

Clusters differ on: Price Brand ima e 35

Variation is more in PRICE and BRANDIMAGE

July 24, 2011

Prepared by Prof C Y Nimkar

36

Cluster izes
Number of Cases in eac Cluster Cluster Valid Missin 1 2 43.000 57.000 100.000 .000

izes of oth clusters are fairly same


Both se ments are important to marketer
July 24, 2011 Prepared by Prof C Y Nimkar 37

Final Cluster Centers Cluster 1 FABRIC BRANDIMA STY E COLOUR FITTING PRICE 9 6 8 9 9 9 2 9 8 9 9 10 5

Cluster 1: Price sensitive Cluster 2: Brand ima e sensitive

Company can consider marketin shirts under two rands: One rand should associate to Best value for money Second rand should associate to Status symbol

July 24, 2011

Prepared by Prof C Y Nimkar

38

Anda mungkin juga menyukai