On running hierarchical clustering, from the agglomeration schedule, we find that the largest jump is
after stage 18 (Refer the tab marked ‘Clustering’). As (no. of clusters + stage number = no of
respondents), the number of clusters would be 2. However, the distance of each respondent from the
centroid, in this case, is very large.
Therefore, the next highest jump, after stage 14 is considered. This means that the number of clusters
will be 6. In this case, the Wilks’ Lambda for one of the discriminant functions is 0.507 while its Eigen
value is 0.973. Even though the values are slightly off the preferred parameters (Eigen value > 1 and
Wilks’ Lambda < 0.5), the distances between respondents and the cluster centroids is much better in
this case. Also, one of the clusters formed contains only one respondent (respondent number 20). This
is because he is an outlier from all other clusters.
We also checked the next highest jump occurring after stage 17, giving rise to 3 clusters. Even in this
case, the distance between the respondents and the cluster centroids was very large, and so we
discarded this result.
Since Hierarchical clustering does not mention cluster membership accurately, we ran K-means
clustering to determine this. The output details are in the tab marked ‘Clustering’ in the table ‘Cluster
membership’. The resulting membership data is as follows:
Cluster Respondent
C1 2,8,11
C2 3,6,16
C3 1,4,10,12,14,17,19
C4 20
C5 5,9,18
C6 7,13,15
We obtain the information about segment distinction from the ANOVA summary table.
We consider the variables which have the significance value below 0.01 since the confidence interval
stands at 99%.
From the ANOVA table, Variables which distinguish the segments are as follows:
SalaryLuxury
HouseNecessity
CarNecessity
SaveRDay
SaveCFuture
BorrowFF
LoanOblig
EnjoyPresent
BorrowBad
FinNews
Q.4. Profiling
The profiling was done based on the respondent’s answers to the distinguishing variables. The details
of the profiles and the labels are mentioned in the tab marked ‘Segmentation’ .
For profiling, we considered averages of the responses obtained within the cluster and results are
similar as found from the cluster centre.
Q.5. Labelling:
a) Cluster 1 - Planner
b) Cluster 2 - Individualist
c) Cluster 3 - Orthodox
d) Cluster 4 - Strategist
e) Cluster 5 - Ambitious
f) Cluster 6 - Pessimist
To create a predictive model, we first ran the K-means clustering and saved the cluster number as a
variable.
To get the coefficients for the predictive model, we ran multi-group discriminant analysis with fisher’s
coefficients.
We created a predictive model using these coefficients and utilized actual cluster numbers to verify
the results.
The model is available in the discriminant tab in the attached excel output along with the classification
coefficients.