Anda di halaman 1dari 2

Case Questions - HDFC CASA

Q.1. Number of Clusters/Segments from the data:

On running hierarchical clustering, from the agglomeration schedule, we find that the largest jump is
after stage 18 (Refer the tab marked ‘Clustering’). As (no. of clusters + stage number = no of
respondents), the number of clusters would be 2. However, the distance of each respondent from the
centroid, in this case, is very large.

Therefore, the next highest jump, after stage 14 is considered. This means that the number of clusters
will be 6. In this case, the Wilks’ Lambda for one of the discriminant functions is 0.507 while its Eigen
value is 0.973. Even though the values are slightly off the preferred parameters (Eigen value > 1 and
Wilks’ Lambda < 0.5), the distances between respondents and the cluster centroids is much better in
this case. Also, one of the clusters formed contains only one respondent (respondent number 20). This
is because he is an outlier from all other clusters.

We also checked the next highest jump occurring after stage 17, giving rise to 3 clusters. Even in this
case, the distance between the respondents and the cluster centroids was very large, and so we
discarded this result.

Note: The prediction accuracy is 100% in all 3 cases listed above.

Result – 6 Clusters have been chosen

Q.2. Cluster membership:

Since Hierarchical clustering does not mention cluster membership accurately, we ran K-means
clustering to determine this. The output details are in the tab marked ‘Clustering’ in the table ‘Cluster
membership’. The resulting membership data is as follows:

Cluster Respondent
C1 2,8,11
C2 3,6,16
C3 1,4,10,12,14,17,19
C4 20
C5 5,9,18
C6 7,13,15

Q.3. Significant variable identification at confidence interval 99%

We obtain the information about segment distinction from the ANOVA summary table.

We consider the variables which have the significance value below 0.01 since the confidence interval
stands at 99%.

From the ANOVA table, Variables which distinguish the segments are as follows:
SalaryLuxury
HouseNecessity
CarNecessity
SaveRDay
SaveCFuture
BorrowFF
LoanOblig
EnjoyPresent
BorrowBad
FinNews

Courtesy: Clustering worksheet

Q.4. Profiling

The profiling was done based on the respondent’s answers to the distinguishing variables. The details
of the profiles and the labels are mentioned in the tab marked ‘Segmentation’ .

For profiling, we considered averages of the responses obtained within the cluster and results are
similar as found from the cluster centre.

Q.5. Labelling:

Based on the profiling, we arrived at the following labels:

a) Cluster 1 - Planner
b) Cluster 2 - Individualist
c) Cluster 3 - Orthodox
d) Cluster 4 - Strategist
e) Cluster 5 - Ambitious
f) Cluster 6 - Pessimist

Q.6. Predictive model based on discriminant analysis

To create a predictive model, we first ran the K-means clustering and saved the cluster number as a
variable.

To get the coefficients for the predictive model, we ran multi-group discriminant analysis with fisher’s
coefficients.

We created a predictive model using these coefficients and utilized actual cluster numbers to verify
the results.

The model is available in the discriminant tab in the attached excel output along with the classification
coefficients.

Anda mungkin juga menyukai