1. Introduction
This document has been prepared as a guide to creating a customer segmentation of overall shopping habits in terms of amount spent and the frequency of shop. A Shopping Habits Segmentation will identify consistent patterns of purchasing behaviour by classifying shoppers according to how often and how much they spend in a retailer. This provides key information as to where opportunities exist to increase both the frequency and value of purchases of specific shoppers and so understanding which shoppers have the most potential for maximum return on investment.
The Shopping Habits Segmentation can also be used as a metric against which to measure the success of any marketing activity. For instance, as we will have already reviewed their consistent shopper behaviour over time, we will have already accounted for the regular ups & downs of a normal shoppers purchasing cycles. Therefore, we can review if this pattern significantly alters during a promotional period, and thereby calculate the true impact on value.
2. St able Periods
3. Spend Bands
4. Frequency Bands
The data preparation consists of choosing the relevant items and time period available then summarising sales to a weekly level per shopper. This weekly sales data is used to identify the most stable period of time, using statistical techniques, to evaluate shopper behaviour. Each shoppers number of transactions and average transaction value are calculated for the two most recent stable periods. This forms the basis of the Spend Bands and Frequency Bands which are consistent and regularly occurring levels of frequency and spend across time periods. It is the combination of these two brands that the Shopping Habits Segments are based on.
Before starting to create the stable periods, the data needs to be cleaned of weeks with extremely high or low sales e.g. Christmas, Mothers Day. This is done by plotting the weekly sales and identifying outliers by eye.
In order to achieve the balance of stability and responsiveness, we begin with an unstable short period of 1 week, from the mid-point of the data, and calculate measures of stability. We then incrementally increase this period week by week and monitor the changes in the measures of stability. The optimal period is known as the stable period and is taken at the point where increasing the length of the period results in relatively little improvement in the stability.
Figure 2.1 Increasing the pre and post periods in order to find the stable period
A useful way of measuring the stability is to use a linear regression model in order to see how accurately a customers spend in the pre-period can be used to predict a customers spend in the post period. Key measures to be monitored in terms of improvement of stability would be the R-Square and the Beta of the model.
Below is an example from the grocery segmentation. It can be seen that there is very little improvement in stability after 6 weeks.
0.50 0.40
Recent transactions covering a period of 6 months for a grocer, and 12 months for a high street retailer should be sufficient in order to determine the stable period. The 3 month point
within the data can then be taken as the mid-week as shown in Figure 2.1. The starting point for the stability analysis would be a summary of each customers spend on a weekly basis. The weekly spends can then be aggregated step by step as illustrated.
For example, if we had determined a stability period of 6 weeks and had data running up to th the 30 April 2006, then we would need to extract 12 weeks of data. Transactions in the 6 th weeks from 20 March 2006 to 30 April 2006 form the post period. For each customer, we need to calculate the total spend and the number of times they went shopping in this post period. From this we can also calculate each customers average transaction value in the post period. The average transaction value in the post period is the post-spend divided by the number of post-transactions.
We also need to look at the preceding 6 weeks i.e. in the pre period. In this example, these are the transactions in the period 6 February 2006 to 19 March 2006. For each customer, we need to calculate the total spend and the number of times they went shopping in this pre period. From this we can also calculate each customers Average Transaction Value in the pre period.
The summary will therefore contain the following variables: Customer Identification Pre number of transactions Post number of transactions Pre average transaction value Post average transaction value
Group 1 2 3 4 .. .. .. . 79 100
This can be achieved in many statistical software packages by mathematically ranking the average transaction value. This means it is possible, using the data summary from Step 2 to place each customer in a group in the pre-period and a group in the post-period based on their Average Transaction Value. Some customers will remain in the same group in both periods, whilst others will change group.
Based on the example above, a customer with an Average Transaction Value of 27 in the pre-period and an Average Transaction Value of 28 in the post-period will be in group 3 in both periods. However, a customer with an Average Transaction Value of 14 in the preperiod and an Average Transaction Value of 24 in the post-period will have moved from group 1 to group 2. In summary some customers will remain in the same group, others will move up groups and others will move down groups. As is explained in more detail in the appendix, these customer movements or migrations between groups will be used to statistically measure the relationship between these groups. These relationships will be measured in terms of a statistical distance. The groups are then combined to make larger groups by clustering based on this notional distance. These larger groups form the stable spend bands.
Spend
Migration Clustering
For example, on grocery value segmentation this stage resulted in 5 stable average transaction value bands .
For example in grocery, spend groups 10-15, 15-25, 25-33 and so on up to group 100-110 formed a cluster together, called Small Baskets. Any spend group greater than this but less than 175, formed a cluster together called Regular Baskets and so on. These clusters define the bands.
Transactions
Migration Clustering
For example, on a grocery value segmentation this stage resulted in 4 stable frequency bands.
Increasing frequency
Occasional Shoppers 1-3 Times in 6 weeks Regular Shoppers 4-5 Times in 6 weeks Frequent Shoppers 6-11 Times in 6 weeks Very Frequent Shoppers >12 Times in 6 weeks
1_1
Stable Transaction Bands
Some customers will be in the same cell in both the pre and post periods, whilst others will move to other cells. For example, a customer that had shopped 2 times with an average transaction value of 120 in the pre-period would be placed in cell 1_2. If the same customer then shopped 5 times with an average transaction value of 90 in the post-period, they would be placed in cell 2_1.
To summarise some customers will remain in the same cell, others will move to adjacent cells, whilst others will move to even more distant cells. These customer movements or migrations between cells will be used to statistically measure the relationship between these cells and they are then grouped together to make larger groupings by clustering based on this notional distance. These larger groups form the segments. The details of this are explained in more detail in the appendix.
Very Low Spend Infrequent Intermittent Occasional Regular Frequent Very Frequent
Low Spend
Modest Spend
Medium Spend
Larger Spend
High Spend
Regular Shoppers
Everyday Shoppers
APPENDIX
Migration Clustering Methodology
This section provides an overview of migration clustering. Migration clustering is the technique behind much of the segmentation and is used to create stable spend bands and stable frequency bands. It is then used to determine the spend-frequency segments. Migration clustering uses the customers spending patterns in order to define the optimal bands and segments. The advantage of this method is that the bands and segments are determined by actual customer behaviour. These relationships will be measured in terms of a statistical distance. This is illustrated below with a pictorial representation of individual shopperss number of transaction and average transaction values for a stable period. Clustering looks at the distance between shoppers and groups customers based on these distances i.e. shoppers with similar frequency and ATV will fall into the same band. E.g. The distance between shopper 1 and shoppers 2 is such that it is not deemed statistically significant, where as the distance between shopper 1 and 3 is sufficient to be classified as a separate shopping behaviour.
Number of transactions
Group 1
Group 4
2 1
Group 3
Group 2
Figure A.1 demonstrates groupings of customers based on frequency of transaction and ATV
This is a distance profile of each shopper to all other shoppers that we can then cluster on, in a statistical package. Statistical outputs from the clustering as well as some human judgement should be used in deciding upon final bands and segments. We use hierarchical clustering with a complete linkage method. In complete linkage, the distance between two
groups is the maximum distance between a shopper in one group and a shopper in the other group. The changes in shopping behaviour are measured by comparing the customers behaviour in the pre-period to the behaviour in the post-period using the groups. Tracing the movements of customers between the groups from the pre-period to post-period allow us to define the statistical distance. Groups between which there are many customer movements are closer together in statistical space, such as groups 1 (10-15) and 2 (15-25) in Figure A.1 i.e. these groups represent similar spending behaviour. Groups between which there are relatively few movements are further apart, such as groups 1 and 3 in Figure A.1. These groups represent distinct types of shopping behaviour. The movements between groups can be quantified by counting the number of customers that move from each pre-period group to each post-period group. An example follows for calculating the statistical distances between all groups in a 3 group situation. For a Shopping Habits Segmentation, the number of groups will always be larger than this but the example illustrates the necessary calculations that can easily be applied to the real segmentation.
Figure A.2 demonstrates customer movements from 3 pre-period groups to 3 postperiod groups.
Each customer is placed in a group based on spending behaviour in the pre period. The same customers are then placed in a group based on spending behaviour in the post period. Some customers will remain in the same group. Others will change groups based on a change in spending behaviour between the two periods. For example, some customers who are in group 1 in the pre-period, will remain in group 1 in the post-period, some customers will move to group 2 and some will move to group 3.
10
The distances between all the bands can be summarised as a table. If there were only 3 bands, this would look like the table shown below.
Band 2 d12
Band 3 d13
d21
d22
d23
d31
d32
d33
Figure A.3
We need the number of customers in each group in the pre-period and the number of customers in each group in the post period. We also need to measure the movements. The movements shown in Figure A.3 are simple 1-way movements. In order to get a better measure of the statistical closeness of two groups, it is important to consider the reverse movements. For example, in order to look at the relationship between group 2 and group 3, we need to consider not only customers moving from group 2 to group 3 but those moving from group 3 to group 2. These pairs of migrations can be represented mathematically as a cross tabulation. The pair of frequencies associated with the group2 group3 migration are highlighted.
Each pair of groups therefore has two frequencies associated with it. For customers that do not change groups, there is only one frequency i.e. the diagonals. In order to calculate a total measure of the movements between two groups, we add the 2 frequencies together, so that the appropriate measure of customer migrations between two groups i and j is:
11
Equation 1
The customer migration measures calculated for the example are shown in the table below.
Number of customers in pre period 5,000 5,000 5,000 6,000 6,000 7,000
Number of customers in post period 4,000 7,000 7,000 7,000 7,000 7,000
As well as the customer migration measure, the numbers of customers in each group in the pre and post periods are used within the statistical distance calculations. This is because groups with large number of customers are likely to have higher flows of customer migrations. We need to take this into account.
d ij
Ti T j Fij
Equation 2
Where: Ti is the number of customers in group i in the pre period plus the number of customers in group i in the post period. Fij is the customer migration between group i and group j, as calculated in Equation 1.
12
For example, in the group 2 - group 3 migration, shown in Figure A.5, the customer migration measure is 6,000. In the pre-period, group 2 contains 6,000 customers and in the post-period it contains 7,000 customers. Group 3 contains 7,000 customers in both the pre and post periods. The statistical distance is calculated as:
d 23
174.2
These statistical distances should be calculated for each pair of groups. The groups can then be clustered based on these distances as explained at the start of the section.
The above migration clustering process is repeated for Defining the spend bands Defining the frequency bands Defining the final frequency-spend bands
N.B. Although two consecutive stable periods (pre and post) are used to define the stable bands and segments by looking at movements, the actual Average Transaction Values and the numbers of transactions in each band correspond to a period of one stable period.
13