Week 3 BTYD Model Fader Et Al MKSC 10

informs
Vol. 29, No. 6, NovemberDecember 2010, pp. 10861108

issn 0732-2399 eissn 1526-548X 10 2906 1086 doi 10.1287/mksc.1100.0580
2010 INFORMS
Customer-Base Analysis in a Discrete-Time

Noncontractual Setting
Additional information, including rights and permission policies, is available at http://journals.informs.org/.
INFORMS holds copyright to this article and distributed this copy as a courtesy to the author(s).
Peter S. Fader
The Wharton School of the University of Pennsylvania, Philadelphia, Pennsylvania 19104,
faderp@wharton.upenn.edu
Bruce G. S. Hardie
London Business School, London NW1 4SA, United Kingdom, bhardie@london.edu
Jen Shang
School of Public and Environmental Affairs, Indiana University, Bloomington, Indiana 47405,
jenshang@indiana.edu
M any businesses track repeat transactions on a discrete-time basis. These include (1) companies for whom
transactions can only occur at xed regular intervals, (2) rms that frequently associate transactions with
specic events (e.g., a charity that records whether supporters respond to a particular appeal), and (3) orga-
nizations that choose to utilize discrete reporting periods even though the transactions can occur at any time.
Furthermore, many of these businesses operate in a noncontractual setting, so they have a difcult time dif-
ferentiating between those customers who have ended their relationship with the rm versus those who are
in the midst of a long hiatus between transactions. We develop a model to predict future purchasing patterns
for a customer base that can be described by these structural characteristics. Our beta-geometric/beta-Bernoulli
(BG/BB) model captures both of the underlying behavioral processes (i.e., customers purchasing while alive
and time until each customer permanently dies). The model is easy to implement in a standard spreadsheet
environment and yields relatively simple closed-form expressions for the expected number of future transactions
conditional on past observed behavior (and other quantities of managerial interest). We apply this discrete-time
analog of the well-known Pareto/NBD model to a data set on donations made by the supporters of a nonprot
organization located in the midwestern United States. Our analysis demonstrates the excellent ability of the
BG/BB model to describe and predict the future behavior of a customer base.
Key words: BG/BB; beta-geometric; beta-binomial; customer-base analysis; customer lifetime value; CLV; RFM;
Pareto/NBD
History: Received: March 24, 2009; accepted: March 31, 2010; accepted by Scott A. Neslin, acting
editor-in-chief. Published online in Articles in Advance August 11, 2010.
1. Introduction of donations for the 1995 cohort as a whole, as well

Consider a major nonprot organization located in as for particular types of individuals, over the period
the midwestern United States that is funded in large 20022006. For instance,
part by donations from individuals. In 1995 the orga- What should be expected from donor 100008,
nization acquired 11,104 rst-time supporters; in who has made a repeat donation in each of the six
each of the following six years, these individuals years since becoming a supporter of the organization:
either did or did not support the organization. As is he likely to go ve-for-ve in the future period?
shown in Table 1, donation behavior can be char- If not, how much shrinkage would we expect?
acterized by a binary string, where 1 indicates that How about comparing donor 100009, who had
a donation was made. (For the purposes of this been a consistent supporter up until 2001, versus
analysissimilar to Netzer et al. 2008we focus only donor 100004, who has had a more irregular history,
on the annual incidence on the donations; we ignore with one fewer donation overall but with one made
the dollar values.) Given these data, management in 2001?
would like to know which individuals are most likely Likewise, how does donor 100004 compare to
to be active donors in the future so that it can pre- donor 111103? They have both made four repeat dona-
dict the level of transactions it can expect in future tions, including one in 2001, but their earlier histories
years from this cohort of donors (both individually differ somewhat from each other.
and collectively). Finally, how about the many donors (such as
Management has a ve-year planning period and 100001) who have done nothing since their initial con-
therefore would like to forecast the expected number tributions? Should the nonprot organization write
1086
Fader, Hardie, and Shang: Customer-Base Analysis in a Discrete-Time Noncontractual Setting
Marketing Science 29(6), pp. 10861108, 2010 INFORMS 1087
Table 1 Annual Donation Behavior by the 1995 Cohort of First-Time framework to accommodate business settings charac-
Supporters terized by discrete-time purchasing (see pp. 1617 and
ID 1995 1996 1997 1998 1999 2000 2001 Table 3 in their paper), yet no one to date has pre-
sented such a model.
100001 1 0 0 0 0 0 0 As another example, consider attendance at the
100002 1 0 0 0 0 0 0
100003 1 0 0 0 0 0 0
INFORMS Marketing Science Conference. The confer-
100004 1 0 1 0 1 1 1 ence occurs at a discrete point in time and an indi-
100005 1 0 1 1 1 0 1 vidual can either attend or not. Similarly, consider
100006 1 1 1 1 0 1 0 Sunday church attendance; an individual can either

100007 1 1 0 1 0 1 0 attend the Sunday morning service or not. In both
100008 1 1 1 1 1 1 1
100009 1 1 1 1 1 1 0
cases, the opportunities for a transaction occur at dis-
100010 1 0 0 0 0 0 0 crete points in time, and there is an upper bound
on the number of transactions that can occur in a
xed unit of time; an individual cannot attend the
111102 1 1 1 1 1 1 1 INFORMS Marketing Science Conference more than
111103 1 0 1 1 0 1 1 once a year or attend the Sunday morning church ser-
111104 1 0 0 0 0 0 0
vice more than 52 times a year. In such noncontractual
settings, the behavior is necessarily discrete, and it
them off, or is there still some meaningful future is clearly incorrect to model the number of transac-
value in themindividually and collectively? tions using a Poisson distribution. It would be more
Recognizing that this a noncontractual setting,1 appropriate to model the number of transactions in a
the marketing analyst may think, Lets use the given time period using a Bernoulli process.
Pareto/NBD, a model developed by Schmittlein In other settings, the behavior of interest can occur
et al. (1987) to provide answers to the kinds of in continuous time, but it is effectively discrete in
customer-base analysis questions listed above. the way rms view it. Consider the case of blood
But is this an appropriate way to proceed? At the donations. A blood collection agency will send quar-
heart of the Pareto/NBD model is the assumption that terly notices to its donor base, requesting that they
customer purchasing while alive is characterized by give blood. Although an individual can give blood
a Poisson distribution and that cross-sectional hetero- at any point in time during that quarter, there is still
geneity in the mean purchase rates is characterized an upper bound in the number of times the agency
by a gamma distribution (resulting in the negative is willing to accept blood from any donor and can
binomial distribution (NBD) model of repeat buying; therefore characterize a donors behavior in terms of
Ehrenberg 1988, Morrison and Schmittlein 1988). The whether or not she gave blood in a xed time inter-
use of the Poisson distribution assumes that trans- val. Similarly, a charity may send out letters every
actions can occur at any point in time; this may be six months requesting money. Although an individ-
an acceptable assumption for the purchasing of CDs ual can send in a donation at any point in time, the
from a website or for the purchasing of ofce prod- charity is basically interested in whether or not he
ucts in a business-to-business (B2B) setting, which responded to a specic request for funds and will
are the empirical settings considered by Fader et al. therefore characterize donation behavior simply in
(2005) and Schmittlein and Peterson (1994), respec- terms of whether or not the individual responds to a
tively. However, it is not a valid assumption in a mailing (Piersma and Jonker 2004). A number of mail-
number of other situations, including the nonprot order companies also think of their customer behav-
setting described above. Even Schmittlein et al. (1987) ior in such a manner (e.g., did the customer place an
acknowledge that their model has limited applicabil- order in response to the quarterly catalog mailing?).
ity and that there is a need for an alternative modeling In these cases, it is convenient to think of there being
a natural upper bound on the number of transactions
1
In a contractual setting (e.g., gym membership, cable TV, the- that can occur in a xed unit of time (e.g., year), and
ater subscription plan), we observe the time at which the customer it is therefore more appropriate to model the number
dies (i.e., ends their formal relationship with the rm). In a non-
contractual setting (e.g., traditional mail order, retail store patron-
of transactions using a Bernoulli process rather than
age), however, the time at which a customer dies is unobserved by a Poisson distribution.
the rm; customers do not notify the rm when they stop being a Finally, there are cases where the event of interest
customer. Instead they just silently attrite (Mason 2003, p. 55). The has no constraints on it at allit is truly a continuous-
only potential evidence of this having happened is an unusually time behavior, but it is so rare per unit of time that
long hiatus since the last recorded purchase. The challenge facing
the analyst is how to differentiate between those customers who
management will choose to discretize the purchasing
have ended their relationship with the rm versus those who are data for analysis and reporting purposes. For exam-
simply in the midst of a long hiatus between transactions. ple, a cruise-ship company may characterize customer
1088 Marketing Science 29(6), pp. 10861108, 2010 INFORMS
behavior in terms of whether or not each customer holdout period. We then examine the relative perfor-
went on a cruise in 2000, 2001, 2002, etc. (Berger et al. mance of the Pareto/NBD model when applied to this
2003). Once again, purchasing behavior is more con- same data set. Next we present an extension to the
veniently described as a Bernoulli process rather than basic model in which the consequences of relaxing one
as a Poisson process. An example of this in a con- of the model assumptions are explored. We conclude
sumer packaged goods setting is the work of Chateld with a discussion of several additional issues that arise
and Goodhardt (1970), who model the purchasing of from this work.
a product not in terms of the number of purchases
made by an individual in a 24-week period (using 2. Model Development

the NBD model) but rather in terms of the number of Our objective is to develop a stochastic model of
weeks in which an individual purchased the product buyer behavior for discrete-time, noncontractual set-
(using the beta-binomial model of Skellam 1948, with tings. To start, we dene a transaction opportunity as
n = 24). Similarly, Easton (1980) uses the beta-binomial either one of the following:
model to characterize purchasing in an industrial set- A well-dened point in time at which a transac-
ting, commenting that using a discrete purchase inter- tion either occurs or does not occur, or
val is a useful way of overcoming the problem of A well-dened time interval during which a trans-
determining when exactly a purchase is deemed to action either occurs or does not occur.
have occurred in a B2B setting. The rst type of transaction opportunity corre-
Figure 1 illustrates this continuum of settings in sponds to the necessarily discrete case in Figure 1. The
which it is either correct or simply makes more sense second type of transaction opportunity corresponds
to model individual-level transaction behavior using to the generally discrete and discretized by record-
a Bernoulli process rather than a Poisson distribution. ing process cases in Figure 1. (The nonprot example
In all of these settings, it is inappropriate to use the discussed in the introduction is an example of this
Pareto/NBD as the underlying model for a customer- second case.) In all three cases, a customers trans-
base analysis exercise. action history can be expressed as a binary string,
In this paper we develop a model that can be used to where yt = 1 if a transaction occurred at or during the
answer the critical customer-base analysis questions in tth transaction opportunity, and 0 otherwise (for t =
discrete-time, noncontractual settings; in other words, 1 n transaction opportunities). Note that we are
we develop a discrete-time analog of the Pareto/NBD simply interested in modeling the transaction process
model. Although many aspects of the Pareto/NBD (i.e., the pattern of 1s and 0s). We are not interested in
model (and the inferences frequently associated with modeling other behaviors associated with each trans-
it) carry over fairly smoothly to the discrete-time set- action (e.g., the quantity purchased); this is discussed
ting, there are a number of interesting issues that arise in 6.
in the discrete-time setting that are quite uniqueand Our model is based on the following six
offer signicant benets for model implementation. assumptions.
In the next section, we rst outline the assumptions Assumption 1. A customers relationship with the
underpinning this model and then present expres- rm has two phases: he is alive (A) for some period of
sions for a number of managerially relevant quan- time, then becomes permanently inactive (dies; D).
tities. This is followed by an empirical analysis (for
Assumption 2. While alive, the customer buys at any
the aforementioned nonprot organization) in which
given transaction opportunity with probability p:
we carefully examine the performance of the model
both in a six-year calibration sample and a ve-year PYt = 1 p alive at t = p 0 p 1
(This implies that the number of transactions by a cus-

Figure 1 Classifying Discrete-Time Transaction Opportunities
tomer alive for i transaction opportunities follows a bino-
Necessarily discrete Church attendance mial (i p) distribution.)
Attendance at a periodic academic conference
Assumption 3. A living customer dies at the begin-
ning of a transaction opportunity with probability . (This
implies that the (unobserved) lifetime of a customer is char-
Generally discrete Charity donations acterized by a geometric distribution.)
Blood donations
Assumption 4. Heterogeneity in p follows a beta dis-
tribution with probability distribution function (pdf )
p1 1 p1
Discretized by Cruise-ship vacations f p = 0 p 1 > 0 (1)
recording process B
Assumption 5. Heterogeneity in follows a beta dis- observed transaction occurred (tx ).2 We therefore go
tribution with pdf from 2n binary string representations of all the pos-
sible purchase patterns to nn + 1/2 + 1 possible
1 1
1 recency/frequency patterns.
f
= 0 1
> 0 (2)
B
This realization that recency and frequency are suf-
cient summary statistics offers signcant benets
Assumption 6. The transaction probability p and the
for model implementation, particularly as the num-
dropout probability vary independently across customers.
ber of transaction opportunities becomes sizeable. For
Assumptions (2) and (4) yield the beta-Bernoulli instance, in the case of our nonprot organization, we
model (i.e., the beta-binomial model without the bino- can compress the number of necessary binary strings
mial coefcient, since we explicitly account for the from 64 down to 22 recency/frequency combinations,
ordering of the transactions). Similarly, Assumptions making it a bit easier to visualize and manipulate the
(3) and (5) yield the beta-geometric (BG) distribu- data set. However, in another recent application with
tion. We therefore call this the beta-geometric/beta- n = 10, we saw a reduction from 1,024 binary strings
Bernoulli (BG/BB) model of buyer behavior. down to 56 recency/frequency combinations. Further-
more, these numbers are not affected by the size of the
2.1. Derivation of Model Likelihood Function customer base being modeled; see Table 2 for a com-
Consider a customer with repeat purchase string plete characterization of the nonprot data set par-
1 0 1 0 0. What is PY1 = 1 Y2 = 0 Y3 = 1 Y4 = 0 Y5 = tially presented in Table 1. Whether we have 11,000
0 p ? The fact that the customer made a purchase at customers or 11 million customers, the data struc-
the third transaction opportunity means that he must ture would be identicalthe numbers in the No. of
have been alive for t = 1 2 3. However, Y4 = 0, Y5 = donors columns would grow, but the computational
0 could be the result of one of three scenarios: (i) he demands for data storage and manipulation would be
died at the beginning of the fourth transaction oppor- unaffected.
tunity (AAADD), (ii) he was alive at the fourth trans- Returning to the likelihood function, we generalize
action opportunity and died at the beginning of the the logic behind the construction of (3), so it follows
fth transaction opportunity (AAAAD), or (iii) he was that
alive at both the fourth and fth transaction opportu-
nities (AAAAA). We therefore compute PY1 = 1 Y2 = Lp xtx n = px 1pnx 1n
0 Y3 = 1 Y4 = 0 Y5 = 0 p by computing the prob- ntx 1

ability of the purchase string conditional on each sce- + px 1ptx x+i 1tx +i (4)
nario and multiplying it by the probability of that i=0
scenario: To arrive at the likelihood function for a randomly
f 10100 p chosen customer with purchase history (x tx n), we
remove the conditioning on p and by taking
= f 10100 p AAADDPAAADD the expectation of (4) over their respective mixing
distributions:
+ f 10100 p AAAADPAAAAD
+ f 10100 p AAAAAPAAAAA L
x tx n
1 1
= p1 pp 1 3 +p1 pp1 p 1 4 = Lp x tx nf p f
dp d
0 0
PAAADD PAAAAD
B + x + n x B
+ n
+ p1 pp 1 p1 p1 5 (3) =
B B

PY1 =1Y2 =0Y3 =1 PAAAAA ntx 1
B + x + tx x + i
+
Note that the zero-order nature of purchasing while i=0
B
the customer is alive means that the exact order of
any given number of transactions prior to the last B + 1
+ tx + i
(5)
observed transaction does not matter. For example, B

it should be clear that f 10100 p = f 01100 p . (The solution to the double integral follows naturally
Therefore, we do not need the complete binary- from the integral representation of the beta function.)
string representation of a customers transaction his-
tory. Rather, all we need to know for n transaction 2
If x = 0, then tx = 0. Note that this measure of recency differs
opportunities are frequency and recency: the number
of from that normally used by the direct marketing community, who
transactions across the calibration period (x = nt=1 yt ) measure recency as the time from the last observed transaction to
and the transaction opportunity at which the last the end of the observation period (i.e., n tx ).
Table 2 Recency/Frequency Summary of the Annual Donation with mean

Behavior by the 1995 Cohort of First-Time Supporters n = 6
x tx No. of donors x tx No. of donors EXn

6 6 1203 4 4 240 =
5 6 728 3 4 181 + 1
4 6 512 2 4 155
+
1 +
+ n
3 6 357 1 4 78 1 (8)
2 6 234 3 3 322 +
+ n 1 +

1 6 129 2 3 255
5 5 335 1 3 129 More generally, let the random variable

4 5 284 2 2 613 Xn n + n = nt=n+1 Yt denote the number of trans-
3 5 225 1 2 277
actions in the interval n n + n . The BG/BB
2 5 173 1 1 1091
1 5 119 0 0 3464 probability of x transactions occurring in this
interval is given by
The four BG/BB model parameters (

) can PXnn+n = x

be estimated via the method of maximum likelihood B
+n
=
x =0 1
in the following manner. For a calibration period with B

n transaction opportunities, we have J = nn + 1/2 + 1
n B+x +n x B
+n+n
possible recency/frequency patterns, each containing +
x B B

fj customers. The sample log-likelihood function is
1
n

given by i B+x +i x B +1
+n+i
+ (9)
i=x
x B B

J
LL
= fj lnL
xj txj n (6) with mean
j=1
EXn n + n

where xj and txj are the frequency and recency, respec-

tively, for each unique pattern. This can be maxi-

+

=
mized using standard numerical optimization rou- + 1 1 +

tines. These calculations are easy to perform in a
1 +
+ n 1 +
+ n + n
spreadsheet environment; in fact, the entire model (10)
+
+ n +
+ n + n
implementation (from initial data setup through the
calculation of the key results in the next section) In most customer-base analysis settings, we are
rarely requires the analyst to use any software beyond interested in making statements about customers con-
a spreadsheet. This is a major benet of the BG/BB ditional on their observed purchase history x tx n.
model. The probability that a customer with purchase
history x tx n will be alive at the n + 1th transac-
2.2. Key Results tion opportunity is
We now present expressions for a set of quantities
of interest to anyone wanting to apply this model Palive at n + 1
x tx n
of buyer behavior in a discrete-time, noncontractual B + x + n x B
+ n + 1
setting. (The associated derivations can be found in =
B B

Appendix A.)
L
x tx n1 (11)
Let the random variable Xn = nt=1 Yt denote
the number of transactions occurring across the rst
The probability that a customer with purchase
n transaction opportunities. The BG/BB probability
history x tx n makes x transactions in the interval
mass function is
n n + n is
PXn = x

PXn n + n = x
x tx n
n B + x + n x B
+ n
= 1
x B B
=
x =0 1
L
x tx n

i B + x + i x B + 1
+ i
n1
2
+ (7) + (12)
i=x
x B B
L
x tx n
where The number of discounted expected residual trans-

B + x + n x B
+ n actions (DERT) is the present value of the expected
1 = future transaction stream for a customer with pur-
B B

chase history x tx T . Fader et al. (2005) derive the
expression for this quantity when the transaction pro-
and cess can be described by the Pareto/NBD model.

n B + x + x + n x + n x When the transaction process is described by the

2 = BG/BB model, the present value of the expected num-
x B
ber of future transactions for a customer with pur-

B
+ n + n chase history x tx n, with discount rate d is

B

n 1
DERTd
xtx n
i B + x + x + n x + i x
+ B+x +1+nx B
+n+1
i=x
x B =
B B
1+d
B + 1
+ n + i

B
2 F1 1
+n+1 +
+n+11/1+d
(14)
L
xtx n
The expected number of future transactions
across the next n transaction opportunities by a cus- where 2 F1 is the Gaussian hypergeometric func-
tomer with purchase history x tx n is tion.3 This number of discounted expected residual
transactions can then be rescaled by the customers
EXn n + n
x tx n
value multiplier to yield an overall estimate of
1 B + x + 1 + n x ERLV. Although the presence of the Gaussian
=
L
x tx n B hypergeometric function makes this calculation a bit

more complex than the others in this section, it is

+

worth emphasizing that it only needs to be evalu-
1 1 +

ated once for any given value of n (i.e., only once
1 +
+ n 1 +
+ n + n per cohort, not for every recency/frequency pattern),
(13)
+
+ n +
+ n + n and it is relatively straightforward to use a recursion
Many customer-base analysis exercises are moti- formula to perform the calculations in a spreadsheet
vated by a desire to compute customer lifetime value environment. Furthermore, this calculation for DERT
(CLV), which is the present value of the future cash is far simpler than the equivalent expression derived
ows attributed to the customer relationship (Pfeifer by Fader et al. (2005) for the Pareto/NBD model. In
et al. 2005, p. 17). The general explicit formula for that case, the DERT expression required the evalua-
computing CLV is (Rosset et al. 2003) tion of Gaussian hypergeometric functions for each
recency/frequency combination, as well as the con-
ECLV = Evt St dt dt uent hypergeometric function of the second kind,
0
which is unfamiliar and fairly burdensome from a
where Evt is the expected value of the customer computational standpoint.
at time t (assuming he is alive), St is the survivor Finally, we may also be interested in making
function, and dt is a discount factor that reects the inferences about a customers latent transaction and
present value of money received at time t. Follow- dropout probabilities.
ing Fader et al. (2005), if we assume that the pro- The marginal posterior distribution of P is
cess describing the net cash ow per transaction for
a given customer is both independent of the trans- f p
xtx n = L
xtx n (15)
action process and stationary, we can express vt
as net cash ow/transaction tt, where tt is the where
transaction rate at t.
In many cases we are interested in the expected p+x1 1 p+nx1 B
+ n
=
residual lifetime value of a customer. Standing at B B

time T , ntx 1
p+x1 1 p+tx x+i1 B + 1
+ tx + i
+
ERLV = Enet cashow/transaction i=0
B B

EttSt t > T dt T dt
T
3
Assuming that there are k transaction opportunities per year, an
discounted expected residual transactions annual discount rate of r maps to a discount rate of d = 1+r1/k 1.
The marginal posterior distribution of is 3.1. Analysis of the 1995 Cohort

The group of 11,104 people that became support-
f
xtx n = L
xtx n (16) ers of the organization for the rst time in 1995
where made a total of 24,615 repeat transactions over the
next six years. Given the data in Figure 2, we
B + x + n x 1 1
+n1 code up the log-likelihood function given in (6)
=
B B
in Excelsee Figure 2 for a screensheet of the com-
ntx 1
B + x + tx x + i 1
+tx +i1 plete spreadsheet used for parameter estimation
+ and maximize it using the Solver add-in. (A note on

i=0
B B
how to implement the model in Excel, along with
For l m = 0 1 2 , the l mth product a copy of the complete spreadsheet, can be found
moment of the joint posterior distribution of P and at http://brucehardie.com/notes/010/.) The resulting
is maximum-likelihood estimates of the model parame-
ters are reported in Table 4. (We also report the model
EP l m
xtx n parameters and value of the log-likelihood function
B+l B +m
for the beta-Bernoulli model and note that the addition
= of the death component results in a major improve-
B B

ment in model t.)
L+l +m
xtx n The expected number of people making 0 1 6
(17)
L
xtx n repeat transactions between 1996 and 2001 is com-
puted using (7) and compared to the actual frequency
where L + l + m
x tx n is simply (5) eval-
distribution in Figure 3. We note that the model pro-
uated using + l in place of and + m in place
vides a very good t to the data.
of .
The performance of the model becomes more
impressive when we see how well it tracks repeat
3. Empirical Analysis transactions over time. Using the expression for the
We examine the performance of the BG/BB model expected number of transactions across n transac-
using data on the annual donation behavior by the tion opportunities as given in (8), we compute the
supporters of a nonprot organization located in the expected number of repeat transactions made by the
midwestern United States. The full data set contains whole cohort of 11,104 people up to 2006. These are
information on the 56,847 people who made their rst- plotted along with the actual cumulative numbers in
ever annual donation between 1995 and 2000 (inclu- Figure 4(a). We note that the BG/BB model predic-
sive), from their rst year up to and including 2006; tions accurately track the actual cumulative number
the sizes of each annual cohort are given in Table 3. of repeat transactions in both the six-year calibration
Our initial analysis focuses on the 11,104 mem- period and the ve-year forecast period, underfore-
bers of the 1995 cohort. We t the model using the casting at 2006 by a mere 065%.4 Further insight
data on whether or not these supporters made repeat into the excellent tracking performance of the model
donations across 19962001 and examine the models is given in Figure 4(b), which reports these num-
predictive performance across a 20022006 holdout bers on a year-by-year basis; we note that the BG/BB
validation period. We follow up this analysis with one model clearly captures the underlying trend in repeat
in which we pool the six cohorts, tting the model to transactions over this fairly lengthy period of time.
the repeat donation data up to and including 2001 and To get a clearer idea of how well the model cap-
examining its predictive performance over 20022006. tures validation period purchasing, we compute the
(For the sake of linguistic simplicity, we will refer to expected number of people making x = 0 1 5
the act of making a repeat donation in any given year transactions in 20022006 (n = 5) using (9) and com-
as making a repeat transaction or purchase.) pare it to the actual frequency distribution in Figure 5.
We note that the model provides a very good predic-
Table 3 Number of New Supporters Each Year tion of the actual behavior.
(19952000)
3.1.1. Conditional Expectations. Perhaps a more
Cohort Size important examination of the predictive performance
1995 11104 of the model focuses on the quality of the predic-
1996 10057 tions of future behavior conditional on past behavior.
1997 9043
1998 8175 4
As a point of comparison, the prediction associated with the BB
1999 8977
model overforecasts cumulative repeat transactions at the end of
2000 9491
2006 by 20%.
Figure 2 Screenshot of Excel Worksheet Used for Parameter Estimation
A B C D E F G H I J K L M N
1 alpha 1.204 B(alpha, beta) 1.146
=EXP(GAMMALN(B1) + GAMMALN(B2) GAMMALN(B1 + B2))
2 beta 0.750
3 gamma 0.657 B(gamma, delta) 0.729
4 delta 2.783 =EXP(GAMMALN($B$1 + A9) + GAMMALN($B$2 + C9 A9)
5 GAMMALN($B$1 + $B$2+C9))/$E$1*EXP(GAMMALN($B$3) +
GAMMALN($B$4 + C9) GAMMALN($B$3 + $B$4 + C9))/$E$3
6 LL 33,225.6 =SUM(E9: E30)
7
8 x t_x n # donors L(.|x = x, t_x, n) n -t_x 1 0 1 2 3 4 5

9 6 6 6 1,203 2,624.6 0.1129 1 0.1129 0 0 0 0 0 0
10 5 6 6 728 3,126.7 0.0136 1 0.0136 0 0 0 0 0
=IF(I$8<=$G9,EXP(GAMMALN($B$1 +
0.0046$A9) + GAMMALN($B$2 +0$B9 $A9 + I$8)
11 4 6 6 512 GAMMALN($B$1 + $B$2 + $B91 0 0 0 0 0
+ I$8))/$E$1*EXP(GAMMALN($B$3 + 1) + GAMMALN($B$4
12 3 6 6 357 1
+ $B9 + I$8) GAMMALN($B$3 0
0 + I$8 + 1))/$E$3,0)
0.0030+ $B$4 + $B9 0 0 0 0
13 2 6 6 234 1,322.5 0.0035 1 0.0035 0 0 0 0 0 0
14 1 6 6 129 630.0 0.0076 1 0.0076 0 0 0 0 0 0
15 5 5 6 335 1,245.1 0.0243
=C15-B15-1 0 0.0036 0.0107 0 0 0 0 0
16 4 5 6 284 1,447.1 0.0061 0 0.0046 0.0015 0 0 0 0 0
17 3 5 225 1,263.5
6 =D19*LN(F19)) 0.0036 0 0.0030 0.0006 0 0 0 0 0
18 2 5 6 173 952.6 0.0041 0 0.0035 0.0005 0 0 0 0 0
19 1 5 6 119 567.3 0.0085 0.0076 N19)
0=SUM(H19: 0.0009 0 0 0 0 0
20 4 4 6 240 923.6 0.0213 1 0.0046 0.0152 0.0015 0 0 0 0
21 3 4 6 181 915.7 0.0063 1 0.0030 0.0027 0.0006 0 0 0 0
22 2 4 6 155 805.3 0.0055 1 0.0035 0.0015 0.0005 0 0 0 0
23 1 4 6 78 356.5 0.0104 1 0.0076 0.0018 0.0009 0 0 0 0
24 3 3 6 322 1,135.8 0.0294 2 0.0030 0.0230 0.0027 0.0006 0 0 0
25 2 3 6 255 1,151.6 0.0109 2 0.0035 0.0054 0.0015 0.0005 0 0 0
26 1 3 6 129 545.0 0.0146 2 0.0076 0.0043 0.0018 0.0009 0 0 0
27 2 2 6 613 1,846.4 0.0492 3 0.0035 0.0383 0.0054 0.0015 0.0005 0 0
28 1 2 6 277 993.9 0.0276 3 0.0076 0.0130 0.0043 0.0018 0.0009 0 0
29 1 1 6 1,091 2,497.1 0.1014 4 0.0076 0.0737 0.0130 0.0043 0.0018 0.0009 0
30 0 0 6 3,464 4,044.3 0.3111 5 0.0362 0.1909 0.0459 0.0189 0.0098 0.0058 0.0037
We use (13) to compute the expected number of trans- being an underestimation of expected purchasing by
actions in the 20022006 period (n = 5) conditional on those individuals whose last repeat purchase occurred
each of the 22 x tx patterns associated with n = 6. before 1998.
These conditional expectations are reported in Table 5 Referring back to Table 5, we can now address the
as a function of recency (the year of the individuals questions about different kinds of customers raised at
last transaction) and frequency (the number of repeat the outset of the paper.
transactions). A donor who has made a repeat transaction every
In Figure 6(a) we report these conditional expecta- year is expected to make only 3.75 transactions over
tions, along with the average of the number of the the next ve years. Of course, such donors are still
transactions that actually occurred in the 20022006 extremely valuable, but the possibility of death plus
forecast period, broken down by the number of repeat the fact that they might have been somewhat lucky in
transactions in 19962001. (For each x, we are aver- the past make them a bit less valuable than they might
aging over customers with different values of tx .)
Similarly, Figure 6(b) reports these conditional expec-
tations along with the average of the number of the Figure 3 Predicted vs. Actual Frequency of Repeat Transactions
transactions that actually occurred in the 20022006 4,000
forecast period, broken down by the year of the indi- Actual
viduals last transaction. (For each tx , we are aver-
3,000 Model
aging over customers with different values of x.) We
No. of people
observe that the BG/BB model generates very good

predictions of the expected behavior in the longi-
2,000
tudinal holdout period, with the only real blemish
1,000
Table 4 Parameter Estimates, 1995 Cohort
LL
0
BB 0487 0826 355161 0 1 2 3 4 5 6
BG/BB 1204 0750 0657 2783 332256 No. of repeat transactions
Figure 4 Predicted vs. Actual (a) Cumulative and (b) Annual Repeat Table 5 Expected Number of Repeat Transactions in 20022006 as a
Transactions Function of Recency and Frequency
(a) Year of last transaction

No. of rpt transactions
Cumulative no. of repeat transactions
40,000 (19962001) 1995 1996 1997 1998 1999 2000 2001

Actual
Model 0 007
30,000 1 009 031 059 084 102 115
2 012 054 106 144 167
3 022 103 180 219

20,000 4 058 203 271
5 181 323
6 375
10,000
100004, with better recency but lower frequency, is

0 expected to make 2.71 transactions over the same
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Year periodan increase of nearly 50%. This highlights the
critically important role of recency, which can also be
seen in the steep growth of the curve in Figure 6(b).
(b) 6,000
Although donors 100004 and 111103 have differ-
ent histories, their recency and frequency numbers
No. of repeat transactions
5,000
are identical (x = 4, tx = 6); thus, they have the same
4,000 conditional expectation. Minor, remote differences in
purchase histories are deemed to be irrelevant when
3,000
making predictions using the BG/BB model.
2,000 A donor who has been completely absent since
making his or her initial transaction is expected to
1,000 make only 0.07 repeat transactions over the next ve
years. However, although each such donor is not
0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 particularly valuable alone, it is important to note,
Year as per Table 2, that over 30% of the entire cohort
of donors is in this recency/frequency group. Taken
together, these donors are expected to make over 240
have otherwise seemed. (With reference to Figure 6(a),
transactions over the next ve years, making them
we see that this conditional expectation overestimates
collectively more valuable than about half of the other
the actual mean (3.53) by only 6%.)
recency/frequency groups.
Donor 100009, who had had a perfect record until
Beyond these specic analyses, Table 5 offers addi-
the most recent year, is expected to make 1.81 trans-
tional insights about the broader interplay between
actions over the next ve years. In contrast, donor
recency and frequency. First, note that for any row
(i.e., value of x), the expected number of transactions
Figure 5 Predicted vs. Actual Frequency of Repeat Transactions in in the forecast period decreases as we move from
20022006 right to left (i.e., the less recent the last observed
7,000 transaction). This is as we would expect, because the
longer the hiatus in making a purchase, the more
6,000
Actual likely it is that the customer is dead. Looking
down the columns, however, we see a somewhat dif-
Model
5,000 ferent pattern. We rst look at 2001 and note that
the conditional expectation is clearly an increasing
No. of people
4,000 function of the number of repeat transactions made

in the six-year calibration period. Looking at the
3,000
19972000 columns, though, we note that the numbers
rst increase, then decrease as the number of repeat
2,000
transactions made in the six-year calibration period
1,000
decreases. (A similar pattern is observed in the DERT
numbers under the Pareto/NBD model reported in
0 Fader et al. 2005.)
0 1 2 3 4 5 To help understand why this is the case, we use (11)
No. of repeat transactions and (17) to compute Palive in 2002 and the mean
Figure 6 Predicted vs. Actual Conditional Expectations of Repeat Table 6 P(Alive in 2002) as a Function of Recency and Frequency
Transactions in 20022006 as a Function of (a) Frequency
and (b) Recency Year of last transaction
(a) (19962001) 1995 1996 1997 1998 1999 2000 2001
4
No. of repeat transactions (20022006)
0 011
Actual 1 007 025 048 068 083 093
Model 2 007 030 059 080 093
3 3 010 044 077 093
4 020 070 093

5 052 093
2 6 093
1
those customers who made only one repeat transac-
tion will have a lower value of p than those who have
made a repeat purchase in all ve years, and there-
0 fore the fact that no transaction occurred in 2001 can
0 1 2 3 4 5 6 be attributed more to their low probability of making
No. of repeat transactions (19962001) a purchase in any given year than to the possibility
of them being dead.
(b) Table 7 reports the mean of the marginal pos-
4
terior distribution of P . Looking at this table col-
umn by column, we see that the posterior mean

increases as a function of the number of repeat trans-
3
actions in the calibration period for any given value of
recency. This is intuitive: a smaller number of repeat
2
transactions reects a lower underlying probability
of purchasing at any given transaction opportunity
(assuming one is alive). Perhaps less immediately
1 intuitive is the within-row pattern: for a given level
of frequency, the underlying probability of purchas-
ing at any given transaction opportunity increases as
0 recency decreases. The reason for this is that, other
1995 1996 1997 1998 1999 2000 2001 things being equal, the longer the hiatus since the last
Year of last transaction transaction, the more likely it is that the customer is
dead, and therefore the individual must have had a
of the marginal posterior distribution of P as a func- higher p in order to have the realized number of trans-
tion of recency and frequency. The combinations of actions while alive.
the patterns we shall see in these two tables provides Further insights can be obtained by looking at the
an explanation for this somewhat surprising pattern marginal posterior distributions of P and , (15)
of conditional expectations. and (16). With reference to Figure 7(a), the prior is
Let us rst consider the probability that a cus- the plot of a beta distribution with parameters =
tomer is alive in 2002; see Table 6. Looking across 1204 and = 0750; the overall mean of P across the
the columns for any value of x, the observed pattern whole sample is 0.62. With reference to Figure 7(b),
is as would be expected, with a lower probability of
being alive the longer the hiatus in making a dona- Table 7 Posterior Mean of P as a Function of Recency and Frequency
tion. Taking a columnwise view, the rst thing to note
is that all customers who made a transaction in 2001 Year of last transaction
have the same probability of being alive the following (19962001) 1995 1996 1997 1998 1999 2000 2001
year, regardless of the number of repeat transactions
0 049
they had prior to that year; this is a natural con-
1 066 044 034 030 028 028
sequence of the Bernoulli death process. Looking 2 075 054 044 041 040
at the 19972000 columns, we note that the numbers 3 080 061 054 053
increase as the number of repeat transactions made 4 082 068 065
in the six-year calibration period decreases. The logic 5 083 078
6 091
behind this is as follows: looking at the 2000 column,
Figure 7 Prior and Selected Posterior Distributions of (a) P and (b)

Table 8 Probability of Being Active in 20022006 as a Function of
Recency and Frequency
(a) 6
Prior
Year of last transaction
5 Posterior for x = 3, tx = 3 (1998) No. of rpt transactions
Posterior for x = 3, tx = 6 (2001) E (P) = 0.80 (19962001) 1995 1996 1997 1998 1999 2000 2001
4
0 005
f ( p)
3 E (P) = 0.62 1 005 017 032 046 056 062

E (P) = 0.53
2 005 024 048 066 076
2
3 009 040 069 084
4 019 066 088
1
5 051 091
6 092
0
0.00 0.25 0.50 0.75 1.00
p have shown interest in the Palive measure.
Although we have reported this quantity as a means
(b) 14 of understanding patterns of conditional expectations,
12 E () = 0.07 we feel that the measure is of limited diagnostic
value when viewed by itself. It is a prediction of
10
something that is, by denition, unobservable (i.e.,
8 whether or not a customer is still alive at a par-
f ( )
E () = 0.19
6
ticular point in time), and thus it is impossible to
E () = 0.20
directly assess its validity. A useful companion mea-
4 sure is a prediction of whether or not the customer
2 will be active in the future, that is, whether or not the
customer undertakes any transactions in a specied
0
0.00 0.25 0.50 0.75 1.00 future period of time.5
The probability that a customer is active in the
20022006 period (n = 5) is computed as 1
PXn n + n = 0 x tx n using (12), conditional on
the prior is the plot of a beta distribution with param-
each of the 22 x tx patterns associated with n = 6.
eters = 0657 and
= 2783; the overall mean of
This conditional penetration is reported in Table 8
across the whole sample is 0.19. The posterior distri-
as a function of recency (the year of the individuals
bution of P for an individual who made three consec-
last transaction) and frequency (the number of repeat
utive repeat purchases with the last one in 1998 has
transactions).
most of its mass to the right; the observed sequence
Comparing Tables 5 and 8, we note that the esti-
of purchases reects the high mean of this distribu-
mated probabilities of being alive in 2002 are strictly
tion EP = 080). At the same time, the three-year
higher than the corresponding conditional 20022006
hiatus suggests that the supporter is dead as a result
penetration numbers. This makes intuitive sense, but
of their coming from a posterior distribution with
the differences between these measures reect several
an interior mode and with E = 020.
factors. First, the Palive numbers are just for one
On the other hand, someone who made three repeat
year, whereas the penetration numbers are for a ve-
purchases with the last one in 2001 had to be alive
year period. Second, the mere fact that someone is
over the whole period, which is a result of their
alive does not mean she will be active, because the lat-
coming from a beta distribution with most of its mass
ter state depends on the persons underlying transac-
piled to the left, with E = 007. The fact that trans-
tion probability p. This is very clear when we look at
actions did not occur in three of the six years reects
the rightmost column of both tables. Although those
the fact that their p comes from a distribution with a
people who made a purchase in 2001 have the same
lower mean (EP = 053).
probability of being alive, irrespective of frequency,
These relationships between P and suggest that
their corresponding probabilities of making at least
there may be some correlation in the joint posterior
one transaction in the next ve years clearly (and log-
distribution (despite the fact we assume independent
ically) increase as a function of frequency, reecting
priors). This is indeed the case, and we explore it with
two analyses in Appendix B. (We discuss a model
5
with correlated priors in 5.) Many authors, including Schmittlein et al. (1987), have used the
terms alive and active as synonyms. We feel that this should
3.1.2. Conditional Penetration. Ever since the not be the case, with the term alive referring to an unobservable
publication of Schmittlein et al. (1987), researchers state and the term active referring to observable behavior.
in part the associated probabilities of making a pur- Figure 8 Predicted vs. Actual Frequency of Repeat Transactions by the
chase at any given transaction opportunity given alive 19952000 Cohorts
(Table 7). Third, the lower penetration numbers also 25,000
reect the fact that inactivity may be due to the per- Actual
son dying in 20032006, even if they had been alive 20,000 Model
in 2002.
No. of people
In summary, we encourage researchers who might 15,000
be attracted by the Palive measure to also utilize the
conditional penetration numbers, because they reect

10,000
an observable quantity (i.e., whether or not the cus-
tomer is active).
5,000
3.2. Pooled Analysis

The analyses presented above all focused on a single 0
0 1 2 3 4 5 6
cohort, the group of individuals who made their rst-
ever donation during 1995. However, as noted earlier,
we have data for a total of six cohorts. At rst glance
we may be tempted to apply the model cohort by see that the model overforecasts the holdout trans-
cohort; unfortunately, we are not able to estimate a actions by a mere 025%. Looking at Figure 9(b),
complete set of cohort-specic parameters. Consider, which reports these numbers on a year-by-year basis,
for instance, the 2000 cohort: we only have one obser- we note that the BG/BB model clearly captures the
vation per customerwhether or not each new donor underlying trend in repeat transactions. (The repeat
made a repeat donation in 2001 (i.e., n = 1)and as transaction numbers rise up to 2001 as new support-
such cannot identify the model parameters. The obvi- ers continue to enter the combined pool of donors;
ous, albeit possibly restrictive, solution is to pool all after that point, we are focusing on a xed group of
six cohorts and estimate a single set of model param-
eters. We now turn our attention to such an analysis,
examining how well the BG/BB model predicts the Figure 9 Predicted vs. Actual (a) Cumulative and (b) Annual Repeat
behavior of the complete group of the 56,847 people Transactions
who made their rst-ever donation to the organiza- (a)
tion between 1995 and 2000. 160,000

The maximum-likelihood estimates of the model Actual
parameters are reported in Table 9. (Comparing the Model
120,000
t of the BG/BB model with that of the beta-Bernoulli
model, we once again note that the addition of the
death component results in a major improvement in 80,000
model t.) We also note that the BG/BB parameters
for the pooled model are remarkably similar to those 40,000
of the 1995 cohort by itself (Table 4)this reects both
the high reliability of the model as well as the poola-
bility of the cohorts. Figure 8, which compares the 0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
expected number of people making 0 1 6 repeat Year
transactions between 1996 and 2001 with the observed
frequencies, conrms that the model provides a very (b) 20,000
good t to the data.
The pooled model continues to accurately track the

actual number of repeat transactions over time. View- 15,000
ing Figure 9(a), which shows the actual versus pre-

dicted cumulative number of repeat transactions, we 10,000
Table 9 Parameter Estimates, Pooling the 19952000 Cohorts 5,000
LL
0
BB 0501 0753 1156150 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
BG/BB 1188 0749 0626 2331 1105210 Year
56,847 potential repeat supporters.) The conditional Figure 10 Comparing the Number of Repeat Donations as Predicted
expectation plots, omitted in the interests of space, are by the Pareto/NBD Model with the Actual Numbers
similarly impressive. 5,000
This pooled analysis provides a further illustra-
Actual
tion of the remarkable ability of the BG/BB model to 4,000 Pareto/NBD
describe and predict the future behavior of a customer
No. of people
base. It is encouraging to see how one set of param-
3,000
eters can capture the behavior of different cohorts
acquired across six consecutive years (19952000) and

2,000
project their actions quite accurately into the future.
1,000
4. Comparison with the Pareto/NBD
Model 0
Our empirical analysis has focused on the number of 0 1 2 3 4 5 6 7 8 9 10+
repeat transactions. The alert reader will have ques- No. of repeat donations
tioned our use of the term transactions because
this is not a necessarily discrete setting (Figure 1).
Strictly speaking, we have been modeling whether or the Pareto/NBD model (e.g., Fader et al. 2005), we
not the supporter has made any donation to the orga- note that the Pareto/NBD provides a poor t to the
nization each year; we have ignored the fact that some observed donation data.
supporters may make more than one donation in a Another test of the Pareto/NBD as a model of the
given year. donation process is to estimate the implied ow of
We feel that such an approach is perfectly appro- annual transactions (i.e., annual incidence) and then
priate for two reasons. First, the majority of the sup- examine how well the model captures and predicts
porter base (71%) made only one donation for each of the observed transaction patterns. The expected num-
the years during which a transaction occurred. Sec- ber of people making 0 1 6 repeat transactions
ond, this is the way the nonprot organization thinks between 1996 and 2001 is compared to the actual fre-
about its donor base; they focus more on whether or quency distribution in Figure 11. In contrast to the
not each person has made a donation in any given t observed for the BG/BB model in Figure 3, we
year (0/1), not as much on the number of donations see that the Pareto/NBD fails to capture the observed
made. Thus, the 0/1 indicator is the primary behav- annual incidence of donations.
ioral measure recorded in the database provided to us We can also examine how well the model tracks
(just as it was for Netzer et al. 2008). repeat transactions over time, both cumulatively (Fig-
Nevertheless, the fact that 29% of the supporter ure 12(a)) and year by year (Figure 12(b)). In con-
base made more than one donation in at least one trast to the equivalent plots for the BG/BB model
of the years during which a transaction occurred (Figures 4(a) and 4(b), respectively), we see that
may lead some to argue that we should be mod- Pareto/NBD fails to track the actual data. The initial
elling the number of donations over time rather than
annual incidence; the natural model to use for such
Figure 11 Comparing the Number of Repeat Transactions (i.e., Annual
an approach to the data would be the Pareto/NBD. Incidence) as Predicted by the Pareto/NBD Model with the
Returning to the 1995 cohort, we obtained data Actual Numbers
on the number of repeat donations made by each 5,000
supporter within each year (i.e., the binary string
characterization of behavior is replaced by a string Actual
4,000
of nonnegative integers). Given the interval-censored Pareto/NBD
nature of these data, we estimate the parameters of
No. of people
the Pareto/NBD model using the likelihood function 3,000
given in Fader and Hardie (2005).6

The expected number of people making 0 1 2 2,000
repeat donations between 1996 and 2001 is compared
to the actual frequency distribution in Figure 10. In 1,000
contrast to the performance we normally expect from
0
6
The parameter estimates are r = 11419, = 12865, s = 0129, and 0 1 2 3 4 5 6
= 0013, with LL = 445066. No. of repeat transactions
Figure 12 Predicted vs. Actual (a) Cumulative and (b) Annual Repeat a strong performance by the BG/BB model and a poor
Transactions performance by the Pareto/NBD model.
(a) To summarize, this analysis has demonstrated that
the Pareto/NBD model fails to capture the ow of
40,000
donations. Treating the data as discreteeven though
Actual
Pareto/NBD the underlying process is not necessarily discrete
30,000 and modeling the ow of transactions (i.e., inci-
dence, rather than the overall number within each dis-
20,000
crete time interval) using the BG/BB model is clearly
superior.
Why does the Pareto/NBD perform so poorly
10,000 in this case? The assumption of exponential inter-
purchase times between donations (which yields the
0 Poisson count model) is a dubious one in this set-
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ting. Donations are made too regularly (e.g., in
Year
December of each year) to be accommodated by the
memorylessness of the exponential/Poisson. Con-
(b) 6,000 sider, for example, the 1,203 customers who made
a donation every year (Table 2). An individual-level
5,000
Poisson model would take such a high donation rate
4,000 and (because of its equi-dispersion property) would
predict a fairly large number of years with multiple
3,000 donations. However, each of these customers made,
2,000
on average, a total of only 1.3 donations per year
across the calibration period. The Pareto/NBD sim-
1,000 ply cannot cope with such a low level of persistent
behavior. Schmittlein et al. (1987, p. 17) explicitly
0
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
acknowledged this limitation as well: For processes
Year like church attendance and television viewing the
opportunities for a transaction occur regularly, so
our model is inappropriate. In contrast, directly
underprediction follows naturally from the overesti- modeling annual incidenceas opposed to continuous-
mation of the number of people making zero dona- time purchasingas a memoryless process (while
tions between 1996 and 2001. We also note that the the customer is alive) is a much more reasonable
Pareto/NBD fails to capture the overall rate of decline approach.
in transactions over time.
Finally, we examine how well the BG/BB and
Pareto/NBD models track (and predict) the evolution 5. Extending the Basic Model
of the number of cohort members that ever make a Of all the assumptions associated with the BG/BB
repeat transactionsee Figure 13. Once again we see model, the one that many readers will have the most
problem with is Assumption (6), that the transac-
tion probability p and the dropout probability vary
Figure 13 Comparing the Number of Ever-Repeaters as Predicted by
the BG/BB and Pareto/NBD Models with the Actual Number
independently across customers. This is not nearly
as restrictive as it may seem; more formally, we are
8,000
assuming independent priors, which does not imply
independence in the joint posterior distribution of P
No. of ever-repeaters
6,000 and . (In fact, we can see some fairly strong correla-
tions in the posterior distributionssee Appendix B.)
Nevertheless, we now relax this assumption.
4,000
An extremely attractive consequence of Assump-
Actual
tions (4)(6) (i.e., independent beta-mixing distribu-
2,000 BG/BB tions) is that we arrive at simple analytical expressions
Pareto/NBD for all the model quantities of interest, which greatly
0
reduces the barriers to model implementation (e.g.,
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 being able to perform all the analysis in an Excel
Year spreadsheet). Ideally, we would like to be able to relax
the independence assumption without losing the abil- Table 10 Results of the Model That Replaces Independent
ity to derive simple analytical expressions. Beta-Mixing Distributions with an SBB Distribution for
The Sarmanov family of distributions, as introduced Heterogeneity in P and
to marketing by Park and Fader (2004), is a natural SBB heterogeneity

starting point, because it allows us to create bivari-
BG/BB Uncorr Corr
ate distributions with specied marginals. However,
a problem with the Sarmanov approach is that the Parameter estimates
range of its correlation coefcients is narrower than P 0720 1.119
1993
(1 1) and is a function of the parameters of the

-2.145
P2 3178 3.869
marginal distributions. When we relax Assumption (6)
2 2219 4.020
via the bivariate beta distribution used by Danaher P
1.774
and Hardie (2005), we nd that the distribution is LL 33,225.6 33,225.7 33,210.7
too constraining (i.e., the estimate of the correlation Moments in P
space
reaches the limits imposed by the estimated parame- E(P ) 0616 0614 0.666
ters of the marginal beta distributions). var(P ) 0080 0082 0.084
We therefore consider the more exible SBB distri- E
0191 0189 0.209
var
0035 0037 0.058
bution (Johnson 1949), also known as the logit-normal
corrP
0.361
distribution; that is,
logitp 2
P P P
MVN
logit P 2 are negligible. However, when we look at the dis-
tribution of holdout period transactions (Figure 15),
Because the individual-level process has not it is clear that the SBB -G/B model provides a bet-
changed, the likelihood function for a randomly cho-
ter prediction of the distribution than the already
sen customer is obtained by taking the expectation of
(4) over the joint distribution of P and :
L x tx n Figure 14 Comparing Predicted (a) Cumulative and (b) Annual Repeat
1 1 Transactions from the BG/BB and SBB -G/B Models vs. Actual
= Lp x tx nf p dp d (a)
0 0
40,000
The major downside of using this distribution is that
Actual
there is no analytic solution to this double integral.
BG/BB
We therefore evaluate the integrals using Monte Carlo 30,000 SBB-G/B
simulation; that is, we estimate the model parameters
using the method of maximum simulated likelihood
20,000
(making use of MATLAB). We call this the SBB -G/B
model.
We rst estimate a constrained version of the model 10,000
assuming p and are assumed to be uncorrelated.
With reference to Table 10, we see that model t
0
is almost identical to that of the original BG/BB 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
model. The associated moments in the P space Year
are also very close to those associated with the BG/BB
model. Allowing for a correlation results in a signi- (b)
6,000
cant improvement in model tan increase of 15 log-
likelihood points at the cost of one extra parameter.
5,000
The estimated (prior) correlation between P and is
4,000
0.361 (versus the limit of 0.042 associated with using
a Sarmanov bivariate beta distribution). 3,000
The big question is whether this improvement in
model t leads to any meaningful improvement in 2,000
the associated predictions. We rst consider how well
1,000
it tracks aggregate repeat transactions over time. The
cumulative and year-by-year numbers are plotted in 0
Figure 14. We note that the differences in the predic- 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
tions associated with the BG/BB and SBB -G/B models Year
Figure 15 Predicted (from the BG/BB and SBB -G/B Models) vs. Actual Table 11 Expected Number of Repeat Transactions in 20022006 as
Frequency of Repeat Transactions in 20022006 a Function of Recency and Frequency, as Predicted by the
SBB -G/B Model
7,000
Year of last transaction
6,000 Actual No. of rpt transactions
BG/BB (19962001) 1995 1996 1997 1998 1999 2000 2001
5,000
SBB-G/B
No. of people
0 010
4,000 1 010 044 075 093 104 111
2 012 066 121 152 168

3,000 3 022 115 193 224
4 056 212 278
2,000 5 178 326
6 359
1,000
0
0 1 2 3 4 5
this assumption comes at a cost. Whereas the basic
No. of repeat transactions BG/BB model can be implemented in Excel, the SBB -
G/B model requires a less accessible computing envi-
ronment (e.g., MATLAB). Although allowing for this
excellent prediction associated with the BG/BB
correlation does lead to some improvements in the
model.7
models predictive performance, the numbers are suf-
Turning our attention to the conditional expecta-
ciently similar for us to conclude that the cost-benet
tions, we rst look at the expected number of trans-
actions in the 20022006 period (n = 5) conditional
on each of the 22 (x tx ) patterns associated with Figure 16 Predicted (from the BG/BB and SBB -G/B Models) vs. Actual
n = 6. These conditional expectations are reported in Conditional Expectations of Repeat Transactions in
20022006 as a Function of (a) Frequency and (b) Recency
Table 11; they are the SBB -G/B model equivalents of
the numbers reported in Table 5. We note that these (a)
4
conditional expectations are highly correlated with
those associated with the BG/BB model (r = 0997). Actual

To compare these predictions with those associated BG/BB
3 SBB-G/B
with the BG/BB model, we report in Figure 16(a) the
two sets of conditional expectations along with the
average of the number of the transactions that actu-
2
ally occurred in the 20022006 forecast period, bro-
ken down by the number of repeat transactions in
19962001. (As in Figure 6(a), we are averaging over
1
customers with different values of tx for each x.) Sim-
ilarly, Figure 16(b) reports the two sets of conditional
expectations along with the average of the number 0
of the transactions broken down by the year of the 0 1 2 3 4 5 6
individuals last transaction. (For each tx , we are aver- No. of repeat transactions (19962001)
aging over customers with different values of x.) For
the most part, the predictions from the two models (b)
4
are very close. Nevertheless, there are some noticeable

differences (e.g., a donor who made a repeat transac-
tion every year in the calibration period is expected to 3
make 3.59 transactions over the subsequent ve years
according to the SBB -G/B model, versus 3.75 under
the BG/BB). 2
In conclusion, we nd that, at least for this empir-
ical setting, there is a signicant (prior) correlation
between the transaction and dropout probabilities; 1
that is, Assumption (6) is violated. However, relaxing
7
Assessing the relative t using the chi-squared goodness-of-t 0
measure, we note that it reduces from 47.9 for the BG/BB model to 1995 1996 1997 1998 1999 2000 2001
4.8 for the SBB -G/B model. Year of last transaction
trade-off is not immediately obvious. We will revisit Various benets associated with the BG/BB have
this issue in the following section. been mentioned throughout this paper, and we sum-
marize them here.
The BG/BB offers tremendous advantages in
6. Discussion terms of the required data structures. The size of
We have developed a new model that can be used the data summary required for model estimation
to answer standard customer-base analysis questions is purely a function of the number of transaction
in noncontractual settings where opportunities for opportunitiesnot the number of customersand
transactions occur at discrete intervals. Using a data therefore the model is highly scalable to customer
set on annual donations made by the supporters of bases of different sizes. Furthermore, in recognizing
a nonprot organization located in the midwestern that recency and frequency are sufcient summary
United States, we have demonstrated how the model statistics, the relationship between the number of
can be used to compute a number of managerially transaction opportunities and the size of the data set
relevant quantities such as future purchasing pat- is on the order of n2 , which is a signicant reduction
terns, both collectively and individually (conditional compared to using the full binary strings (order 2n ).
on past behavior). In examining these quantities we Besides the efcient data requirements, the cal-
have observed some interesting effects of past behav- culations associated with the model are much simpler
ior (as summarized by recency and frequency) on pre- than those of the Pareto/NBD. No unconventional
dictions about future behavior. or computationally demanding functions are required
The contractual versus noncontractual distinction for parameter estimation or for most of the diagnostic
that lies at the heart of this work is very similar to statistics that emerge from the model. Taken together
Jacksons (1985a, b) lost-for-good versus always- with the aforementioned data advantages, this means
a-share framework. Rust et al. (2004) observe that that the model is easy to fully implement and utilize
such a distinction is important, because the esti- within a standard spreadsheet environment, as illus-
mates of CLV generated by applying a lost-for-good trated in Figure 2. This is very appealing to practi-
model to data best characterized by the always-a- tioners, because this reduction in space/effort can be
share assumption will systematically underestimate accomplished at virtually no cost (i.e., without sacri-
true CLV. In a discrete-time always-a-share setting, the cing anything in model performance, as shown in
BB is the natural benchmark model for purchasing our empirical analyses).
from the rm. However, as shown earlier, it substan- Pragmatic considerations aside, we see that the
tially overforecasts cumulative repeat transactions; it Pareto/NBD can fail to capture the ow of donations,
fails to capture the leakage of customers over time be it the actual number or annual incidence. We sus-
typically observed in an always-a-share settingalso pect that there are many settings (particularly when
observed by East and Hammond (1996). By allowing periodic transactions tend to occur during a relatively
for an unobserved death component, the BG/BB can limited range of time) when these shortcomings of the
be viewed as a leaky version of an always-a-share Pareto/NBD will be quite evident.
model. The discrete nature of the data and the associated
As we mentioned from the outset of this paper, behavioral story lead to model diagnostics that are
the BG/BB is the direct analog of the Pareto/NBD convenient to display and are readily interpretable.
as one moves from a continuous-time setting to a For instance, it is very easy to see and appreciate the
discrete-time domain. We have brought up a number nonlinear pattern associated with high frequency and
of specic examples where this distinction is critically low recency, shown in Table 5. Likewise, a simple
important, as well as some situations (characterized examination of that table instantly answers the man-
as discretized by recording process in Figure 1) where agerial questions raised in the introduction.
the analyst might intentionally convert a continuous- Finally, it is relatively easy to build and ana-
time setting into a discrete-time one, primarily to lyze the BG/BB model across multiple cohorts of
be able to use the BG/BB model instead of the customerssomething that has been done rarely
Pareto/NBD. We are aware of several organizations (if ever) in the Pareto/NBD literature. Not only does
(including hotel chains, nancial services rms, and this make the model even more practical, but the
a variety of nonprots) that have chosen to focus multiyear empirical results shown here offer much
on discretized data, either on their own (such as stronger support for the models validity than a
the organization that provided the data used here) single-cohort analysis can provide.
or specically to utilize the BG/BB framework. The Although the BG/BB is an excellent starting point
fact that they have approached their data manage- for modeling discrete-time noncontractual data, there
ment/analysis in such a manner is an indication of are several natural extensions worth investigating
the direct applicability of this new model. in future research. First, as is the case with the
Pareto/NBD model, the BG/BB model will need to of marketing activitiesassuming such data are read-
be augmented by a model of purchase amounts when ily available in the rst case.8 If some of the under-
we are interested in the overall monetary value of lying modeling assumptions are unappealing (e.g.,
each customer. A natural candidate would be the the assumption of independence between the transac-
gamma-gamma mixture (Colombo and Jiang 1999) tion and dropout probabilities), we can create a ver-
that Fader et al. (2005) use in conjunction with the sion 2.0 of the model that comes at some increased
Pareto/NBD model. In situations (such as the data set computational cost.
used here) that are not necessarily discrete and where Implicit in these basic models is the assumption
there is the possibility that more than one transaction that future marketing activities will be basically the
could occur in each discrete-time interval, we should same as past marketing activities. The impressive
derive the monetary-value multiplier by rst mod- predictive performance of the BG/BB model sug-
eling the number of transactions (conditional on the gests that this is not an overly restrictive assumption.
fact that at least one transaction occurred) and then If there has been some customization of marketing
multiply this by the average value per transaction. activities on the basis of outputs generated from this
model (e.g., after scoring the customer database on
A logical model would be the shifted beta-geometric
the basis of P(alive) or the conditional expectations),
distribution (as used by Morrison and Perry 1970
then all we would need to do is reestimate the model
to model purchase quantity, conditional on purchase
on an updated data set when it is time to apply
incidence).
the model again in the future. (Given that this can
Second, we may want to allow for a non-zero-order
be done in Excel, such reestimation comes at very
purchasing process at the individual level. A good low cost.) Furthermore, the forecasts generated by the
historical starting point would be the Brand Loyal model provide a natural (and low-cost) baseline for
Model (Massy et al. 1970). This would effectively be examining the performance of the customized mar-
an extension of the Markov chain model of retail cus- keting activities.
tomer behavior at Merrill Lynch by Morrison et al. Beyond efforts to use the BG/BB for customized
(1982), an extension in which the exit parameter is marketing activities, a similar iterative approach can
allowed to be heterogeneous and is estimated directly be applied to better understand other kinds of time-
from the data (as opposed to being derived from other varying marketing activities. In ongoing eld appli-
data sources). cations of the model, we encourage organizations to
The research presented in this paper is clearly rerun the model on a periodic basis to try to detect
anchored in the probability models for customer- notable deviations from its baseline predictions, as
base analysis tradition, of which the Pareto/NBD is well as to make inferences about the changing nature
a central model. As Fader and Hardie (2009) note, of the underlying buy and die processes. Like-
this stream of research uses combinations of basic wise, we encourage organizations to run the model
probability distributions to develop simple mod- separately for different cohorts of customers, e.g.,
els of customer behavior that can be used to make based on their date and/or channel of acquisition.
predictions of future behavior conditional on cus- It is often possible to detect systematic shifts across
tomers past behavior. It is perhaps useful to reect these incoming customer groups, which can help
on how this ts within the broader customer prof- rene expectations and acquisition tactics for newly
itability/CLV/customer equity literature, as exempli- acquired customers. Although these efforts admit-
ed by a number of top managerially oriented books tedly fall short of a full-blown optimization strategy,
(e.g., Blattberg et al. 2001, Gupta and Lehmann 2005, they help organizations gain a much better feel for
Kumar 2008, Rust et al. 2000) and the large academic the evolving patterns of their customer base and the
literature (e.g., as reviewed in Blattberg et al. 2008), effectiveness of their marketing efforts.
especially in light of the fact that the effects of factors As this kind of analytics culture gets embedded
such as marketing activities are completely ignored. into a marketing organization, we can expect man-
If one takes an evolutionary model-building view agers to begin to ask deeper kinds of what-if and
resource allocation questions tied to marketing vari-
of embedding analytics in an organization (Urban and
ables. Assuming all the data are readily available in
Karash 1971), models such as the BG/BB represent a
the organization, it is possible to develop models that
natural rst step. These models can be implemented
incorporate these effects (e.g., Kumar et al. 2008; also
by an organization at very low cost. For example,
see the review by Blattberg et al. 2009). As covariates
no new software is required and the model can be
coded up in a blank spreadsheet in a matter of min- 8
In the nonprot example considered in this paper, we know that
utes; furthermore, the data requirements are minimal marketing activities were undertaken but the data were not avail-
and do not require the merging of databases, as is typ- able. There was no indication that these activities were customized
ically the case when wanting to incorporate the effects at the donor level.
are incorporated, data structures and model estima- Taking the expectation of this over the mixing distribu-
tion issues become more complex. To the extent that tions for P and ((1) and (2), respectively) gives us (7).
customers have been targeted with different market-
ing activities on the basis of their past behavior, we A.2. Derivation of (8)
Conditional on p and , the expected number of transactions
must also account for endogeneity. This is clearly a
over n transaction opportunities is computed as
major step up the evolutionary ladder of marketing
analytics in the organization. We feel that it is impor-
n
EXn p = PYt = 1 p alive at tPalive at t
tant that any organization embarking on such a jour- t=1
ney should learn to walk before they can run, and the

n
BG/BB seems to be a solid way to start the journey. =p 1 t
t=1
Acknowledgments
n1
The authors thank the anonymous nonprot organization = p1 1 s
for making the data set available, Paul Berger for his exten- s=0
sive input into an earlier version of this paper, and Katie
Palusci for her capable research assistantship. The rst which, recalling (A1) and performing some further algebra,
author acknowledges the support of the Wharton Interac-
tive Media Initiative. The second author acknowledges the p1 p1 n+1
support of the London Business School Centre for Mar- = (A5)

keting and the hospitality of the Department of Market-
Taking the expectation of this over the mixing distribu-
ing at the University of Auckland Business School. The
tions for P and gives us
authors thank the acting editor-in-chief, the area editor, and
both reviewers for their encouragement and insightful com- EXn

ments. A good paper has gotten even better as a result of

their careful reading throughout the review process. B 1
+ 1 1
+ n + 1
=
+ B

Appendix A. Derivations
In this appendix we present derivations of the key results (Strictly speaking, the use of the integral representation of
presented in 2.2. Before starting, we rst recall that for the beta function to solve the integral associated with taking
0 < k < 1, the expectation over only holds for > 1. However, it can
The sum of the rst n terms of a geometric series is be shown that we arrive at the same result when 0 < < 1.)
Representing the beta functions in terms of gamma func-
1 kn
a + ak + ak2 + + akn1 = a (A1) tions and recalling the recursive property of gamma func-
1k tions gives us (8). Reecting on the bracketed term in (8) as
The sum of an innite geometric series is n , we note that EXn grows to a limit of

a

akn = (A2)
n=0
1k + 1
and note the following transformation of Eulers integral when > 1. When < 1, there is no limit on EXn. (The
representation of the Gaussian hypergeometric function Pareto/NBD model shares this property regarding the exis-
(2 F1 a b c z):
tence of a limit.)
1
t b1 1 tcb1 1 zta dt
0 A.3. Derivation of (9) and (10)
= Bb c b2 F1 a b c z c > b (A3) Recalling (A4), it follows from the memoryless nature of the
death process that
A.1. Derivation of (7)
An individual making x purchases had to be alive for PXn n + n = x p alive at n

at least the rst x transactions opportunities. Conditional n

on p, the probability of observing x transactions out of the = px 1 pn x 1 n
x
i (unobserved) transaction opportunities (i = x n) the 1
n
customer is alive is i

+
px 1 pix 1 i (A6)
i x x
p 1 pix i=x
x
Noting that the probability that someone is alive at n is
Removing the conditioning on being alive for i transaction 1 n , we have
opportunities by multiplying this by the probability that the
individual is alive for that length of time gives us PXn n + n = x p

n x n
PXn = x p = p 1pnx 1n =
x =0 1 1 n + px 1 pn x 1 n+n
x x

1
i x
n1 n
i
+ p 1pix 1i (A4) +
px 1 pix 1 n+i
i=x
x i=x x
(The rst term accounts for the fact that anyone not alive at A.6. Derivation of (13)
n will, by denition, not make any purchases in the interval Conditional on p and , the expected number of transac-
n n + n .) Taking the expectation of this over the mixing tions across the next n transaction opportunities (i.e., in
distributions for P and gives us (9). the interval (n n + n ]) by a customer with purchase history
By denition, Xn n + n = Xn + n Xn; it follows x tx n is
that EXn n+n = EXn+n EXn. Substituting (8)
in this gives us (10). EXn n + n p x tx n
= EXn n + n p alive at n
A.4. Derivation of (11)
Reecting on (4), the rst term is the likelihood of Palive at n p x tx n

x purchases out of n transaction opportunities under the
assumption that the customer was alive for all n transaction Now
opportunities. (The other terms account for the possibility EXn n + n p alive at n
that the individual died before n.) Using Bayes theorem, it
follows that the probability that a customer with purchase
n+n
= PYt = 1 p alive at tPalive at t t > n
history x tx n is alive at n is t=n+1
px 1 pnx 1 n

Palive at n p x tx n = (A7)
n+n
1 t
Lp x tx n =p
t=n+1
1 n
It follows that

n
Palive at n + 1 p x tx n =p 1 s
s=1
px 1 pnx 1 n+1 +1
= (A8) p1 p1 n
Lp x tx n = (A11)

By Bayes theorem, the joint posterior distribution of P Taking the expectation of the product of (A7) and (A11)
and is given by over the joint posterior distribution of P and , (A9), and
f p
x tx n simplifying (i.e., representing certain beta functions in terms
of gamma functions and exploiting the recursive property
Lp x tx nf p f
of gamma functions) gives us (13).
= (A9)
L
x tx n
where the individual elements are given in (1), (2), (4), A.7. Derivation of (14)
and (5). Taking the expectation of (A8) over the joint poste- The number of discounted expected residual transactions
rior distribution of P and gives us (11). for a customer alive at n is
By the same logic, we can derive an expression for the DERTd p alive at n
probability that a customer with purchase history x tx n
is alive at transaction opportunity n + m. Conditional on p
PYt = 1 p alive at tPalive at t t > n
=
and , t=n+1
1 + dtn
px 1 pnx 1 n+m
1 tn
Palive at n + m p x tx n = =p
Lp x tx n 1 + dtn
t=n+1
Taking the expectation of this over the joint posterior dis-
1 1 s
tribution of P and yields =p
1 + d s=0 1 + d
Palive at n + m
x tx n
which, recalling (A2),
B + x + n x B
+ n + m
= p1
B B
= (A12)
d+
L
x tx n1 (A10)
Multiplying this by the probability that a customer
A.5. Derivation of (12) with purchase history x tx n (and latent transaction and
By denition, dropout probabilities p and ) is still alive at transaction
opportunity n, (A7), gives us
PXn n + n = x p x tx n
px+1 1 pnx 1 n+1
=
x =0 1 Palive at n p x tx n DERTd p x tx n = (A13)
d + Lp x tx n
+ PXn n + n = x p alive at n Taking the expectation of this over the joint posterior dis-
Palive at n p x tx n tribution of P and , (A9), gives us
Substituting (A6) and (A7) in this, and taking the expecta- DERTd
x tx n
tion over the joint posterior distribution of P and , (A9), B + x + 1 + n x
gives us (12). = L
x tx n
B
where Appendix B. Correlation Analyses

1 1 1
+n
1
One of the assumptions associated with the BG/BB model is
= d
0 d+ B
that the transaction probability p and the dropout probabil-
letting s = 1 ity vary independently across customers. At rst glance,
1 1 1 this may appear to be unrealistic, but it is not nearly as
= 1 s 1 s
+n ds restrictive as it may seem. More formally, we are assuming
B
0 1 + d s

1 independent priors, which does not imply independence
1 1 1
= s
+n 1 s 1 1 s ds in the joint posterior distribution of P and ; in fact, we
B
1 + d 0 1+d can see some fairly strong correlations in the posterior dis-
which, recalling (A3), tributions, as we show here in two separate analyses that

demonstrate how these correlations can be estimated and
B
+ n + 1 1 interpreted.
= F 1
+ n + 1 +
+ n + 1
B
1 + d 2 1 1+d First, following an analysis shown in Abe (2009), Fig-
giving us the expression in (14). ure B.1 is a scatter plot of the means of the marginal poste-
It is interesting to note that this expression for DERT dif- rior distributions of P and . Each circle represents the pair
fers from that for the conditional expectation, (13), by a of means for a particular purchase history x tx n (com-
factor of puted using (17) with l = 1 m = 0 and l = 0 m = 1, respec-
12 F1 1
+ n + 1 +
+ n + 1 1/1 + d tively), and the area of each circle is directly proportional
+
+ n1 + d to the number of customers who share the same purchase
history (i.e., using the numbers from Table 2). The weighted
+
+ n 1 +
+ n + n 1 correlation across the 22 pairs of numbers is 042. This
1
1 +
+ n +
+ n + n implies, as common intuition would suggest, that customers
For any given analysis setting, this is a constant, inde- who purchase more frequently (while alive) tend to live
pendent of the customers exact purchase history. There- longer than light purchasers (but of course we do not want
fore, any ranking of customers on the basis of DERT will to imply any kind of causal connection here).
be exactly the same as that derived using the conditional However, this analysis tells only part of the story because
expectation of purchasing over the next n periods. When it only considers the posterior means. When we take into
> 1 and d = 0 (i.e., there is no discounting of future pur- account the full posterior distribution for a given customer,
chases), this converges to 1 as n . a different correlation analysis emerges. Suppose for each
Because L
x tx n = 1 when x = tx = n = 0, it customer in a given recency/frequency group we made a
follows that the number of discounted expected transactions number of draws from their joint posterior distribution
(DET) for a just-acquired customer is what would be the correlation between p and across these
DETd
draws? The joint posterior distribution of P and is given

by (A9). For the special case where tx = n, this collapses to

2 F1 1
+1 +
+11/1+d
= (A14)
+ +
1+d f p
xtx n = f p +x+nxf
+n
To compute DET for a yet-to-be-acquired customer, we need
to add 1 to this quantity (i.e., the purchase at time t = 0 that
corresponds to the customers rst-ever purchase with the Figure B.1 Scatter Plot of the Marginal Posterior Means of P and
rm and therefore starts the transaction opportunity clock). for the 22 (x tx ) Patterns Associated with n = 6
1.0
A.8. Derivation of (15)(17)
We obtain (15) and (16) by integrating (A9) over and p,
respectively.
By denition, the l m)th product moment (l m = 0.8
0 1 2 ) of the joint posterior distribution of P and is
EP l m
x tx n
0.6
1 1
= pl m f p
x tx n dp d
E()
0 0
which, recalling (A9), 0.4

1 1 Lp xtx nf p f

= pl m dp d
0 0 L
xtx n
0.2
B+l B +m

=
B B

1 1 Lp xt nf p +lf +m

x
dp d 0.0
0 0 L
xtx n 0.0 0.2 0.4 0.6 0.8 1.0
which, recalling the derivation of (5), gives us (17). E (P)
Table B.1 The Posterior Correlation of P and

as a Function of Berger, P. D., B. Weinberg, R. C. Hanna. 2003. Customer lifetime
Recency and Frequency value determination and strategic implications for a cruise-ship
company. J. Database Marketing Customer Strategy Management
Year of last transaction 11(1) 4052.
(19962001) 1995 1996 1997 1998 1999 2000 2001 Blattberg, R. C., G. Getz, J. S. Thomas. 2001. Customer Equity: Build-
ing and Managing Relationships as Valuable Assets. Harvard Busi-
0 0258 ness School Publishing, Boston.
1 0193 0250 0203 0105 0030 0000 Blattberg, R. C., B.-D. Kim, S. A. Neslin. 2008. Database Marketing:
2 0165 0238 0159 0047 0000 Analyzing and Managing Customers. Springer, New York.
3 0174 0214 0071 0000 Blattberg, R. C., E. C. Malthouse, S. A. Neslin. 2009. Customer
4 0214 0114 0000 lifetime value: Empirical generalizations and some conceptual
5 0190 0000 questions. J. Interactive Marketing 23(2) 157168.
6 0000
Chateld, C., G. J. Goodhardt. 1970. The beta-binomial model for
consumer purchasing behaviour. Appl. Statist. 19(3) 240250.
i.e., the posterior distribution of P is independent of the Colombo, R., W. Jiang. 1999. A stochastic RFM model. J. Interactive
Marketing 13(3) 212.
posterior distribution of . (Equivalently, the marginal pos-
terior distributions of P and , (15) and (16), collapse to Danaher, P. J., B. G. S. Hardie. 2005. Bacon with your eggs? Appli-
cations of a new bivariate beta-binomial distribution. Amer.
the updated beta distributions f p + x + n x and
Statistician 59(November) 282286.
f
+ n, respectively.) In all other cases, the poste-
rior distribution of an individuals transaction probability is East, R., K. Hammond. 1996. The erosion of repeat-purchase loyalty.
Marketing Lett. 7(2) 163171.
not independent of the posterior distribution of her dropout
probability. The joint posterior correlation is given by Easton, G. 1980. Stochastic models of industrial buying behaviour.
OMEGA 8(1) 6369.
corrP
x tx n Ehrenberg, A. S. C. 1988. Repeat-Buying, 2nd ed. Charles Grifn &
Company, London.
EP EP E
= (B1) Fader, P. S., B. G. S. Hardie. 2005. Implementing the Pareto/NBD
EP 2 EP 2 E2 E 2 model given interval-censored data. Retrieved June 26,
2010, http://brucehardie.com/notes/011/.
where the individual terms are computed using (17). This Fader, P. S., B. G. S. Hardie. 2009. Probability models for customer-
correlation is reported in Table B.1 as a function of recency base analysis. J. Interactive Marketing 23(1) 6169.
(the year of the individuals last transaction) and frequency Fader, P. S., B. G. S. Hardie, K. L. Lee. 2005. RFM and CLV: Using
(the number of repeat transactions). iso-value curves for customer base analysis. J. Marketing Res.
This table shows that the intracustomer correlations are 42(4) 415430.
strictly positive (except when tx = n), or, equivalently, if we Gupta, S., D. R. Lehmann. 2005. Managing Customers as Investments:
were to draw from the joint posteriors across all the individ- The Strategic Value of Customers in the Long Run. Wharton School
uals that are represented within each cell of this table, we Publishing, Upper Saddle River, NJ.
would see these positive correlations. In the most extreme Jackson, B. B. 1985a. Build customer relationships that last. Harvard
case, i.e., when tx = n = 0, we see a fairly strong relationship Bus. Rev. 63(NovemberDecember) 120128.
between p and . This makes sense: customers in this cell Jackson, B. B. 1985b. Winning and Keeping Industrial Customers. Lex-
with a higher purchasing propensity are even more likely ington Books, New York.
(than light purchasers) to be dead. However, across cells, Johnson, N. L. 1949. Bivariate distributions based on simple trans-
the overall correlation is a fairly strong negative one, as dis- lation systems. Biometrika 36(34) 297304.
cussed previously. In some sense, this combined analysis Kumar, V. 2008. Managing Customers for Prot. Wharton School Pub-
(within and across each type of customer) represents a form lishing, Upper Saddle River, NJ.
of Simpsons paradox (Simpson 1951, Wagner 1982). Kumar, V., R. Venkatesan, T. Bohling, D. Beckmann. 2008. Practice
Taken together, these two analyses provide a more com- Prize ReportThe power of CLV: Managing customer lifetime
plete picture of the correlations than shown by Abe (2009) value at IBM. Marketing Sci. 27(4) 585599.
and other researchers, who have limited themselves to a Mason, C. H. 2003. Tuscan lifestyles: Assessing customer lifetime
simple correlation across the posterior means. More impor- value. J. Interactive Marketing 17(4) 5460.
tantly, these analyses put to rest any concerns that a simple Massy, W. F., D. B. Montgomery, D. G. Morrison. 1970. Stochastic
Models of Buying Behavior. MIT Press, Cambridge, MA.
empirical Bayesian model with independent priors will be
Morrison, D. G., A. Perry. 1970. Some data based models for ana-
unable to capture and reveal correlations in the underlying
lyzing sales uctuations. Decision Sci. 1(34) 258274.
processes. To the contrary, these analyses arise quite natu-
Morrison, D. G., D. C. Schmittlein. 1988. Generalizing the NBD
rally from the BG/BB modeland the same is true for the model for customer purchases: What are the implications and
Pareto/NBD and other related models. is it worth the effort? J. Bus. Econom. Statist. 6(2) 145159.
Morrison, D. G., R. D. H. Chen, S. L. Karpis, K. E. A. Britney. 1982.
Modelling retail customer behavior at Merrill Lynch. Marketing
References Sci. 1(2) 123141.
Abe, M. 2009. Counting your customers one by one: A hierarchi- Netzer, O., J. M. Lattin, V. Srinivasan. 2008. A hidden Markov
cal Bayes extension to the Pareto/NBD model. Marketing Sci. model of customer relationship dynamics. Marketing Sci. 27(2)
28(3) 541553. 185204.
Park, Y.-H., P. S. Fader. 2004. Modeling browsing behavior at mul- Schmittlein, D. C., R. A. Peterson. 1994. Customer base analysis:
tiple websites. Marketing Sci. 23(3) 280303. An industrial purchase process application. Marketing Sci. 13(1)
Pfeifer, P. E., M. E. Haskins, R. M. Conroy. 2005. Customer lifetime 4167.
value, customer protability, and the treatment of acquisition Schmittlein, D. C., D. G. Morrison, R. Colombo. 1987. Counting
spending. J. Managerial Issues 17(1) 1125. your customers: Who are they and what will they do next?
Management Sci. 33(1) 124.
Piersma, N., J.-J. Jonker. 2004. Determing the optimal direct fre-
quency. Eur. J. Oper. Res. 158(1) 173182. Simpson, E. H. 1951. The interpretation of interaction in contin-
gency tables. J. Roy. Statist. Soc. Ser. B 13(2) 238241.
Rosset, S., E. Neumann, U. Eick, N. Vatnik. 2003. Customer life-
Skellam, J. G. 1948. A probability distribution derived from the
time value models for decision support. Data Mining Knowledge
binomial distribution by regarding the probability of success

Discovery 7(3) 321339. as variable between the sets of trials. J. Roy. Statist. Soc. Ser. B
Rust, R. T., K. N. Lemon, V. A. Zeithaml. 2004. Return on marketing: 10(2) 257261.
Using customer equity to focus marketing strategy. J. Marketing Urban, G. L., R. Karash. 1971. Evolutionary model building. J. Mar-
68(1) 109127. keting Res. 8(1) 6266.
Rust, R. T., V. A. Zeithaml, K. N. Lemon. 2000. Driving Customer Wagner, C. H. 1982. Simpsons paradox in real life. Amer. Statistician
Equity. The Free Press, New York. 36(1) 4648.

Week 3 BTYD Model Fader Et Al MKSC 10

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Week 3 BTYD Model Fader Et Al MKSC 10

Diunggah oleh

Hak Cipta:

Format Tersedia

informs

Vol. 29, No. 6, NovemberDecember 2010, pp. 10861108

Customer-Base Analysis in a Discrete-Time

1. Introduction of donations for the 1995 cohort as a whole, as well

100006 1 1 1 1 0 1 0 Sunday church attendance; an individual can either

made by an individual in a 24-week period (using 2. Model Development

(This implies that the number of transactions by a cus-

Table 2 Recency/Frequency Summary of the Annual Donation with mean

x tx No. of donors x tx No. of donors EXn  

The four BG/BB model parameters ( 

tively, for each unique pattern. This can be maxi- 

where The number of discounted expected residual trans-

n B + x + x  + n x + n x  When the transaction process is described by the

ber of future transactions for a customer with pur-

The marginal posterior distribution of  is 3.1. Analysis of the 1995 Cohort

+ and maximize it using the Solver add-in. (A note on

Figure 2 Screenshot of Excel Worksheet Used for Parameter Estimation

8 x t_x n # donors L(.|x = x, t_x, n) n -t_x 1 0 1 2 3 4 5

observe that the BG/BB model generates very good

(a) Year of last transaction

40,000 (19962001) 1995 1996 1997 1998 1999 2000 2001

3 022 103 180 219

100004, with better recency but lower frequency, is

4,000 function of the number of repeat transactions made

4 020 070 093

umn by column, we see that the posterior mean

Figure 7 Prior and Selected Posterior Distributions of (a) P and (b)

3 E (P) = 0.62 1 005 017 032 046 056 062

conditional penetration numbers, because they reect

3.2. Pooled Analysis

tion between 1995 and 2000. 160,000

The pooled model continues to accurately track the

ing Figure 9(a), which shows the actual versus pre-

Table 9 Parameter Estimates, Pooling the 19952000 Cohorts 5,000

acquired across six consecutive years (19952000) and

the Pareto/NBD model using the likelihood function 3,000

given in Fader and Hardie (2005).6

to marketing by Park and Fader (2004), is a natural SBB heterogeneity

(1 1) and is a function of the parameters of the

2 012 066 121 152 168

those associated with the BG/BB model (r = 0997). Actual

are very close. Nevertheless, there are some noticeable

at least the rst x transactions opportunities. Conditional n

Reecting on (4), the rst term is the likelihood of Palive at n p  x tx n

where Appendix B. Correlation Analyses

which, recalling (A9), 0.4

Table B.1 The Posterior Correlation of P and

binomial distribution by regarding the probability of success

Anda mungkin juga menyukai

x tx No. of donors x tx No. of donors EXn

The four BG/BB model parameters (

tively, for each unique pattern. This can be maxi-

n B + x + x + n x + n x When the transaction process is described by the

The marginal posterior distribution of is 3.1. Analysis of the 1995 Cohort

Reecting on (4), the rst term is the likelihood of Palive at n p x tx n

where Appendix B. Correlation Analyses