Anda di halaman 1dari 26

A Nested Hidden Markov Model for Internet Browsing Behavior

Steven L. Scott and Il-Horn Hann University of Southern California Marshall School of Business December 1, 2006

Abstract Internet browsers generate click stream data that supply information about the path taken through a web site. Click stream data have an interesting temporal structure in which sequences of individual page requests are nested within browsing sessions, and some users return to the web site for multiple sessions. We model sequences of page requests within a session using a hidden Markov mixture of rst order Markov chains. A second hidden Markov chain describes variation between consecutive web sessions by the same user. Forward-backward recursions can be employed to simultaneously draw both latent Markov chains directly from their joint posterior distribution. The recursions make MCMC posterior simulation particularly ecient. The model allows probabilities of interest, such as the probability that a web session contains a purchase, to be computed directly from model parameters without resorting to Monte Carlo integration. We apply the model to data collected from a small e-commerce site and nd that the session-level and page-level hidden Markov chains play dierent roles. The session-level model discovers a set of session types that correspond to recognizable patterns of user behavior. The page-level models account for data features that would be poorly described by rst order Markov chains. The exible page-level models allow the session-level model to describe the data with relatively few session types.

Introduction

This article presents a statistical model for describing data generated in a web server log le by a computer user navigating through a web site. Each entry in a web server log, known as a page request, contains the URL requested by the user, the IP address of the requesting computer, and a 1

time stamp. The sequence of page requests is called a click stream because page requests are most often generated by a user clicking on a link in a web browser. Click stream data have a nested structure in which page requests occur within sessions by the same user, and some users return for multiple sessions. This article develops a hidden Markov model with latent data having the same nested structure as click streams. Web trac modeling has attracted considerable interest from researchers in statistics, computer science, and marketing. An overview of the issues involved with web trac modeling can be found in a special issue of Statistical Science (Jank and Shmueli, 2006). Descriptive summaries of data in web server log les were presented by Catledge and Pitkow (1995), Tauscher and Greenberg (1997), and Cooley, Mobasher, and Srivastava (1999). Model based analyses of web trac data typically involve either stochastic processes or some variation of regression models. The latter use session-level covariates, such as counts of visits to various page categories, to predict sessionlevel outcomes, such as the probability that a visit to an e-commerce site results in a purchase. Examples of regression modeling include Bucklin and Sismeiro (2003), who studied the duration of web sessions using a decision theoretic probit regression model, and Moe, Chipman, George, and McCulloch (2002), Moe and Fader (2004), and Sismeiro and Bucklin (2004) all of whom used exible extensions of generalized linear models to describe the probability that a visit to an e-commerce site results in a purchase. Stochastic processes operate at a ner level of detail by directly modeling the individual page requests in the click stream. Scott and Smyth (2003) and Kleinberg (2003) used point process models to study the intensity of page requests within a web session. Cadez et al. (2000a,b) modeled click streams using mixtures of rst order Markov chains. Eirinaki, Vazirgiannis, and Kapogiannis (2005) suggested a rst order Markov chain on a graph determined by the link structure of pages on the site. Montgomery, Li, Srinivasan, and Liechty (2004) modeled customer paths through an e-commerce site using a hidden Markov model based on dynamic multinomial probit regression. Many of these articles present evidence that a simple rst order Markov chain cannot accurately describe click stream data. The model developed here describes consecutive web sessions by the same user using a sessionlevel hidden Markov model. Conditional on the session-level hidden Markov chain the distribution

of a sessions click stream is a page-level hidden Markov model. The mixture components for the page-level model are rst order Markov chains with transition probabilities determined nonparametrically. This approach has several advantages over existing methods. First, the session-level model requires fewer mixture components than Cadez et al. (2000a) because its mixture components are considerably more exible than rst order Markov chains. Second, the session-level hidden Markov model describes temporal variation among repeat visitors to the site. Third, the model can use forward-backward recursions (Scott, 2002) to quickly and exactly compute the likelihood of any page request sequence. This contrasts with the dynamic probit regression models used by Montgomery et al. (2004), which can evaluate likelihood only by Monte Carlo approximation. The forward-backward recursions can also be used to make an exact draw from the joint posterior distribution of the session-level and page-level hidden Markov chains, given observed data and model parameters. Thus the recursions play a key role in a rapidly mixing data augmentation algorithm. Finally, the page-level model can be interpreted as a rst order Markov chain on an expanded state space. This allows many session-level quantities of interest, such as purchase probabilities and expected session duration, to be computed using well known formulas for Markov chain absorption probabilities. The remainder of the article proceeds as follows. Section 2 describes our data set and discusses some of the issues involved in modeling click stream data. Section 3 denes the model and reviews the relevant Markov chain absorption calculations. Section 4 develops an ecient MCMC posterior sampling strategy. Section 5 presents the results of a detailed case study in which we apply the model to the data from Section 2. Section 6 describes some of the limitations of the model, some possible extensions, and summarizes our conclusions.

Click Stream Data

We collected 62 consecutive days of click stream data from a small e-commerce web site. The sites sole product line consists of eight style and color variations of a baby sling used by parents to carry infant children. During the data collection period the site was not modied in terms of design, price, or product oerings. We excluded 1631 sessions that each contained only one page 3

8000

# Sessions 1 2 3 4 5 6 7+

# Users 7650 512 127 33 17 11 25

6000

Frequency

Frequency 0 50 100 150 200 250

4000

2000

500

1000

1500

2000

2500

0.5

1.0

1.5

2.0

(a)

(b)

(c)

Figure 1: (a) Numbers of sessions per user. (b) Histogram of number of page requests per session. (c) Histogram of log10 number of page requests per session. request. The deleted sessions either represent individuals who went to the wrong web page and left immediately, or automated computer programs randomly scanning the web. The remaining 9811 sessions were generated by 8375 visitors and contained 126,348 page requests. To these we added an articial page request at the end of each session to help mark session boundaries. The site was hosted by a major internet service provider (ISP), who grouped the data into sessions associated with individual IP addresses before releasing them. The data collection process preserved the sequence of page requests for each user, but the time stamp associated with each page request was accurate only to the minute. Because users typically generate several page requests per minute the coarse time stamp eectively prevented us from considering continuous time models. The sites overall conversion rate, dened as the percentage of sessions containing a purchase, is (121 purchases)/(9816 sessions) = 1.28%. The rate is low even by e-commerce standards, where 3% is typical. A well run corporate site can achieve a conversion rate of 7% or more (Montgomery et al., 2004). Figure 1 describes basic summaries of the number of sessions per user and numbers of page requests per session. While most users visited the site for only one session, there were 725 users (8.7%) with multiple sessions. Although many sessions were very short (2 or 3 page requests), there were also several long sessions, with nearly 5% containing 50 or more page requests. The site contained 180 distinct web pages, 121 of which were unique receipt pages recording the details

Category Home InBsk Info

OutBsk Order Product Shop ShowBsk SiteMap Leave

Description Home page for the web site. Put product in shopping basket. Several information pages were placed in this category, such as the FAQ for the site, a page explaining how the product should t, instructions on how to use the product, a page describing product safety, and a page of legal disclaimers. Remove product from shopping basket. Place order. Page devoted to a single product. Page devoted to displaying/comparing several products. Show the contents of the shopping basket. Site map for the online store. Marks the end of a session.

Table 1: Page category denitions.


Home .60 .08 .76 .79 .03 .85 .83 .23 .77 .20 InBsk .00 .01 .00 .00 .00 .01 .00 .01 .00 .00 Info .14 .00 .15 .02 .02 .02 .03 .08 .19 .05 OutBsk .01 .00 .00 .00 .00 .01 .00 .08 .00 .00 Order .00 .00 .00 .00 .00 .00 .00 .20 .00 .00 Product .10 .00 .01 .08 .02 .06 .07 .08 .01 .02 Shop .05 .00 .04 .02 .56 .03 .03 .09 .02 .72 ShowBsk .00 .89 .00 .05 .01 .00 .00 .12 .00 .00 SiteMap .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 Leave .10 .01 .04 .03 .35 .03 .03 .11 .01 .00

Home InBsk Info OutBsk Order Product Shop ShowBsk SiteMap Leave

Table 2: Observed proportions of transition between page categories. Transitions are from rows to columns, so each row sums to 1. of individual purchases. To reduce the dimension of the problem we assigned each web page to one of 10 categories listed in Table 1. The categorization was based primarily on intuition, and there are clearly other choices that could have been made. Table 2 presents the empirical distribution of transitions between page categories. A transition from Leave to any other category marks the beginning of a new session (not necessarily by the same user), so the initial distribution of page categories within a session can be read from the last row of the Table. Most sessions begin on the Shop page rather than the Home page because the Shop page appears prominently on search engine results. The Home page is a frequent destination from most other pages on the site, including a 60% probability of a self-transition. A peculiar feature

of our data set is that some sessions contain long runs of home page requests. Every page on the site contains a company logo linked to the home page. The long runs of home page requests could be explained by frustrated customers repeatedly clicking on the logo during slow page downloads. They could also be the product of confused customers who click on the logo expecting to be taken to the product. If the customer clicks on the logo from the home page the home page will reload, but the customers view will not change. Thus some customers may click on the logo several times before they gure out why nothing appears to be happening. Some zeros in Table 2 are structural zeros, meaning transitions which are impossible based on the design of the web site. These are primarily associated with the shopping cart and ordering. Some of the transitions observed in Table 2 are logically impossible from a users point of view. These are attributed to inner workings of internet trac protocols that we do not attempt to model. For example, placing an item in the shopping cart should automatically display the updated basket to the user. However, shopping cart pages may be hosted by a separate, more secure server than the rest of the web site. If the shopping cart server is operating very slowly a user could grow impatient and return to the home page before the shopping cart transaction is complete. There are other ways in which click stream data may dier from the actual sequence of pages viewed by a computer user. First, many users are assigned dynamic IP addresses by their ISPs. Dynamic IP addresses remain constant while the user is online, but may change each time a user reconnects to the ISP. Thus, the data may contain more repeat visits than suggested by Figure 1(a). Dynamic IP addresses are mainly associated with home users who have dial-up connections. Socalled always-on connections like DSL and cable modems rarely disconnect from the ISP (even when a users computer is switched o), so the dynamic addresses assigned to them are essentially static. A second feature of click stream data is cache censoring. Most web browsing software can cache frequently or recently viewed pages for rapid display. If a user requests a cached page then the web browsing software may ll the request internally instead of forwarding it to the internet. Thus cached pages accessed through the back button or bookmarks will not appear in web server log les. Page requests can also be generated by internal browser features. For example, some pages that display news, stock market information, or sports scores may refresh periodically even

if no user is present to read them. Finally, not all page requests are generated by web browsers. The classic example is web spiders, which are programs that randomly crawl the web collecting information for search engines or comparative shopping sites. The preceding diculties need not deter one from studying click stream data. They merely serve as reminders that the data are an inexact proxy for what the user sees. With that caveat in mind, we proceed to model the data as they were actually observed by the web server.

3
3.1

Nested Hidden Markov Models


Denition

Let S0 = {1, . . . , S0 } index the collection of web pages comprising the site. Let yijt S0 denote the tth page request generated by user i in session j. Let Yi denote page requests from user i, grouped into sessions so that Yi = (Yi1 , . . . , Yini ), where Yij = (yij1 , . . . , yijTij ). It will be convenient to assume that S0 contains an element marking the end a session. For concreteness we label the web pages so that all sessions end with yijTij = S0 which is treated as an absorbing state. Because all web sessions must eventually end, no other element of S0 can be absorbing. An individuals page request data are modeled using a nested hidden Markov mixture of Markov chains. Assume that there are S2 distinct types of sessions, and let Hij S2 = {1 . . . S2 } denote the session type associated with Yij . The latent session-type indicators obey a Markov chain with initial distribution 0 and transition probabilities

P r(Hij = s|Hij1 = r) = (r, s).

(1)

Conditional on Hij = H, the sequence of page requests in session Yij follows a second hidden Markov model with latent states hijt S1 = {1 . . . S1 } having initial distribution 0 probabilities P r(hijt = s|hijt1 = r, Hij = H) = (H) (r, s). (2)
(H)

and transition

Finally, conditional on Hij = H and hijt = h, page request t obeys a Markov chain with initial

yi11

yi12

yi13

yi21

yi22

yi23

hi11

hi12 Hi1

hi13

hi21

hi22 Hi2

hi23

Figure 2: DAG describing rst two sessions of for user i. Open and lled circles represent latent and observed variables, respectively. Boxes group variables within the same session. distribution P0
(H,h)

and transition probabilities P r(yijt |yijt1 , hijt = h, Hij = H) = P(H,h) (yijt1 , yijt ).

(3)

Figure 2 shows a directed acyclic graph (DAG) describing the relationship between the latent and observed variables.

3.2

Prior Distribution
(H) (H,h)

Denote the collection of all model parameters by = {, 0 , (H) , 0 , P(H,h) , P0

: H S2 , h

S1 }. Each component of is either a multinomial probability vector or a Markov transition matrix, so it is convenient to model using a product of independent Dirichlet distributions. Let D(|) = (
i i ) i 1 /(i ) i i

represent the density of the Dirichlet distribution, evaluated at the discrete

probability distribution , where the parameter is a vector of positive real numbers with the same dimension as . The elements of are interpretable as prior counts. A complete-data conjugate

prior for is p() =D(0 |N0 )


HS2 (H) (H)

D((H, )|N (H, )) D((H) (h, )|n(H) (h, ))


hS1

D(0 |n0 ) D(P0


(H,h)

(4)

|0

(H,h)

)
rS0

D(P(H,h) (r, )| (H,h) (r, )).

A uniform prior on can be obtained by setting each the various N, n, parameters in (4) to 1. Alternatively, the Jereys prior for the multinomial distribution (Kass and Wasserman, 1996) sets N0 = N (H, ) = 1S2 /S2 , n0
(H)

= n(H) (h, ) = 1S1 /S1 , and 0

(H,h)

= (H,h) (r, ) = 1S0 /S0 for

all (H, h), where 1k denotes a k-vector of 1s. The Jereys prior is a proper distribution, but it is bowl-shaped with innite ridges along the boundaries of the probability simplex. It is thus more diuse and less informative than a uniform distribution. Equation (4) oers a convenient way to handle the absorbing states by setting (H,h) (S0 , S0 ) = , which forces P(H,h) (S0 , S0 ) = 1, assuming the other elements (H,h) (S0 , ) are nite.

3.3

The Augmented Markov Chain

The conditional distribution of (hijt , yijt ) given Hij and is that of a Markov chain on S1 S0 with transition probabilities P r(hijt , yijt |hijt1 , yijt1 , Hij , ) = (Hij ) (hijt1 , hijt )P(Hij ,hijt ) (yijt1 , yijt ).

The transition probability matrix can be written = (H) (2, 1)P(H,1) (H) (2, 2)P(H,2) . . .. (H) (3, 1)P(H,1) . . . (H) (1, 1)P(H,1) (H) (1, 2)P(H,2)

P (H)

(5)

Let I(h, y) = (h 1)S0 + y. The conditional probability of a transition from (h0 , y0 ) to (h, y) is the element in row I(h0 , y0 ) column I(h, y) of P (H) . 9

The augmented Markov chain can be used to obtain low dimensional model summaries through techniques for computing various absorption probabilities. Examples include the expected duration of each session type, the distribution of pages visited during a session, and the probability that any given subset of pages will be visited. The last case is of particular interest to e-commerce sites because it can be used to compute the probability that a session will contain a purchase. Formulas for computing absorption probabilities are derived in several introductory texts on stochastic processes, e.g. Resnick (1992). 3.3.1 Expected Session Duration and State Distribution

Let V (H) (h0 , y0 , h, y) denote the expected number of visits to the transient state (h, y) in a session of type H, conditional on the initial state being (h0 , y0 ). Let Q(H) denote the matrix obtained by deleting rows and columns I(1, S0 ), . . . , I(S1 , S0 ) from P (H) . Then we may represent V (H) with the matrix V (H) = (I Q(H) )1 , where each row of V (H) is associated with a unique (h0 , y0 ), and each column with (h, y). Averaging V (H) over the initial distribution of (h0 , y0 ) yields the expected number of visits to (h, y), W (H) (h, y) =
h0 S1 y0 S0

0 (h0 )P0

(H)

(H,h0 )

(y0 )V (H) (h0 , y0 , h, y),

(6)

where S0 = S0 \S0 . Then the expected number of page requests in a session of type H is

V+

(H)

=
hS1 yS0

W (H) (h, y).

(7)

The ratio (H) (h, y) = W (H) (h, y)/V+

(H)

is the proportion of expected page visits from state

(h, y). Summing over either h or y gives the marginal distributions (H) (y) and (H) (h) respectively. 3.3.2 Visitation Probabilities

The probability that a given set of states A1 will be visited during a session of type H is computed as follows. Imagine that A1 is closed (once entered, the probability of leaving is zero), let A0 =

10

{(h, S0 ) : h S1 } and A = A0 A1 . Then T = Ac is the set of transient states. Let QA

(H)

denote

the matrix obtained by deleting from P (H) all rows or columns corresponding to states in A. Let RA denote the matrix obtained by retaining the rows of P (H) corresponding to T and the columns corresponding to A. These matrices may be envisioned by relabeling the states so that
(H) QA (H) RA (H) PA (H)

P (H) =

Let U (H) (h0 , y0 , h, y) denote the conditional probability, given initial state (h0 , y0 ) T , that the rst excursion from T is through state (h, y) A. Then we may represent U (H) with the matrix U (H) = (I QA )1 RA , where each row in U (H) corresponds to a state in T , and each column to a state in A. Averaging U (H) over the initial distribution of (h0 , y0 ) gives P r(A1 |H, ) =
h0 ,y0 A1 (H) (H)

0 (h0 )P0 +
h0 ,y0 T h,yA1

(H)

(H,h0 )

(y0 ) (8)
(H,h0 )

0 (h0 )P0

(H)

(y0 )U (H) (h0 , y0 , h, y).

3.3.3

Marginal Distribution of Observed Data

While results for Markov chains may be used to calculate the quantities described above, it should be noted that the marginal distribution of Yij , averaging over hij1 , . . . , hijTij , is not a Markov chain. Instead, the transition distribution of yijt is a mixture P(H,h) (yijt1 , yijt )wt (h),
hS1

p(yijt |yij1 , . . . , yijt1 , H, ) =

(9)

where the mixing weight wt (h) = P r(hijt = h|yij1 , . . . , yijt1 , H, ) is a complicated function of past data (see Appendix A for its derivation). As an example of non-Markov behavior that p(Yij |H, ) can model, let Xy be the length of a run of self-transitions in state y. If Yij followed a Markov chain with transition probability matrix P then Xy would be a geometric random variable with success probability 1 P(y, y). Under p(Yij |H, ) the distribution of Xy is essentially arbitrary. To understand why, recall that the waiting time 11

until a Markov chain leaves a given set of states is known as a phase-type distribution. Phase-type distributions include all mixtures and convolutions of negative binomial distributions (of which geometric distributions are a special case), as well as any distribution with nite support on the positive integers (Neuts, 1981, page 46). Thus, with a large enough state space and appropriately chosen transition probabilities, virtually any distribution on Z+ can be approximated by a phase type distribution. Under the hidden Markov model described above Xy is the time required for the augmented Markov chain to leave {(h, y) : h S1 }, a phase-type random variable.

Posterior Sampling
A Markov chain with stationary distribution

Let Y = {Yi }, H = {Hij } and h = {hijt }.

p(, H, h|Y) can be constructed by alternately drawing from p(H, h|Y, ) and p(|Y, H, h). The latter draw is trivial because complete data conjugacy implies that p(|H, h, Y) remains a product of independent Dirichlet distributions. The key to the sampler is that p(H, h|, Y) may be sampled directly, in a single step, without introducing intermediate Gibbs or Metropolis-Hastings draws. The draw is accomplished by repeatedly applying a set of forward-backward recursions for hidden Markov models (Baum et al., 1970; Chib, 1996; Scott, 2002). The recursions provide both a fast way to compute likelihood for hidden Markov models and a way to simulate the latent Markov chain given observed data. The recursions are summarized in Appendix A.

4.1

Sampling p(h, H|Y)

The key to sampling p(h, H|Y, ) is to view Yi1 , . . . , Yini as a time series of multivariate observations from a hidden Markov model with hidden chain Hi1 , . . . , Hini . Then the forward-backward recursions can be employed to directly sample p(Hi1 , . . . , Hini |Yi , ). During the forward recursion one must be able to evaluate p(Yij |Hij , ), which is accomplished by a second recursion averaging over hij1 , . . . , hijTij . Running the forward-backward recursions for each user produces H from p(H|Y, ). Conditional on H, a second application of the recursions produces a sample from p(hij1 , . . . , hijTij |Y, H, ) within each session. Obviously, sampling p(H|Y, ) and then p(h|H, Y, ) produces the desired draw from p(H, h|Y, ). 12

4.2

Sampling p(|H, h, Y)

Let N(r, s) denote the number of transitions from Hij = r to Hij+1 = s and let N0 (s) denote the number of times Hi1 = s. Let n(H) (r, s) denote the number of transitions from hijt = r to hijt+1 = s with Hij = H, and let n0 (s) be the number of times Hij = H and hij1 = s. Let (H,h) (r, s) be the number of times yijt = r and yijt+1 = s when Hij = H and hijt = h. Finally, let 0
(H,h) (H)

(s) be

the number of times yijt = s when Hij = H and hij1 = h. Then p(|H, h, Y) may be sampled by independently drawing from the following Dirichlet distributions for H S2 , h S1 and y S0 . 0 D(N0 + N0 ) 0 P0
(H)

(r, ) D(N (H, ) + N (H, )


(H) (H,h)

D n0 D 0

(H)

+ n0

(H) (h, ) D n(H) (h, ) + n(H) (h, ) P(H,h) (y, ) D (H,h) (y, ) + (H,h) (y, ) .

(H,h)

(H,h)

+ 0

5
5.1

Case Study
Model Selection

To determine S2 and S1 we ran the posterior sampler with 2, 3, and 4 session types, and 1-4 states within a session. In each case we assumed the Jereys prior described in Section 3.2, but changed (H,h) (S0 , S0 ) = 1010 . We recorded the value of log likelihood S2 ,S1 () = log p(Y|, S1 , S2 ) for each sampled . Figure 3(a) shows boxplots representing the posterior distribution of S2 ,S1 . For each S2 , log-likelihood increases until S1 = 3, but little gain is seen by increasing to S1 = 4. Among the models with S1 = 3, both 4,3 and 3,3 are substantially greater than 2,3 . There is some overlap between the distributions of 4,3 and 3,3 with 3,3 > 4,3 for roughly 12% of the posterior draws. There are 936 parameters in the (3,3) model, and 1252 in the (4,3) model. We made a subjective judgment that the overlap between 3,3 and 4,3 was large enough to choose the simpler (3,3) model and avoid the 316 additional parameters. We also made formal Bayes factor calculations and tried subtracting the BIC penalty of k log(n)/2 from log likelihood. These were unhelpful because BIC always preferred simpler models, while Bayes factors always preferred additional complexity. BIC assumes the n observations are independent 13

2 session types

0 1500

3 session types

0 15000

0 1500

4 session types

0 15000

1500

1 states in a session

2 states in a session

3 states in a session

4 states in a session

15000

1 states in a session

2 states in a session

3 states in a session

4 states in a session

(a)

(b)

Figure 3: Log likelihood (a) and log posterior (b) across MCMC iterations for models with 2 (top row), 3,
and 4 (bottom row) session types, and 1-4 states within a session. A constant has been added to each panel so that 0 is the maximum value.

and the k parameters have a multivariate normal posterior distribution. The violation of both these assumptions explains how BIC and Bayes factors come to dierent conclusions despite the fact that dierences in BIC approximate log Bayes factors under a particular prior (Kass and Raftery, 1995). A Newton and Raftery (1994) Bayes factor calculation indicated that the (4,4) model was most likely, and was e98 times more likely than its closest competitor, the (3,4) model. This is far too large to be believed. Bayes factors proved untrustworthy because our very weak prior distribution leads to a reverse Lindley paradox (Bernardo and Smith, 1994) in which the more complex model is always preferred. Numerically the problem occurs because the prior oers an innite reward for any transition probability that can be pushed to zero. As more states are added there is more opportunity for empty transition counts to push model parameters near zero. The phenomenon is apparent in Figure 3(b) which shows boxplots describing the distribution of un-normalized log posterior S2 ,S1 () + log p(). Notice that the scale of panel (b) is ten times that of panel (a). The discrepancy is entirely due to the prior, as both panels use the same likelihoods. The remainder of this Section describes the model with S2 = 3 and S1 = 3. 14

4 session types

10000

1000

5000

500

3 session types

10000

1000

5000

500

2 session types

10000

1000

5000

500

2000

4000

6000

8000 10000 1.0

0.8

0.6

0.4

0.2

0.0

0.0

0.2

0.4

0.6

0.8

0.8

0.6

ACF

0.4

0.2

0.0

0.8

0.6

0.4

0.2

0.0

0.0

0.2

0.4

0.6

0.8

1.0

0 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 10000

10

20

30

40

10

20

30

40

10

20

30

40

lag

(a)

(b)

Figure 4: (a) Transition probabilities between session types. (b) Autocorrelation functions. Panels are arranged in the same order as (r, s). The posterior mean of the initial distribution is E(0 |Y) = (.13, .55, .32).

5.2

Session Types

Figure 4 shows MCMC sample paths of , the transition probabilities for the latent session types. The sampler was run for 20,000 iterations with the rst 10,000 discarded as burn-in. Figure 4 reveals no evidence of label switching or multiple modes, which are sometimes encountered with hidden Markov and mixture models (Stephens, 2000; Celeux et al., 2000). The rapidly decaying autocorrelation functions in panel (b) are attributable to the ecient forward-backward sampling scheme from Section 4. About half of all users have initial sessions of type 2, about one third begin as type 3, and about one sixth begin as type 1. The type 1 and 2 sessions tend to persist, meaning that later sessions tend to be of the same type. Users returning from session type 3 have about a 50% probability of remaining in type 3, otherwise they move to type 2 and are unlikely to move back. This suggests that state 3 is at least partly populated by users who are learning about the site, and who graduate to session type 2 when they are ready. Table 3 shows the distribution of page requests within a session of each type, computed as described in Section 3.3.1. Type 1 sessions spend considerably less time on the Home page than the

15

0.0

0.2

0.4

0.6

0.8

1.0

Session 1 2 3 1 2 3

Home .48 .69 .69

InBsk .01 .00 .00 .01 .01 .00

Info .22 .13 .01 .43 .43 .02

OutBsk .01 .01 .01 .02 .02 .03

Order .00 .00 .00 .01 .00 .00

Product .12 .07 .10 .23 .24 .31

Shop .15 .09 .20 .29 .28 .62

ShowBsk .01 .00 .00 .01 .01 .01

SiteMap .00 .00 .00 .01 .00 .00

Table 3: (Top) Page request distributions by session type. (Bottom) Distribution given not Home page. other session types, so Table 3 also shows the conditional distribution of page requests given that the user is not on the Home page. Excluding visits to the Home page, session types 1 and 2 have very similar proles. Relative to the other session types, type 3 sessions spend a greater fraction of their time on Shopping pages and a much smaller fraction on Information pages, whether or not the Home page is taken into account. The session types also dier according to the probability of shopping cart events, which are rare enough to be lost in the rounding of Table 3. Figure 5(a) shows the probability that an item is placed in the shopping cart. Figure 5(b) shows the probability that a session contains a purchase. Type 2 sessions clearly have a higher shopping cart probability than type 1 sessions. Type 3 sessions almost never uses the shopping cart, so the probability that a session of type 3 contains a purchase is nearly zero. Session type 2 has a higher expected purchase probability than session type 1, but the purchase probability for type 1 sessions exceeded that of type 2 sessions in about 20% of posterior draws. Taken together, the clear dierence in Figure 5(a) and the overlap in Figure 5(b) suggest that session type 1 has a higher conditional probability of completing a sale given that an item was placed in shopping cart. This probability is plotted in Figure 5(c), where it is higher for type 1 sessions in 94% of posterior draws. The expected number of type 2 sessions containing shopping cart events was 211.05, compared to only 38.05 for type 1 sessions, so the sale-completion probability is estimated more precisely for type 2 sessions. Figure 5(d) plots a nal summary showing that session type 2 has a much greater expected length (number of page requests) than the others. The information in Table 3, Figure 4, and Figure 5 suggests that users in session types 13 can be labeled decisive shoppers, deliberators, and never buyers. Decisive shoppers appear briey on 16

0.07

0.06

Shopping Basket Probability

0.05

Purchase Probability

0.04

0.03

0.02

0.01

0.000

0.00

0.005

0.010

0.015

0.020

0.025

0.030

2 Session Type

2 Session Type

(a)

(b)

0.5

0.6

0.7

P(complete sale)

0.3

Page Requests Session Type 1 Session Type 2 5 10

0.1

0.2

0.4

15

20

2 Session Type

(c)

(d)

Figure 5: Posterior distributions of (a) probability of placing an item in the shopping cart, (b) purchase
probabilities (reference line is overall purchase probability), (c) conditional probability of a purchase given that an item was placed in the shopping cart and (d) expected number of page requests. Boxplots represent uncertainty in .

17

the site and rarely place an item in the shopping cart, but when they do they are more likely to complete a sale. Deliberators remain on the site longer, are more likely to place an item in the shopping cart, but are also more likely to leave without nalizing a sale. Moe and Fader (2004) suggested the term never buyers for users who briey visit a site in order to check prices and features for items they are considering purchasing elsewhere. If never buyers are indeed deliberating about items on multiple web sites then it makes sense that those who return would tend to act as deliberators, as seen in Figure 4. A merchant could use this categorization to tailor promotion strategies for the dierent session types. For example, incenting decisive shoppers to place an item in the shopping cart, oering buy now incentives that encourage deliberators to complete a sale after an item is placed in the cart, and educating never-buyers about a products merits relative to competitors. A merchant can track a users probability of session membership in real time using the forward recursion and adjust his promotion strategy accordingly. Alternatively, a merchant with limited computing ability can base promotions on heuristics associated with the session types. For example, oering the buy now incentive only to users who have generated at least 15 page requests, attempting to educate users that exclusively visit Home, Product, or Shopping pages, and switching from education to price promotions for users who have visited Information pages.

5.3

The Role of Latent States Within a Session

The session types discussed above genuinely seem to capture dierent categories of browser behavior. By contrast, the latent states within a session compensate for the inability of a rst order Markov chain to describe web trac data. The sole exception is session type 1, with transition probabilities plotted in Figure 6(a). For type 1 sessions, the low probability of moving to state 1 and the high probability of leaving suggest that it is an introductory state where users are learning to use the web site. This corresponds roughly to the behavior observed in Figure 4 where never-buyers return as deliberators, except that here the learning occurs within a session rather than between sessions. To help understand the roles of the other states, Figure 7 plots the posterior distribution of (H) (h, y), computed as described in Section 3.3.1. Each panel of Figure 7 shows the prole of pages visited by a session of type H while in state h. Summing over the page categories in each

18

0 1.0

2000

4000

6000

8000 10000 1.0

2000

4000

6000

8000 10000 1.0

2000

4000

6000

8000 10000

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0

1.0

1.0

0.0

0.2

0.4

0.6

0.8

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0

1.0

1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0

2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

0.0

0.2

0.4

0.6

0.8

1.0

2000

4000

6000

8000 10000

2000

4000

6000

8000 10000

E(0 |Y) = (.13, .44, .43) (a)

(1)

E(0 |Y) = (.36, .30, .34) (b)

(2)

E(0 |Y) = (.12, .28, .60) (c)

(3)

Figure 6: Transition probabilities for latent states within session types 1 (a), 2 (b), and 3(c). Final 10,000
iterations plotted. Posterior means of initial state distributions are below each panel.

panel gives (H) (h). Figure 7 suggests that states 2 and 3 in session type 1 assign dierent weights to Product and Shopping pages, but are otherwise similar. The same can be said for states 1 and 2 in type 1 sessions, which model frequent visits to the Home page. Examining the page-level transition matrices is one way to understand the dierence between states with similar proles in Figure 7. Figure 8 shows the transition matrices for states 2 and 3 of session type 1. There are seven other panels like Figures 8(a) and 8(b), which emphasizes why low dimensional summaries like Figure 7 are necessary. Figure 8 shows some evidence of multimodality among the sparsely populated shopping cart pages (OutBsk, Order, ShowBsk), which prevents us from ascribing shopping cart events to individual states. However, the multimodality has no eect on model summaries that average over latent states. In particular the session level summaries from Section 5.2 are unaected. Note that the multimodality is not caused by label switching. If it were, the other rows of Figure 8 would be similarly multimodal. Figure 8 shows that state 2 assigns high self-transition probabilities to pages where self transitions are possible, while state 3 assigns higher probabilities of leaving the site. This suggests that an important role for the latent states is to correctly model session duration. With most of the risk of leaving the site concentrated in one state, the session duration can be extended or shortened by adjusting the probability of entering the risky state. Figure 9 explores this idea for the other

19

0.0

0.2

0.4

0.6

0.8

1.0

0.5

0.3 0.2 0.1 0.0

0.5

0.3 0.2 0.1 0.0

0.5

0.3 0.2 0.1 0.0 OutBsk OutBsk Shop OutBsk InBsk Shop InBsk ShowBsk InBsk Product ShowBsk SiteMap Product Shop Info Info Info ShowBsk SiteMap Product SiteMap Home Order Home Order Home Order

Figure 7: The joint distribution (H) (h, y). The boxplots represent uncertainty about . Columns represent
h = 1, 2, 3. Rows represent H = 1, 2, 3.

session types by plotting the distribution of h when the Leave page is entered for the rst time. Each session type clearly has one state responsible for ending sessions. For type 2 sessions, states 1 and 2 are devoted to modeling the long runs of home page requests mentioned in Section 2. The second row of Figure 7 shows that states 1 and 2 frequently visit the Home page, while state 3 contains a higher fraction of visits to Information, Product, and Shopping pages. Figure 6(b) shows that state 1 has a low probability of transitioning to state 3, while state 2 has a high probability. Thus state 1 is associated with home page transitions that tend to be followed by self transitions, while state 2 is associated with home page transitions that are followed by visits to other pages on the site.

20

Session 3

0.4

Session 2

0.4

Session 1

0.4

Home

0.0

InBsk

0.0

0.0

0.6

Info

0.6

0.0

0.0

OutBsk

0.0

0.0

Order

0.0

0.0

Product

0.0

0.0

Shop

0.0

0.0

SiteMap ShowBsk

0.0

0.0

Leave

0.0

0.0

Home

InBsk

Info

OutBsk

Order

Product

Shop

ShowBsk

SiteMap

Leave

0.0

Home

InBsk

Info

OutBsk

Order

Product

Shop

ShowBsk

SiteMap

Leave

(a)

(b)

Figure 8: Conditional transition probabilities for session type 1 (a) state 2 and (b) state 3. The preceding observations suggest that the within-session latent states are correcting for deciencies in rst order Markov chains rather than providing insight about user behavior. A similar conclusion can be reached by examining the self transition probabilities in Figure 6. If the hidden Markov chain was capturing information about a users evolving mental state then one would expect the diagonal elements of (H) to be large, indicating that the mental state persists over time. However, most of the diagonal elements are less than 1/2, which would imply that users mental states tend to change with each click of the mouse. Thus it is dicult to give the within-session hidden Markov chain a meaningful economic or psychological interpretation.

Conclusion

This article has described data from a small e-commerce web site using a nested hidden Markov mixture of Markov chains. The session-level and page-level Markov chains play dierent roles in the model. The page-level model oers a exible description of within-session behavior, which allows the session-level model to describe the data with a relatively small number of session types. We

21

Leave

0.6

0.6

SiteMap ShowBsk

0.6

0.0

0.6

0.6

0.0

0.6

Shop

0.6

0.6

Product

0.6

0.6

Order

0.6

0.6

OutBsk

0.6

0.6

Info

InBsk

0.6

0.6

Home

0.6

0.6

0.8

1.0

0.8

1.0 0.0

0.2

0.8

1.0 0.0

0.2

0.0

0.2

State

Figure 9: Conditional distribution of h at the end of a session. considered models with either 3 or 4 session types. By contrast Cadez et al. (2000b) needed 40 session types to describe web log data using mixtures of rst order Markov chains, though they were working with a more complex web site. The within-session models are complex, but because they can be interpreted as rst order Markov chains on an augmented state space, marginal page distributions and visitation probabilities are easy to compute. These model summaries are simple to compute but are analytically complex, which makes them natural candidates for analysis by Bayesian posterior simulation. When applied to our sample data set the models identied three quite reasonable categories of user behavior. Our ndings are robust in the sense that we found similar session types when we ran the model on a second set of data with a dierent observation window. The four state model that was our second choice in Section 5.1 split state 2 into two categories, but states 1 and 3 were essentially unchanged. The model described in this article was tailored to our specic data set in three ways that could easily be modied for other applications. First, our decision to use discrete time models for Yij was driven entirely by the coarse time stamp mentioned in Section 2. With a more precise time stamp we would have preferred a continuous time model such as a Markov modulated Poisson process used by Scott and Smyth (2003) and Scott (2004). Second, all transition probabilities in

22

Session 3

0.4

0.6

Session 2

0.4

0.6

Session 1

0.4

0.6

our model were estimated nonparametrically. An alternative would have been to replace each row of a given transition matrix with a multinomial logit model. The lack of a regression structures prohibits our model from considering demographic user information such as age, sex, or income. However in practice such information is rarely available to online merchants, ethical and privacy considerations may prohibit its use, and past work has shown it to be of limited value (e.g. Rossi et al., 1996; Montgomery et al., 2004). Third, our model considered only two levels of nesting. More complex web sites could be modeled with more deeply nested models. For example, the web site Amazon.com has stores that specialize in books, housewares, clothing, etc. Customers who visit multiple stores can be modeled using a triply nested HMM, where the outer layer models transitions between trips to Amazon, the middle layer models transitions between Amazon stores, and the inner layer models individual page requests.

Forward-Backward Recursions for HMMs

Suppose {zt } follows a nite state Markov chain with initial distribution 0 and transition probn abilities P r(zt = s|zt1 = r) = Qt (r, s). Let y1 = {y1 , . . . , yn } be a sequence of random variables 0 n n with conditional distribution p(y1 |z1 ) = fz1 (y1 ) n t=2 fzt (yt |yt1 ).

The mixture components fs may

depend on a parameter which we suppress in the notation. Allowing fs (yt |yt1 ) to depend on yt1 is a minor modication of the traditional hidden Markov model that does not substantially alter the recursions. The forward-backward recursions are used to eciently compute the observed data likelihood
hn 1 n p(y1 |hn )p(hn ), as well as the posterior distribution of the latent Markov chain given observed 1 1

data and model parameters. The recursions consist of a forward step that computes the distribution of the tth transition given all the data up to time t, and a backward recursion that updates each distribution to condition on all observed data. Cowell et al. (1999) refer to these steps as accumulating evidence and distributing evidence. There is also a stochastic backward step
n n that simulates from p(z1 |y1 ). The derivation below follows Scott (2002).

The forward recursion operates on the set of transition distributions, represented by a sequence
t t of matrices Pt = (ptrs ) where ptrs = P r(zt1 = r, zt = s|y1 ). For t > 0 let t (s) = P r(zt = s|y1 )

23

t1 and at (zt1 , zt , yt ) = p(zt1 , zt , yt |y1 ). By denition,

at (r, s, yt ) = t1 (r)Qt (r, s)fs (yt |yt1 ).


t1 Equation (10) yields both the incremental likelihood p(yt |y1 ) =

(10)

r,s at (r, s, yt ) r

and the ltered

t1 transition distribution ptrs = at (r, s, yt )/p(yt |y1 ). Computing t (s) =

ptrs sets up the next

step in the recursion. The recursion is initialized by replacing f with f 0 in equation (10). The
n observed data log-likelihood can be computed as log p(y1 ) = log p(y1 ) + n t1 t=2 log p(yt |y1 ).

With

t appropriate use of logarithms, fs and p(yt |y1 ) need only ever be evaluated on the log scale. The

mixing weight in equation (9) is wt (h) =

rS

t1 (r)Q(r, h).

n n The stochastic version of the backward recursion simulates from p(z1 |y1 ). Begin with the

factorization
n n p(z1 |y1 )

n1

n p(zn |y1 ) t=1

n n p(zt |zt+1 , y1 ).

n Then notice that, given zt+1 , zt is conditionally independent of yt+1 and all later zs. Thus t+1 n p(zt |zt+1 , y1 ) = P r(zt = r|zt+1 = s, y1 ) pt+1rs . Therefore, if one samples (zn1 , zn ) from

the discrete bivariate distribution given by Pn and then repeatedly samples zt from a multinomial
n n n distribution proportional to column zt+1 of Pt+1 then z1 is a draw from p(z1 |y1 ).

References
Baum, L. E., Petrie, T., Soules, G., and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics 41, 164171. Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory. John Wiley & Sons. Bucklin, R. E. and Sismeiro, C. (2003). A model of web site browsing behavior estimated on clickstream data. Journal of Marketing Research XL, 249267. Cadez, I. V., Ganey, S., and Smyth, P. (2000a). A general probabilistic framework for clustering individuals and objects. In Knowledge Discovery and Data Mining, 140149. Cadez, I. V., Heckerman, D., Meek, C., Smyth, P., and White, S. (2000b). Visualization of navigation patterns on a web site using model-based clustering. In Knowledge Discovery and Data Mining, 280284.

24

Catledge, L. D. and Pitkow, J. E. (1995). Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27, 10651073. Celeux, G., Hurn, M., and Robert, C. P. (2000). Computational and inferential diculties with mixture prior distributions. Journal of the American Statistical Association 95, 957970. Chib, S. (1996). Calculating posterior distributions and modal estimates in Markov mixture models. Journal of Econometrics 75, 7997. Cooley, R., Mobasher, B., and Srivastava, J. (1999). Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems 1, 532. citeseer.nj.nec.com/cooley99data.html. Cowell, R. G., Dawid, A. P., Lauritzen, S. L., and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. Springer. Eirinaki, M., Vazirgiannis, M., and Kapogiannis, D. (2005). Web path recommendations based on page ranking and markov models. In WIDM 05: Proceedings of the 7th annual ACM international workshop on Web information and data management, 29, New York, NY, USA. ACM Press. Jank, W. and Shmueli, G. (2006). A special issue on statistical challenges and opportunities in electronic commerce research. Statistical Science 21. Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association 90, 773795. Kass, R. E. and Wasserman, L. (1996). The selection of prior distributions by formal rules (corr: 1998v93 p412). Journal of the American Statistical Association 91, 13431370. Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery 7, 373397. MacDonald, I. L. and Zucchini, W. (1997). Hidden Markov and other models for discrete-valued time series. Chapman and Hall. Moe, W. W., Chipman, H., George, E. I., and McCulloch, R. E. (2002). A Bayesian treed model of online purchasing behavior using in-store navigational clickstream. Tech. rep., University of Texas at Austin. Moe, W. W. and Fader, P. S. (2004). Dynamic conversion behavior at e-commerce sites. Management Science 50, 326335. Montgomery, A. L., Li, S., Srinivasan, K., and Liechty, J. C. (2004). Modeling online browsing and path analysis using clickstream data. Marketing Science 23, 579595. Neuts, M. F. (1981). Matrix-Geometric Solutions in Stochastic Models. Dover Publications, Inc., New York. Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap (Disc: P26-48). Journal of the Royal Statistical Society, Series B: Methodological 56, 326. 25

Resnick, S. I. (1992). Adventures in Stochastic Processes. Birkhaeuser. Rossi, P. E., McCulloch, R. E., and Allenby, G. M. (1996). The value of purchase history data in target marketing. Marketing Science 15, 321340. Scott, S. L. (2002). Bayesian methods for hidden Markov models: Recursive computing in the 21st century. Journal of the American Statistical Association 97, 337351. Scott, S. L. (2004). A Bayesian paradigm for designing intrusion detection systems. Computational Statistics and Data Analysis 45, 6983. (special issue on computer security). Scott, S. L., James, G. M., and Sugar, C. A. (2005). Hidden Markov models for longitudinal comparisons. Journal of the American Statistical Association 100, 359369. Scott, S. L. and Smyth, P. (2003). The Markov modulated Poisson process and Markov Poisson cascade with applications to Web trac modeling. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, and M. West, eds., Bayesian Statistics 7, 671680. Oxford University Press. Sismeiro, C. and Bucklin, R. E. (2004). Modeling purchase behavior at an e-commerce web site: Atask-completion approach. Journal of Marketing Research LXI, 306323. Stephens, M. (2000). Dealing with label switching in mixture models. Journal of the Royal Statistical Society, Series B, Methodological 62, 795810. Tauscher, L. and Greenberg, S. (1997). Revisitation patterns in world wide web navigation. In Human Factors in Computing Systems: Proceedings of the CHI 97 Conference. New York: ACM, 399406.

26

Anda mungkin juga menyukai