Anda di halaman 1dari 21

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/236742423

Binomial Mixture Modeling of University Credits

Article  in  Communication in Statistics- Theory and Methods · January 2013


DOI: 10.1080/03610926.2013.804565

CITATIONS READS

2 439

3 authors:

Leonardo Grilli Carla Rampichini


University of Florence University of Florence
65 PUBLICATIONS   682 CITATIONS    64 PUBLICATIONS   435 CITATIONS   

SEE PROFILE SEE PROFILE

Roberta Varriale
Italian National Institute of Statistics
25 PUBLICATIONS   149 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Advances in Multilevel and Longitudinal Modelling SID project, University of Padua View project

All content following this page was uploaded by Leonardo Grilli on 22 May 2014.

The user has requested enhancement of the downloaded file.


Binomial mixture modelling of university credits
Leonardo Grilli*, Carla Rampichini* and Roberta Varriale**
*Department of Statistics, Computer Science, Applications - University of Florence
**ISTAT, Rome

Corresponding author Leonardo Grilli Department of Statistics, Computer Science, Appli-


cations Viale Morgagni, 59 50134 Firenze e-mail: grilli@ds.unifi.it

Pre-print version of the article to appear in 2013 in


.
COMMUNICATIONS IN STATISTICS - THEORY AND METHODS
http://www.tandfonline.com/loi/lsta20#.UZNHoMrmDtk
Binomial mixture modelling of university credits

Abstract

The paper reviews finite mixture models for binomial counts with concomitant variables.
These models are well known in theory, but they are rarely applied. We use a binomial

finite mixture to model the number of credits gained by freshmen during the first year at the

School of Economics of the University of Florence. The finite mixture approach allows us to
appropriately account for the large number of zeroes and the multi-modality of the observed

distribution. Moreover, we rely on a concomitant variable specification to investigate the role

of student background characteristics and of a compulsory pre-enrolment test in predicting

gained credits. In the paper we deal with model selection, including the choice of the number
of components, and we devise numerical and graphical summaries of the model results in

order to exploit the information content of the concomitant variable specification. The main

finding is that the introduction of the pre-enrolment test gives additional information for
student tutoring, even if the predictive power is modest.

Key Words: concomitant variables; excess zeroes; latent class; prediction; pre-enrolment

test.

1 Introduction

In the statistical literature there has been a growing interest in finite mixture modelling as a

tool to increase the flexibility of conventional parametric models (McLachlan and Peel, 2000;
Schlattmann, 2009). Finite mixture models can be seen as a compromise between a simple

parametric model and a non-parametric approach. Moreover, these models allow to account

for unobserved heterogeneity due to latent sub-populations, often called latent classes.
In our application the response of interest is the number of credits gained by university

1
freshmen during the first year. This is a count variable with a maximum of sixty common
to all students, thus the binomial distribution is a natural candidate. However, the large

number of zeroes and the multi-modality of the observed distribution call for a flexible model,

such as a binomial finite mixture. This model should be considered as an approximation

to the observed distribution and it is not intended to be an accurate representation of the


processes of credits accumulation, which would require a more complex model taking into

account that each student can choose her own exam sequence and that exams have different

success rates.
Indeed, the main purpose of the analysis is to identify predictors of student performance

among background characteristics and the results of a pre-enrolment test. To this end we

use a concomitant variable approach (Dayton and Macready, 1988).


In applied research there are many examples of finite mixture models for unbounded

count data, where the outcome distribution is assumed to be Poisson or negative binomial.

In case of bounded counts, the usual strategy is to use a truncated Poisson distribution

(Saffari et al., 2013), whereas binomial finite mixture models are rarely used. The only
two examples we found are: a study of fetal deaths in litters by Brooks et al. (1997), who

apply different types of finite mixture models without covariates; and a study of welfare

participation by Melkersson and Saarela (2004), who specify a hurdle model with a binomial
finite mixture for non-zero counts. In our application the zero counts are not modelled

separately, so there is not a hurdle stage. Moreover, Melkersson and Saarela (2004) adopt a

mixture regression approach where the covariates affect the binomial probabilities, while we
adopt a concomitant variable approach where the covariates affect the mixture probabilities.

The application of binomial mixtures raises several issues that must be carefully ad-

dressed. Most of the issues are common to all mixture models, such as the choice of the

number of components, whereas other issues pertain to models with concomitant variables,
such as the strategy to summarize the effects of the covariates. In fact, in models with

2
concomitant variables the effects are difficult to interpret and often results are presented in
a hasty way, so that the information content of the model is largely unexploited. The results

will be effectively presented in form of tables and graphs for the component probabilities and

the expected responses, considering some relevant student profiles. Moreover, the predictive

performance of the model will be evaluated via cross-validation techniques.


The structure of the paper is as follows. Section 2 outlines the binomial finite mixture

model with concomitant variables and reviews some of the methods for selecting the number

of components. Section 3 describes the data and discusses model specification. Section 4
illustrates the results through numerical and graphical summaries, including the assessment

of the predictive ability. Section 5 concludes.

2 Finite mixture models for binomial counts

Let us consider a discrete random variable yi observed on a random sample of subjects

i = 1, . . . , n. A finite mixture model for yi assumes that its mass distribution function P (yi )

is defined by a finite mixture of conditional distributions P (yi | ui ), where ui is a categorical


latent variable taking values k = 1, . . . , K with prior probabilities πk = P (ui = k), where

πk > 0 and K k=1 πk = 1:
 K
P (yi ) = πk P (yi | ui = k). (1)
k=1

In this paper we assume that all the conditional distributions P (yi | ui ) are binomial with

common number of trials t and component-specific probabilities of success θk :


 
t
P (yi | ui = k) = θkyi (1 − θk )t−yi . (2)
yi

Titterington et al. (1985) show that, in general, finite mixtures of distributions of the expo-
nential family are identified, even if for the binomial distribution the number of components

K should be limited with respect to the number of trials t. In particular, the K-component

3
1
binomial mixture model (1) with 0 < θk < 1 is identifiable if and only if K ≤ 2
(t + 1)
(McLachlan and Peel, 2000).

A common interpretation of the latent variable ui is in terms of latent classes, namely

the population is assumed to be partitioned into K latent classes, where ui = k for subject

i belonging to the k-th latent class. Thus, the prior probability πk corresponds to the
proportion of subjects in the k-th latent class (class size).

The covariates can enter a finite mixture model in two ways: through the conditional

distributions P (yi | ui ), yielding a Mixture Regression Model (Wedel and DeSarbo, 1995),
and through the component probabilities πk , yielding a Concomitant variable mixture model

(Dayton and Macready, 1988). The mixture regression approach allows the relationship

between the response variable and the covariates to differ across the latent classes. This
approach is not suitable in our application on university credits, where mixture modelling

is a way to account for the multi-modality of the response variable. Moreover, our interest

lies in predicting the performance of a student, which requires computing the component

probabilities using the available covariates. We therefore rely on the concomitant variable
approach.

In a concomitant variable mixture model the component probabilities of the finite mixture

vary across subjects according to a vector of covariates zi (usually including a constant for
the intercept):

K
P (yi | zi ) = πk|zi P (yi | ui = k), (3)
k=1
K
where πk|zi = P (ui = k | zi ), with πk|zi > 0 and k=1 πk|zi = 1 for any subject i. Such

constraints are satisfied by any model for nominal variables, like the multinomial logit model:
exp(zi β k )
πk|zi = K 
, k = 1, . . . , K, (4)
l=1 exp(zi β l )

with β 1 = 0 for model identifiability. Therefore, the prior probabilities of class membership
depend on the covariates zi through a non-linear function.

4
The concomitant variable model (3) involves two sets of parameters: θ1 , . . . , θK in the
binomial mass function (2) and β 2 , . . . , β K in the multinomial model (4). The model is

identified if the matrix of the covariates is of full rank, in addition to the condition on the

number of components, K ≤ 12 (t + 1) (Wang et al., 1996). For given K, the parameters

can be estimated with Maximum Likelihood using the EM algorithm (McLachlan and Peel,
2000).

Class membership can be predicted by assigning each unit to the class with the highest

probability, using either the prior probabilities πk|zi = P (ui = k | zi ) or the posterior prob-
abilities P (ui = k | yi , zi ), derived by means of Bayes rule. When the aim is to classify the

sample units, the prediction is usually based on posterior probabilities. In our application,

however, we are interested in predicting the number of gained credits for a hypothetical new
student on the basis of the available covariates. In other words, we aim at making out-

of-sample predictions. Thus, in order to predict the response y∗ for a hypothetical subject

with covariates z∗ , we rely on the expected value based on prior, rather than posterior, class

membership probabilities (marginal mean prediction):



K 
K
E(y∗ | z∗ ) = πk|z∗ E(y∗ | u∗ = k) = t πk|z∗ θk , (5)
k=1 k=1

where the last equality follows from the assumption of binomial components. The predicted
value ŷ∗ is obtained by plugging the estimated parameters into equation (5).

To asses the predictive ability of the estimated model, we can compare the observed

responses yi with the corresponding predictions ŷi for the sample individuals (i = 1, . . . , n).
The prediction error can be summarized in many ways, e.g. by the mean absolute error

(MAE):
1
n
M AE = | ŷi − yi | . (6)
n i=1
Cross-validation techniques (Hastie et al., 2009) can be used to obtain a more reliable value
of MAE as a measure of the performance of out-of-sample prediction.

5
The choice of the number of mixture components (latent classes) is a critical issue: even if
models with different values of K are nested, the Likelihood Ratio Test (LRT) does not have

the standard chi-square distribution since the regularity conditions are not met. In applied

research, the issue is usually solved by comparing models via information criteria, such as

BIC and AIC and their modifications, though the methodological literature suggests to use
statistical tests. Here we consider two tests: the LRT, whose asymptotic distribution may

be approximated by parametric bootstrap (McLachlan, 1987); and the EM-test recently

proposed by Li and Chen (2010). The EM-test compares a finite mixture model with K
components with a model having more than K components. The test statistic is a penalized

version of the LRT based on a few EM iterations. Li and Chen (2010) show that, under

weak conditions, the limiting distribution of the test statistic under the null hypothesis is a
mixture of a mass point in zero and several χ2 distributions.

Nylund, Asparouhov and Muthén (2007) perform a simulation study comparing various

methods for choosing the number of latent classes. The authors conclude that BIC is the

best information criterion, while bootstrap LRT is the best test (but the EM-test was not
considered).

3 Data description and model specification

We analyze data on 690 freshmen of the School of Economics in Florence in a.y. 2008/2009,

considering the students who took the compulsory pre-enrolment test in September 2008.

The aim is to evaluate their performance in terms of gained credits after one year, whose

distribution is shown in Figure 1. The number of credits ranges from 0 to 60 by 3, namely


{0, 3, 6, . . . , 60}. The sample distribution has a small percentage at the maximum (0.75% of

freshmen gained 60 credits), but it has a peak at the minimum (23% of freshmen did not

gain any credit). Therefore, the phenomenon is characterized by a relevant left censoring

6
that needs to be accounted by the model. Moreover, the distribution of positive credits is
quite irregular, showing peaks at 6, 15, 24, 36 and 45 credits. This pattern results from the

paths followed by students, who can take exams yielding 6, 9 or 12 credits. The distribution

of positive credits has a median of 30 and a mean of 29.8.

25
2015
percent
10
5
0

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
Gained credits after one year

Figure 1: Number of gained credits after one year. Freshmen of the School of Economics of
the University of Florence a.y. 2008/09.

The choice of a parametric distribution for gained credits is challenging due to the excess

zeroes and the multi-modality of the observed distribution. A natural approach is to use
a finite mixture of binomial distributions, yielding a multi-modal distribution with limited

support. Since each exam gives a number of credits multiple of 3, we define the response

variable as the number of gained credits divided by 3. In this way, the response variable
ranges from 0 to 20, thus it can be modelled with a binomial distribution with a number of

trials t = 20. The finite mixture approach automatically solves the issue of excess zeroes,

since they are captured by a component with a very low success probability. Note that zero-

inflated models (Hall, 2000) and hurdle models (Saffari et al., 2013) solve the issue of excess
zeroes, but they cannot easily account for the multi-modality of the positive counts. For

example, the hurdle approach could be generalized to allow for multi-modality by modelling

7
the positive counts through a mixture of shifted binomial distributions or a mixture of trun-
cated Poisson distributions (Böhning and Kuhnert, 2006), even if in the present application

the Poisson distributions should be truncated also on the right tail.

The prediction of gained credits may be improved by exploiting background information

and test results through the concomitant variable mixture model (3). The available covariates
are:

• Background variables: Gender, Far-away resident (indicator for residence in the provinces

of Massa-Carrara and Grosseto or in a province out of Tuscany), Type of high school

(HS type: Scientific, Humanities, Technical, Other), High school irregular career (HS
irreg. career: indicator for age at high school diploma > 19), High school grade (HS

grade: from 60 to 100, centered at 80);

• Pre-enrolment test scores: Total test score, Partial test scores (Logic, Reading, Math-

ematics).

A summary of the number of gained credits by background characteristics is reported in


Table 1. Note that the last column reports the average number of gained credits for students

gaining at least one credit. The most important predictors seem to be the irregular career

and the high school grade. The type of school plays a role in predicting students who did
not gain any credit.

The pre-enrolment test is a compulsory test for evaluating the abilities of the candidates

wishing to enrol in one of the degree programs of the School of Economics of the University

of Florence: Management, Economics, Tourism, Marketing and Statistics. The test is based
on 40 multiple-choice items covering 3 areas: Logic (12 items), Reading (10 items) and

Mathematics (18 items). For each item, one out of 5 alternatives is correct, with the following

scoring system: 1 if correct, 0 if blank, −0.25 if wrong. Thus the total score ranges from
−10 to 40. The threshold for passing the test is fixed at 9: candidates with a lower total

8
Table 1: Freshmen’s performance after one year by background characteristics. School of
Economics of the University of Florence a.y. 2008/09.
Variable N % with credits Avg. Credits
=0 >0 (credits > 0)
Gender
Male 357 23.8 76.2 28.7
Female 333 22.2 77.8 30.9
Far-away resident
No 643 22.7 77.3 29.8
Yes 47 27.7 72.3 29.5
HS type
Scientific 254 19.3 80.7 30.7
Humanities 53 17.0 83.0 30.3
Technical 274 25.2 74.8 30.6
Other 109 29.4 70.6 24.9
HS irreg. career
No 605 19.8 80.2 30.6
Yes 85 45.9 54.1 20.6
HS grade
≤ 80 434 27.0 73.0 26.3
> 80 256 16.4 83.6 35.0
All 690 23.0 77.0 29.8

score are advised against enrollment, so that they could still enrol in a degree program of
the School of Economics, but they are allowed to take examinations only after passing the

test during one of the later editions.

The number of gained credits is strongly related to the test result, as shown by Table
2. Overall, the percentage of students gaining credits is 77.0% with a mean of 29.8 credits

out of 60. The performance is worse for students who did not pass the test (58.6% gained

credits, with a mean of 23.5) and it is better for students passing the test with a score above

the median (85.8% gained credits, with a mean of 33.7).


In the analysis we do not use the total score, but the three partial scores in Logic,

Reading and Math. In fact, we are interested in evaluating the role of each of the three

areas in predicting the student performance. For comparison purposes, we use standardized
partial scores.

9
Table 2: Freshmen’s performance after one year by test result. School of Economics of the
University of Florence a.y. 2008/09.
Test result N % with credits Avg. Credits
=0 >0 (credits > 0)
Not passed (< 9) 111 41.4 58.6 23.5
Passed below median (9 − 16.25) 297 24.6 75.4 27.4
Passed above median (> 16.25) 282 14.2 85.8 33.7
All 690 23.0 77.0 29.8

The relationships between gained credits and partial scores are summarized by the corre-
sponding simple regression coefficients (Logic 3.49, Reading 4.35 and Math 5.67). However,

due to the correlations among the partial scores, the multiple regression coefficients give a

somewhat different picture (Logic 0.63, Reading 3.06 and Math 4.70): the score in Logic has

a little role in predicting gained credits once Math and Reading scores are known.
The use of test scores as concomitant variables allows us to assess the predictive power

of the test in terms of number of gained credits, thus establishing if the pre-enrolment test

is an effective tool for student evaluation in addition to background characteristics of the


candidates already available from administrative records.

4 Results

Let us define the response variable for the i-th freshmen as yi = creditsi /3, where creditsi
is the number of gained credits at the end of the first year. Conditionally on the latent

class ui = k, we assume that yi follows a binomial distribution with number of trials t = 20

and class-specific success probability θk . The marginal distribution of yi is given by the


concomitant variable finite mixture model (3) with the multinomial logit model (4) for the

component probabilities πk|zi . The model is fitted with maximum likelihood using the Syntax

module of Latent Gold (Vermunt and Magidson, 2008).

In model (3) the covariates affect the latent class probabilities, but they do not affect the

10
class-specific distribution. Therefore, in order to choose the number K of latent classes, we
first fit the model without covariates. Once the number of latent classes has been chosen,

the covariates are selected using the standard Wald test.

The number of components K of the finite mixture binomial model without covariates is

selected according to the BIC, the bootstrap LRT, and the EM-test of Li and Chen (2010).
The BIC and the bootstrap LRT are obtained from Latent Gold, whereas the EM-test is

performed using the R code embinom.R developed by Pengfei Li 1 .

The results are reported in Table 3 for K = 1, . . . , 6: the three criteria agree in selecting
a model with K = 5. Regarding the EM-test, under the null hypothesis the test statistic

has a positive probability of being equal to 0, as for the case K = 5 in Table 3. This means

that there is no evidence to reject the null hypothesis, suggesting that the model with K = 5
components provides good fitting to the data.

Table 3: Selection of the number of components K in the binomial mixture model without
concomitant variables.
Number Number Log-likelihood BIC LRT statistic EM-test
comp. param. (bootstrap p-value) statistic (p-value)
1 1 -4045.3 8097.1 3539.09 (0.0000) 3538.3 (0.0000)
2 3 -2275.8 4571.1 567.98 (0.0000) 700.3 (0.0000)
3 5 -1991.8 4016.2 135.50 (0.0000) 153.2 (0.0000)
4 7 -1924.0 3893.8 18.31 (0.0000) 17.5 (0.0001)
5 9 -1914.9 3888.5 0.03 (0.2640) 0.0 (1.0000)
6 11 -1914.8 3901.6

Table 4 reports the results for the binomial mixture model without concomitant variables
for K = 5 components. The first component has a proportion π̂1 = 0.22 and a probability of

success θ̂1 near zero, thus yielding an almost degenerate distribution with mass concentrated

in zero. The other four components, whose distributions are depicted in Figure 2, correspond
to latent classes of students with increasing performance in terms of gained credits (the

expected number of credits are 9, 23, 39 and 51, respectively). Table 4 also reports two
1
Downloadable from http://www.math.uwaterloo.ca/ p4li/software/index.htmltest

11
relevant conditional probabilities: the probability of getting zero credits, P (credits = 0|u =
k), and the probability of getting all or almost all of the sixty planned credits, P (credits ≥

54|u = k).

Table 4: Fitted binomial mixture model without concomitant variables for K = 5 compo-
nents.
Component πk θk E(credits|u = k) P (credits = 0|u = k) P (credits ≥ 54|u = k)
1 0.22 0.00 0 1.000 0.000
2 0.15 0.14 9 0.045 0.000
3 0.25 0.39 23 0.000 0.000
4 0.28 0.65 39 0.000 0.012
5 0.10 0.85 51 0.000 0.381

The predicted marginal probabilities P (credits = c) are obtained by plugging parameter


estimates into equations (1) and (2), recalling that y = credits/3. It is worth noting that

the model yields an excellent fit of the proportions at the extremes of the distribution. In

fact, Table 4 shows that the probability of gaining zero credits is not negligible only for
the first two components, so that P (credits = 0) ≈ 0.22 × 1.000 + 0.15 × 0.045 = 0.230,

equal to the observed proportion. Therefore, the fitted model adequately accounts for excess

zeroes. As for the right tail, the probability of gaining at least 54 credits (i.e. at most

one exam of 6 credits left out) is not negligible only for the last two components, so that
P (credits ≥ 54) ≈ 0.28 × 0.012 + 0.10 × 0.381 = 0.040, close to the observed proportion

0.046.

The five components of the fitted mixture model all have a relevant size and they are
well separated (see Table 4 and Figure 2, where the first component is not represented since

it has almost all the mass in zero). Thus, the multi-modal distribution of gained credits

depicted in Figure 1 is adequately approximated by the 5-component binomial mixture.


In order to predict the number of gained credits, we exploit the background variables

and the test scores to characterize the five components via the concomitant variable mixture

model of equations (2) to (4). To select the covariates, we first fit the full model, then we

12
0.25

0.20

P(credits|u=k)
0.15

k=2
k=3
k=4
0.10
k=5

0.05

0.00
0 6 12 18 24 30 36 42 48 54 60
Gained credits

Figure 2: Distribution of components 2 to 5 of the fitted binomial mixture model of Table 4

remove not significant covariates. Specifically, for each covariate s, we perform the multiple

Wald test for H0 : β2s = β3s = β4s = β5s = 0 and discard the covariate when the p-value is
higher than 5%.

For the final model, Table 5 shows the parameter estimates and p-values of the multiple

Wald test for the covariates. The p-value of the standardized Logic test score is slightly higher
than 5%, but we retained it for comparability with the other test scores. As expected, the

estimates of the binomial probabilities are nearly identical to those of the model without

covariates (see Table 4).

The univariate Wald test for the k-th component (k = 2, 3, 4, 5) allows us to test if a
given covariate affects the odds comparing the k-th component with the first one (i.e. the

zero-credit latent class). Considering only significant coefficients, we see that students from

Technical or other high schools and students with irregular career have a lower performance
in terms of gained credits, while students with a good high school grade and students with

high scores in Reading and Math have a better performance.

13
Table 5: Binomial mixture model with concomitant variables: parameter estimates and
p-values of the multiple Wald test for the covariates.
Latent class p-value
1 2 3 4 5
Binomial probability θk 0.00 0.15 0.38 0.64 0.85 -
Multinomial logit model† for πk
Constant - -0.03 0.22 0.96 -0.57 0.000
HS Technical/other - -0.63 0.18 -0.40 -1.43 0.013
HS irregular career - -0.39 -0.79 -3.08 -0.57 0.012
HS grade - -0.01 0.01 0.06 0.12 0.000
Logic (std score) - -0.11 0.21 0.26 -0.34 0.052
Reading (std score) - 0.51 0.33 0.29 0.79 0.001
Math (std score) - -0.09 0.00 0.25 1.10 0.000

Estimates are in italic when the p-value of the univariate Wald test is < 0.05.

In general in a multinomial model, the effect of covariates on the probabilities πk is

not linear and it can even be not monotone. In order to better understand the effect of

covariates, it is useful to transform coefficients into probabilities and credits. To this end,
Table 6 reports the predicted latent class probabilities and expected number of gained credits

for several student profiles. The expected number of credits for latent class k is obtained

from the corresponding binomial distribution as 60 × θk . These values are reported in the

last row of Table 6, whereas the expectations in the last column are the model predictions

obtained as weighted means, namely k πk (60 × θk ).

Table 6: Binomial mixture model with concomitant variables: predicted latent class proba-
bilities and expected number of gained credits.
Latent class Expected
1 2 3 4 5 n. of credits
Predicted probabilities
Baseline student† 0.16 0.15 0.20 0.41 0.09 25.8
HS Technical/other 0.20 0.11 0.31 0.36 0.03 22.9
HS irregular career 0.38 0.25 0.21 0.04 0.12 14.8
HS grade 60 (min) 0.24 0.30 0.26 0.19 0.01 16.5
HS grade 100 (max) 0.06 0.04 0.08 0.47 0.35 37.8
Weak student‡ 0.48 0.22 0.28 0.01 0.00 9.0
Expected number of credits
0.0 8.8 22.7 38.1 50.8

Baseline: HS Scientific/Humanities, regular career, HS grade=80, test scores=0.

Weak: HS Technical/other, Irregular career, HS grade=60, test scores=0.

14
The baseline student is defined by having all the covariates equal to zero, namely she
comes from a Scientific/Humanities high school, with regular career and a grade at the mid-

point (80), and she obtained average test scores. This student has a low probability to be in

the zero-credit latent class and a high probability to be in the fourth one, with an expected

number of credits equal to 25.8. In Table 6, the four rows below the baseline student refer
to profiles that differ from the baseline by changing the covariates one at a time. Students

from Technical or other schools have a higher probability to be in the zero-credit class and

a lower probability to be in the top class, but overall the difference in terms of expected
number of credits is small (−2.9 credits). Students with irregular high school career have

a remarkably large probability to be in the zero-credit class and they have a substantially

lower expected number of credits (−11.0 credits). The high school grade is a good predictor
of student’s performance: the expected number of credits ranges from 16.5 for a grade at

the minimum to 37.8 for a grade at the maximum.

The weak student has the most unfavorable background characteristics, i.e. she comes

from a Technical or other high school, with irregular career and a grade at the minimum
(60). This kind of student has nearly fifty percent probability to be in the zero-credit class

and an almost null probability to be in the two top classes; as a consequence, the expected

number of gained credits is only 9.


To interpret the effect of the standardized test scores on gained credits, we compute the

the expected number of credits for the baseline student by varying the scores one at a time

in a grid between −3 and +3. Figure 3 reports the three curves, showing non-linear patterns
that would be difficult to figure out without transforming the estimated coefficients. The

Logic score has a negligible effect, whereas the Reading score has a small positive effect. The

Math score has the largest effect, especially for high scores (note the asymmetry in Figure

3): given the background characteristics and the other test scores, a high Math score is
associated with a substantial increase of gained credits.

15
45

40

35

Expected number of credits


30

25

20

15

10

0
-3 -2 -1 0 1 2 3
Standardized test scores
Reading (std score) Math (std score) Logic (std score)

Figure 3: Expected number of gained credits by test scores (the value in zero refers to the
baseline student).

Another interesting aspect is the probability of belonging to the zero-credit latent class.
Such probability strongly depends on background characteristics, for example it is 0.16 for

the baseline student and 0.48 for the weak student (see Table 6). Figure 4 reports the

probability of belonging to the zero-credit latent class for the weak student by varying the
scores one at a time in a grid between −3 and +3. It is worth to note that the Math and

Logic scores do not have any appreciable effect in reducing the size of the zero-credit class,

whereas the Reading score has a strong effect: the probability that a weak student falls in
the zero-credit class ranges from 0.78 for a Reading score equal to −3 to 0.21 for a Reading

score equal to +3. Thus, given the background characteristics and the other test scores, a

high Reading score is associated with a substantial increase in the probability of a successful

start-up of the university career.


To evaluate the predictive performance of the concomitant variable mixture model we

compute the predicted values ŷi from equation (5), using 10-fold cross-validation (Hastie et

al., 2009). The prediction errors (ŷi − yi ) have a mean close to zero and a nearly symmetric
distribution, with quartiles equal to Q1 = −12.4, Q2 = −0.4 and Q3 = 11.4. The Mean

16
1.0

0.9

ProbabilityofbelongingtothezeroͲcreditlatentclass
0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0
Ͳ3 Ͳ2 Ͳ1 0 1 2 3
Standardizedtestscores
Reading(stdscore) Math(stdscore) Logic(stdscore)

Figure 4: Probability of belonging to the zero-credit latent class by test scores (the value in
zero refers to the weak student).

Absolute Error (MAE) is 13.0, which is similar to the one computed on the whole sample
(12.7), therefore in this application the advantage of using cross-validation is negligible.

Moreover, to asses the predictive power of covariates, we compare the average predic-

tion errors for the following nested models: model without covariates (MAE=15.7), model
with only background characteristics (MAE=13.3, reduction of 15%), and full model with

background characteristics and test scores (MAE=12.7, further reduction of 4%). Therefore,

in terms of prediction ability, the background characteristics give a relevant contribution.


The pre-enrolment test yields a further improvement, even if the predictive ability remains

modest.

5 Concluding remarks

The paper has presented a detailed data analysis based on a binomial mixture model with

concomitant variables. In our application on the number of gained university credits, the

binomial mixture model proved to be a flexible tool to model a complex response variable,
which is a bounded count characterized by a peak in zero and several modes.

17
The paper addressed the controversial issue of the choice of the number of components
using two standard methods, namely the BIC index and the bootstrap LRT, and a recently

proposed EM-test. In this application, the three methods led to the same conclusion, though

further research is needed to compare the EM-test with standard procedures.

The results of the concomitant variable mixture model have been effectively presented
by means of tables and graphs, based on converting regression coefficients into predicted

component probabilities and expected response for a set of student profiles defined by the

covariates. In this way the information content of the model has been well exploited.
The predictive performance of the model has been evaluated by means of cross-validation,

showing that background characteristics and test scores help predicting gained credits. Fur-

ther work should be done on the development of suitable diagnostic tools based on generalized
residuals, starting from the work of Wang et al. (1996) on Poisson mixture models.

The analysis on gained university credits confirmed the predictive role of background

characteristics such as the high school type and grade, and the regularity of the school

career. Moreover, the analysis showed that the pre-enrolment test designed by the School
of Economics of the University of Florence gives additional information. Thus, the test

results can be effectively added to background characteristics to yield valuable indications

for student tutoring: in particular, a low Reading score is related to a difficult start-up, while
a low Math score is related to a slow progression.

References
Böhning, D., Kuhnert, R. (2006). Equivalence of Truncated Count Mixture Distributions
and Mixtures of Truncated Count Distributions. Biometrics, 62:1207–1215.

Brooks, S. P., Morgan, B. J. T., Ridout, M. S., Pack, S. E. (1997). Finite Mixture Models
for Proportions. Biometrics, 53:1097-1115.

Dayton, C. M., Macready, G. B. (1988). Concomitant-Variable Latent-Class Models. Journal


of the American Statistical Association,83: 173-178.

18
Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: a case
study. Biometrics, 56:1030–1039.

Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Second Edition. Springer.

Li, P., Chen, J. (2010). Testing the order of a finite mixture model. Journal of the American
Statistical Association, 105:1084-1092.

McLachlan, G. (1987). On Bootstrapping the Likelihood Ratio Test Statistic for the Number
of Components in a Normal MixtureTesting the order of a finite mixture model. Applied
Statistics, 36:318-324.

McLachlan, G., Peel, D. (2000). Finite Mixture Models. New York: Wiley.

Melkersson, M., Saarela, J. (2004). Welfare Participation and Welfare Dependence among
the Unemployed. Journal of Population Economics, 17:409-431.

Nylund, K. L., Asparouhov, T., Muthén, B. O. (2007). Deciding the number of classes in
latent class analysis and growth mixture modeling: a Monte Carlo Simulation Study,
Structural Equation Modeling, 14:535-569.

Saffari, S.E., Adnan, R., and Greene, W. (2012). Investigating the impact of excess zeros
on hurdle-generalized Poisson regression model with right censored count data, Statistica
Neerlandica, 67:67-80.

Schlattmann, P. (2009). Medical Applications of Finite Mixture Models. Berlin: Springer-


Verlag.

Titterington, D. M., Smith, A. F. M., Makov, U. E. (1985). Statistical Analysis of Finite


Mixture Distributions. New York: Wiley.

Vermunt, J. K., Magidson, J. (2008). LG-Syntax users guide: Manual for Latent GOLD 4.5
Syntax Module. Belmont, MA: Statistical Innovations Inc.

Wang, P., Puterman, M. L., Cockburn, I., Le, N. D. (1996). Mixed Poisson regression models
with covariate dependent rates. Biometrics, 52:381-400.

Wedel, M., DeSarbo, W. S. (1995). A Mixture Likelihood Approach for Generalized Linear
Models. Journal of Classification, 12:21-55.

19

View publication stats

Anda mungkin juga menyukai