University Credits Modelled with Binomial Mixture

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/236742423
Binomial Mixture Modeling of University Credits
Article in Communication in Statistics- Theory and Methods · January 2013

DOI: 10.1080/03610926.2013.804565
CITATIONS READS
2 439
3 authors:
Leonardo Grilli Carla Rampichini

University of Florence University of Florence
65 PUBLICATIONS 682 CITATIONS 64 PUBLICATIONS 435 CITATIONS
SEE PROFILE SEE PROFILE
Roberta Varriale
Italian National Institute of Statistics
25 PUBLICATIONS 149 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Advances in Multilevel and Longitudinal Modelling SID project, University of Padua View project
All content following this page was uploaded by Leonardo Grilli on 22 May 2014.
The user has requested enhancement of the downloaded file.

Binomial mixture modelling of university credits
Leonardo Grilli*, Carla Rampichini* and Roberta Varriale**
*Department of Statistics, Computer Science, Applications - University of Florence
**ISTAT, Rome
Corresponding author Leonardo Grilli Department of Statistics, Computer Science, Appli-

cations Viale Morgagni, 59 50134 Firenze e-mail: grilli@ds.unifi.it
Pre-print version of the article to appear in 2013 in

.
COMMUNICATIONS IN STATISTICS - THEORY AND METHODS
http://www.tandfonline.com/loi/lsta20#.UZNHoMrmDtk
Binomial mixture modelling of university credits
Abstract
The paper reviews finite mixture models for binomial counts with concomitant variables.
These models are well known in theory, but they are rarely applied. We use a binomial
finite mixture to model the number of credits gained by freshmen during the first year at the
School of Economics of the University of Florence. The finite mixture approach allows us to
appropriately account for the large number of zeroes and the multi-modality of the observed
distribution. Moreover, we rely on a concomitant variable specification to investigate the role
of student background characteristics and of a compulsory pre-enrolment test in predicting
gained credits. In the paper we deal with model selection, including the choice of the number
of components, and we devise numerical and graphical summaries of the model results in
order to exploit the information content of the concomitant variable specification. The main
finding is that the introduction of the pre-enrolment test gives additional information for
student tutoring, even if the predictive power is modest.
Key Words: concomitant variables; excess zeroes; latent class; prediction; pre-enrolment
test.
1 Introduction
In the statistical literature there has been a growing interest in finite mixture modelling as a
tool to increase the flexibility of conventional parametric models (McLachlan and Peel, 2000;
Schlattmann, 2009). Finite mixture models can be seen as a compromise between a simple
parametric model and a non-parametric approach. Moreover, these models allow to account
for unobserved heterogeneity due to latent sub-populations, often called latent classes.
In our application the response of interest is the number of credits gained by university
1
freshmen during the first year. This is a count variable with a maximum of sixty common
to all students, thus the binomial distribution is a natural candidate. However, the large
number of zeroes and the multi-modality of the observed distribution call for a flexible model,
such as a binomial finite mixture. This model should be considered as an approximation
to the observed distribution and it is not intended to be an accurate representation of the

processes of credits accumulation, which would require a more complex model taking into
account that each student can choose her own exam sequence and that exams have different
success rates.
Indeed, the main purpose of the analysis is to identify predictors of student performance
among background characteristics and the results of a pre-enrolment test. To this end we
use a concomitant variable approach (Dayton and Macready, 1988).

In applied research there are many examples of finite mixture models for unbounded
count data, where the outcome distribution is assumed to be Poisson or negative binomial.
In case of bounded counts, the usual strategy is to use a truncated Poisson distribution
(Saffari et al., 2013), whereas binomial finite mixture models are rarely used. The only
two examples we found are: a study of fetal deaths in litters by Brooks et al. (1997), who
apply different types of finite mixture models without covariates; and a study of welfare
participation by Melkersson and Saarela (2004), who specify a hurdle model with a binomial
finite mixture for non-zero counts. In our application the zero counts are not modelled
separately, so there is not a hurdle stage. Moreover, Melkersson and Saarela (2004) adopt a
mixture regression approach where the covariates affect the binomial probabilities, while we
adopt a concomitant variable approach where the covariates affect the mixture probabilities.
The application of binomial mixtures raises several issues that must be carefully ad-
dressed. Most of the issues are common to all mixture models, such as the choice of the
number of components, whereas other issues pertain to models with concomitant variables,
such as the strategy to summarize the effects of the covariates. In fact, in models with
2
concomitant variables the effects are difficult to interpret and often results are presented in
a hasty way, so that the information content of the model is largely unexploited. The results
will be effectively presented in form of tables and graphs for the component probabilities and
the expected responses, considering some relevant student profiles. Moreover, the predictive
performance of the model will be evaluated via cross-validation techniques.

The structure of the paper is as follows. Section 2 outlines the binomial finite mixture
model with concomitant variables and reviews some of the methods for selecting the number
of components. Section 3 describes the data and discusses model specification. Section 4
illustrates the results through numerical and graphical summaries, including the assessment
of the predictive ability. Section 5 concludes.
2 Finite mixture models for binomial counts
Let us consider a discrete random variable yi observed on a random sample of subjects
i = 1, . . . , n. A finite mixture model for yi assumes that its mass distribution function P (yi )
is defined by a finite mixture of conditional distributions P (yi | ui ), where ui is a categorical

latent variable taking values k = 1, . . . , K with prior probabilities πk = P (ui = k), where

πk > 0 and K k=1 πk = 1:
K
P (yi ) = πk P (yi | ui = k). (1)
k=1
In this paper we assume that all the conditional distributions P (yi | ui ) are binomial with
common number of trials t and component-specific probabilities of success θk :

t
P (yi | ui = k) = θkyi (1 − θk )t−yi . (2)
yi
Titterington et al. (1985) show that, in general, finite mixtures of distributions of the expo-
nential family are identified, even if for the binomial distribution the number of components
K should be limited with respect to the number of trials t. In particular, the K-component
3
1
binomial mixture model (1) with 0 < θk < 1 is identifiable if and only if K ≤ 2
(t + 1)
(McLachlan and Peel, 2000).
A common interpretation of the latent variable ui is in terms of latent classes, namely
the population is assumed to be partitioned into K latent classes, where ui = k for subject
i belonging to the k-th latent class. Thus, the prior probability πk corresponds to the
proportion of subjects in the k-th latent class (class size).
The covariates can enter a finite mixture model in two ways: through the conditional
distributions P (yi | ui ), yielding a Mixture Regression Model (Wedel and DeSarbo, 1995),
and through the component probabilities πk , yielding a Concomitant variable mixture model
(Dayton and Macready, 1988). The mixture regression approach allows the relationship
between the response variable and the covariates to differ across the latent classes. This
approach is not suitable in our application on university credits, where mixture modelling
is a way to account for the multi-modality of the response variable. Moreover, our interest
lies in predicting the performance of a student, which requires computing the component
probabilities using the available covariates. We therefore rely on the concomitant variable
approach.
In a concomitant variable mixture model the component probabilities of the finite mixture
vary across subjects according to a vector of covariates zi (usually including a constant for
the intercept):

K
P (yi | zi ) = πk|zi P (yi | ui = k), (3)
k=1
K
where πk|zi = P (ui = k | zi ), with πk|zi > 0 and k=1 πk|zi = 1 for any subject i. Such
constraints are satisfied by any model for nominal variables, like the multinomial logit model:
exp(zi β k )
πk|zi = K
, k = 1, . . . , K, (4)
l=1 exp(zi β l )
with β 1 = 0 for model identifiability. Therefore, the prior probabilities of class membership
depend on the covariates zi through a non-linear function.
4
The concomitant variable model (3) involves two sets of parameters: θ1 , . . . , θK in the
binomial mass function (2) and β 2 , . . . , β K in the multinomial model (4). The model is
identified if the matrix of the covariates is of full rank, in addition to the condition on the
number of components, K ≤ 12 (t + 1) (Wang et al., 1996). For given K, the parameters
can be estimated with Maximum Likelihood using the EM algorithm (McLachlan and Peel,
2000).
Class membership can be predicted by assigning each unit to the class with the highest
probability, using either the prior probabilities πk|zi = P (ui = k | zi ) or the posterior prob-
abilities P (ui = k | yi , zi ), derived by means of Bayes rule. When the aim is to classify the
sample units, the prediction is usually based on posterior probabilities. In our application,
however, we are interested in predicting the number of gained credits for a hypothetical new
student on the basis of the available covariates. In other words, we aim at making out-
of-sample predictions. Thus, in order to predict the response y∗ for a hypothetical subject
with covariates z∗ , we rely on the expected value based on prior, rather than posterior, class
membership probabilities (marginal mean prediction):

K
K
E(y∗ | z∗ ) = πk|z∗ E(y∗ | u∗ = k) = t πk|z∗ θk , (5)
k=1 k=1
where the last equality follows from the assumption of binomial components. The predicted
value ŷ∗ is obtained by plugging the estimated parameters into equation (5).
To asses the predictive ability of the estimated model, we can compare the observed
responses yi with the corresponding predictions ŷi for the sample individuals (i = 1, . . . , n).
The prediction error can be summarized in many ways, e.g. by the mean absolute error
(MAE):
1
n
M AE = | ŷi − yi | . (6)
n i=1
Cross-validation techniques (Hastie et al., 2009) can be used to obtain a more reliable value
of MAE as a measure of the performance of out-of-sample prediction.
5
The choice of the number of mixture components (latent classes) is a critical issue: even if
models with different values of K are nested, the Likelihood Ratio Test (LRT) does not have
the standard chi-square distribution since the regularity conditions are not met. In applied
research, the issue is usually solved by comparing models via information criteria, such as
BIC and AIC and their modifications, though the methodological literature suggests to use
statistical tests. Here we consider two tests: the LRT, whose asymptotic distribution may
be approximated by parametric bootstrap (McLachlan, 1987); and the EM-test recently
proposed by Li and Chen (2010). The EM-test compares a finite mixture model with K
components with a model having more than K components. The test statistic is a penalized
version of the LRT based on a few EM iterations. Li and Chen (2010) show that, under
weak conditions, the limiting distribution of the test statistic under the null hypothesis is a
mixture of a mass point in zero and several χ2 distributions.
Nylund, Asparouhov and Muthén (2007) perform a simulation study comparing various
methods for choosing the number of latent classes. The authors conclude that BIC is the
best information criterion, while bootstrap LRT is the best test (but the EM-test was not
considered).
3 Data description and model specification
We analyze data on 690 freshmen of the School of Economics in Florence in a.y. 2008/2009,
considering the students who took the compulsory pre-enrolment test in September 2008.
The aim is to evaluate their performance in terms of gained credits after one year, whose
distribution is shown in Figure 1. The number of credits ranges from 0 to 60 by 3, namely

{0, 3, 6, . . . , 60}. The sample distribution has a small percentage at the maximum (0.75% of
freshmen gained 60 credits), but it has a peak at the minimum (23% of freshmen did not
gain any credit). Therefore, the phenomenon is characterized by a relevant left censoring
6
that needs to be accounted by the model. Moreover, the distribution of positive credits is
quite irregular, showing peaks at 6, 15, 24, 36 and 45 credits. This pattern results from the
paths followed by students, who can take exams yielding 6, 9 or 12 credits. The distribution
of positive credits has a median of 30 and a mean of 29.8.
25
2015
percent
10
5
0
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60
Gained credits after one year
Figure 1: Number of gained credits after one year. Freshmen of the School of Economics of
the University of Florence a.y. 2008/09.
The choice of a parametric distribution for gained credits is challenging due to the excess
zeroes and the multi-modality of the observed distribution. A natural approach is to use
a finite mixture of binomial distributions, yielding a multi-modal distribution with limited
support. Since each exam gives a number of credits multiple of 3, we define the response
variable as the number of gained credits divided by 3. In this way, the response variable
ranges from 0 to 20, thus it can be modelled with a binomial distribution with a number of
trials t = 20. The finite mixture approach automatically solves the issue of excess zeroes,
since they are captured by a component with a very low success probability. Note that zero-
inflated models (Hall, 2000) and hurdle models (Saffari et al., 2013) solve the issue of excess
zeroes, but they cannot easily account for the multi-modality of the positive counts. For
example, the hurdle approach could be generalized to allow for multi-modality by modelling
7
the positive counts through a mixture of shifted binomial distributions or a mixture of trun-
cated Poisson distributions (Böhning and Kuhnert, 2006), even if in the present application
the Poisson distributions should be truncated also on the right tail.
The prediction of gained credits may be improved by exploiting background information
and test results through the concomitant variable mixture model (3). The available covariates
are:
• Background variables: Gender, Far-away resident (indicator for residence in the provinces
of Massa-Carrara and Grosseto or in a province out of Tuscany), Type of high school
(HS type: Scientific, Humanities, Technical, Other), High school irregular career (HS
irreg. career: indicator for age at high school diploma > 19), High school grade (HS
grade: from 60 to 100, centered at 80);
• Pre-enrolment test scores: Total test score, Partial test scores (Logic, Reading, Math-
ematics).
A summary of the number of gained credits by background characteristics is reported in

Table 1. Note that the last column reports the average number of gained credits for students
gaining at least one credit. The most important predictors seem to be the irregular career
and the high school grade. The type of school plays a role in predicting students who did
not gain any credit.
The pre-enrolment test is a compulsory test for evaluating the abilities of the candidates
wishing to enrol in one of the degree programs of the School of Economics of the University
of Florence: Management, Economics, Tourism, Marketing and Statistics. The test is based
on 40 multiple-choice items covering 3 areas: Logic (12 items), Reading (10 items) and
Mathematics (18 items). For each item, one out of 5 alternatives is correct, with the following
scoring system: 1 if correct, 0 if blank, −0.25 if wrong. Thus the total score ranges from
−10 to 40. The threshold for passing the test is fixed at 9: candidates with a lower total
8
Table 1: Freshmen’s performance after one year by background characteristics. School of
Economics of the University of Florence a.y. 2008/09.
Variable N % with credits Avg. Credits
=0 >0 (credits > 0)
Gender
Male 357 23.8 76.2 28.7
Female 333 22.2 77.8 30.9
Far-away resident
No 643 22.7 77.3 29.8
Yes 47 27.7 72.3 29.5
HS type
Scientific 254 19.3 80.7 30.7
Humanities 53 17.0 83.0 30.3
Technical 274 25.2 74.8 30.6
Other 109 29.4 70.6 24.9
HS irreg. career
No 605 19.8 80.2 30.6
Yes 85 45.9 54.1 20.6
HS grade
≤ 80 434 27.0 73.0 26.3
> 80 256 16.4 83.6 35.0
All 690 23.0 77.0 29.8
score are advised against enrollment, so that they could still enrol in a degree program of
the School of Economics, but they are allowed to take examinations only after passing the
test during one of the later editions.
The number of gained credits is strongly related to the test result, as shown by Table
2. Overall, the percentage of students gaining credits is 77.0% with a mean of 29.8 credits
out of 60. The performance is worse for students who did not pass the test (58.6% gained
credits, with a mean of 23.5) and it is better for students passing the test with a score above
the median (85.8% gained credits, with a mean of 33.7).

In the analysis we do not use the total score, but the three partial scores in Logic,
Reading and Math. In fact, we are interested in evaluating the role of each of the three
areas in predicting the student performance. For comparison purposes, we use standardized
partial scores.
9
Table 2: Freshmen’s performance after one year by test result. School of Economics of the
University of Florence a.y. 2008/09.
Test result N % with credits Avg. Credits
=0 >0 (credits > 0)
Not passed (< 9) 111 41.4 58.6 23.5
Passed below median (9 − 16.25) 297 24.6 75.4 27.4
Passed above median (> 16.25) 282 14.2 85.8 33.7
All 690 23.0 77.0 29.8
The relationships between gained credits and partial scores are summarized by the corre-
sponding simple regression coefficients (Logic 3.49, Reading 4.35 and Math 5.67). However,
due to the correlations among the partial scores, the multiple regression coefficients give a
somewhat different picture (Logic 0.63, Reading 3.06 and Math 4.70): the score in Logic has
a little role in predicting gained credits once Math and Reading scores are known.
The use of test scores as concomitant variables allows us to assess the predictive power
of the test in terms of number of gained credits, thus establishing if the pre-enrolment test
is an effective tool for student evaluation in addition to background characteristics of the

candidates already available from administrative records.
4 Results
Let us define the response variable for the i-th freshmen as yi = creditsi /3, where creditsi
is the number of gained credits at the end of the first year. Conditionally on the latent
class ui = k, we assume that yi follows a binomial distribution with number of trials t = 20
and class-specific success probability θk . The marginal distribution of yi is given by the

concomitant variable finite mixture model (3) with the multinomial logit model (4) for the
component probabilities πk|zi . The model is fitted with maximum likelihood using the Syntax
module of Latent Gold (Vermunt and Magidson, 2008).
In model (3) the covariates affect the latent class probabilities, but they do not affect the
10
class-specific distribution. Therefore, in order to choose the number K of latent classes, we
first fit the model without covariates. Once the number of latent classes has been chosen,
the covariates are selected using the standard Wald test.
The number of components K of the finite mixture binomial model without covariates is
selected according to the BIC, the bootstrap LRT, and the EM-test of Li and Chen (2010).
The BIC and the bootstrap LRT are obtained from Latent Gold, whereas the EM-test is
performed using the R code embinom.R developed by Pengfei Li 1 .
The results are reported in Table 3 for K = 1, . . . , 6: the three criteria agree in selecting
a model with K = 5. Regarding the EM-test, under the null hypothesis the test statistic
has a positive probability of being equal to 0, as for the case K = 5 in Table 3. This means
that there is no evidence to reject the null hypothesis, suggesting that the model with K = 5
components provides good fitting to the data.
Table 3: Selection of the number of components K in the binomial mixture model without
concomitant variables.
Number Number Log-likelihood BIC LRT statistic EM-test
comp. param. (bootstrap p-value) statistic (p-value)
1 1 -4045.3 8097.1 3539.09 (0.0000) 3538.3 (0.0000)
2 3 -2275.8 4571.1 567.98 (0.0000) 700.3 (0.0000)
3 5 -1991.8 4016.2 135.50 (0.0000) 153.2 (0.0000)
4 7 -1924.0 3893.8 18.31 (0.0000) 17.5 (0.0001)
5 9 -1914.9 3888.5 0.03 (0.2640) 0.0 (1.0000)
6 11 -1914.8 3901.6
Table 4 reports the results for the binomial mixture model without concomitant variables
for K = 5 components. The first component has a proportion π̂1 = 0.22 and a probability of
success θ̂1 near zero, thus yielding an almost degenerate distribution with mass concentrated
in zero. The other four components, whose distributions are depicted in Figure 2, correspond
to latent classes of students with increasing performance in terms of gained credits (the
expected number of credits are 9, 23, 39 and 51, respectively). Table 4 also reports two
1
Downloadable from http://www.math.uwaterloo.ca/ p4li/software/index.htmltest
11
relevant conditional probabilities: the probability of getting zero credits, P (credits = 0|u =
k), and the probability of getting all or almost all of the sixty planned credits, P (credits ≥
54|u = k).
Table 4: Fitted binomial mixture model without concomitant variables for K = 5 compo-
nents.
Component πk θk E(credits|u = k) P (credits = 0|u = k) P (credits ≥ 54|u = k)
1 0.22 0.00 0 1.000 0.000
2 0.15 0.14 9 0.045 0.000
3 0.25 0.39 23 0.000 0.000
4 0.28 0.65 39 0.000 0.012
5 0.10 0.85 51 0.000 0.381
The predicted marginal probabilities P (credits = c) are obtained by plugging parameter

estimates into equations (1) and (2), recalling that y = credits/3. It is worth noting that
the model yields an excellent fit of the proportions at the extremes of the distribution. In
fact, Table 4 shows that the probability of gaining zero credits is not negligible only for
the first two components, so that P (credits = 0) ≈ 0.22 × 1.000 + 0.15 × 0.045 = 0.230,
equal to the observed proportion. Therefore, the fitted model adequately accounts for excess
zeroes. As for the right tail, the probability of gaining at least 54 credits (i.e. at most
one exam of 6 credits left out) is not negligible only for the last two components, so that
P (credits ≥ 54) ≈ 0.28 × 0.012 + 0.10 × 0.381 = 0.040, close to the observed proportion
0.046.
The five components of the fitted mixture model all have a relevant size and they are
well separated (see Table 4 and Figure 2, where the first component is not represented since
it has almost all the mass in zero). Thus, the multi-modal distribution of gained credits
depicted in Figure 1 is adequately approximated by the 5-component binomial mixture.

In order to predict the number of gained credits, we exploit the background variables
and the test scores to characterize the five components via the concomitant variable mixture
model of equations (2) to (4). To select the covariates, we first fit the full model, then we
12
0.25
0.20
P(credits|u=k)
0.15
k=2
k=3
k=4
0.10
k=5
0.05
0.00
0 6 12 18 24 30 36 42 48 54 60
Gained credits
Figure 2: Distribution of components 2 to 5 of the fitted binomial mixture model of Table 4
remove not significant covariates. Specifically, for each covariate s, we perform the multiple
Wald test for H0 : β2s = β3s = β4s = β5s = 0 and discard the covariate when the p-value is
higher than 5%.
For the final model, Table 5 shows the parameter estimates and p-values of the multiple
Wald test for the covariates. The p-value of the standardized Logic test score is slightly higher
than 5%, but we retained it for comparability with the other test scores. As expected, the
estimates of the binomial probabilities are nearly identical to those of the model without
covariates (see Table 4).
The univariate Wald test for the k-th component (k = 2, 3, 4, 5) allows us to test if a
given covariate affects the odds comparing the k-th component with the first one (i.e. the
zero-credit latent class). Considering only significant coefficients, we see that students from
Technical or other high schools and students with irregular career have a lower performance
in terms of gained credits, while students with a good high school grade and students with
high scores in Reading and Math have a better performance.
13
Table 5: Binomial mixture model with concomitant variables: parameter estimates and
p-values of the multiple Wald test for the covariates.
Latent class p-value
1 2 3 4 5
Binomial probability θk 0.00 0.15 0.38 0.64 0.85 -
Multinomial logit model† for πk
Constant - -0.03 0.22 0.96 -0.57 0.000
HS Technical/other - -0.63 0.18 -0.40 -1.43 0.013
HS irregular career - -0.39 -0.79 -3.08 -0.57 0.012
HS grade - -0.01 0.01 0.06 0.12 0.000
Logic (std score) - -0.11 0.21 0.26 -0.34 0.052
Reading (std score) - 0.51 0.33 0.29 0.79 0.001
Math (std score) - -0.09 0.00 0.25 1.10 0.000
†
Estimates are in italic when the p-value of the univariate Wald test is < 0.05.
In general in a multinomial model, the effect of covariates on the probabilities πk is
not linear and it can even be not monotone. In order to better understand the effect of
covariates, it is useful to transform coefficients into probabilities and credits. To this end,
Table 6 reports the predicted latent class probabilities and expected number of gained credits
for several student profiles. The expected number of credits for latent class k is obtained
from the corresponding binomial distribution as 60 × θk . These values are reported in the
last row of Table 6, whereas the expectations in the last column are the model predictions

obtained as weighted means, namely k πk (60 × θk ).
Table 6: Binomial mixture model with concomitant variables: predicted latent class proba-
bilities and expected number of gained credits.
Latent class Expected
1 2 3 4 5 n. of credits
Predicted probabilities
Baseline student† 0.16 0.15 0.20 0.41 0.09 25.8
HS Technical/other 0.20 0.11 0.31 0.36 0.03 22.9
HS irregular career 0.38 0.25 0.21 0.04 0.12 14.8
HS grade 60 (min) 0.24 0.30 0.26 0.19 0.01 16.5
HS grade 100 (max) 0.06 0.04 0.08 0.47 0.35 37.8
Weak student‡ 0.48 0.22 0.28 0.01 0.00 9.0
Expected number of credits
0.0 8.8 22.7 38.1 50.8
†
Baseline: HS Scientific/Humanities, regular career, HS grade=80, test scores=0.
‡
Weak: HS Technical/other, Irregular career, HS grade=60, test scores=0.
14
The baseline student is defined by having all the covariates equal to zero, namely she
comes from a Scientific/Humanities high school, with regular career and a grade at the mid-
point (80), and she obtained average test scores. This student has a low probability to be in
the zero-credit latent class and a high probability to be in the fourth one, with an expected
number of credits equal to 25.8. In Table 6, the four rows below the baseline student refer
to profiles that differ from the baseline by changing the covariates one at a time. Students
from Technical or other schools have a higher probability to be in the zero-credit class and
a lower probability to be in the top class, but overall the difference in terms of expected
number of credits is small (−2.9 credits). Students with irregular high school career have
a remarkably large probability to be in the zero-credit class and they have a substantially
lower expected number of credits (−11.0 credits). The high school grade is a good predictor
of student’s performance: the expected number of credits ranges from 16.5 for a grade at
the minimum to 37.8 for a grade at the maximum.
The weak student has the most unfavorable background characteristics, i.e. she comes
from a Technical or other high school, with irregular career and a grade at the minimum
(60). This kind of student has nearly fifty percent probability to be in the zero-credit class
and an almost null probability to be in the two top classes; as a consequence, the expected
number of gained credits is only 9.

To interpret the effect of the standardized test scores on gained credits, we compute the
the expected number of credits for the baseline student by varying the scores one at a time
in a grid between −3 and +3. Figure 3 reports the three curves, showing non-linear patterns
that would be difficult to figure out without transforming the estimated coefficients. The
Logic score has a negligible effect, whereas the Reading score has a small positive effect. The
Math score has the largest effect, especially for high scores (note the asymmetry in Figure
3): given the background characteristics and the other test scores, a high Math score is
associated with a substantial increase of gained credits.
15
45
40
35
Expected number of credits

30
25
20
15
10
0
-3 -2 -1 0 1 2 3
Standardized test scores
Reading (std score) Math (std score) Logic (std score)
Figure 3: Expected number of gained credits by test scores (the value in zero refers to the
baseline student).
Another interesting aspect is the probability of belonging to the zero-credit latent class.
Such probability strongly depends on background characteristics, for example it is 0.16 for
the baseline student and 0.48 for the weak student (see Table 6). Figure 4 reports the
probability of belonging to the zero-credit latent class for the weak student by varying the
scores one at a time in a grid between −3 and +3. It is worth to note that the Math and
Logic scores do not have any appreciable effect in reducing the size of the zero-credit class,
whereas the Reading score has a strong effect: the probability that a weak student falls in
the zero-credit class ranges from 0.78 for a Reading score equal to −3 to 0.21 for a Reading
score equal to +3. Thus, given the background characteristics and the other test scores, a
high Reading score is associated with a substantial increase in the probability of a successful
start-up of the university career.

To evaluate the predictive performance of the concomitant variable mixture model we
compute the predicted values ŷi from equation (5), using 10-fold cross-validation (Hastie et
al., 2009). The prediction errors (ŷi − yi ) have a mean close to zero and a nearly symmetric
distribution, with quartiles equal to Q1 = −12.4, Q2 = −0.4 and Q3 = 11.4. The Mean
16
1.0
0.9
ProbabilityofbelongingtothezeroͲcreditlatentclass
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
Ͳ3 Ͳ2 Ͳ1 0 1 2 3
Standardizedtestscores
Reading(stdscore) Math(stdscore) Logic(stdscore)
Figure 4: Probability of belonging to the zero-credit latent class by test scores (the value in
zero refers to the weak student).
Absolute Error (MAE) is 13.0, which is similar to the one computed on the whole sample
(12.7), therefore in this application the advantage of using cross-validation is negligible.
Moreover, to asses the predictive power of covariates, we compare the average predic-
tion errors for the following nested models: model without covariates (MAE=15.7), model
with only background characteristics (MAE=13.3, reduction of 15%), and full model with
background characteristics and test scores (MAE=12.7, further reduction of 4%). Therefore,
in terms of prediction ability, the background characteristics give a relevant contribution.

The pre-enrolment test yields a further improvement, even if the predictive ability remains
modest.
5 Concluding remarks
The paper has presented a detailed data analysis based on a binomial mixture model with
concomitant variables. In our application on the number of gained university credits, the
binomial mixture model proved to be a flexible tool to model a complex response variable,
which is a bounded count characterized by a peak in zero and several modes.
17
The paper addressed the controversial issue of the choice of the number of components
using two standard methods, namely the BIC index and the bootstrap LRT, and a recently
proposed EM-test. In this application, the three methods led to the same conclusion, though
further research is needed to compare the EM-test with standard procedures.
The results of the concomitant variable mixture model have been effectively presented
by means of tables and graphs, based on converting regression coefficients into predicted
component probabilities and expected response for a set of student profiles defined by the
covariates. In this way the information content of the model has been well exploited.
The predictive performance of the model has been evaluated by means of cross-validation,
showing that background characteristics and test scores help predicting gained credits. Fur-
ther work should be done on the development of suitable diagnostic tools based on generalized
residuals, starting from the work of Wang et al. (1996) on Poisson mixture models.
The analysis on gained university credits confirmed the predictive role of background
characteristics such as the high school type and grade, and the regularity of the school
career. Moreover, the analysis showed that the pre-enrolment test designed by the School
of Economics of the University of Florence gives additional information. Thus, the test
results can be effectively added to background characteristics to yield valuable indications
for student tutoring: in particular, a low Reading score is related to a difficult start-up, while
a low Math score is related to a slow progression.
References
Böhning, D., Kuhnert, R. (2006). Equivalence of Truncated Count Mixture Distributions
and Mixtures of Truncated Count Distributions. Biometrics, 62:1207–1215.
Brooks, S. P., Morgan, B. J. T., Ridout, M. S., Pack, S. E. (1997). Finite Mixture Models
for Proportions. Biometrics, 53:1097-1115.
Dayton, C. M., Macready, G. B. (1988). Concomitant-Variable Latent-Class Models. Journal

of the American Statistical Association,83: 173-178.
18
Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: a case
study. Biometrics, 56:1030–1039.
Hastie, T., Tibshirani, R., Friedman, J. (2009). The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. Second Edition. Springer.
Li, P., Chen, J. (2010). Testing the order of a finite mixture model. Journal of the American
Statistical Association, 105:1084-1092.
McLachlan, G. (1987). On Bootstrapping the Likelihood Ratio Test Statistic for the Number
of Components in a Normal MixtureTesting the order of a finite mixture model. Applied
Statistics, 36:318-324.
McLachlan, G., Peel, D. (2000). Finite Mixture Models. New York: Wiley.
Melkersson, M., Saarela, J. (2004). Welfare Participation and Welfare Dependence among
the Unemployed. Journal of Population Economics, 17:409-431.
Nylund, K. L., Asparouhov, T., Muthén, B. O. (2007). Deciding the number of classes in
latent class analysis and growth mixture modeling: a Monte Carlo Simulation Study,
Structural Equation Modeling, 14:535-569.
Saffari, S.E., Adnan, R., and Greene, W. (2012). Investigating the impact of excess zeros
on hurdle-generalized Poisson regression model with right censored count data, Statistica
Neerlandica, 67:67-80.
Schlattmann, P. (2009). Medical Applications of Finite Mixture Models. Berlin: Springer-

Verlag.
Titterington, D. M., Smith, A. F. M., Makov, U. E. (1985). Statistical Analysis of Finite

Mixture Distributions. New York: Wiley.
Vermunt, J. K., Magidson, J. (2008). LG-Syntax users guide: Manual for Latent GOLD 4.5
Syntax Module. Belmont, MA: Statistical Innovations Inc.
Wang, P., Puterman, M. L., Cockburn, I., Le, N. D. (1996). Mixed Poisson regression models
with covariate dependent rates. Biometrics, 52:381-400.
Wedel, M., DeSarbo, W. S. (1995). A Mixture Likelihood Approach for Generalized Linear
Models. Journal of Classification, 12:21-55.
19
View publication stats

University Credits Modelled with Binomial Mixture

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

University Credits Modelled with Binomial Mixture

Diunggah oleh

Hak Cipta:

Format Tersedia

See discussions, stats, and author proﬁles for this publication at: https://www.researchgate.

Binomial Mixture Modeling of University Credits

Article in Communication in Statistics- Theory and Methods · January 2013

Leonardo Grilli Carla Rampichini

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded ﬁle.

Corresponding author Leonardo Grilli Department of Statistics, Computer Science, Appli-

Pre-print version of the article to appear in 2013 in

distribution. Moreover, we rely on a concomitant variable speciﬁcation to investigate the role

of student background characteristics and of a compulsory pre-enrolment test in predicting

such as a binomial ﬁnite mixture. This model should be considered as an approximation

to the observed distribution and it is not intended to be an accurate representation of the

use a concomitant variable approach (Dayton and Macready, 1988).

performance of the model will be evaluated via cross-validation techniques.

of the predictive ability. Section 5 concludes.

2 Finite mixture models for binomial counts

Let us consider a discrete random variable yi observed on a random sample of subjects

is deﬁned by a ﬁnite mixture of conditional distributions P (yi | ui ), where ui is a categorical

common number of trials t and component-speciﬁc probabilities of success θk :

A common interpretation of the latent variable ui is in terms of latent classes, namely

number of components, K ≤ 12 (t + 1) (Wang et al., 1996). For given K, the parameters

membership probabilities (marginal mean prediction):

be approximated by parametric bootstrap (McLachlan, 1987); and the EM-test recently

3 Data description and model specification

distribution is shown in Figure 1. The number of credits ranges from 0 to 60 by 3, namely

of positive credits has a median of 30 and a mean of 29.8.

the Poisson distributions should be truncated also on the right tail.

The prediction of gained credits may be improved by exploiting background information

of Massa-Carrara and Grosseto or in a province out of Tuscany), Type of high school

grade: from 60 to 100, centered at 80);

A summary of the number of gained credits by background characteristics is reported in

test during one of the later editions.

the median (85.8% gained credits, with a mean of 33.7).

is an eﬀective tool for student evaluation in addition to background characteristics of the

class ui = k, we assume that yi follows a binomial distribution with number of trials t = 20

and class-speciﬁc success probability θk . The marginal distribution of yi is given by the

module of Latent Gold (Vermunt and Magidson, 2008).

the covariates are selected using the standard Wald test.

performed using the R code embinom.R developed by Pengfei Li 1 .

The predicted marginal probabilities P (credits = c) are obtained by plugging parameter

depicted in Figure 1 is adequately approximated by the 5-component binomial mixture.

Figure 2: Distribution of components 2 to 5 of the ﬁtted binomial mixture model of Table 4

covariates (see Table 4).

high scores in Reading and Math have a better performance.

In general in a multinomial model, the eﬀect of covariates on the probabilities πk is

the minimum to 37.8 for a grade at the maximum.

number of gained credits is only 9.

Expected number of credits

start-up of the university career.

in terms of prediction ability, the background characteristics give a relevant contribution.

further research is needed to compare the EM-test with standard procedures.

results can be eﬀectively added to background characteristics to yield valuable indications

Dayton, C. M., Macready, G. B. (1988). Concomitant-Variable Latent-Class Models. Journal

Schlattmann, P. (2009). Medical Applications of Finite Mixture Models. Berlin: Springer-

Titterington, D. M., Smith, A. F. M., Makov, U. E. (1985). Statistical Analysis of Finite

View publication stats

Anda mungkin juga menyukai