Action M Agency
Prague, 2006
Typeset in LATEX.
The cover page and PGM logo: Jir Vomlel
Graphic art: Karel Solark
ISBN 80-86742-14-8
Preface
These are the proceedings of the third European workshop on Probabilistic Graphical Models
(PGM06) to be held in Prague, Czech Republic, September 12-15, 2006. The aim of this series
of workshops on probabilistic graphical models is to provide a discussion forum for researchers
interested in this topic. The first European PGM workshop (PGM02) was held in Cuenca, Spain
in November 2002. It was a successful workshop and several of its participants expressed interest in
having a biennial European workshop devoted particularly to probabilistic graphical models. The
second PGM workshop (PGM04) was held in Leiden, the Netherlands in October 2004. It was
also a success; more emphasis was put on collaborative work, and the participants also discussed
how to foster cooperation between European research groups.
There are two trends which can be observed in connection with PGM workshops. First, each
workshop is held during an earlier month than the preceding one. Indeed, PGM02 was held in
November, PGM04 in October and PGM06 will be held in September. Nevertheless, I think this
is a coincidence. The second trend is the increasing number of contributions (to be) presented. I
would like to believe that this is not a coincidence, but an indication of increasing research interest
in probabilistic graphical models.
A total of 60 papers were submitted to PGM06 and, after the reviewing and post-reviewing
phases, 41 of them were accepted for presentation at the workshop (21 talks, 20 posters) and
appear in these proceedings. The authors of these papers come from 17 different countries, mainly
European ones.
To handle the reviewing process (from May 15, 2006 to June 28, 2006) the PGM Program
Committee was considerably extended. This made it possible that every submitted paper was
reviewed by 3 independent reviewers. The reviews were handled electronically using the START
conference management system.
Most of the PC members reviewed around 7 papers. Some of them also helped in the post-
reviewing phase, if revisions to some of the accepted papers were desired. Therefore, I would like
to express my sincere thanks here to all of the PC members for all of the work they have done
towards the success of PGM06. I think PC members helped very much to improve the quality of
the contributions. Of course, my thanks are also addressed to the authors.
Further thanks are devoted to the sponsor, the DAR UTIA research centre, who covered the
expenses related to the START system. I am also indebted to the members of the organizing
committee for their help, in particular, to my co-editor, Jirka Vomlel.
My wish is for the participants in the PGM06 workshop to appreciate both the beauty of
Prague and the scientific program of the workshop. I believe there will be a fruitful discussion
during the workshop and I hope that the tradition of PGM workshops will continue.
Additional Reviewers
Alessandro Antonucci IDSIA Lugano, Switzerland
Janneke Bolt Utrecht University, Netherlands
Julia Flores University of Castilla La Mancha, Spain
Marcel van Gerven Radboud University Nijmegen, Netherlands
Yimin Huang University of South Carolina, USA
Sren H. Nielsen Aalborg University, Denmark
Jana Novovicova Academy of Sciences, Czech Republic
Valerie Sessions University of South Carolina, USA
Petr Simecek Academy of Sciences, Czech Republic
Marta Vomlelova Charles University, Czech Republic
Peter de Waal Utrecht University, Netherlands
Changhe Yuan University of Pittsburgh, USA
Organizing Committee
Chair:
Radim Jirousek Academy of Sciences, Czech Republic
Committee members:
Milena Zeithamlova Action M Agency, Czech Republic
Marta Vomlelova Charles University, Czech Republic
Radim Lnenicka Academy of Sciences, Czech Republic
Petr Simecek Academy of Sciences, Czech Republic
Jir Vomlel Academy of Sciences, Czech Republic
Contents
Some Variations on the PC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Joaqun Abellan, Manuel Gomez-Olmedo, and Serafn Moral
The Independency tree model: a new approach for clustering and factorisation . . . . . . . . . . . . . . . . . 83
M. Julia Flores, Jose A. Gamez, and Serafn Moral
Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets . . . . . . . . . . . . . . . . 91
Olivier C.H. Francois and Philippe Leray
Dependency networks based classifiers: learning models by using independence tests . . . . . . . . . . 115
Jose A. Gamez, Juan L. Mateo, and Jose M. Puerta
Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials . . . . . . . . 123
Jose A. Gamez, Rafael Rum, and Antonio Salmeron
Decision analysis with influence diagrams using Elviras explanation facilities . . . . . . . . . . . . . . . . . 179
Manuel Luque and Francisco J. Dez
Dynamic importance sampling in Bayesian networks using factorisation of probability trees . . . 187
Irene Martnez, Carmelo Rodrguez, and Antonio Salmeron
Abstraction and Refinement for Solving Continuous Markov Decision Processes. . . . . . . . . . . . . . .263
Alberto Reyes, Pablo Ibarguengoytia, L. Enrique Sucar, and Eduardo Morales
Abstract
This paper proposes some possible modifications on the PC basic learning algorithm and
makes some experiments to study their behaviour. The variations are: to determine
minimum size cut sets between two nodes to study the deletion of a link, to make statistical
decisions taking into account a Bayesian score instead of a classical Chi-square test, to
study the refinement of the learned network by a greedy optimization of a Bayesian score,
and to solve link ambiguities taking into account a measure of their strength. It will be
shown that some of these modifications can improve PC performance, depending of the
objective of the learning task: discovering the causal structure or approximating the joint
probability distribution for the problem variables.
z (Nz + s0 ) x (s00 )
3.3 Refinement Z
If the statistical tests do not make errors and
the faithfulness hypothesis is verified, then PC
algorithm will recover a graph equivalent to the
Figure 3: A simple network
original one, but this can never be assured with
finite samples. Also, even if we recover the origi-
nal graph, when our objective is to approximate make the movement with highest score dif-
the joint distribution for all the variables, then ference while this is positive.
depending of the sample size, it can be more
Refinement can also solve some of the prob-
convenient to use a simpler graph than the true
lems associated with the non verification of
one. Imagine that the variables follow the graph
the faithfulness hypothesis. Assume for exam-
of Figure 2. This graph can be recovered by
ple, that we have a problem with 3 variables,
PC algorithm by doing only statistical indepen-
X, Y, Z, and that the set of independencies is
dence tests of order 0 and 1 (conditioning to
given by the independencies of the graph in Fig-
none or 1 variable). However, when we are go-
ure 3 plus the independence I(Y, Z|). PC al-
ing to estimate the parameters of the network
gorithm will estimate a network, where the link
we have to estimate a high number of probabil-
between Y and Z is lost. Even if the sample
ity values. This can be a too complex model
is large we will estimate a too simple network
(too many parameters) if the database is not
which is not an I-map (Pearl, 1988). If we ori-
large enough. In this situation, it can be rea-
ent the link X Z in PC algorithm, refinement
sonable to try to refine this network, taking into
can produce the network in Figure 3, by check-
account the actual orientation and the size of
ing that the Bayesian score is increased (as it
the model. In this sense, the result of PC al-
should be the case if I(Z, Y |X) is not verified).
gorithm can be used as an starting point for a
The idea of refining a learned Bayesian net-
greedy search algorithm to optimize a concrete
work by means of a greedy optimization of a
metric.
Bayesian score has been used in a different con-
In particular, our proposal is based on the
text by Dash and Druzdzel (1999).
following steps:
3.4 Triangles Resolution
1. Obtain an order compatible with the graph
learned by PC algorithm. Imagine that we have 3 variables, X, Y, Z,
and that no independence relationship involv-
2. For each node, try to delete each one of ing them is verified: each pair of variables is de-
its parents or to add some of the non par- pendent and conditionally dependent giving the
ents preceding nodes as parent, measuring third one. As there is a tendency to decide for
the resulting Bayesian Dirichlet score. We independence when the size of the conditioning
Abstract
The increasing complexity of the models, the abundant electronic literature and the
relative scarcity of the data make it necessary to use the Bayesian approach to complex
queries based on prior knowledge and structural models. In the paper we discuss the
probabilistic semantics of such statements, the computational challenges and possible
solutions of Bayesian inference over complex Bayesian network features, particularly over
features relevant in the conditional analysis. We introduce a special feature called Markov
Blanket Graph. Next we present an application of the ordering-based Monte Carlo method
over Markov Blanket Graphs and Markov Blanket sets.
In the Bayesian approach to a structural fea- for elementary features (Madigan et al., 1996;
ture F with values F (G) {fi }R i=1 we are Friedman and Koller, 2000). This paper ex-
interested in the feature posterior induced by tends these results by investigating Bayesian in-
the model posterior given the observations DN , ference about BN features with high-cardinality,
where G denotes the structure of the Bayesian relevant in classification. In Section 1 we
network (BN) present a unified view of BN features enriched
with free-text annotations as a probabilistic
X knowledge base (pKB) and discuss the corre-
p(fi |DN ) = 1(F (G) = fi )p(G|DN ) (1) sponding probabilistic semantics. In Section 2
G
we overview the earlier approaches to feature
The importance of such inference results from learning. In Section 3 we discuss structural BN
(1) the frequently impractically high sample features and introduce a special feature called
and computational complexity of the complete Markov Blanket Graph or Mechanism Bound-
model, (2) a subsequent Bayesian decision- ary Graph. Section 4 discusses its relevance in
theoretic phase, (3) the availability of stochastic conditional modeling. In Section 5 we report an
methods for estimating such posteriors, and (4) algorithm using ordering-based MCMC meth-
the focusedness of the data and the prior on ods to perform inference over Markov Blanket
certain aspects (e.g. by pattern of missing val- Graphs and Markov Blanket sets. Section 6
ues or better understood parts of the model). presents results for the ovarian tumor domain.
Correspondingly, there is a general expectation
1 BN features in pKBs
that for small amount of data some properties
of complex models can be inferred with high Probabilistic and causal interpretations of BN
confidence and relatively low computation cost ensure that structural features can express a
preserving a model-based foundation. wide range of relevant concepts based on condi-
The irregularity of the posterior over the tional independence statements and causal as-
discrete model space of the Directed Acyclic sertions (Pearl, 1988; Pearl, 2000; Spirtes et
Graphs (DAGs) poses serious challenges when al., 2001). To enrich this approach with sub-
such feature posteriors are to be estimated. jective domain knowledge via free-text annota-
This induced the research on the application of tions, we introduce the concept of Probabilistic
Markov Chain Monte Carlo (MCMC) methods Annotated Bayesian Network knowledge base.
Definition 1. A Probabilistic Annotated membership and pairwise precedence was inves-
Bayesian Network knowledge base K for a fixed tigated in (Friedman et al., 1999).
set V of discrete random variables is a first- On the contrary, the Bayesian framework of-
order logical knowledge base including standard fers many advantages such as the normative,
graph, string and BN related predicates, rela- model-based combination of prior and data al-
tions and functions. Let Gw represent a target lowing unconstrained application in the small
DAG structure including all the target random sample region. Furthermore the feature poste-
variables. It includes free-text descriptions for riors can be embedded in a probabilistic knowl-
the subgraphs and for their subsets. We assume edge base and they can be used to induce pri-
that the models M of the knowledge base vary ors for other model spaces and for a subse-
only in Gw (i.e. there is a mapping G M ) quent learning. In (Buntine, 1991) proposed
and a distribution p(Gw |) is available. the concept of a posterior knowledge base con-
A sentence is any well-formed first-order ditioned on a given ordering for the analysis
formula in K, the probability of which is defined of BN models. In (Cooper and Herskovits,
as the expectation of its truth 1992) discussed the general use of the poste-
rior for BN structures to compute the poste-
X rior of arbitrary features. In (Madigan et al.,
Ep(M|K)[M ] = M(G) p(G|K).
1996) proposed an MCMC scheme over the
G
space of DAGs and orderings of the variables
where M(G) denotes its truth-value in the to approximate Bayesian inference. In (Heck-
model M(G). This hybrid approach defines a ermann et al., 1997) considered the applica-
distribution over models by combining a logi- tion of the full Bayesian approach to causal
cal knowledge base with a probabilistic model. BNs. Another MCMC scheme, the ordering-
The logical knowledge base describes the cer- based MCMC method utilizing the ordering of
tain knowledge in the domain defining a set the variables were reported in (Friedman and
of models (legal worlds) and the probabilistic Koller, 2000). They developed and used a
part (p(Gw |)) expresses the uncertain knowl- closed form for the order conditional posteri-
edge over these worlds. ors of Markov blanket membership, beside the
earlier closed form of the parental sets.
2 Earlier works
To avoid the statistical and computational bur- 3 BN features
den of identifying complete models, related lo-
cal algorithms for identifying causal relations The prevailing interpretation of BN feature
were reported in (Silverstein et al., 2000) and in learning assumes that the feature set is sig-
(Glymour and Cooper, 1999; Mani and Cooper, nificantly simpler than the complete domain
2001). The majority of feature learning algo- model providing an overall characterization as
rithms targets the learning of relevant variables marginals and that the number of features and
for the conditional modeling of a central vari- their values is tractable (e.g linear or quadratic
able, i.e. they target the so-called feature sub- in the number of variables). Another interpre-
set selection (FSS) problem (Kohavi and John, tation is to identify high-scoring arbitrary sub-
1997). Such examples are the Markov Blanket graphs or parental sets, Markov blanket sub-
Approximating Algorithm (Koller and Sahami, sets and estimate their posteriors. A set of sim-
1996) and the Incremential Association Markov ple features means a fragmentary representation
Blanket algorithm (Tsamardinos and Aliferis, for the distribution over the complete domain
2003). The subgraphs of a BN as features were model from multiple, though simplified aspects,
targeted in (Peer et al., 2001). The bootstrap whereas using a given complex feature means a
approach inducing confidence measures for fea- focused representation from a single, but com-
tures such as compelled edges, Markov blanket plex point of view. A feature F is complex if
Age 1 0 1 0 1 0 0 0 0.07
P(MB|<,D)
Meno 1 0 1 1 0 1 1 1 0.06
P(MB|<,D,MBM)
PMenoY 0 1 0 0 0 0 0 0 0.05
P(MB|D)
PillUse 1 0 0 0 0 0 0 0 0.04
P(MB|D,MBM)
Bilateral 1 0 1 1 1 1 1 1
0.03
Pain 0 1 0 0 0 0 0 0
0.02
Fluid 1 0 1 0 0 0 0 0
Septum 1 0 1 1 1 1 1 1 0.01
ISeptum 0 0 1 1 1 1 1 1 0
PSmooth 1 0 0 0 0 0 0 0 1 3 5 7 9 11 13 15 17 19
Loc. 1 1 0 0 0 1 0 1
Shadows 1 0 1 1 1 1 1 1
Echog. 1 0 1 1 1 1 1 1 Figure 1: The ranked posteriors and their
ColScore 1 0 1 1 1 1 1 1
PI 1 1 0 0 1 0 0 0 MBM-based approximations of the 20 most
RI 1 0 1 1 1 1 1 1 probable M B(P athology) sets for the sin-
PSV 1 0 1 1 1 1 1 1 gle/unconstrained orderings.
TAMX 1 1 0 1 1 0 0 0
Solid 1 1 1 1 1 0 1 0
FHBrCa 0 0 0 0 0 0 0 1
FHOvCa 0 0 0 1 0 0 0 0 is connected with the annotated BN knowledge
base defined in Def. 1, which allows an offline
exploration of the domain from the point of view
of conditional modeling. The histogram of the
number of parameters and inputs for the MBGs
GM AP = arg max p(G|DN ) (6)
G using only the fourteen most relevant variables
M BG(Y )M AP = arg max p(mbg(Y )|DN ) are reported in Fig. 2.
mbg(Y )
M AP
M B(Y ) = arg max p(mb(Y )|DN )
mb(Y )
600
200
model, because of an additional Age and F luid 300
0
variables in the domain model. The MAP 200
200
MB feature value M B(Y )M AP similarly differs 400
14
models for example w.r.t. the vascularization 100 200 300 400
10
500 600
variables such as P I. Interestingly, the MAP 700 800 8 |Inputs|
|Params|
MB feature value also differs from the MB fea-
ture value defined by the MAP MBG feature
Figure 2: The histogram of the number of pa-
value M B(Y, M BG(Y )M AP ), for example w.r.t.
rameters and inputs of the MAP MBGs.
T AM X, Solid variables. In conclusion these
results together with the comparison against
the simple feature-based analysis such as MBM- 7 Conclusion
based analysis reported in Fig. 1, show the rel-
evance of the complex feature-based analysis. In the paper we presented a Bayesian approach
We also constructed an offline probabilistic for complex BN features, for the so-called
knowledge base containing 104 MAP MBGs. It Markov Blanket set and the Markov Blanket
Abstract
In biomedical domains, free text electronic literature is an important resource for knowl-
edge discovery and acquisition, particularly to provide a priori components for evaluating
or learning domain models. Aiming at the automated extraction of this prior knowledge
we discuss the types of uncertainties in a domain with respect to causal mechanisms,
formulate assumptions about their report in scientific papers and derive generative prob-
abilistic models for the occurrences of biomedical concepts in papers. These results allow
the discovery and extraction of latent causal dependency relations from the domain lit-
erature using minimal linguistic support. Contrary to the currently prevailing methods,
which assume that relations are sufficiently formulated for linguistic methods, our ap-
proach assumes only the report of causally associated entities without their tentative
status or relations, and can discover new relations and prune redundancies by providing
a domain-wide model. Therefore the proposed Bayesian network based text mining is an
important complement to the linguistic approaches.
may include stochastic grammars for modeling is first updated by the literature data, then by
the linguistic aspects of the publication, how- the clinical data. A considerable advantage of
ever, we will assume that the literature has a this approach is the integration of literature and
simplified agrammatical representation and the clinical data at the lowest level and not through
corresponding generative model can be formal- feature posteriors, i.e. by using literature pos-
ized as a BN (GL , L ) as well. teriors in feature-based priors for the (clinical)
The main question is the relation of P (G, ) data analysis (Antal et al., 2004).
and P (GL , L ). In the most general approach We will assume that a bijective relation exists
the hypothetical posteriors P (G, |DN , i ) ex- between the domain model structures G and the
pressing personal beliefs over the domain mod- publication model structures GL (T (G) = GL ),
els conditional on the experiments and the per- whereas the parameters L may encode addi-
sonal background knowledge i determine or tional aspects of publication policies and expla-
at least influence the parameters of the model nation. We will focus on the logical link be-
(GL , L ) in P (DN
L |(GL , L ), ). tween the structures, where the posterior given
0 i
the literature and possibly the clinical data is:
The construction or the learning of a full-
L
fledged decision theoretic model of publication P (G|DN , DN 0 , ) (1)
is currently not feasible regarding the state of L
P (G|)P (DN |G)P (DN 0 |T (G)).
quantitative modeling of scientific research and
publication policies, not to mention the cogni- This shows the equal status of the literature
tive and even stylistic aspects of explanation, and the clinical data. In integrated learning
understanding and learning (Rosenberg, 2000). from heterogeneous sources however, the scal-
In a severely restricted approach we will fo- ing of the sources is advisable to express our
cus only on the effect of the belief in domain confidence in them.
models P (G, ) on that in publication models
3 Concepts, associations, causation
P (GL , L ). We will assume that this transfor-
mation is local, i.e. there is a simple proba- Frequently a biomedical domain can be charac-
bilistic link between the model spaces, specifi- terized by a dominant type of uncertainty w.r.t
cally between the structure of the domain model the causal mechanisms. Such types of uncer-
and the structure and parameters of the pub- tainty show certain sequentiality described be-
lication model p(GL , L |G). Probabilistically low, related to the development of biomedical
linked model spaces allow the computation of knowledge, though a strictly sequential view is
the posterior over domain models given the lit- clearly an oversimplification.
18 P. Antalband A. Millinghoffer
(1) Conceptual phase: Uncertainty over the to simple grammatical characterization. Conse-
domain ontology, i.e. the relevant entities. quently top-down methods typically use agram-
(2) Associative phase: Uncertainty over the matical text representations and minimal lin-
association of entities, reported in the literature guistic support. They autonomously prune the
as indirect, associative hypotheses, frequently redundant, inconsistent, indirect relations by
as clusters of entities. Though we accept the evaluating consistent domain models and can
general view of causal relations behind associa- deliver results in domains already in the Asso-
tions, we assume that the exact causal functions ciative phase.
and direct relations are unknown.
(3) Causal relevance phase: (Existential) un- Until recently mainly bottom-up methods
certainty over causal relations (i.e. over mech- have been analyzed in the literature: linguistic
anisms). Typically, direct causal relations are approaches extract explicitly stated relations,
theoretized as processes and mechanisms. possibly with qualitative ratings (Proux et al.,
(4) Causal effect phase: Uncertainty over the 2000; Hirschman et al., 2002); co-occurrence
strength of the autonomous mechanisms em- analysis quantifies the pairwise relations of vari-
bodying the causal relations. ables by their relative frequency (Stapley and
In this paper we assume that the target do- Benoit, 2000; Jenssen et al., 2001); kernel sim-
main is already in the Associative or Causal ilarity analysis uses the textual descriptions or
phase, i.e. that the entities are more or less the occurrence patterns of variables in publi-
agreed, but their causal relations are mostly cations to quantify their relation (Shatkay et
in the discovery phase. This holds in many al., 2002); Swanson and Smalheiser (1997) dis-
biomedical domains, particularly in those link- cover relationships through the heuristic pat-
ing biological and clinical levels. There the As- tern analysis of citations and co-occurrences; in
sociative phase is a crucial but lengthy knowl- (Cooper, 1997) and (Mani and Cooper, 2000)
edge accumulation process, where wide range of local constraints were applied to cope with pos-
research methods is used to report associated sible hidden confounders, to support the discov-
pairs or clusters of the domain entities. These ery of causal relations; joint statistical analysis
methods admittedly produce causally oriented in (Krauthammer et al., 2002) fits a generative
associative relations which are partial, biased model to the temporal pattern of corrobora-
and noisy. tions, refutations and citations of individual re-
lations to identify true statements. The top-
4 Literature mining down method of the joint statistical analysis of
de Campos (1998) learns a restricted BN the-
Literature mining methods can be classified into saurus from the occurrence patterns of words in
bottom-up (pairwise) and top-down (domain the literature. Our approach is closest to this
model based) methods. Bottom-up methods at- and those of Krauthammer et al. and Mani.
tempt to identify individual relationships and
the integration is left to the domain expert. Lin- The reconstruction of informative and faith-
guistic approaches assume that the individual ful priors over domain mechanisms or models
relations are sufficiently known, formulated and from research papers is further complicated by
reported for automated detection methods. On the multiple aspects of uncertainty about the ex-
the contrary, top-down methods concentrate on istence, scope (conditions of validity), strength,
identifying consistent domain models by analyz- causality (direction), robustness for perturba-
ing jointly the domain literature. They assume tion and relevance of mechanism and the in-
that mainly causally associated entities are re- completeness of reported relations, because they
ported with or without tentative relations and are assumed to be well-known parts of common
direct structural knowledge. Their linguistic sense knowledge or of the paradigmatic already
formulation is highly variable, not conforming reported knowledge of the community.
20 P. Antalband A. Millinghoffer
variable. We assume also full observability of to-cause or diagnostic interpretation and expla-
causal relevance, i.e. that the lack of occur- nation method has a different structure with op-
rence of an entity in a paper means causal ir- posite edge directions.
relevance w.r.t. the mechanisms and variables In the Bayesian framework, there is a struc-
in the paper and not a neutral omission. These tural uncertainty also, i.e. uncertainty over the
assumptions allow the merging of the explana- structure of the generative models (literature
tory, explained and described status with the BNs) themselves. So to compute the probabil-
observable reported status, i.e. we can repre- ity of a parental set P aXi = paXi given a liter-
sent them jointly with a single binary variable. ature data set DNL , we have to average over the
0
Note that these assumptions remain tenable in structures using the posterior given the litera-
case of report of experiments, where the pattern ture data:
of relevancies has a transitive-causal bias. L
P (P aXi = paXi |DN 0) (3)
These would imply that we can model only X
L
full survey papers, but the general, uncon- = P (Xi = 1|paXi , G)P (G|DN 0 )
strained multinomial dependency model used in (paXi Xi )G
X
the transitive BNs provides enough freedom to L
1((paXi Xi ) G)P (G|DN 0) (4)
avoid this. A possible semantics of the parame- G
ters of a binary, transitive literature BN can be Consequently, the result of learning BNs from
derived from a causal stance that the presence the literature can be multiple, e.g. using a max-
of an entity Xi is influenced only by the pres- imum a posteriori (MAP) structure and the cor-
ence of its potential explanatory entities, i.e. its responding parameters, or the posterior over the
parents. Consequently, P (Xi = 1|P aXi = paxi ) structures (Eq. 3). In the first case, the param-
can be interpreted as the belief that the present eters can be interpreted structurally and con-
parental variables can explain the entities Xi verted into a prior for a subsequent learning. In
(P aXi denotes the parents of Xi and P aXi the latter case, we neglect the parametric infor-
Xi denotes the parental substructure). In that mation focusing on the structural constraints,
way the parameters of a complete network can and transform the posterior over the literature
represent the priors for parental sets compatible network structures into a prior over the struc-
with the implied ordering: tures of the real-world BNs (see Eq. 1).
P (Xi = 1|P aXi = paXi ) = P (P aXi = paXi ) 6 The literature data sets
(2)
For our research we used the same collection
where for notational simplicity pa(Xi ) denotes
of abstracts as that described in (Antal et al.,
both the parental set and a corresponding bi-
2004), which was a preliminary work using pair-
nary representation.
wise methods. The collection contains 2256 ab-
The multinomial model allows entity specific
stracts about ovarian cancer, mostly between
modifications at each node, combined into the
1980 and 2002. Also a name, a list of synonyms
parameters of the conditional probability model
and a text kernel is available for each domain
that are independent of other variables (i.e. un-
variable. The presence of the name (and syn-
structured noise). This permits the modeling
onyms) of a variable in documents is denoted
of the description of the entities (P (XiD )), the
with a binary value. Another binary represen-
beginning of the transitive scheme of causal
tation of the publications is based on the kernel
explanation (P (XiB )) and the reverse effect
documents:
of interrupting the transitive scheme (P (XiI )).
These auxiliary variables model simplistic in- K 1 if 0.1 < sim(kj , di )
Rij = , (5)
terventions, i.e. authors intentions about pub- 0 else
lishing an observational model. Note that a which expresses the relevance of kernel doc-
backward model corresponding to an effect- ument kj to document di using the term
22 P. Antalband A. Millinghoffer
PMenoAge FamHistBr FamHistOv
ReprYears
FamHist
Meno Age
Age PostMenoY
Hysterect
CycleDay
PillUse Parity
HormThera
Pathology
TAMX PI
PI
PapFlow
PSV RI
PapSmooth
Papillati
ColScore
Solid
Volume
Ascites
WallRegul Septum
Fluid
IncomplSe
Locularit Echogenic Bilateral
Shadows
Pain
CA125
Figure 1: The expert-provided (dotted), the MAP transitive (dashed) and the intransitive (solid)
BNs compatible with the experts total ordering of the thirty-five variables using the literature data
set (P MRCOREL
3
), the K2 noninformative parameter priors, and noninformative structure priors.
R. Baeza-Yates and B. Ribeiro-Neto. 1999. Modern D. Proux, F. Rechenmann, and L. Julliard. 2000.
Information Retrieval. ACM Press, New York. A pragmatic information extraction strategy for
gathering data on genetic interactions. In Proc.
G. F. Cooper and C. Yoo. 1999. Causal discovery of the 8th International Conference on Intelligent
from a mixture of experimental and observational Systems for Molecular Biology (ISMB2000), La-
data. In Proc. of the 15th Conf. on Uncertainty in Jolla, California, pages 279285.
Artificial Intelligence (UAI-1999), pages 116125.
A. Rosenberg. 2000. Philosophy of Science: A con-
Morgan Kaufmann.
temporary introduction. Routledge.
G. Cooper. 1997. A simple constraint-based S. J. Russell, J. Binder, D. Koller, and K. Kanazawa.
algorithm for efficiently mining observational 1995. Local learning in probabilistic networks
databases for causal relationships. Data Mining with hidden variables. In IJCAI, pages 1146
and Knowledge Discovery, 2:203224. 1152.
L. M. de Campos, J. M. Fernandez, and J. F. H. Shatkay, S. Edwards, and M. Boguski. 2002. In-
Huete. 1998. Query expansion in information re- formation retrieval meets gene analysis. IEEE In-
trieval systems using a Bayesian network-based telligent Systems, 17(2):4553.
thesaurus. In Gregory Cooper and Serafin Moral,
editors, Proc. of the 14th Conf. on Uncertainty B. Stapley and G. Benoit. 2000. Biobibliometrics:
in Artificial Intelligence (UAI-1998), pages 53 Information retrieval and visualization from co-
60. Morgan Kaufmann. occurrences of gene names in medline asbtracts.
In Proc. of Pacific Symposium on Biocomputing
N. Friedman and D. Koller. 2000. Being (PSB00), volume 5, pages 529540.
Bayesian about network structure. In Craig
Boutilier and Moises Goldszmidt, editors, Proc. D. R. Swanson and N. R. Smalheiser. 1997. An in-
of the 16th Conf. on Uncertainty in Artificial teractive system for finding complementary litera-
Intelligence(UAI-2000), pages 201211. Morgan tures: a stimulus to scientific discovery. Artificial
Kaufmann. Intelligence, 91:183203.
P. Thagard. 1998. Explaining disease: Correlations,
D. Geiger and D. Heckerman. 1996. Knowledge rep- causes, and mechanisms. Minds and Machines,
resentation and inference in similarity networks 8:6178.
and Bayesian multinets. Artificial Intelligence,
82:4574. X. Wu, P. Lucas, S. Kerr, and R. Dijkhuizen. 2001.
Learning bayesian-network topologies in realistic
D. Heckerman, D. Geiger, and D. Chickering. 1995. medical domains. In In Medical Data Analysis:
Learning Bayesian networks: The combination of Second International Symposium, ISMDA, pages
knowledge and statistical data. Machine Learn- 302308. Springer-Verlag, Berlin.
ing, 20:197243.
24 P. Antalband A. Millinghoffer
!#"$%&
'%(*),++ -/.101234'5.7637.981:;:;<=-/.10?>@-A2 :B3DC-FE-G(*37.
H +6<*6 8163JIK5-G(L(*)>3/(L(*),M4K5<ONP6 8Q0Q<RN181(L(TS H ./6),(L(L<*UG)V.1W,-4'%26<*XY:;<L-G(*)
Z\[^]`_GaGbGc >-/.Q.13dTeO8#U/-/.#3#fgNPh^<*6W;);2 (L-/.Q0
iFj1k/lQmPmGjGn#oPp1q!rs#j7tPt#j1k/qGnvuxwQyAoYmPy/jz|{V}
~R,xVYG
! /A,G V -A2)53P0Q),(L+6 1-A6)BP6)V.Q0\-,G),+ <L-/..#);6+630Q),-G(Yh^<*6 <Q2),:;<L+<*37.<.123/Y- ]
Y<L(L<*6Gg9-/.Q0:;-/.-G:B6 8Q-G(L(*v)2);U/-A2 01),0-G++);6+!3G\-,G),+<L-/.D.#);6+;!#<L01)V.1:B)+ 81UGUG),+6+6 1-A6:B2),0Y-G(
.1);6+=-A2)-v3Vh);2T8Q(),-/.Q+632),12),+)V./6-/.100Q),-G(h^<*6 4-/./<v3G26-/.76=-/.Q0:1-G(L(*)V.#U/<.#U
Q23/Q(*)V+<.8Q.1:B);26-G<.2),-G+37.1<.#U1)U/<*G))B#-/4Q(*),+634+ 13Vh?6 Q-A6^+37)3G!6 #),+)123/Q(*)V4+
:;-/.37.Q(*v)3P01),(L(*),09:B2),0Q-G(.#);6+:;-G(L(*),0 GQ;`1G/B*1V =%#),+)Gg#3Vh);G);2,gR-A2 )
+6<L(L(<L++<.1U-DUG2 -G1<L:;-G(2),12 ),+)V./6-A6<*37.(L-/.#U781-AUG)4-/.10+3/(8#6<*37.-G(*UG3G2 <*6 Q4+;^#)+<*6 81-A6<*37.
<L+=P8Q<*6)46 #)3/Y3/+ <*6)4h^<*6 J+),Q-A2 -A6),(*+v),:;<*XQ),0:B2),0Q-G(.#);6+;gh%1<L:|1-,G)Dv);)V.6 #)4+|81#),:B6
3G\81:|+6 810Q-/.10-G(*UG3G2 <*6 Q4<L:01);G),(*3/Y)V./6;^1<L+5Q-Gv);25U/<*G),+6h3-F3G2:B37./62|<LY8#6<*37.1+;
<*2 +6;g<*6^01),(L<*G);2|+%-4.1);h?UG2 -GYQ<L:;-G(R(L-/.#U781-AUG)=6343G2|81(L-A6)-/.76#)=3G:B2),0Q-G(O.1);6h3G2gvv3G6
+),Q-A2 -A6),(*-/.10.#37. ] +),Y-A2 -A6),(*+v),:;<*XQ),0%N#),:B37.10g<*6+ #3xh^+^6 Q-A6 GQ .137. ] +),Q-A2 -A6),(*@+v),:;< ]
XY),0.#);62),12 ),+)V./6),0h^<*6 @6 #).#);h(L-/.#U781-AUG):;-/.v)=),-G+ <L(*62 -/.1+3G2|),0<.763-/.),#81<*A-G(*)V./6
+),Q-A2 -A6),(*+v),:;<*XQ),0.#);6;g01);X.#),03VG);2-4(L-A2 UG);250Q37-G<.^1<L+%2),+ 8Q(*6^3/v)V.1+%8Q-.P8Qv);23G
.1);hv);2 +v),:B6<*G),+5-/.10:B37.1:B2);6)378#6:B37),+;XQ2|+6%3G-G(L(Tg<*6<),0Q<L-A6),(*)V.Q-GQ(*),+6 #)=)B1<L+6<.#U
-G(*UG3G2|<*6 Q+=3G2+),Q-A2|-A6),(*+v),:;<*XQ),0:B2),0Q-G(\.#);6+=63v)-GYQ(L<*),0J63.#37. ] +),Q-A2 -A6),(*+ ),:;<*XY),0
37.1),+;
xOG/P/R 0Q<*6<*37.1-G(4-G++58Q.1:B6<*37.<L+-G(L(*3Vh),063A-A2<.<*6+
:B2),0Q-G(1+);6<.101),v)V.101)V.76(*3Gv6 #)3G6 #);2 +,!%#)\2), ]
)3P:,8Q+%37.:B2 ),0Q-G(.#);6h3G2#+4dNP),:B6<*37.9f=d Z 3GW ] 2),+)V.76-A6<*37.<L+.1-A6 812 -G(L(*(*3#:;-G(),:;-/8Q+)6 #);2)-A2)
-/.g bAG/ fgh%1<L:-A2)-UG)V.#);2 -G(L<*W,-A6<*37.3G .#32),(L-A6<*37.1+|1<LQ+v);6h);)V.0Q<E);2)V./6:B2),0Y-G(+);6+;
\-,G),+<L-/..#);6+,^#)UG)V.#);2|-G(L<*W,-A6<*37.<L+-G:|1<*);G),0 ^#)P8#),+6<*37.?<L+3G2):B37Q(L<L:;-A6),0h^<*6 3G2)
92),(L-F#<.1U^6 #)2),P8Q<*2)V)V.766 1-A6O6 #):B37.10Y<*6<*37.1-G( UG)V.#);2 -G(R+v),:;<*XY:;-A6<*37.1+3G!:B2),0Q-G(O.#);6h3G2#+;gQh%1<L:
-G+ +8Q.1:B6<*37.1+3G6 #)3#01),(v)12),:;<L+)Gh^<*6 h):;-G(L( YGY,`#GG;$1V ^#)<L01),-3G
:B2),0Q-G(.1);6+),-G:|3G6 #)V<L+37.Q(*42),#81<*2),063v) ] .#37. ] +),Q-A2 -A6),(*+ ),:;<*XY),0:B2),0Y-G(v.#);6+\<L+<.-G:B663
(*37.#U63-:;(*3/+),0:B37.7G)B+);6; Z (*3/+),0:B37./G)B+);6+ -G(L(*3Vh3G2R2),(L-A6<*37.Q+ 1<LQ+v);6h);)V.:B37.10Y<*6<*37.1-G(P4-G++
3G-G+ +^8Q.1:B6<*37.1+%-A2)-G(L+31.#3xh%.-G+ V /A;, 8Q.1:B6<*37.1+<.40Q<E);2)V./6:B2),0Q-G(Q+);6+,g7h%1<L::;-/.4-G(L+3
-A6);2\eR);P<!dTe);#<Tg aGcA f+<.#U:B2),0Q-G(v+);6+\<.6 #) v)-A2%-,h\-,<.6 #).#);6;
Q(L-G:B)3G\4-G++8Q.1:B6<*37.1+4-AG),+:B2),0Q-G(.#);6h3G2#+ '(*6 #378#U746 #)5<L01),-3GR.137. ] +),Q-A2 -A6),(*D+v),:;<*XQ),0
-/. ^vV,=v7 7; 3P0Q),(%d-G(L(*);Gg5 aGa Ff :B2),0Q-G(.1);6+<L+2),(L-A6<*G),(*<.76 81<*6<*G)Gg<*6+ #3781(L0v)
H 6\:;-/.)5+ #3xh%.D6 1-A6-:B2),0Q-G(.1);6h3G2D<L+),P81<* ] +62),+ +),0J6 Q-A66 Q<L+#<.103G%.#);6+1-G+);)V.<.7G),+ ]
-G(*)V.7663- ;,54^G/B/; h^<*6 6 #)+-/) 6<*U/-A6),0$G);2(L<*66(*)G<.T-G:B6;g6 #);2)@1-G+v);)V..#3
UG2 -GY -A66)VQ6+3-A2D6301);G),(*3/-JUG)V.#);2|-G(^UG2 -G1<L:;-G(
'5.<v3G26-/.76P81),+6<*37.<L+h#);6 #);23G2=.13G6-G(L( (L-/.#U781-AUG)63@01),+ :B2 <Lv)6 #)V!-/.106 1);2)4<L+=.13-G( ]
:B2),0Q-G(.#);6h3G2#+:;-/.?v)2),Q2),+)V.76),0<.-h\-, UG3G2 <*6 Q63:B374Y8#6)h%<*6 6 1)V*^1<L+5-GQv),-A2 +
6 1-A6)V4Y1-G+<*W;),+(*3P:;-G(L<*6`G^#)-/.1+h);2<L+v3/+<*6<*G)
<*h)2 ),+62 <L:B66 #)=-A66)V./6<*37.636 #)=3/+6v3/Y81(L-A2 G R* xF V
F ! A"#$ %'|&7@) (7 ||+ *-, ..0/)V1 *
B243V
6#v)3G:B2),0Y-G(.#);6`h3G2 P+;gv6 #3/+):;-G(L(*),0 ;`1G/B* \x, |5B6GG76VGOF98|!:<;>=|
Q, dNP),:B6<*37.#f H .$6 1<L+:;-G+)Gg),-G:|?:B37. ] F 64G :Vx *F ? 3Fx@ : 6A % x
63v)-/.8Q.#3G26 8Q.Q-A6)U/-G<. 6 #)(L<*6);2|-A6 8#2)-G+
F`
v/O
A 1
6 #).#37. ] +),Q-A2 -A6)+v),:;<*XY:;-A6<*37.+);)V+=63)6 #) Re );681+XY2 +601);X.1)+37)%.13G6-A6<*37.4-/.106 #)^8Q.10Q- ]
G);563%3P01),(/4-/./<v3G26-/.76123/Q(*)V4+;gV-G+O<L(L(81+ ] )V./6-G(.#3G6<*37.3G5\-,G),+<L-/.$.#);6`h3G2ve);6
62 -A6),0<.NP),:B6<*37. _ NP),Q-A2|-A6),(*+v),:;<*XQ),0:B2),0Q-G( d !!! #"fv)-:B3/(L(*),:B6<*37. 3G2 -/.10137 A-A2 < ]
.#);6+,gV37.6 1)3G6 #);21-/.Q0gx1-VG)v);)V.56 #)+ 8Q#),:B63G -GQ(*),+,g%h%Q<L:|$6-AG)A-G(8#),+<. X.1<*6)+);6+,g-/.10%$
81:-G(*UG3G2 <*6 Q4<L:01);G),(*3/)V.76\d Z 3GWV4-/.g bAG/ f -0Q<*2),:B6),0-G:B#:;(L<L:@UG2|-GYdTK^''&=fgh%#3/+).#3#01),+
H .6 1<L+OQ-Gv);2h)U/<*G)6h34-F3G2R:B37.762 <LY816<*37.1+; -A2)-G++3#:;<L-A6),063@6 #)A-A2 <L-GQ(*),+<.( 3G2=),-G:
<*2 +6;gYh)01);X.#)-48Y.1<*XQ),0UG2 -G1<L:;-G((L-/.#U781-AUG)63 #)*+g-,/.10<L+6 #)v3/++ <LQ<L(L<*6+Q-G:B)3G2#)g43)!-
(*3#:;-G(L(*+v),:;<*:B2),0Q-G(=.#);6h3G2#+<.6 #)JUG)V.#);2 -G( UG)V.#);2|<L:),(*)V)V./653G5,/.10g764d#)f\--G++%8Q.1:B6<*37.
:;-G+)dNP),:B6<*37. f%%#).#);h2),12 ),+)V./6-A6<*37.<L+5<. ] 3G28#)-/.1096d:3)f6 1)12 3/Q-GQ<L(L<*66 1-A6;#)3)
+Y<*2),0g#<L-6 #) Z^Z > 62|-/.1+3G2|-A6<*37.d Z -/.#3);6 ^1)Q-A2)V.76+3G2 ) g-G:;:B3G2 0Q<.#U63$gY-A2)0Q)V.#3G6),0
-G(TLg aGa #fg9$6 #)3G2|-G(L<L+ 3G<. 81)V.1:B)0Q<L- ] 96 #)3/<./6F-A2|<L-GQ(*)=<=)`gPh%#3/+)^v3/++<LQ<L(L<*6`+Q-G:B)
UG2 -/4+;g-/.103G2 )UG)V.#);2 -G(L(* 3G=01),:;<L+ <*37.UG2|-GY1+ <L+1,/>-0 3G2),-G:|@?4)*A,/>B0gC6d#)ED ?)f<L+6 #):B37.10Q< ]
dCQ-/.#U);6-G(TLg aGa 9f H .6 1<L+(L-/.1U781-AUG)^6 #)^UG2 -GY 6<*37.1-G(-G++8Q.1:B6<*37.3G2#)U/<*G)V.6 #)!3/<./6A-G(8#)
3GR-:B2),0Q-G(.#);6\<L+\-/8#U7)V.76),0Dh^<*6 4:B37.7623/(.#3#01),+ ?)O3G6 #)Q-A2)V.76+3G#)`^1<L+\3G2|-G(L<L+ <L+^+ 8GF ]
6 1-A6\)B112),++\6 1)2 ),(L-A6<*37.1+ 1<LQ+v);6`h);)V.0Y<Ev);2)V.76 :;<*)V.7663412 3/);2|(*D<.7623#0Y81:B)6 #)3/(L(*3Vh^<.1U1
:B2),0Y-G(+);6+;)U/<*G))B1-/Q(*),+\634+|#3Vh6 1-A66 #) HJILKMN N:OBMQP4R '\-,G),+ <L-/..1);6h3G2dTTSf3xG);21
.#);h(L-/.#U78Q-AUG)4123xP<L01),+37.#)h^<*6 -.1-A6 8#2 -G(!h\-, <L+!-5Q-G<*2VUW$ YX1Z + 81:6 Q-A6 X <L+-5+);6!3GY:B37.10Y<*6<*37.1-G(
6301);Xv.#).#37. ] +),Q-A2|-A6),(*+v),:;<*XQ),0.1);6+;O-/.10h) 4-G++T8Y.1:B6<*37.1+64d#)[D ?)fg37.#)3G2),-G:|#)*9
U/<*G)-123#:B),0Y8#2 )632);3G2|81(L-A6)-/./+),Q-A2 -A6),(* -/.10J?)C*=,/>-0
+v),:;<*XQ),0.1);6^<.6 #).#);h(L-/.#U781-AUG)G )-G++|8Q)6 1)]\ G A_^D GY/TG 634-AG)#$
N#),:B37.10gh)-AG)-JG);2+<Y(*)3/Q+);2A-A6<*37. 2),Q2),+)V.76123/Q-GQ<L(L<L+6<L:<.101),v)V.101)V.Q:B)2),(L-A6<*37.Q+
dNP),:B6<*37.7fg7h%Q<L:|1-G++|8#2 12 <L+ <.#U/(*=v3Vh);281(#< ] v);6h);)V. 6 #)F-A2 <L-GY(*),+D<.`);G);2A-A2 <L-GQ(*)<L+
Q(L<L:;-A6<*37.Q+;h)5+ #3xh 6 Q-A63G2-/.74:B2),0Q-G(.#);6`h3G2 <.10Q),)V.Q01)V./63G\<*6+.#37. ] 01),+ :B)V.10Q-/.76.#37. ] Q-A2)V.76+
+v),:;<*XQ),0@h^<*6 @6 #).#);h(L-/.#U781-AUG)6 #);2)<L+5-+), ] :B37.10Y<*6<*37.1-G(37.J<*6+Q-A2 )V./6+;^P81+;g-TS 01);6);2 ]
-A2 -A6),(*@+v),:;<*XQ),0:B2),0Y-G(.#);6`h3G2 g01);X.#),03VG);2- 4<.#),+-3/<.76-G+ +8Q.1:B6<*37.+6df\-G:;:B3G2 0Q<.#UD63
(L-A2UG);201374-G<.gh%1<L:<L+D),#81<*A-G(*)V./6;^#)123 ] 6 #)3/(L(*3Vh%<.#U4T-G:B63G2 <*W,-A6<*37.
:B),0Y812)46362 -/.1+3G2| 6 #)D3G2);2=<.7636 #)(L-A66);2 "
.#);6`h3G2 <L+G);2$+<Q(*)Gg-/.10$6-AG),+37.1(*$(L<.#),-A2 b
6d:a!f 6d:3 ) D ? ) f dFf
6<)G^^#)=G);v3/<./6<L+%6 Q-A6%6 1<L+123#:B),0Y8#2 ):;-/. )dc
v)81+),0-G+-63P3/(63I+),Y-A2 -A6),M6 #):B2),0Y-G(+);6+
3G.137. ] +),Q-A2 -A6),(*+v),:;<*XQ),0.#);6+;=^1<L+-AG),+5<*6 3G2),-G:|a(*A,/e5g/h%1);2)\3G2),-G:gf1 !!!hEi 6 #)
v3/++<LY(*)633#01),(Tg9+),Q-A2|-A6),(*+v),:;<*XQ),0.#);6+;g A-G(8#),+d:3-) ?)f-A2):B37.1+<L+6)V./6h%<*6
a
12 3/Q(*)V+3G2|);2 (*3#01),(L(*),07.137. ] +),Q-A2 -A6),(* j kY l8C1J #Y 7l51mn2
+v),:;<*XQ),0=37.#),+;9-/.101)V.1:B)6381+) GY dT3G6 =)B1-G:B6
-/.10-GQQ23,1<-A6)Af)B1<L+6<.#UJ-G(*UG3G2|<*6 Q 3G2+),Q- ] Z 2 ),0Q-G(.#);6h3G2#+)B#6)V.10\-,G),+ <L-/. .#);6+63J01),-G(
2 -A6),(*+v),:;<*XQ),0@.#);6+634+3/(*G)+ 81:123/Q(*)V4+; h^<*6 <12),:;<L+ <*37.<. 12 3/Q-GQ<L(L<*6G ^1<L+J<L+3/ ]
N#37):B374)V.76+=37.6 1<L+2),+ 81(*6-/.10v);2 + ),: ] 6-G<.#),09),-/.1+3G:;(*3/+),0J:B37./G)B+);6+53G123/ ]
6<*G),+3G2T816 8#2)01);G),(*3/)V.76+-A2)@0Q<L+:,81++),0<. -GQ<L(L<*6`4-G++T8Y.1:B6<*37.1+;gh1<L:|-A2):;-G(L(*),0 V /A
NP),:B6<*37. c ^#)=3G2)6),:|Y.1<L:;-G(!Q-A26+3G!6 1<L+%Q- ] ,; dTe);#<Tg aGcA f)3/(L(*3Vh d Z 3GWV4-/.g bAGG f<.
v);2-A2):B3/(L(*),:B6),0<.'Qv)V.10Q<D' ] :B37.1+ <L01);2 <.#U37.1(*X.Q<*6),(*JUG)V.#);2 -A6),0:B2 ),0Q-G(\+);6+;g
<T)GLgP3/16-G<.#),0-G+6 1)%:B37.7G)BP81(L(13G-Xv.1<*6).P8Q ]
v);23G-G+ +T8Y.1:B6<*37.1+;J&);37);62 <L:;-G(L(*Gg-:B2),0Q-G(
+);6<L+- #A*A|1 ' :B2),0Q-G(!+);6:B37.76-G<.1+-/.<.#X ]
VTx|||4 * A 6GA x| 2G .1<*6)5.98Y=v);23GR4-G++8Q.1:B6<*37.Q+;gP8#637.1(*-=X.1<*6)
,
V G` Ox0A GG * .P8Qv);23G Eo7,5F Rqp#Y,TG# !6 #3/+)%:B3G22) ]
243V 8R F 8 7 6432FF 8|!:x x 8
F 3F ; +v37.10Y<.#U=636 1) ^F;T| 3G6 #)v3/(*963/v)Gg1h%Q<L:|
D=?v A
)76+R,Y\v3
@K<L+NO^(VLRQSRTp,
85O*]#
<}$)
; 4~v)7
968*
)/.0
; 6r]R@Yq$U3
@K<L+
NO^5V+L(R,QSRUT
ps
q] 5@5
z0
$pft$z0$,p,uE;
`
z
'
)+
6<)O6
~?$;< - - * )76?$;
?,B<*
- 46<*D$9( )m$A
)**$B<; )+)~
+9&U$)6FHI3K<LMONONOP(QSRT(VL/X
YkZ]R@YqN+K]R@^
Y[Q[LR@^(aL(RXN+K<N+R@MONL(RWI0K<L7M NVOV]QSRUT^(RP
eg^R@^+TUN+N+R,YL8X!R@MONKY\^(QSR@Y[lQSRgRL(3aNOP+TUN+_`^(VNOP
d@l5VYqN+V]pv?
B
)76
w0x=vz
{+Dm
pUzv)z0D?@6pUtuF/v)p
|t$z
w)+; ;<);O=$|bv
O}0"",;
?@6< s
E;<)*A
1054)6<*
)/.0
; 6
6 68v
9( ).b<}*D=?;<)+A
+*6<)
!&U* ( ')?$; - - *96/ *
6 68)76<6<D)+U 6FH
I3K<LMONONOP(QSRT(V0L8XY[ZN0Y[ZfRR^a$L(R7XNK<N+R@MONLRoR_
MON+KYq^QSR,Y`lQSR|KY[Q 0MQ[^aU]R@YqN+aSajQTUNRMONp?
B
)637"@
bF3; )6 6+
w0x=z
{D
:""
"$z; )$
)/.0
; 6KY[Q 0MQ[^a
R,Y\NaSaQTN+R@MON+p7
"$*
(v
$
w0yx=z
{+Dm
"
"$oxf;O?$}$9
+
D=vv)*6~k;D?$; )
+*6<)
?;< - - ** )76+]R,Y\$0,K<LUN ^(V+L(RQSRTp`(A
O*#@7,
tzrw)+; ;<);OE$cbv
O}c,w0x=z
{D
h
"
"Uv
FH$~k)+; )+
+).b<}6<)+?,;O(<)4E6<?,)7
)!6<)O6f~?$; - A
- * )76
+;<)7$$)+/.;<v6FH:I0K<L7M NONOP(QSRT(VL8X=Y[ZN
Y[Z>LRXN+K<NRMONhLRR@MON+KYq^QSR,Y`lcQSRKY`Q M+Q[^(aR_
Y\N+aSajQTN+R@MON+p$?
B
)63U
"(UE
; B
$~kD
$
F]r)+'`c7
"$:sZvNm0R,Y\NK\@K]Q9V+NL8XRL(3aNOP+TUN+hEF/n
;<)76<6pvv
r
u$E;
s,z0""vuU<;
B=
+
v<*
,$A
$)+?@)+v)
)~k;
; )$
68)+ 60RR^(ayVL/Xe!^
YkZvNm^Y`_
Q[MV^(RPKY`Q M+Q[^(a,R,Y\NaSajQTUN+R@MONp@/A\]
5$
n$);<Dm,Et$)
;<q3
"$%&'(*)+,
)h6<4U$A
}$)6<96c~|
+
6 Dvv)+96FHI0K LMONONOP(QSRUTV2L/XY[ZN
d@QUYkZR,R^(afL(R7XNK<N+R@MONL(RR@MON+KYq^QSR,Y`lQSR2K]_
Y`Q M+Q[^(a@]R,Y\N+aSajQTN+R@MON+p?
B
)763
5v"$%968)');
s*)4
E$dYq^
Y[Q9VY`Q[M ^aoNO^5V+LRQSRUTgQkY[ZE],K N_
M+Q9V+NI0K Li ^iQSajQkY`Q[N]V]3z},?$Dmh
qp$).W
;
GYj@ \%P
]2tw
RSM G<riKrMWr{|KyAQRSJ
]
RS\rxyM BH\GYv+X6]BS{}P^RSw2OQAdPqOdBHGYAzJqMv+]
qRS Z+AQ\^nxyp J G5Odw2lEBHOdx9]2w7__ BH
IKS\^ G<w+SPq2BHrGjSF]
\^P%wRSBHx~OdGvBHRS]
]\%J(xYBHRqOdG<wBHAdOOdtKG\^AdOdiw
J OdwWOQJR *w2BHZw?\nkw2RSY]
\\
v+J2OQAQRS\^GYxyJ~\^G<BH]
w2\OdG+IvRSw2OdG<Yw
\\%X@;RS{s+{!wqM|xyBHAQRRSX+G+\^IbA?kRSOdw2]FkjBSJ
P^RSOCAdBHs+A(w2OQ\nRS[GYv+J(]
k\%RSJ7]_ BHrGYjSX bBUX+kl\^w
w
\^\^]]2x~BHG OdGY\%\^JDK\^w2G<Ywb\f\^BSxyJ0RSRw2OQP%RSP^Gs+BH]
A \%X]
\qp BSP^w2OQYRS\G}w
RHSw2 Y\
w2J
\^YRSDKxy\}\^G<G<\(w
s+JqRHx?pk|"Zw2Y\^[]|v+\^xf]
RH\%kpzJ
\^J2xy`0OdG+RSRrIw2J2OQJ
wWRS\^GYGYDKJ|RS\^]wBHBHBHGYAZ+X(AQ\^\~xytK\%BHRS\^]
w2v+\~OQOdRSG+ByGYI?Jz]
\%BWBHX[w+sYOQRSP^J2GYw2w
OQRSP%RS]2\Gi(]
RHRH\n_kk GYRHk\^w2Y\J2Od\^w2sxyBHRSw2w2OQRSOQRSG.Gp BHA w(]
\%BHJ2AQvJ
RSRfGYX+J
\r\^w
p"\^]2x~Y\WOdGY\%J?%lw2YS\~OdG<2w
\^+GYHJ2qOdwani
v+X+]
\^RS]?vw
RrRJ
\%X+J\qBHw2A}Y\0OdBSw2P^w2OQw2RSYG\w2\^YDK\0\^G<BHwqIKpf\^G<wY\OdBSA WP^w2wOQBHRStKG\P
OdYGRrJ
RS\^]7G _
xv\^BH]
OdJ
GYRSJGBHBAdO v+wai]
RSwaZ+iAQv\^x \%Jqp BHAQRSG+IFOdw2xyRX+\^AdO G+IfX[O\^]
\^G<w Jw22YY\RSs+BHAQIKXb\^G<Zw \9J OdG[YsY\^p"GYP%Y\%X\ Z<ib<Hw2nYS\~d
\nS[v
S\%P^w
\%[X SW<R BHnGYYX
]
\%SJ2 v<GRSnGYw2+J
SOQ\% J?Jqp.?}RS^Y]2|t\|SX[} ]2q\OdDkOdlG+REP^INWsYkJ(JnRS.]
RSP%w
G\
RzOdJ2BSGOdP^x?w2w2YOQs+RS\ACGYBHJ~w
`\BH\^GYVWxyXOQJRSs+w2OQw2RSOd+A GOdSqBH \ A
"
ZP
llYRSRSw2OQ07P%\\^xyRH k|qRSBSw2OQP^RS w2WGOQRSBHR GfA"J2OQwJBHBHOdw
G[GY\(YX&BHsYGY\^OdwGYXP%OQP
\%JYXfRSsYZ<OQJ
P%i\%\9X&w2RHYk|w
\zRBSDrP^X+BHw2OQAd\^RSsYw
G.\\^]2pRHx~kOdBHGYYAd\\ w
w2Y\n[\;w2P
s+Y]
Rr\%JqJ
\^MYGbBHG+BSOdP^xw2OQBHRSw2G.OQRSM+GYOkJBH\^G<w
PriKpp Y\?%jS9PqBH]2]2OQ\%JRSs+w
YRHksYBS\^GYP^w2P%OQ\yRSG~ZRSOdw
w2OQJ"w2vYRr\yJ
J2P
OdZ+BHAQ\G+w
IKRW\yxyOdGRX+\^xy\^AYRSX[w2OOQRS\^G]
\^BHG<GYwXvP
\^Y]
J
RSRSOQG[P%\ _ w2GYOQ\^RSwaG }RS]2tOdxyA RRSX+s+\^w2AQAdJyO GYPq\~BHYGRqZ\w2s+Yw2\~OdA X+Odq\^\%DKX\^AQRSkRSv]9\%Xw2YE\fB%
iK\%DrJ2BHOCAdBHs[G _
&%
BHAdO |waiBSP
waivBH\%IKJ \^G<wWOdA IK\^w;ByP%\^]2wBHOdGBHxyRSs+G<w;RHk WR X}+BHw
\%RSRSJ
]2]~P^t]2BHOdw
ZGYR9\%XJZw2\zYs+\ BHw2Odw A Odx?P^qw2\%sYOQXRSJ2pGwZ
\;]
RSJ2vvRr\%J
P^\^O]qM
\%Xk+RSOd]AQ\fRSs+V]\%P^klw2]OQBHRSxyG\n _
Z<B;iG<s+vx?\^]7ZkRS\^]2]x~RHOdk.G+]
I\%J
BSRSP^s+w2]
OdDP%\%Odw2J}OQ\%OdJqGp}RS|]
X+BS\^P
]}w
BSR?P^w2ZOd\DOdPqwaiBH]2]
]2\%OQ\%<Xs+OdRS]
s+\%Jw
w2BHBzYGYP%\X(\^]2BHOwkIKBH\^B OdG<G]
w}\%BSPqJ
P^RSBHw2s+GyOdD]
PqP%OdwaBH\i~]2OQ]2JOQi~JAQRrRSX+J2s+\^w"w
wq\^Odp"w|]2x~BUYOdGY\\%\%P^WXyw
J"R GYfRS+w}]
OQ\%RSP
P%G+\^AdOdBSi(DKP^\%Z<w2X9Odi9DklOdY]
w2RSRqOQ\%x J ' ( jrK
*),+.-/10 32
Dkr\nRSBH]w
\%\%P^w2Xw2OdBHDKw2wzY\F\;BSw2P^YBHw2IK\fOdD\^G<BSOdwaP^wiKw2pOQOdJDOdBHwaGYi\;XsYOQJ~Z<J
\ iZ+Ws++wOQR J}BHIKAQJ
\^w
RbRGY\^X+Z<]\^BHiw
A\^Yv+]2Rq]
x~\nkOdGY\^xy]
\?\^RSGYw2w2YP%O\\_ BHIKB%\^i[G<Jw
JqBHp IK\^ G<w
P^Jw2OdDPqOdBHw2GOQ\%J]
\%BHP%GY\^OdXDK\ RqWGYR \^]
J2+BHGYOdvXBHw2]
Y\F\ w2WY\FR RSG+BHAdGi
.
]
w2Y\%SJ2\nv2AQRS\^GYDKJ
\^M"\(A RHw
RFk<\nBHn[Gfv\%\^BHP^DKGYw
\^X \%G<X w?WX+2S\^R ^vp\^GYOdGX+YJ;w2\yYRS\?\nGbYJ2BSw2OdYw2P^sw?\~BH\^P
w2xyOQRSBHRSGbG+w2IKOQBURS\9klGw
Od\^BHG ] A
EP
w2P^Y]
Y\q\\^BSG\^J
xy\zw2OdRSYGw2\~OQklRSs+\nG[w2s+vOQJ]
\%\;P^X+w
\n\^\%[w
X \^v]2\%Wx~P^Rw
Od\%GYX \%X]2WOQJ
Z<\%R iJqp9w2Y\?+YOdP
\9AQ\OdBHG<G+w
SIK\^GY\J2ROdOdP%G0waP^is+\nRH]
J_k
! w2Y\;YP
Y\?RSxyOQP%\?\^w2RHYkRX+BSP^RSw2AQOdRSDIrOdiFwaiKkp RS]WX+\%P^OQX[OdG+I+OQP
BSP^w2OdDOdwai
EP
"
w
Rv\^]7kRS]2x PqBHG6Z\bX+\%J
P^]2OdZ\%X6Z<iw2Y\0kRSAdQRqOdG+I
36 O. Bangs, N. Sndberg-Madsen, and F. V. Jensen
J2w
\^vYJ
I<BHxyJ\%Jqw2M+w2OQYJ\y}xRS]2BHtOdGOQJkROdP^GsYJw2YOQJ?\bRSP%GRSG<w
B%\nD[OdwFG+I0RHk(w2YP%RS\x~BHv+IKs+\^G<w
\^w
]J
rp G\^DK\^G<wP
BHG+IK\%Jw2Y\
]
\%J
RSs+]
P%\%JRHk[w2Y\|BHIK\^G<wqp w2w2\nY+[]
\W+\%Odv+\Z+ACOdB%w2waiKOdiG+\^vI]q\%pJf<s+RHYxk\;BHBSG[BHP^IK_jw2Ad\^OdO DG<tKOdw
\(w2JOQZ\%OdJfG\^w2w2B%YDBH\zOQwRS\ns+Yi]qBHOQM\^x~RSAQX ZYv+J
AQ\WW\^]2RDrBHOdA Z+MAQ\?OdG B%Z<DKiB\
[p|RHk"Yw2\zY\n\;[P
v\%BHP^G+w
IK\%X\OdWGfR ]
\%J
xRSs+B%]
iP%\%P
Jqp BHG+IK\?BSJBHGf\n\%P^w
P%RSx~xy\^]
P^OCBHAI<BHxy\;w2Y\^]
\;OdA .GY\%\%XFw
R9Z\zxyRS]
\
[p w2 OQRSP
GBHBHA.G+]
IK\%\ J2vOdGRSGY\nJ
[\rvp \%P^w
\%X WR w2]2OdIrIK\^]
JBHG\^xyRH_ w2cYBH\z]2x~ACBHOdGYG+XIp BSP^w2OdDOdw2OQ\%J(BSJ(w
R0X+ROdw2kjBH]2x~OdG+I
#
Od\^GDrBHw2G<Yw;\
\nX+Y\^BHwBHx~OdAQv+J;AQ\"BH]
kRS\?].+\qOQBSX+J
X+\\^RHGbk+Od\nG0[vw2YRrJ2\(Odw2RSOQRSZ[G.T2\%MHP^BSw
JJqMw2Y}\
\~Od]2BH]
AQ\^J
AR _
OdGYG[RSYw|sYYIK\^\^\ GYwP%Wx?\%XyR sYZ<P
i~]
\%Ww2P%Y\^R \ OdfDKBH\%klIKX]
\^RSG<klx]
w RSJ|X+x waRSiOd\qG+vBSI\rP
M<kjBH\r]2pBSIYx~P^p<w2OdB;OdG+DI9J
OdwaRSiBSAQX[P^w2waOQ\^iOdD]vOdw2\~OQOd\%OQA JJ
ZI<BH\^OdAdG6OQ\^DKOd\G6w2J2vBH\%w|P^J
ORSxyPqBH\w2OQP^RSACG6BSJ
J
w2\%OdxyJPq\BHGyJ2YZRS\s+AQ]
X\^sYZJ
\%\XBSM<P
J
+ROQ\^J
DKRS\%xyX\ p
w
w2RSxBHwBHPq]
BH\ GRSZs+\w2v+vs+BHw|]
\^GYG<Rw
X+J\%RHJqk<GYGYRRX+X+\%\%J|JOdOdGGYJ2w2OQYX+\\Ww2J2Ys+\]2]
OdRSGYs+J2GYwBHX[GYOdG+P%I\ \^AQXMB(v+AQRSs+Ir.MBHGYXfB9YRS]
J
\rp
P^ACBSJ
Jqp
B%DrAQRSBHOds+ACIrBHZ++AQOdG+\rMWIv+AQRSOdw2s+5Ir+BOdG+Iv+AQRSs+OdIrw2&.p B v+k9AQRSGYs+RIr6YRSOQ]
JJ
\w2YOQ\J
xyv+AQRrRSs+J2w&Ir.\ p P^OQ\^G<wqp \%<s+Od]
\%JB\^AQX BHGYX B
J2YAQB%RSRqiDKs+\^IrRHAj+k;p OdG+v+I(AQRSs+IrOdw2+OdG+BIYpJ2YRq DK\%\^<Ajp
s+Od]
\%YJ\BAQ\qBSJ2\^w|AQX\ BHP^GYOQX\^G<Bw
w
R~v+AQRSs+IrfOdw20B9J2YRqDK\^Ajp
cOdOdIrw2Fs+]
wa\&}Ry OdG+v+ s+GwGY@;R@WX+\%EJ NBHGYP%XfRSG<BHwGfBHOdRSG+s+OdG+w2v+I1s+wBHG GYROdX+GY\rJ2p
wBHGYYP%\\ Y\;YRSxy\?BSP^w29OdDrOdw2 OQ\%JWCBH]
<\
w2P
Y\BH]2OQBHJ2IKx\^G<B[wp J\^YGY\ \^]2IrWi R AQ\^DKOQJF\^AyOdG[BHYGYsYX5\^GYw2P%Y\%\X&J2Z<viRSsYw2J
Y\%\J v\^AQ\%JP^w
w
R9\%X IK\^WwR w2Y|\z}]
\%\J2s+PqAdBHwGBHG<w\^G<P
w
\^BH]G+\^IKDK\?\^G<RHk"w
J\n[w
vR\%w2P^Yw
\%\FX xyWRR X_
w2Odxy\RHk
XYB%ifBHGYXFw2Y\?BHIK\^G<w J P
BH]2OQJ2xB[p
+
0 9
r
C <
w2w2OQYRS\G
BHADr]
BH\%AdsJ2vBHRSw
RSGY]nJ
n\Ww
vR9BH]2w2w|Y\;RHk\^w2DK+\^OQG<Jw(wBSvJ2t9BH]2wOdRHA YkOdw2GYYP^\zAdsYwX+BS\J2tw2YRH\ k
EP
v+
s+s+Z9Z0OQDJOQOdJ2G[OdwqYpsY\^GYYP%\\%X(xyZ<RSi?w2Odw2DrYBH\w2OQBHRSIKG\^G<kRSw ] J"IKxRSOdBHG+]2IOdwBHw
RyA[J2w2wYBU\_
w2Z<sYibJ|w2BHYGY\9X9w2\^OdxyGY\^\y]2IrRHi(kAQXY\^B%DKiK\^MAjp"YY\^\w2YW\^]R Fw2YOQJ"\yOdBHG[IKY\^sYG<\^w?GYP%\%BSX J
. xyJ2BSOdP^qRw2\ OdX+DRH\^OdkAQw2JOQw2\%YPqJ\WBHGP
+v+OQBHP
]
G+FRqIKDOQ\WJOQX+OdsY\bGJ
\%\nBbX[v]OdBHGF\%G+P^tKP%w
\%RS\%XGSXT7Ws+AdOQGYRJ2P^ww2pOQRHRSkGFYPq\;BHGYOd@;w2X[ @WOQXYEBHw
N \
kOdu wRSBHBH]A;w2XYOdG+J2BHwIYw2BHOdpbw2G+sYI~ JOQ\%J}BH<GYOds+G[XOdY]
\%sY\^J9\^GYGY\^BP%]2Ir\%XYiXBHw
Z<AQ\r\^ip0DKw2\^YAjp \;Y\BHIKxyY\^\G<RSww2WOdJDrRBHx5w2OQBHRS]7OQG J_ X+\^w GBHOdA.w2YY\;Rq&kRSAdw2QYRq\;OdxyG+IFRX+J
\^\%AQP^Jw2OQJ2RSs+GYv+J v}RS]2\(wX+w2Y\%J
\;P^]2OdZ`\VOdpG0xyRS]
\
Sa r.=
7.
vw
"\^BHBH]]
]2\%w(BHJ?GRHw2k\^YDKw2\\^YG<\yBHw}IK\^G<`BSwJVbJ~RP%OQ\nJP^[s+vw2]
Y\%\%\P^Xw
p {}\%X RSGx~Ww2v+R BHOQJ}]BHJ
Zw
\%\nRSP^k](w2RSOQ]
w2RS\GBHBH}w(GY\ P%XRSx9BUOdkA __
) 3)
xyRSGY\^iw2Y\?BHIK\^G<wBSJqp
ACYB%\ iOdxyG+IRSw2OdP
DrYBH\%w2J
OQJqRSp*GOQJOd\%G[<Ys+sYOd]
\^\%GYJ0P%\%BXP
Z<Yi~\%J
w2JfY\vw2BHOd]2xyw2GY\W\^RH]qkp
\]
^\%GYJ2P%s+\%G<AdJw
iJ9w2Y\^OdGDK\\^w2G<BHYIKwF\F\^PqG<J2BHw w
GRSJFAQ\^Z]
G\%\fJ
]
RSxy\%s+J
R]
RSP%X+s+\%\^]
JqAQP%\%M \%XJ~\rp Z<IYGYipRSw9BYZRqw2\^YOd\nG+klOdIwFwB%\^OdG[DrDKYBH\^G<Ods[Aw__
$
IKI<\^BHGYxy\^\;]BHP
A+ZBH\}]BSGYP^Rzw
\^XY]
BHJqwMYB;w2YB%\DrBHX+OdAC\%BHJ2Z+Od]
AQ\%\XkRSZ]"\^w2YB%\DOQZRS\^s+] B%RHDk"OQRSw2s+Y]\^xRHk ]
Ojp\%\rJ
p+RSw2s+Y]
\(P%\WBHIKBHAQ\^J
G<RwWIKBS\^Jw
J B WWR R bkBHRSw2]wBSP
B%YD\%OdXG+IkRSw2]}YRq\]
GY\%J
\^RS]
s+J2+]
P%Odv.\rMM
v+|s+BSw2P
w2YOdG+\^BSIb]
\P^w2BFOdOQDJbxyOdwaBH\qiG5BSJ2BHs+OdAQGYJ
]
R\J2wRHBHkGYBSP%Jfw2\YBH\ kGRSW]bOdRGY\qJ2BSwP
BHRH GYkP%w2BS\P^BHw2RSw9Ods+Dw2OdBSv+waP^i s+w2Odw2DRSw2OdOds+waG+iKw7I _ p OdJ2G0vRS VGbGY\%J
P^c\Ww2OdOQw2IrRSYs+G\^]
Gf\rp X+\^BHvG\^GY@;X+@WJERSGfN1w2+kRSOQJ]WP
w2YBH\9G+\^IKDK\(\^G<BSw;Jw2RSs+BHw2wzAdO GYw2Y\%X \
4
ZOdw2\^OQ\%wa}JF\%\^GY\^w2]
Y\\`0`0RSRSGYGY\^i\^iOQJOdG+v+BHG6s+w~OdG+GYv+Rs+X+wf\FGYBHRGYX+X\rw2p=Y\FBS+P^OQJFw2OdD<OQJ_
cBHIKOdIr\^s+G<w]
\ J.xyRS GY\^xyiYnRpX+\^A+Yk\RS]}.B;+w2^YS\nGYklw|RX+\^\"DK\^xyG<wzRX+w2\^YAQJ.\nklw}YRH\^kw2w2YY\^\] J2 YRqGf
OdG aca OdIrs+]
\ Y p a r. |5
#
%
GDrBHw2Ad+sOQBHJ(w
J
RS\%]fP^w2BHOQGYRSGX}w2Y\\ RS s+w2P^Adw2O OQGYRS\G6OQX+
\q]
BSRSJ(vRrP%J
RS\^GY]qP%M\^]2w2G+YOd\%G+J
I0\bw2BHY]
\\
-
2 0
DrP^s+BHAd]2sY]
\^\?G<kwyRS]WDrBHw2AdYsY\~\BHBHIKGY\^XG<w w2J;Y\]
\%v+J
RS]
s+RSZ]
P%BH\(Z+OdxyA OdwaRSiGY\^RHiKkWMw2IrYOdDK\f\^Gb\^DKw2\^YG<\w kJ2RSw2Od]A kls+v+w2]
s+\^]
AdO\;x~]
Od\%GJ
BH\q]2BHiK]
MP
.BHp GYXJ2YRSs+AQXZ\FJ
\%\^GBSJ9w
RSv+OQP%J
v+ k] RSw2+ZOQBHJ Z+Od.A Od+wai^w2GYBHRwX+\?w2YOQ\~JWw2OdGb+] w2\qYBH\(w?J2wBHOdA w |\?Zkj\~BHAQPqJ BH\r]2M]2w2OQY\%X\(RSRSs+s+w7wq_ p OdGf\n[v\%P^w \%X WR BSJOdG+v+s+w4
v+BSJ;s+w|w2YGY\9RX+OdG+\;v+bs+wSGYnRX+\OdbA SB%DKnr\ Mw2OjYp \r\ p9JBHGYxyR\w2YX[\nOQJ2klw;w2]2OdZ+BSs+Jw2OQRRSPnG _ rp|T2RqRriKJ2p Odw2OdDK\(P BHG+IK\?OdG0\n[v\%P^w \%X WR ] \%J2s+Adw J OdG
Pk^\%s+P^]2w
]
\%\%XXp M|J
kR0Odw2wYOQ\FJ?BHOdGIK\^w2G<Yw \J~J2xywBHRSw
GY\y\^w2i]2sY\rBSM"J9w2YGY\RSw9\^DKZ\^\%G<\^w(GBUBSkJ_ [p|NW\^I<BHw2OdDK\ P
BHG+IK\ OdG \n[v\%P^w
\%X WR Mw2Y\
Rw2YP%\;P^s+\^]
DK\%\^XyG<wqJ
RWMBHw2GYYX\ BHw2YIK\^\;G<RSw s+Jw2v+xys+RSwzGYb\^i9SnOdAz+Z\OdA BUw2Y\%P^\^w
Gf\%X~P%RSZ<G[i _ x?P
s+BH]
G+X+IK\^\]WPqRHBHkG+B(GYRSAQRqw(DK\%\qXBSJ2OdRSA GYi0\UZnM+\~]
\%]
\%J2s+P^w2Adw
OJ\%OdXGfJ
l\rRSp]2IY]
pyRqw2Yp \
# [pNWP
\^BHI<G+BHIKw2\9OdDKPq\!BHGP
ZBH\;G+]
IK\%\ P^w2OOdG \%X\n[vOdw2\%YP^w
RS\%s+X w;B~WAQRRSwWMRHk|w2\nYk\ _ RHRHkkw2BFYP^\?ACBSv+J
ACJBHG.RSMs+sYw2v+J
\%s+Xw2w2w
OdG+RIfX+w2\^Yw
\^\~]2x~v+]
OdRSGYZ\(BHw2Z+YOdA\(Odwa\ni[vRHk\%P^J2w
sY\%P%XbP%\%\nJ
kJ_
kRS]2wqM+]
\%J2s+Adw
JOdG0BHG+IK\^]qp kk\%RSP^] wBSRSJ
GyJ
\%w2J
YJ2Od\G+]
I9\%J
w2YRSs+\]
X[P%\%O J|P^RHs+k.AdBHwaw2iKw
p \^x~v+w2OdG+I?w2Y\v+ACBHGBHGYX
YpNWJ2s+\^AdI<wBHRHw2kOdDKB\w2P
+]
BH\qG+BHIKwq\M.w2OdYG\y\nBH[G<vw\%BHP^IKw
RS\%G+X OQJ2Ww(R Pq6BHGBSJ(Z\~BFX+]
\n\n__ +OQJ|BHv+v+]
R%[OdxBHw2OQRSGyxy\qBHGYJ
w2BHw
}\]2OQJ2t?OdIrGYRS]7_
[pNWJ2s+\^AdI<wyBHRHw2kzOdDKB0\P
w2+]
BH\qG+BHIKwq\MOdw2GY\F\n[BHvG<\%wP^BHw
IK\%RSX G+OQWJ2wyR 6OQJ9BSGYJ(\n[BFw9]
w
\nR _ Ir\n[s+vOQJ2\%+P^Odw
G+\%IX ZW\^R wa}\%x\^G B%iYp=GYRSw?BHGYOdX&GYP^Ad[sYpX+\PqBHBFG&ZOk\b\rp X+RSu GYOQ\bJ2w2OdZ<G[i _
]
#[xp9BHu w2OQJzOQOQRSJ2w2YGOd]
G+\%PqIrXBHs+GOQOQJ2J;Z+GYOd\yG+RSI;BSw(P
Z+B\^OQ\^waw2}DK]2Od\%\%DX\^OCBHGZ<A
J2ibxOdw2xsBHw2BHBHw
w2t\^OQRS]qOdG+pGYIJ
EGYN \^BH]
v+xy\v+R[]
X+R%p[\^RSAQO]J_ w2RqYDK\\^u ]
\^P%\^w
RSxy\^x~]2RSx~Odw2G+OQOdRSIG+GYOdw2G+JfYIF\OQJ0w2v+Y]
B\9RSZ+x~J2AQw2\^O]
x[\^w2G+s+BHIr]
GYw2\X9RHYRHk9k|Rqw2w2YYJ
\\(\^]2OdX[G[OQRSO YsYsYJ"P^\^s+w2GYYAdP%wa\\~i&\nRSRHG _k
RHX+kW\^w
w2\^Y]2\x~OdX[G+O OdG+I0P^s+YAdwaRqiRHX[kWO IK\^P^w2s+w2OdAdG+w(IOdw?]
\%OQJJ
RSw
s+Rf]
P%Z+\%]2JqOdM}G+IBHGYw2XY\w2Y\n\^G _ vOdG \%P^\nw
[\%Xv\%WP^w
R \%FX OQJWBUR \%P^OQw
Jq\%p XMrcOjp \rOdGYpSX[YOdRqG+IACw2BHY]2IK\\J2w2w2Y]
\\^G+P
Irw2BH5G+IKOQ\J
$
vw2Y\%\P^w
\^\%DKX \^G<WwqRpYZ\%BSJ
P
\tFxyw
RRX+BH\^w AQJ}AQ\qBSJ2OdwA ]
\%J
BH\^wx?OdZ+w AQ\Ww2BSYJ \Zxy\nkRRSX]
\_ X+BURSGY\%P^\Fw
\%X[Xs+M?]2Odw2G+YI\w2J
Y\^\F]2OQX+RSsY\^w
J2\^GY]2\%x~J
J0OdGOQBHJw2OQw2RSYG\RHRSkzs+w2w2Yv+\Fs+w\^xyRHkRSw2w2OQRSYG \
4
\w2^AQJBHwBHAdBS]
P^\qw2BSOdX[DiOdw2OQ\%X+J \%J
BHP^]
]2\WOdZv+\%ACXBHGYp J}kRSY]\bBSP%X[<Os+\^Od]2]
Od\^G+GYI~P%\%]
J\%J
RSs+OdA ]
zP%\%ZJq\ M {}RSx~Yv\ BH]BHP^w
w2RSOQ]qRSpG
]
RSvRrJ
\^]zJ2YRSs+AQXbRSs+w2v+s+wWw2Y\?GY\n[w
w2OdG+YI9\BBSP^]
w2\%OdJ
DRSOdwas+i]
P%wa\riM+vBH\%GYJXyBH]
OdGY\J2X[w
\qOBSX\^]
\^RHG<kw\n[v+vACBH\%GYP^w
J\%kX RS] WBSR P%0<s+w2YOd]7\_ w2+OdG+I0RHkkRSw2Y](\9w2YBH\FIK\^BHG<IKw;\^G<BHGYw~Xbw
RbB~X+AdOQR+J2wWp RH k}w(BSIKP^\^w2w
OdJyDOdBSw2OQJ(\%JqOdG+Mv+]BHs+G+wytKBH\%XG
s+DK\^w2OdG+A OdOQw2\^OQGY\%P%J"\rpOdA +YZ\|\v+B ACBHxyGYJ.\qBSJ2Ods+A +]
\BHAQRHJ
kRw2OdYGY\P^Ad\nsY[X+v\\%BHP^G?w
\%OdX9GYJ2OdGYwBHP%GYRSG[P%\_ Z<BSP^iwyw2YX[O\^Od] \^]
\n\^[G<vw2\%AdiP^w
\%w
RX WBHGR \^pxyu RSOw2OQRS\^G]
\^BHG<AWwzP
BHIKBH\^G+G<IKw
\rJM\rpOdIYA p|]
BS\nJ_
EP
\^G<BSw2J]BHBHGYG P%\?}kyRS]zrl\qBSSP
bS \^xy.KRSw2%OQjRSSGbBSSJ
2J
%RjSP^%OCBH w
\%XOdw2bOdw2 \qBSBHP
G J2wBHG<R&w2AdiB%DKIrRS]
OQRqX5Odw2G+YIY\M|Dr}BH\AdsY\OdA RHkx\^BHxyOdG<RSww2BHOQRSOdGGYJbBZAd\^OQJ2Odw~G+I6RHk P%w2RSYG[\_
.
BSBSP^P^w2w2OdOdDDOdOdwawaiKiKp
MWx?Ys+\ Adw2Odv+P^Adw2iOQRSw2GY
\]
X+RSRSvw7Rr_jJ
v+\^]
]|RX[sYOdA P^Yww2YRH\^k GykRS]|\qBHBSGYP
X
EF
P%P%RSRSG<G<w2w2]2]2OdOdZ+Z+s+s+w2w2OQOQRSRSGYGJWkjBSw
RX+\~RqDK\^M];BHGYw2OdXFxyAQ\r\^pyw Vw2YRS\xyDr\~BH\^AdsYDK\%\^JG<w
RHJkx\qBSB%P
iK M
BHwW]BHRWJ2GYOdB%G+X+DKIRSRSx OQRSX?G+sYv+AdiJ2]
Od\%G+w2X[+I~OQP^OQJw2wYBHZ+\%J
OdOdA\;A OdwaDri]
BH\%}AdJ2sYs+\\%AdJwOdBSOdA GJ<v+}w2OQY\^P
t?Od\bIrBH<BHG(w
IKJq\^BSp G<P^w2w
OdJFDOdBHwaAi _
EV = QoL EP EF
OdGYP^AdsYX+\rp
EP
xyBSJ
RSJ
\%]
\~J
J]
w2\%YJ
\|RSs+X[]
O P%\%P^Jqs+p Adwa iJzRH}k\]
\UBH7AdBS]
P%\q<BSs+X[ibOd]2OdGYG+\%IW\%]
X\%J
xyRSs+R]
X+P%\^\%AQJ.Jkw
RSR ]
0 ) ),2 2
w2v+ G]
RKBHw2wBS+P
OQGYJ\%J
w
\%\%RXP^w2Zw
OQRS\?RGfOdZx~}\0v+\zAQJ2\^vxy\%OdAP^\^OG<IKw
\%R9\%XXw2+kP%RS]
RSRS]x~s+w2Irv+Y0AQ\0\^w
BHRS\^AdAds+iKw2w2p?YAdO \zGY\%w2\~X6+OdBHG+BHOdIKv[x J_
w2\^YAQJ~\k
RS]DrBHX+Ads\^w
BH\^w
]2RSx~]qMYOdG+}Od\;G+Ix~OdIrY<\^wzw2BSYJ\^]}w2\^YAd\sYBHJ
\zIK\^w2G<Yw
\%JJ
\;J2YxyRSRs+XAQX _
NWP^RSw2w
OQ\WRSGw2
BHw]
RSw2v+RrOQJJ
\^BH]?v+v+OdG]
RKw2BSYP
\FJOQBHJxyx?\~iKRSv+B%OQiPrM+BSZ+J(s+wBSP^Pqw2BHOdDGFOdw2\qOQBS\%J7Jq_ p Bw2H]2]
Od\Z+s+IrAQw
RS\%ZJWBHRHAk
Drw2BHYAdsY\(\%BHJqIKM\^\rG<p IYw
Jqp9Mw2\rYp IY\ pJ
P
\qBSBHJ
RS]2OQG.J2xMRSB[w2MY\^]
+JOdAQBH\?]
\zRSw2BH[w7__
EV
Od<A sYi?\^ZG<\w
\nv+[ACBHvGYBHJqGYp"X+\%X~+OQw
JqRWMSYOdGYRqP^}AdsY\^X+DK\^\ ]qBHMG<P%i(RSxyG<s+\%Jx?BHZw
\^w2]|YRH\kP%J2Rrs+J2ZYw
J
RH\n_k
Od\^G]
+JOdw2BHAQY\]
\0\xyRSI<RSZ[BHGYT2xy\%\^iP^\nw
_j}J"x~RSRSOd]Ir]2AQ<BHXwWZYMzJ2Z\rw2\]p IYBSBHpP^Gbw2BOQRSBHGYv+ZYJ
AQJ2RSw2w2s+]BSIrBHP^w
w2OQPqRSOQJFBHG.G9p}BHZGNW\|R~RSkZ[RSxs+T2\%BHGYP^w7X w_
P%BHB RSGyIrx?]
\n\%Z+[\%OdvX[GRSi9BHGYw2BH\^OQRSv+G<Gbw2v+OC]
BHRHRKA+k
BSIrv+P
]
~ACRqBHxGYw2JB%yi?w
RyOdG~Z\^\w2DrYBHBH\}w2Adw
sG<\^BHs+x~w
x?\rv+p Zw
\% \^X]|AdMKw
RH\^k]2YGv\^BHRr]
w2J
\}OdJ2DKOdw2Z+\^YAdAQi \\ w
w2Y\^]\bDrBHAdBHsYw\%w2JY\^kl]
ifRSxBH]
\rw2M+Y}\b\;I<GYBH\%xy\%X\rp=w
R~tYGY\ Rq&W}R \^]
\;RHk~w
R9\qBSIKP
\^ w
Y+]
OdIrJ2Ywz\%J2J2w
w\^v0\n[OQvJW\%w
P^Ryw
\%YX GYWX0R w2Y\(BUv+klACw
BH\^G.]MJ2RSs+]zZ+w2v+]ACBSBHP^GYw2JqOdG+MI9w2Odw2Y0\Ww2\nY\ _ ]
J2\%vJ
\%RSP^s+O]
P%\%\XMZBH\^AQOdRSG+G+IbIbOdGOdB0w2J2vB\%P^J2OwBHPGYXYJ2BHwBH]
Xw
\v+x?]
\nsYkJ2\^wy]
\^BHGYAQP%J
\RkZRS\]
PqBHw2OQRSG.p
EF
N YOdtK\RSACBHBUs+T w2Y RSi]
AQJyX[Od}IRSks+RSAQ]XDrOdBHGAdsvBHBHZ+]2AQw2\OQP^s+X[ACOQBHJ
P^]sYAdJ
O J2tKOQ\FRSGYw
JRRSw2GBH+G+OQt J
- 4+
0 */ 0 32
BJ2SYP^w2RSOds+DAQOdXwai0BHwaAQJ
iRv\%ZJ\(BHIrGYOdX0DK\^w2GY\yByBSkjBSP^P^w2w
OdDRSOd]zw2OQk\%RSJq];p(\q|BSBSP
P
\^xyBSP^RSw2w2OdDOQRSOdwaG.i M OQxX+\qBHBSw2OdJ9DK\~RSGBHG+w2IrYAQ\\rp;`YV\~BHs++w2OQYP
RS]
JW}\}RSs+B%AQDKXb\AdIrO tKOdDK\(\^w
GRBfw2GYBHRSG+]7t _
vsYBHIKJ
Rr\%\^J
X9G<\^]qwkp RSJ2s+]}wx1PqxBHAQw
P^B%Rzs+iw2ACYBHZw2\\zOdG+JIKBHI9RxyRw2YX\}\ DrOQX+BHAd\qsYB9\rM<w
MKR9Z<J
i~RzAQ\^w2w2w YBHBH\ Adw" YP^GYw2\nOQRS_jw2Gks+RSG+
] Od]
G+BHRHGI _ @;s+@WIrEOdGNWJqkRSp.]v+\|]
Rq}DRSOQs+X[AQOdXyG+IBHAQBJ
Rw
AdRO tKRS\|Aw
kR RS]w2J2BHvG+\%tP^Ow2YPq\BHw2BHOQGYRSRSGG<i<RH_k
w2v+Y]
\\nkZ\^]
\^\^GYB%P%D\%OQJqRSp ]}RHkBHIK\^G<w
J}PqBHGZ\X+RSGY\ J
RSAQ\^Adi~RSGyw2Y\
EV
EF xyRSsYJ]
\^DOQ\^}\^]
JkRS]DrBHAdsBHZ+AQ\P%RSx~xy\^G<w
Jqp
w2Y\
ACJBHBHGYxyJ~\kRSBS]yJfBSBSP%<P^w2s+OdOdD]2OdOdw2G+OQI\%JqMW]
\%J
RSOdw2s+&]
P%\%OdGYJP%J2RSYG<RSDKs+\^G+AQXOQ\^GYOdGYP%P^\AdsY]
X+\n\ _ 0 0 30 Y | 0 2
v+J2YACBSRSP^s+OdAQG+XI OdGYP^WAdsYR X+\|p]
\%c+J
s+RS]2s+w2]
YP%\^\%]2Jxyw2RS]
BH\wOdG[w2YYsY\$\^GYJ2P%v\|\%w2P^YO\|Pqv+BH]
w2RSOQRSZ[G _
!#"%$'&)(+*-,/.1024365879,;:=<?>83651@BADCE:-51FG3652H@BI6I 36J
7%@B51<?@BKL ML$'"N O%P=P-QRO%S=&
OkdBHG[\%Z+P^YOdw
AsY\%Odwa\^XiGYP%Z<RH\yik?w2w2YBHY\~w2\w
v+\^v+x~]
ACRSBHv+ZGw2BHOdZ+G+ZOdI1A\^OdOdwaG+w2iKYIM\YJ2RqsYv+P%ACP%BH\%]
G.J
\%MfJ7J
klRSs+BHs+GYAj]
MXP%Y\%J(RqY!RqBH ]
\w2Y]
BU\^\nki __
T)U VV#W'?MYX+%Z[\Z?W' ]^[%Z#`_'&=&=&a3b:=cBI *ed\dD
T f-ghW'[=_'&=&%_i@Bj-@BADkl3652H@BAmin3 7o>82bpD
<y q
`m\Zf=8s= U f=[=W'MG1f U U #"==+1>@Y(+*7'J
5362436j-@v`24AD0<B240A@*t),;*-243b*-5pD)W']M=[/ =[\Q
cYRHkRS]fw2Y\q\9BSP
BH&ZRqwaDKi\rvM\YRHk\^]
\?IKw2\^YG<\9wfBHw2IKY\^\G<wzv+wa]
i\nvk\^\(]
\^X[GYOP%\%\^Jf]
JRHklky]
RSBHx Ad
\ ZsY~+[#
m T W'\Z[#?1/_'&=&%_mdD52H@R7'A:-24365872>@(h(uE*#F=@BI*
v+BHG&]
Od\nA .kB%\^ZDK]
\z\^\^]GYsYBHP%J
IK\|\%\XOQJ.pBHs+IKGY\^YJ2G<v\(wq\%MWBHP^GYIKO\%\^\%G<\%X?X+wJw2waYiw
v\|R\J2wZPqBH\bBHGYG0XYJ2vBHBH]
\%AQXJ
P^R~Ov+Od]
\%GY\nXP^kp\^AdsY]
\^X+ GYk9\?P%BB\
t),;*-243b*-5p365t),;c?*#F-3b@?F(>:-A:=<B2H@BA?pDH~+fB[[#M8Q
%f'|Z}[Gf=\}f=f=m \ZW U f=8=[?W-Z f=W U
}W'?W=DZ[?N/ U W-Z f='KL[BZ}fM'W'M|[#\[#W'??}
}W U U [=[#KL[ U `f=[=
J2RHwk}BHw2GYXYBHBHw] Xywai]vBH\9G+IK\OdAkRS]X[Ow2Y\^\]z] kl\%] J RSRSxs+] P%w2Y\%J\B%YDK\^\^]] BH\WIKBH\GBHBHIKIK\^\^G<G<wqw p T W'%+~! r+ U U []^ |_'&=&=&^*.1J\F=*-kl5(+*-5J
w2 wYIK\|J2\^w2B%G<OdDKAw\^wax]iBHvBHIKtK\%\J
\%BHJxIKJ
\^B%\^G<iGYwqJ
MrBH\z\rAQJ
pw
IYRzR9pqw2AQ\^YB%wDK\
\ Z+s+AdJi;J
J2was+w2ix=vBH\
w|w
xyR(X[O\^w2G<Y\^w2\;]
OQRSJklGYBH]
RS\%xyXx \ p
pD24AD0<B243b*-5:-51Fm@R.@B24362436j-@v`24AD0<B240A@Dpm@R.1A@Dp@B52H:-J
243b*-5365m:-C=@DpD3b:-5i@B24k*-Aop/H~+fB[[#M %mf'lZ}[
h} \Z[[%Z}YH%Z[W-Z f=W U`U f=MW/\Z `BW U H%Z[ U U Q
EF =[B[8fB [BZs9f=[[B[=W'=[#|_'%_oQ_'=S
DrZBH\qAdBHsYw2\rOdG+M.If\rp IYvp\%RSw2v+YAQ\9\~Z+s+s+vAd i0OdwaA i|vX[\~]
RSBHvIK\^xyG<w RS]
J\~OdGYw2P^AdBHO GGBHw2w2YOQRS\GB%w
D<R _
EF
w2BHYIK\z\^G<v+w]
\nRHkk\^w2]
\^GYBHwP%\%waJiv\rYM\^GY]
\;RSw
Od\Www2X[OBHw\^]
v+J]
kl\n]
kRS\^x*]
\^GYB9P%\%J2J}wBHkGYRSXY]BHBS]
PnX _
w2 k}OdDJ
OdRSwai xyU\9v+AC]
BH\%GJ
RSwas+i]
vP%\%\9JGYBH\%]
\%\ X++J;+w
\%RXZOdGY\(J2OQ+X++\z\%XBHGFBHBHw?IKB\^G<J2wvwa\%iP^vO\rP p
Abstract
Loopy propagation provides for approximate reasoning with Bayesian networks. In pre-
vious research, we distinguished between two different types of error in the probabilities
yielded by the algorithm; the cycling error and the convergence error. Other researchers
analysed an equivalent algorithm for pairwise Markov networks. For such networks with
just a simple loop, a relationship between the exact and the approximate probabilities was
established. In their research, there appeared to be no equivalent for the convergence er-
ror, however. In this paper, we indicate that the convergence error in a Bayesian network
is converted to a cycling error in the equivalent Markov network. Furthermore, we show
that the prior convergence error in Markov networks is characterised by the fact that the
previously mentioned relationship between the exact and the approximate probabilities
cannot be derived for the loop node in which this error occurs.
44 J. H. Bolt
convergence node of a loop. A convergence node
1
combines the messages from its parents as if the
parents are independent. They may be depen- Pr(c)
exact probabilities which are denoted by Pr. Figure 2: The probability of c as a function of
Moreover, we studied the prior convergence Pr(a) and Pr(b), assuming independence of the
error. Below, we apply our analysis to the ex- parents A and B of C (surface), and as a func-
ample network from Figure 1. For the network tion of Pr(a) (line segment).
in its prior state, the loopy-propagation algo-
rithm establishes
X
line segment that matches the probability Pr(a)
f
Pr(c) = Pr(c | AB) Pr(A) Pr(B) from the network and its orthogonal projection
A,B
on the surface. For Pr(a) = 0.5, more specif-
f
ically, the difference between Pr(c) and Pr(c)
as probability for node C. Nodes A and B, how-
ever, may be dependent and the exact probabil- is indicated by the vertical dotted line segment
ity Pr(c) equals and equals 0.65 0.5 = 0.15. Informally speak-
ing:
X
Pr(c) = Pr(c | AB) Pr(B | A) Pr(A)
A,B the more curved the surface is, the larger
the distance between a point on the line
The difference between the exact and approxi- segment and its projection on the surface
mate probabilities is can be; the curvature of the surface is re-
flected by the factor x;
f
Pr(c) Pr(c) =xyz
where the distance between a point on the seg-
ment and its projection on the surface de-
pends on the orientation of the line seg-
x = Pr(c | ab) Pr(c | ab) Pr(c | ab) + Pr(c | ab)
ment; the orientation of the line segment is
y = Pr(b | a) Pr(b | a)
reflected by the factor y;
z = Pr(a) Pr(a)2
46 J. H. Bolt
(ab) = px
(ab) = (1 p)x
matrices plus the other incoming vectors. Sub-
(ab) = q(1 x) sequently, he showed that the reflexive matrices
(a, a0 b0 ) = 1 (b, a0 b0 ) = 1
0 0
(ab) = (1 q)(1 x) also include the exact probability distribution
(a, a b ) = 1 (b, a0 b0 ) = 0
(a, a0 b0 ) = 0 (b, a0 b0 ) = 1 and used those two observations to derive an an-
(a, a0 b0 ) = 0
A (b, a0 b0 ) = 0 alytical relationship between the approximated
(a, a0 b0 ) = 0 (b, a0 b0 ) = 0 and the exact probabilities.
0 0
(a, a b ) = 0 B (b, a0 b0 ) = 1
(a, a0 b0 ) = 1 (b, a0 b0 ) = 0 M 1T
(a, a0 b0 ) = 1 (b, a0 b0 ) = 1 O1 O2
X M1
D1 D2
L1
0 0 0 0
(a b , c) = r (a b , c) = (1 r) L2
(a0 b0 , c) = s (a0 b0 , c) = (1 s)
(a0 b0 , c) = t (a0 b0 , c) = (1 t)
C Mn
(a0 b0 , c) = u (a0 b0 , c) = (1 u) Ln
nT
M
Figure 4: A pairwise Markov network that rep- Dn
resents the same joint probability distribution O n
48 J. H. Bolt
expected. We indeed find that the approxima- A first observation is that 1 equals 1 and the
tions (1 x, 1 (1 x)) equal the exact probabil- other eigenvalues equal 0, but the exact and the
ities. Note also that, as expected, the messages approximate probabilities of node X may differ.
from node A back to itself do not change any This is not consistent with Equation 1. The ex-
more after the first cycle. As in the Bayesian planation is that for node X, the matrix P , is
network, for node A no cycling of information singular and therefore, the matrix P 1 , which
occurs in the Markov network. For node B a is needed in the derivation of the relationship
similar evaluation can be made. between the exact and approximate probabili-
We now turn to the convergence node C. In ties, does not exist. Equation 1, thus isnt valid
the Bayesian network in its prior state a conver- for the auxiliary node X. We note furthermore
gence error may emerge in this node. In the con- that the messages from node X back to itself
version of the Bayesian network into the Markov may still change after the first cycle. We there-
network, the convergence node C is placed out- fore find that, although in the Bayesian network
side the loop and the auxiliary node X is added. there is no cycling of information, in the Markov
For X, the following reflexive matrices are com- network, for node X information may cycle, re-
puted for the clockwise and counterclockwise sulting in errors computed for its probabilities.
messages from node X back to itself respec- The probabilities computed by the loopy-
tively: propagation algorithm for node C equal the nor-
malised product M XC v, where v is the vec-
M X = M BX M AB M XA D CX =
" # tor with the approximate probabilities found at
px px q(1 x) q(1 x)
(1 p)x (1 p)x (1 q)(1 x) (1 q)(1 x) node X. It can easily be seen that these ap-
px px q(1 x) q(1 x)
(1 p)x (1 p)x (1 q)(1 x) (1 q)(1 x) proximate probabilities equal the approximate
probabilities found in the equivalent Bayesian
with eigenvalues 1, 0, 0, 0; principal eigenvector network. Furthermore we observe that if node
((px+q(1x))/((1p)x+(1q)(1x)), 1, (px+ X would send its exact probabilities, that is,
q(1 x))/((1 p)x + (1 q)(1 x)), 1) and Pr(AB), exact probabilities for node C would
other eigenvectors (0, 0, 1, 1), (1, 1, 0, 0) and be computed. In the Markov network we
(0, 0, 0, 0). thus may consider the convergence error to be
founded in the cycling of information for the
M X = M AX M BA M XB M CX = auxiliary node X.
" #
px (1 p)x px (1 p)x
px (1 p)x px (1 p)x In Section 3, a formula for the size of the prior
q(1 x) (1 q)(1 x) q(1 x) (1 q)(1 x) convergence error in the network from figure 1
q(1 x) (1 q)(1 x) q(1 x) (1 q)(1 x)
is given. We there argued that this size is de-
with eigenvalues 1, 0, 0, 0; principal eigenvector termined by the factors y and z that capture
(x/(1x), x/(1x), 1, 1) and other eigenvectors the degree of dependency between the parents
(0, 1, 0, 1), (1, 0, 1, 0) and (0, 0, 0, 0). of the convergence node and the factor x, that
On the diagonal of the reflexive matrices of X indicates to which extent the dependence be-
we find the probabilities Pr(AB). As the correct tween nodes A and B can affect the computed
probabilities for a loop node are found on the probabilities. In this formula, x is composed
diagonal of its reflexive matrices, these probabil- of the conditional probabilities of node C. In
ities can be considered to be the exact probabil- the analysis in the Markov network we have a
ities for node X. The normalised vector of the similar finding. The effect of the degree of de-
component wise multiplication of the principal pendence between A and B is reflected in the
eigenvectors of the two reflexive matrices of X difference between the exact and the approxi-
equals the vector with the normalised probabil- mate probabilities found for node X. The ef-
ities Pr(A)Pr(B). Likewise, these probabilities fect of the conditional probabilities at node C
can be considered to be the approximate prob- emerges in the transition of the message vector
abilities for node X. from X to C.
50 J. H. Bolt
Preprocessing the MAP problem
Janneke H. Bolt and Linda C. van der Gaag
Department of Information and Computing Sciences, Utrecht University
P.O. Box 80.089, 3508 TB Utrecht, The Netherlands
{janneke,linda}@cs.uu.nl
Abstract
The MAP problem for Bayesian networks is the problem of finding for a set of variables an
instantiation of highest posterior probability given the available evidence. The problem is
known to be computationally infeasible in general. In this paper, we present a method for
preprocessing the MAP problem with the aim of reducing the runtime requirements for
its solution. Our method exploits the concepts of Markov and MAP blanket for deriving
partial information about a solution to the problem. We investigate the practicability of
our preprocessing method in combination with an exact algorithm for solving the MAP
problem for some real Bayesian networks.
In our further research, we will expand the J. Park and A. Darwiche. 2003. Solving MAP exactly
experiments to networks with different numbers using systematic search. In 19th Conference on
Uncertainty in Artificial Intelligence, pages 459
of variables, different cardinality and diverging 468.
connectivity. More specifically, we will investi-
gate for which types of network preprocessing is J. Pearl. 1988. Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. Mor-
most profitable. In our future experiments we gan Kaufmann Publishers, Palo Alto.
will also take the effect of evidence into account.
We will further investigate if the class of MAP
problems that can be feasibly solved can be ex-
tended by our preprocessing method. We hope
to report further results in the near future.
Abstract
Hierarchical latent class(HLC) models are tree-structured Bayesian networks where leaf
nodes are observed while internal nodes are hidden. We explore the following two-stage
approach for learning HLC models: One first identifies the shallow latent variables
latent variables adjacent to observed variables and then determines the structure among
the shallow and possibly some other deep latent variables. This paper is concerned
with the first stage. In earlier work, we have shown how shallow latent variables can be
correctly identified from quartet submodels if one could learn them without errors. In
reality, one does make errors when learning quartet submodels. In this paper, we study
the probability of such errors and propose a method that can reliably identify shallow
latent variables despite of the errors.
X2 Y1 X3 X2 Y1 X3
U V T W U V T W
Y2 Y3 Y4 Y5 Y6 Y7 Y2 Y3 Y4 Y5 Y6 Y7
D2D: The generative model was a dogbone, and 500 1000 2500 5000
D2F D2D D2F D2D D2F D2D D2F D2D
QSL produced a different dogbone. BIC 95.3%0.00% 87.0%0.00% 68.2%0.00% 51.5%0.00%
BICe 95.0%0.05% 86.8%0.00% 67.8%0.04% 50.6%0.04%
The statistics are shown in Table 5. To un- AIC 71.2%2.71% 60.5%1.58% 46.7%0.54% 39.0%0.38%
CS 88.5%1.93% 83.4%0.74% 67.0%0.30% 51.5%0.13%
derstand the meaning of the numbers, consider Total=10023
the number 0.83% at the upper-left corner of
the top table. It means that, when the sample i) Which quartets should we use in Step 1?
size was 500, QSL returned dogbones in 0.83% ii) How do we determine sibling relationships
percent, or 83, of the 10,011 cases where the in Step 2 based on results from Step 1?
generative models were forks. In all the other iii) How do we introduce SLVs in Step 3 based
cases, QSL returned the correct fork structure. on the sibling graph constructed in Step 2?
It is clear from the tables that: The probabil-
ity of D2F errors is large; the probability of F2D Our answer to the second question is sim-
errors is small; and the probability of D2D er- ple: two manifest variables are regarded as non-
rors is very small, especially when BIC or BICe siblings if they are not siblings in one of the
are used for model selection. Also note that quartet submodels. In the next two subsections,
the probability of D2F decreases with sample we discuss the other two questions.
size, but that of F2D errors do not. In the next 5.1 SLV Introduction
section, we will use those observations when de-
signing an algorithm for identifying shallow la- As seen in Section 3, when QSL is error-free, the
tent variables. sibling graph one obtains has a nice property:
every connected component is a fully connected
In terms of comparison among the scoring
subgraph. In this case, the rule for SLV intro-
functions, BIC and BICe are clearly preferred
duction is obvious:
over the other two as far as F2D and D2D er-
rors are concerned. It is interesting to observe Introduce one latent variable for each
that BICe, although proposed as an improve- connected component.
ment to BIC, is not as good as BIC when it
comes to learning quartet models. For the rest When QSL is error-prone, the sibling graph one
of this paper, we use BIC. obtains no longer has the aforementioned prop-
erty. Suppose data are sampled from a model
5 Quartet-Based SLV Discovery: An with the structure shown in Figure 3 (a). Then
Algorithm what one obtains might be the graphs (c), (d),
or (e) instead of (b).
The quartet-based approach to SLV discov- Nonetheless, we still use the above SLV in-
ery consists of three steps: (1) learn quartet troduction rule for the general case. We choose
submodels, (2) determine sibling relationships it for its simplicity, and for the lack of better
among manifest variables and hence obtain a alternatives. This choice also endows the SLV
sibling graph, and (3) introduce SLVs based on discovery algorithm being developed with error
the sibling graph. Three questions ensue: tolerance abilities.
levels of the algorithm: the errors made when 2,500 and m = 5. This is also expected. As m
learning quartet substructures (QSL-level), the increases, the number of quartets examined also
errors made when determining sibling relation- increases. For two manifest variables U and V
ships between manifest variables (edge-level), that are not siblings in the generative model,
and the errors made when introducing SLVs the probability of obtaining a dogbone (despite
(SLV-level). Each number in the table is an D2F errors) where U and V are not siblings also
average over the 10 generative models. The increases. The number of fake edges decreases
corresponding standard deviations are given in with m because as m increases, the probability
parentheses. of D2F errors decreases.
QSL-level errors: We see that the proba- SLV-level errors: We now turn to SLV-
bilities of the QSL-level errors are significantly level errors. Because there were not many miss-
smaller than those reported in Section 4. This is ing edges, true sibling clusters of the generative
because we deal with strong dependency models models were almost never broken up. There are
here, while the numbers in Section 4 are about only five exceptions. The first exception hap-
general models. This indicates that strong de- pened for one generative model in the case of
pendency assumption does make learning easier. sample size 500 and m = 3. In that case, a man-
On the other hand, the trends remain the same: ifest variable from one true cluster was placed
the probability of D2F errors is large, that of into another, resulting in one misclustering er-
F2D errors is small, and that of D2D errors are ror (MC).
very small. Moreover, the probability of D2F The other four exceptions happened for the
errors decreases with sample size. following combinations of sample size and m:
Edge-level errors: For the edge-level, the (500, 5), (1000, 5), (2500, 5). In those cases,
numbers of missing edges (ME) and the number one true sibling cluster was broken into two clus-
of fake edges (FE) are reported. We see that the ters, resulting in four latent commission errors
number of missing edges is always small, and in (LCO).
general it increases with the number of standing Fake edges cause clusters to merge and hence
members m. This is expected since the larger m lead to latent omission errors. In our experi-
is, the more quartets one examines, and hence ments, the true clusters were almost never bro-
the more likely one makes F2D errors. ken up. Hence a good way to measure latent
The number of fake edges is large when the omission errors is to use the total number of
sample size is small and m is small. In gen- shallow latent variables, i.e. 13, minus the num-
eral, it decreases with sample size and m. It ber of clusters returned by DiscoverSLVs. We
dropped to 1.3 when for the case of sample size call this the number of LO errors. In Table 2,we
68 B. R. Cobb
dc O XU^jR^j7>Uv]d^`Z<2^`a&,TST7]c^2XwZB,/R^j7>Um7WVR ]OcdO XUd^ 7WVT^jRaeZ\7>U>R^gUf]7>fanP,/.O?U>RaB_]7@[OBWOB.Qv4R ]OlnbX
R^j7>UeP.79P679arOQcq8>01fwS ,9Udc],WogS_OB. 7>Uy/}W}9|3 .OQPd. OQarO?U9R,/R^j7>U7WVR d^`amS_OBR ]7c X.<arRfdarOQa_,c^`a
;
JB GI
3 stOB' R U 8O",*S^uOQc Z\.OBRrO,WPPd.7?u]^gS,/R^j7>URr7mR ]OZ\7>U9R^gUf]7>fdaicdOQZB^`a ^j7>U
c^gS_O?Uda^j7>Ud,Wo4{,/. ^`,W8wojOW3 sOBR P . . . {v /,/. ^`,W8ojOW3
Q . . . v,9Ud" c S . . . 286O
R ]Oc^`aZ\. OBRrOiZd,9UdZ\OWvZ\7>U>R^gUfd7>fda)Z<,9UdZ\OWv,9UdccdO\ @WA;o m q bqp F wh H g H f^ S i I o H Hji
ZB^`a^j7>U/,/. ^`,W8ojOP,/. Ra7WMV Uv. OQaP6OQZ\R^jWOQoj0WvTb^jR dfdPwP79aOr)!s^`a),9U^gUdPwfdR&l)XP7WRrO?U>R^`,WoVI7W. U"
,9!Un#"l$)&%XP67W(RrO?'YU9R3^`,Wx VKfUZ\R^j7>U*) Y Z,+- ./ ^`a
o^jVY7>U]O47WVYR ]OU]O\u]RbR ].OBOZ\7>U
Pj 0 Qt0 S.OQPd.OQaO?U9R^gU]k_,T[NiVI7W.u#H Qk9^jWO?U
^jRabPw,/.O?U9Ra Uwv x 93 pV[OZB,9UqWOB. ^jVI0mR ,/R
c^jR^j7>Uda]79o`caB y
z{ ) s j V D " D
/3 P1M0 S32T,9Uc4)ZB,9Ue86Oi). ^jRrRrO?Ue,Wa
VI7W.4,Wo`To V[H Y Z6|~} vt-O_arR,/RrOR d,/R)Rs!^`a,9UlnbX
=CB cdO?Uda ^jR0V7W.k43Om,Waa fS_OR d,/R,Wo`o ^gUdPwf]RlnbX
)j VY&)5]Y , 687 , ` O\udP< ?>A`E@ D =GF 9 9 Pd.798w,W8^`o`^jR0P7WRrO?U>R^`,Wo`a^gU,5&Nil)X p N,/.OTU]7W.r
`;: = : { S_,Wo`^jBOQcnPd. ^j7W.)Rr7_R dO4ar79ogf]R^j7>UnPwd,WarOW3
@WA S i> i] cfH I HIB HI p F i I / HI g
VI7W=K.B ,Wo`o V*H Y Ziv]OB.O , ` I-_} . . . J ,9Udc x L>J JBAFGIGiK GK: ;
JQn GK
cdOQaZ\.<^`8OQanR ]Oo`^gU]OQ,/.
>86`@ OB. aQv
3 I/ . . . JvML/ . . .o ,/.O[.OQ,Wo>UfS cdOBRrOB.<S^gUd^`arR^`Zb.OQo`,/R^j7>Uda d^`P86OBR-OBO?U2,4arOBR-7WV/,/. ^
,W8ojOQ!a Q x . . .o 93ExcdOBRrOB.<S_^gU^`arR^`ZP67WRrO?U
y3 P 9 0 S 2",9UcR ]OB. O^`a, P,/. R^jR^j7>U R^`,Wo5E7988,9Udc2d]O?U]7@0Wvy/}W}9|YV7W. Q^`a cdO XUdOQc_,Wa
YN+ . . .oYPO&7WB V YRQ_^gU>Rr74>0P6OB. ZQfd86OQaYa fdZR d,/R ,9UOQ]fd,/R^j7>U
)^`acdO XU]OQcq,Wa
)j VY!&)TSj VY ^jV VUGH YRS y 5] }x w5]!}x : Iz]
]OB.O_OQ,WZV)T_S XW/ . . . XYZB,9U86O_b. ^jRrRrO?U ]OB. O]vr / . . . ,/. OZ\7>UarR,9U9RaQ3 b]O
^gUeR ]OVI7W.<S#7WV!OQ]fd,/R^j7>U{iK^K3 OW3EOQ,WZ<Z)TS_^`a OQ]fd,/R^j7>U 5]2 }cdO XUdOQa_,o`^gUdOQ,/.2cdOBRrOB.S_^gU
,9Unln)XP67WRrO?U9R^`,Wot7>U YPS3 ^`arR^`Z2.OQo`,/R^j7>Uda<d^`Pv!]OB. O 5] ,
D
, v,9UdcdOB.O ,
. . .o, A ,9Udc ,/.O
3 P[0 S]3\ 2i,9UdcVI7W.OQ,WZ<) Xdu]OQcT{,WogfdO4_T^ s H .OQ,WDo UfS> 86OB. aB3b]OTZ\7O 1ZB^jO?U9#R ,A ` 7>U ` H> Q^`a
YP`acbv)dfe g5d[ZB,9Ue86OcdO XUdOQcq,Wab^gUy3 OQ]fd,WoERr7]O?UR ]OecdOBRrOB.S_^gUd^`arR^`ZP67WRrO?U>R^`,Wo)^`a
aP6OQZB2^ XOQce,Wab,_Z\7>Udc^jR^j7>Ud,WoP67WRrO?U9R^`,WotV7W.h ` k9^jWO?U
prUR ]OcdO XUd^jR^j7>U,W867?WOWvhY^`aR ]OUfwS86OB._7WV Qvr ` 3]Ol 9: ;WAXi 5] }x-,/.OS_,W^gU
GJ:JKi ve,9UcjJ ^`aR ]OUfS86OB.7WVnO\u]P67>U]O?U>R^`,Wo R,W^gU]OQc,WaT,cOQZ\7>S_P679arOQcarOBRT7WV[OQ^jk>>RrOQcOQ]fd,{
R dJBA^j. Fkci ZB^g,WUiarOQOW,Wv!ZR ]PO2^jOQP6Z\7WO[Rr7WO?U9VRR ^`],WO-omlnl )XP67WRrO?U9)R^`df,We oKg3Ypr5dUiR VI]7WO. R^j7>UdaBv b^jR .OQPd. OQarO?U9R^gUdkR dO JQGj
? _i VI7W.T,Wo`o
/ . . .<T32b]OT-OQ^jk>9RabR0P^`ZB,Wo`oj0. OQa fdojR
,Wo`o!_T^ s 6GH Y ` e b Z\7>UdarR^jR fdRrA O?9R ]F_OiJBln _i)XP67WRrO?U>R^`,Wo VI.7>S R ]OS_,/.k9^gU,Wo`^jQ,/R^j7>Un7WV,mc^`aZ\.OBRrO{,/.<^`,W8ojOWv
VI7W. Pf+Q LS 93prUR ^`aPw,WPOB.?v,Wo`o)ln)XP.798,{ ,Waba<]7?U^gUeX!ud,9S_PojO4T^gUOQZ\R^j7>Un3 y3g/3
8^`o`^jR0,9Ucnf]R^`o`^jR0qP67WRrO?U>R^`,Wo`ab,/.OOQ]fd,WoRr7TBOB. 7^gU O RrOB.<S 5] }x , P67WRrO?U>R^`,Wo
fUda POQZB2^ XwOQc.OBk9^j7>UdaB3prUn5ENl)XpNaBvw,Wo`oP.798,{ ^gU R ]O arO?UarO R d,/R ,9U90 /,/. ^`,W8ojO `
8^`o`^jR0mc^`arRr.<^`8wf]R^j7>Uda-,9Udcqf]R^`o`^jR02VfUdZ\R^j7>Uda-,/.O,WP] R,/WOQa7>U R dO /,Wogf]O D ` r,A
D
Pd.7?u]^gS,/RrOQce80l)XP67WRrO?U>R^`,Wo`aB3 , e , e , A G/ ,
bdOcO XUd^jR^j7>UPd. OQarO?U9RrOQcdOB.O,Waa fS_OQa2R d,/R b^jR `_ Pd. D 79`_8 ,W8^`o`^jR0` / D ,9` Ud/ c,9U907WR ]OB.D /,Wogf]> Ob^jR `
cdOQZB^`a^j7>U{,/. ^`,W8wojOQa,/.Oc^`aZ\.OBRrOW3 bd^`aP,WP6OB. Pd.798w,W8^`o`^jR0}]3xa_b^jR lnbXP67WRrO?U>R^`,Wo`aBvSfdo
Pd.OQaO?U9Ra!,STOBR d7cVI7W.cdOBWOQoj79P^gU]k,cdOQZB^`a ^j7>U.<fdojO R^`PojO_VI. ,/k>S_O?U9Ra7WVER ]OTVI7W.<S^gUIz]ZB,9UcO XU]O_,
VI7W.,Z\7>U>R^gUf]7>fdacdOQZB^`a^j7>Un{,/. ^`,W8wojO,Wa,TVKfwUdZ\R^j7>U cdOBRrOB.<S^gUd^`arR^`ZP67WRrO?U>R^`,WoVI7W.,_aOBR)7WVY{,/. ^`,W8wojOQra U3
70 B. R. Cobb
6P 7WRrO?U9R^`,Wo ^gU9Rr7qR ]Ol)XP67WRrO?U>R^`,WoK3Tbd^`ai79P6OB. ,{ ]RrOQPe/!5E.OQ,/RrO,cw^`aZ\.OBRrOb,WPP.7Qud^gS_,/R^j7>URr7R ]O
R^j7>U2^`a[cOB. ^jWOQc2V.7>SR ]OSTOBR ]7]cT7WVtZ\7>U>W79ogf]R^j7>Uda Z\7>U>R^gUf]7>fdabcOQZB^`a^j7>Uq/,/. ^`,W8ojO4,9Uce,WPPoj02R ]O
^gUPd.798,W8w^`o`^jR04R dOB7W.0Wv>,WaO\u]Pwo`,W^gU]OQcT^gU5-7988,9Udc P.7Z\OQcf].O^gUnOQZ\R^j7>U3 y3 3
d]O?Ud7?0Wvy/}W}9|3 ]RrOQPy~a^gU]kojOQ,WarRa fd,/. OQa.OBkW.OQaa ^j7>UvZ\. OQ,/RrO
stOBR)dfe g5 86O_,9Ul)XP67WRrO?U9R^`,Wo 7>U U ,qcOQZB^`a^j7>U.<fojOVI7W.R ]O_Z\7>U>R^gUf]7>fdacdOQZB^`a ^j7>U
P,0 Q 0 S ,9UdcojOBR dfe g5 i"}x-86O,ncOBRrOB.r
S_^gU^`arR^`ZP7WRrO?U>R^`,Wo)7>U U 0 P 0 Q 0 S3xa /,/. ^`,W8ojO),Wa[,iVKfwUdZ\R^j7>U7WV6^jRa[Z\7>U9R^gUf]7>fda P,/.r
a fS_O Q" QR0 Q v U U !;0 U v&,9UdcR d,/R O?U>R?Ka\3
OQ,WZ< dfe gfe 5 [}xv / . . . Tv^`ai^gU9WOB. R^`8ojO ]RrOQP5E7>UdarRr.<fdZ\R,cdOBRrOB.S_^gUd^`arR^`ZP67WRrO?U>R^`,Wo
^gU4 H Q Q 3bdO4S_,/.k9^gUd,Wo7W?V x)dfe Wg dfe Jg VI.7>S R dOcdOQZB^`a^j7>U.<fdojOcdOBWOQoj79P6OQc^gURrOQP
VI7W.-,` arOBR 7WV/,/. ^`,W 8ojOQa Usd P#0m Qv ` c0 S U yvZ\7>U>WOB.RR dOZ\7>U>R^gUf]7>facdOQZB^`a ^j7>U /,/. ^
^`a)Z\7>SPwf]RrOQce,Wa ,W8wojO-Rr7,icdOBRrOB.<S^gUd^`arR^`Z-Z<d,9UZ\O&{,/. ^`,W8wojOWv>,9Udc
S,/.k9^gUd,Wo`^jBOR ]O/,/. ^`,W8ojOfa^gU]kR ]OPd.7]Z\O\
)ndfe g dfe g6d?Ze g 5 s cf].O^gUnOQZ\R^j7>U3 y3 y3
b^`ab79P6OB. ,/R^j7>Unb^`o`o8OcdOQaZ\. ^`86OQcnV7W.R ]OZB,WarO
9
)ndfe g
W dfe gfe 5 s 5 s ]OB. O,cdOQZB^`a ^j7>U/,/. ^`,W8ojOd,Wae7>UdOZ\7>U>R^gUf]7>fa
: P,/.O?U>RBvw8wfdR&SfdojR^j/,/. ^`,/RrO. OBkW.OQaa^j7>UnZB,9Uq86O4fdarOQc
VI7W.b,Wo`Bo V s GH Y Z ]OB.O55 s D ` v5G-5 s D ` v Rr7O\u]RrO?UdcR ]O79P6OB. ,/R^j7>UV7W.qR dOZB,WaO]OB.O,
5 5 s D ` vw,9Udc cdOQZB^`a^j7>U/,/. ^`,W8ojO,WamSfdojR^`PojOnZ\7>U>R^gUf]7>fda2P,/.r
O?U>RaB3fdPP679arO2[OqE,9U9RTRr7OQo`^gS^gUd,/RrOe,Z\7>U>R^gU
Wndfe gfe 5 s Y ,
D f]7>fdacdOQZB^`a^j7>U/,/. ^`,W8ojO b^jR Z\7>U>R^gUfd7>fdaP,/.r
, e `_ D `_ ;, e ` / D ` /
, D > f/ , ` O?U>RrVI.7>S"R ]O^gUdf]O?UdZ\Oc^`,/kW. ,9Sfda^gU]kmR ]O4Vf
. a^j7>U,WojkW7W.<^jR Sq3stOBR 86Oe,9Ul)Xf]R^`o`^jR0P67/
b]OZ\7>UdarR,9U>R ^`a , ` ^jVk)dfe g^`a,9UlnbX RrO?U>R^`,WoVI7W. x G 93bY^j. arRBv, 9P679^gU9Rc^`aZ\.OBRrO,WP]
Pd.7?u]^gS,/R^j7>U^`aZ\.OQ,/RrOQcV7W.3X ,WZfd,Wo`^jR,/R^jWO
Pd.798w,W8^`o`^jR0P7WRrO?U>R^`,Wo&,9Udc2^jVN)dfe g^`aT,9UlnbX arR,/RrO7WVR ]Oic^`aZ\.OBRrO,WPPd.7?u]^gS,/R^j7>UZ\7W.. OQaP67>Udca
f]R^`o`^jR0P67WRrO?U9R^`,WoKvt]OB.#O ,A ` .OQPd.OQarO?U>RaR ]OZ\7OBV b^jR ,.OQ,WobUfS8OB. OQc/,Wogf]On^gUR ]OeZ\7>U>R^gUf]7>fa
XwZB^jO?U>R)7>U/,/. ^`,W8ojO$ ` ^gU dfe gfe 5 3 arR,/RrOaP,WZ\Om7WV&R ]O/,/. ^`,W8ojOWv Y $ " " `
93b]O2.OQ,Wo&UfwS86OB.OQc/,Wogf]OQaT,Waar7]ZB^
oBA @BAo S HIQ>o i>o i S i HIo H FJI &w/ HIw gji ,/" RrOQce"7 ^jR q R ]O4]fd,Wo`^jR,/R^jWOarR,/RrOQa,/.O4cdOBRrOB.<S7 ^gU]OQc
l,/.k9^gUd,Wo`^jQ,/R^j7>Umb^jR . OQaP6OQZ\R!Rr74,c^`aZ\. OBRrObcdO\ ,Wa " " ` j * _} . |? " " ` G! VI7W.
ZB^`a^j7>U/,/. ^`,W8ojOq^`a7>Udoj0cdO X6U]OQcVI7W._lnbXf]R^`o`^jR0 * / . . .7 w3d7W.a^gSPo`^`ZB^jR
arR,/RrOQab^`o`o,Wo`ar728Oi.OBVIOB..OQceRr77 ,Wa " + . 7 . .o " " 3
P67WRrO?U9R^`,Wo`aQ3 stOBR 86O,9Ul)X*f]R^`o`^jR0P67WRrO?U 0WvTR dOfd,Wo`^jR,/R^jWO
R^`,WoiVI7W. U" P1G0 QM0 Sv)dOB.O H S3b]O l,/.k9^gUd,Wo`^jQ,/R^j7>U7WV VI.7>SR ]O_f]R^`o`^jR0P67WRrO?U
S_,/. k9^gUd,Wo7WV VI7W.,marOBR7WV!/,/. ^`,W8ojOQa U A T^`a R^`,Wo ^gU>W79ojWOQa XUdc^gU]kR dO){,WogfdOb7WV " R ,/R-S_,{ud^
,9Uel)XfdR^`o`^jR0eP67WRrO?U9R^`,WoZ\7>S_PwfdRrOQcq,Wa S_^jBOQa V7W.!OQ,WZ_P679^gU>RY^gUR dO)Z\7>U9R^gUf]7>fdac7>S_,W^gU
7WV43b]OeVI79o`oj7?^gU]kYP^jOQZ\OZ\7>S_P67>U]O?U>RlnbX
@ Z
B _^5= s s Y S_ z,{u _^5= s > P67WRrO?U9R^`,Wo-VI7W. #^`acdOB. ^jWOQcV. 7>S*R ]Oc^`aZ\. OBR^jBOQc
Z\7>U>R^gUf]7>fa&cdOQZB^`a^j7>U2/,/. ^`,W8ojOr fda^gUdk4R ]OPd.7]Z\O\
VI7W.,Wo`o_T^ a5 s s tH Y Z
]OB.O s s s " 3 cwf]. Oi^gUOQZ\R^j7>U3 y3
l,/.k9^gUd,Wo`^jQ,/R^j7>U7WV&cdOQZB^`a^j7>U/,/. ^`,W8ojOQa.OQa fdojRa^gU $%% ? D ^jV ( D (
,9UlnbXP67WRrO?U9R^`,WoK3d7W._cdOBR,W^`o`aBv-arOBOn5-7988,9Udc
d]O?Ud7?0y/}W}/z]3 D ! & %% 33 D ^jV (
D (
7# ' O] D ^jV (O D (MO
oBA @BA FI / H I n F nt S i HIo H FJI &w/ HIw gji
X[o`^gS_^gUd,/R^gU]k,Z\7>U9R^gUf]7>fdacdOQZB^`a^j7>U"{,/.<^`,W8ojO ]OB. O ( = ,9Udc ( = ,/.OR ]Ooj7@[OB.,9Udc#fdPP6OB.
VI.7>S ,m5ENl)XpN^`ab,R ]. OBO\arRrOQPnPd. 7Z\OQaaQ 867>fUdca7WViR ]Ocd7>S_,W^gU7WVR dOe{,/.<^`,W8ojOZ^gULW
0.75
DRr7W.m7W9V3 . OQ,WstoOBRUfTS 86D OB.OQ c/ ,W" ogf]@ OQB a . . . " 7W@ ViB R ]7 On86OcOQ,ZB^`Wa^jOQ7>ZU
0.5
0.25
Y^jk>f].Ozd)b]OlnbXP7WRrO?U>R^`,WoV.<,/k>STO?U>Rab^`Z<
v^K3 OW3 ^`abR dOcdOQZB^`a ^j7>Un.fdojO4798dR,W^gUdOQcfdP67>U Z\7>UdaR^jR f]RrOR ]Ol)XP7WRrO?U>R^`,Wo ] V7W. ?# 9
93
R ]7 OT . O?ST7?/,Wo!7WVn3_]O_cdOQZB^`a^j7>U.fdojO_^`a4,VfUdZ
R^j7>UdOB.OT D b " =KB ^jV ( = D 9( = VI7W.,Wo`o % '& )6 (+* :
L/ . . .oXY63 @
]OmcdOQZB^`a ^j7>U.<fdojO "^`aRr. ,9UarV7W.STOQcRr7,cdO\ b^`amarOQZ\R^j7>UP.OQarO?U>RamR dO5ENl)XpNa79ogf]R^j7>U
ZB^`a^j7>U.fdoj O R d,/R^`aarR,/RrOQc,Wa,VKfUZ\R^j7>U7WV Rr7R ]OST7>U]79P679o`^`arRBacdOQZB^`a^j7>UPd.798wojO?Sq3xcc^
43#~a^gUdkojOQ,WarRna]fd,/.OQa.OBkW. OQaa^j7>Uvi-O798dR,W^gU R^j7>Ud,WoYUfSTOB. ^`ZB,WocdOBR,W^`o`a7WV[l)XP7WRrO?U>R^`,Wo`aiZB,9U
,o`^gU]OQ,/.OQ]fd,/R^j7>U7WVER ]OVI7W.<S % D i > P > D 3 86OiV7>fwUdcq^gU5-7988vy/}W}93
bdOq{,Wogf] OQ a > > ~,/.OZB,Wo`ZQfdo`,/RrOQc,Wa
]OB.O " B " B . . . " O B ADC , i_e i i I ?d H FJI
R
,9Udc @ @ @ bdOlnbXP67WRrO?U>R^`,WoV7W.ik9^jWO?U @7 < T^`a,y@
/ ! P^jOQZ\O2ln)X,WPPd. 7Qud^gS_,/R^j7>URr7R ]OmU]7W.S_,Wo! Ni
/ 5-798w8e,9UdcddO?U]7?0qy/}W}9/,&b^jR . -|/}W}W} {}
,9Ud0c /W4#B}W} k9^jWO?U }]v,9Ud1c -B}W}W}W}
33 33 . |/} ,9Ud+c / B}W}W} k9^jWO?U /3b]OP67/
/ " RrO?U>R^`,WRo ^VI7W!. ? 7 < Td,WabP67WRrO?U>R^`,WotVI. ,/k>STO?U>Ra
^Y R }[,9Ud% c ^Y R {3
bdOicOQZB^`a^j7>Uq.<fojO4^`a&R ]O?UqaR,/RrOQcn,Wa ]Ol)X,WPPd.7?u]^gS,/R^j7>UeRr72R ]OfdR^`o`^jR0eVfUdZ
R^j7>U V7W[. ? < ^`a[cdO?U]7WRrOQc_80 ,9Udc_^`a cdOBRrOB.r
$%% " ` ^jV > 6 > D " ` S^gU]OQcfda^gU]kqR ]OmPd.7]Z\OQcwf].OTcdOQaZ\.<^`8OQc^gU5-7988
,9Udcd]O?U]7@0Wvy/}W}/z]3 ln)X,WPP.7Qud^gS_,/R^j7>UdaRr7
T D Y & %%' > 67 > ^jD V " ` 6 D 7 " R ]O86OBR, Ni-aV7W.ik9^jWO?U ,/. OZ\7>UdarRr.fdZ\RrOQc
7 ^jV > > > D> # " 7 fda ^gU]kR ]OPd.7]Z\OQcwf].O^gU*5-798w8 vy/}W}93
" bdOQarOl)XP67WRrO?U>R^`,WowVI. ,/k>STO?U>Ra JM d
^`Z<`H TZ\7>UarR^
dOB.OR dO X. #7arRq ,9UdcR d^j. c. OBk9^j7>Udaq7W$V 7 _ D ,/.O R f]RrOiR ]O4l)XP67WRrO?U>R^`,Wo ] VI7W. ?# 9 \,/. O4c^`a
Po`,?0WOQcekW. ,WPd^`ZB,Wo`oj0e^gUqY^jk>f].Oz7?WOB. o`,?0WOQc7>UR ]O
,WccOQcRr7O?Ua f].O R,/WOQa7>U,n{,WogfdOb^jR d^gU^jRa Z\7W.. OQaP67>Udc^gU]k86OBR,2 Ni-aB3]O-lnV7W. ^`a
arR,/RrOTaPw,WZ\OW3bdOcdOQZB^`a^j7>U.fdojO,W867?WOT^`afdarOQcRr7 .OQP.OQarO?U>RrOQc80R dOP67WRrO?U>R^`,W!o \dOB.;O \[K}
Z\7>UdaRr.<fdZ\R!,cdOBRrOB.<S^gUd^`arR^`ZEP67WRrO?U9R^`,Wo }x3
h,/. ^`,W8wojO d,Wa86OBO?UZ\7>U>WOB.RrOQcRr7,DcOB" RrOB.<S_^gU }Y_} . zm,9UdGc \-{Ym{Y_} . 3
xojR ]7>f]k>2^`a[,Z\7>U9R^gUf]7>fda-cdOQZB^`a^j7>Um/,/. ^`,W8ojOWv
^`arR^`ZZd,9UdZ\O/,/. ^`,W8ojO,9Udcq^`aEOQo`^gS^gUd,/RrOQcqVI.7>SR ]O -Om^`o`o ^gUd^jR^`,Wo`oj0fdarO,nR ]. OBO\P79^gU>Rq "c^`a
S_7cdOQo80efda^gU]kR ]O479P6OB. ,/R^j7>UcdOQaZ\. ^`86OQce^gUOQZ Z\.OBRrO,WPPd.7?u]^gS,/R^j7>Ue^jR G xEW/ . W9v W|v
R^j7>Ue3 y3 y3 ,9Udc k |W . WWRr7,WPPoj0R dOar79ogf]R^j7>UPd.7]Z\O\
]OTZBo`,Waa7WV&lnbXP67WRrO?U9R^`,Wo`a^`aZBoj79arOQcfUdcdOB. cwfd.OW3
OQ,WZ7WVdR ]O[79P6OB. ,/R^j7>Uda.OQf^j.OQcRr7ar79ojWOEl)X^gU
fdO?UdZ\Oc^`,/kW. ,9Sa45-7988,9Udcd]O?Ud7?0Wvy/}W}/z]&,9Udc Al@ 2WF gDn/ H FI
R ]O[Z\7>U9W79ogf]R^j7>U79P6OB. ,/R^j7>UfdarOQcRr7OQo`^gS_^gUd,/RrO-Z\7>U R
R^gUf]7>fdaecOQZB^`a^j7>U/,/. ^`,W8ojOQa5-7988,9Udcd]O?Ud7?0Wv bdO5ENl)XpNa79ogf]R^j7>UPd.7]Z\OBOQcai8>0eOQo`^gS^gUd,/R
y/}W}9|3 ^gU]kqR dO_{,/.<^`,W8ojOQa^gUR ]OmarOQfdO?UdZ\OTvvYTv3
72 B. R. Cobb
,W8ojO/NOQZB^`a ^j7>UarRr. ,/RrOBkW0VI7W.,9U5&Nil)X p N
200000
150000
ar79ogf]R^j7>Ueb^jR OQ^jk>>RarR,/RrOQabV7W.i3
OQarRb1)OQa<fdojRa4K fd,9U>R^jR0e .7]cwfdZ\OQc
100000
b^jR qOQ^jk>9Ric^`aZ\.OBRrOarR,/RrOQaV7W.^`aWWWWv^`Z<
.&j
x,
^`ai/}9|Wm^jk>]OB.R ,9Un,9UlnbX^gUdf]O?UdZ\O4cw^`,/kW. ,9S K X]y<36*} Tj<%&*& K L <=M
!.02 %S.f2PQ%,:R5
!$%&-$!
b^jR TOQ^jk>9REarR,/RrOQa[V7W.-T3 5-7988ny/}W}9!k9^jWOQa-,cdO\ *
'(<)3<=>0$ @'(% TW X]<=3*<=%&*& W L <XM &!80 %
R,W^`ojOQcZ\7>S_Pw,/. ^`ar7>UT7WVR dOi5ENl)XpNar79ogf]R^j7>UmRr7 !8*
08 K6XX=ym/ DJXmBRN6=
X&2$1XmXx1
.x&X
~jM 3<XM6!$)'1-
a^gS^`o`,/.blnbX,9Udcqcw^`aZ\.OBRrO4^gU]6f]O?UdZ\Oc^`,/kW. ,9SaB3 !8-$'(0 '( %&0W36 5
H L !$%
7( EG<=36Cm
yT
mt[ >/t > L <*
0 !$%T <%&*yj!$%&0 !$%U{=j6~ 7DV'(%
)7('D%
!.<=3
5&<*,3<XM '1--/ %&*,'DM '( %&<7<=5&0602'1<=%'(%,:&5
!8%&-/!*
'(<=
5-7>U>R^gUf]7>fdacdOQZB^`a ^j7>Ul)X^gU]6f]O?UdZ\O)c^`,/kW. ,9Sa )36<>0$ @ {$==,=,=SR6$?{
^`Z<cdOBRrOB.<S_^gUdO-,cdOQZB^`a^j7>U.<fojO[VI7W.,iZ\7>U>R^gUfd7>fda Kx$XRj
&j&=X,,
cdOQZB^`a ^j7>U2{,/. ^`,W8wojO,Wa),VKfUZ\R^j7>U27WV^jRa)Z\7>U>R^gUfd7>fda L <*
0 !$%,T R<=%&*y IR=!8%&0 !$%28jg <$O!8Vx<7D5
P,/. O?U9Ra a ]7?P67WRrO?U9R^`,WoRr7^gS_P.7?WOcOQZB^`a^j7>U <=M '( %U Yy0 O>>?!$M 36'(-<xO!80 '1<=%U*,!.-/'102'( %H
3
7(!$>0$ @
S,/^gU]k,/R&,oj7?-OB.bZ\7>S_PwfdR,/R^j7>Ud,WoZ\79arRER d,9Uc^`a '(%? f<0 C!$O<=%&*}E3<*,![!8*&0$ fm//cX
Z\.OBRrOq^gU]6f]O?UdZ\Omcw^`,/kW. ,9S_a7W._lnbX^gU]f]O?UdZ\Oc^ / y/={$Nj/m x,
.,
,/kW. ,9SaB3". OB^j7>fdaSTOBR ]7]caVI7W.d,9Udcwo`^gU]kZ\7>U 7(>0SM6!8*}~m L RS.jf%?36!$H
36!80 !$%jM '(%
)<%&*02 7DV'(%
)
R^gUf]7>fdaEcdOQZB^`a^j7>Um/,/. ^`,W8ojOQa&^gU2^gU]fdO?UdZ\Oc^`,/kW.<,9S_a *
!8-/'10 'D %H
36
7(!$>08 @E &
!.02'10$ !$HR<=3 M >?!$%jM" =Y
.OQ]fd^j.OnOQ^jR ]OB.Ud7W.<S_,Wo)cw^`arRr. ^`8wf]R^j7>Ua_VI7W.mZ\7>U>R^gU %
)'(%
!8!$36'D%
)=
-/ %& >?'(-g~O,02M !$>08~jM6<%,Y[ 3*K%
'DV!$3
f]7>faZ<d,9UdZ\O{,/. ^`,W8wojOQa4,9Udc 3@7W.c^`aZ\.OBRrO,WPwPd.7Qud^ 0 'DMSO&~jM<=%,Y[ 36*}RyT
S,/R^j7>Uda4Rr7Z\7>U>R^gUfd7>fdacdOQZB^`a^j7>U{,/.<^`,W8ojOQaB3b]O Eg 7(<%&*},A9 <=%&* ~&<-jM !$3S8 L 'NM65
3 !.0
Z\7>U>R^gUfd7>fdabcdOQZB^`a ^j7>Unl)X^gUdf]O?UdZ\Oc^`,/kW.<,9S",Wo Y<=5R0 0 '(<%&0y<=%&*>'(%
'(>5
>3 !87(<=M '(V!K!8%jM 36 HOM !8-
oj7@ba ,4S_7W.O&Pd.OQZB^`arOcdOQZB^`a^j7>UT.fdojO),9UdcTR ]O,W8^`o`^jR0 %&' 5
!.0yY[ 3W>? ,*,!$7('(%
)-/ %M6'D%5
5&05
%&-$!$3 M6<'D%jM '(!808 @'(%
Rr7_O\udPoj79^jR)R dOZ\7>U9R^gUf]7>fdaUd,/R f].O7WVRrOQarRb. OQa fdojRaB3 mW!8-C!$36><=%U<=%R*UW L <=>*
<%
'[!8*&0$ fm/
cX?/ y/=R{$Nj/m6 R&}8x.&
2 IRF gjis"Jc;i I ? "5&> ,&<%&*TR~,<7D>?!83R %jg6Eg!$%
%
'(7(!8060
H&3 H,
bdOn,9f]R ]7W.^`akW.<,/RrOBVKfdobV7W.2R ]O^gUda^jk>>RrVKfdoZ\7>S <)<=M '( %]"'DM >'DM 5
36!80 Y
M 365
%&-8<XM !.*!$H %&!$%jM '1<=710$ @
S_O?U9Ra&7WV!R d.OBO4,9U]7>U>0]S_7>fda). OB^jOB-OB. ad^`Z<na^jk/ '(%m ,*, B[!8*g X ==B&X[c=xR
.//cWWx$=&j,m$$/{=R x!8-/M 5
36!
Ud2^ XZB,9U9Roj0q^gS_P.7?WOQcR ]OP,WP6OB.Q3 =M6!80'(%TK32M6'N#-/'1<=7}PQ%jM !87D7('D)!$%&-$!
j,x
=&
~R<-jM !838K 28j VX<7D5&<=M '(%
)'(%,:&5
!8%&-/!*
'(<=
_:d>
: : : )36<>0$ @U
/ =
W/$X / &Rj
/j,
"!$#&%
'(%
)+*,!.-/'102'( %4365
7(!809'(% ~R<-jM !838m <%&*B !8%
7D!8OS.6<5&0602'1<=%
'(%,:&5&!$%&-$!;*,'1<=)36<>?08 @+AB 36C'D%
)FEG<=H!$3.JIK'(3 )'D%
'1< '(%,:&5&!$%&-$!*,'1<=)36<>0$ @? Xm$j// /$ ,
{
L 'D7('NM<=36OFPQ%R0SM6'NM65,M !UTWVX<='(7(<
7(!ZY[ 3\*, X]"%
7( <*^<XM._ .,
```"acbXdfegaShjikRl.dmhji&e=n,lh&o.pk
l
oqXpp&lr,s=tuRvxwya{z,ij| ~&!$%
XOmEg EgGS. T;%
!8]>?!/M6
,*UY[ 336!$H&3 !.02!8%M
} }<%&*UEg Egm~&!$%
XOJy KO
3 '1*U'(%,:&5
'(%
)B<=%&*02 7DV'(%
)B<xO!80 '(<%*,!8-$'(0 'D %H
36
7(!$>08 @B'(%
!8%&-/!*,'(<)3<=>05&0 'D%&)>?'NM65
3 !.0W YM63 5&%&-$<=M !8*!$,HR
gK<%&*[!.*} /
$XG{$Nj/m6} =R/
%&!$%jM '1<=710$ @'(% L
&'(-C!$36'D%
)<%&*
jK<7DH!$36%U[!.*
0$ c=1/ _ WX c=1//" 2
&<=H&>?<%<%&*
$/{=R/ y/=R{$Nj/m R
j.
<=7(7
%&*
%}8xm8
<%&*Eg Eg~
!8%
XO=j^ KO
3 '1*
<xO!80 '(<%%
!$MS]
3 C,0]"'DM 7('D%
!.<=3K*
!/M !83 >?'(%
'10SM6'(-VX<3 'D
<
7(!808 @'(%yy<-$-5&0<%&*,<<=CC 7(<?!8*
08 f&
$/{=R?/
$X/Rc/NN$ R,}8X8&
74 B. R. Cobb
Discriminative Scoring of Bayesian Network Classifiers:
a Comparative Study
Ad Feelders and Jevgenijs Ivanovs
Department of Information and Computing Science
Universiteit Utrecht, The Netherlands
Abstract
We consider the problem of scoring Bayesian Network Classifiers (BNCs) on the basis of the
conditional loglikelihood (CLL). Currently, optimization is usually performed in BN parameter
space, but for perfect graphs (such as Naive Bayes, TANs and FANs) a mapping to an equivalent
Logistic Regression (LR) model is possible, and optimization can be performed in LR parameter
space. We perform an empirical comparison of the efficiency of scoring in BN parameter space,
and in LR parameter space using two different mappings. For each parameterization, we study two
popular optimization methods: conjugate gradient, and BFGS. Efficiency of scoring is compared
on simulated data and data sets from the UCI Machine Learning repository.
336.2
Table 1 lists the UCI data sets used in the ex-
periments, and their properties. To discretize nu-
336.4
meric variables, we used the algorithm described in
(Fayyad and Irani, 1993). The ? values in the votes 336.6
missing value. LR + CG
LR + CG_full.search
336.8
LR + BFGS
LR_Roos + CG
LR_Roos + CG_full.search
Table 1: Data sets and their properties LR_Roos + BFGS
BN + CG
337.0
Tic-tac-toe 958 9 2 3
Voting 435 16 2 3 Figure 3: Optimization curves on the Pima Indians
data.
The fitted BNC structures for the UCI data were
obtained by (1) connecting the class to all attributes,
(2) using a greedy generative structure learning al- (flops) needed to reach the threshold. For each data
gorithm to add successive arcs. Step (2) was not set these numbers are scaled dividing by the small-
applied for the Satimage data set for which we fit- est value. The bound of 5% was selected heuris-
ted the Naive Bayes model. All structures happened tically. This bound is big enough, so all algorithms
to be perfect graphs, thus there was no need for ad- were able to reach it before 200 iterations and before
justment. In figure 2 we show the structure fitted to satisfying the stopping criterion. The bound is small
the Pima Indians data set. The other structures were enough, so algorithms make in general quite a few
of similar complexity. iterations before achieving it. There are some data
Figure 3 depicts convergence curves for 9 opti- sets, however, on which all algorithms converge fast
mization algorithms on the Pima Indians data. to a very high CLL value, and a clear distinction
Table 2 depicts statistics for different UCI and ar- in performance is visible using a very small bound.
tificial data sets. For each data set i we have start- Pima Indians is an example (5% threshold is set to
ing CLL value CLLiM L and maximal (over all algo- -337.037). We still see that table 2 contains a quite
rithms) CLL value CLLimax . We define a threshold fair comparison, except that BFGS methods are un-
ti = CLLimax 5%(CLLimax CLLiM L ). For ev- derestimated for smaller bounds.
ery algorithm we computed the number of iterations It is very interesting to notice that convergence
Table 2: Convergence speed in terms of iterations (top) and flops (bottom). Artificial data sets are named
in the following way: [number of parents of each attribute].[number of attributes].[domain size of each
attribute].[sample size]. Domain size denoted as 2-3 is randomly chosen between 2 and 3 with equal proba-
bilities.
Serafn Moral
Departamento de Ciencias de la Computacion e I. A.
Universidad de Granada
Granada, 18071, Spain
Abstract
Taking as an inspiration the so-called Explanation Tree for abductive inference in Bayesian
networks, we have developed a new clustering approach. It is based on exploiting the
variable independencies with the aim of building a tree structure such that in each leaf all
the variables are independent. In this work we produce a structure called Independency
tree. This structure can be seen as an extended probability tree, introducing a new
and very important element: a list of probabilistic single potentials associated to every
node. In the paper we will show that the model can be used to approximate a joint
probability distribution and, at the same time, as a hierarchical clustering procedure.
The Independency tree can be learned from data and it allows a fast computation of
conditional probabilities.
where Px(i) is the potential for variable Xi which X3: "0" [0] "1" [1] X3: "0" [1] "1" [0]
is in the path from the root to a leaf determined
by configuration x (following in each inner node Figure 1: Illustrative independence tree struc-
N with variable Var(N ) = Xj , the child corre- ture learned from the exclusive dataset.
sponding to the value of Xj x).
This decomposition is based on a set of inde-
pendencies among variables {X1 , . . . , Xm }. As-
sume that VL(N ) is the set of variables of the 1
D stands for Descendants.
The Independency tree model: a new approach for clustering and factorisation 85
Our structure seeks to keep in leaves those (independent variables). This is exploited by
variables that remain independent. At the same another clustering algorithms as EM-based Au-
time, if the distribution of one variable is shared toClass clustering algorithm (Cheeseman and
by several leaves, we try to store it in their com- Stutz, 1996).
mon ascendant, to avoid repeating it in all the An important fact of this model is that it is a
leaves. For that reason, when one variable ap- generalisation of some usual models in classifi-
pears in a list for a node Nj it means that this cation, the Naive Bayes model (a tree with only
distribution is common for all levels from here to one inner node associated to the class and in
a leaf. For example, in figure 1, the binary vari- its children all the rest of variables are indepen-
able X4 has a uniform (1/2,1/2) distribution for dent), and the classification tree (a tree in which
all leaf cases (also for intermediate ones), since the list of each inner node only contains one po-
it is associated to the root node. On the other tential and the potential associated to the class
hand, we can see how X2 distribution varies de- variable always appears in the leaves). This gen-
pending on the branch (left or right) we take eralization means that our model is able of rep-
from the root, that is, when X1 = 0 the be- resenting the same conditional probability dis-
haviour of variable X2 is uniform whereas being tribution of the class variable with respect to the
1 the value of X1 , X2 is determined to be 0. rest of the variables, taking as basis the same set
The intuition underlying this model is based of associated independencies.
on the idea that inside each cluster the variables From this model one may want to apply two
are independent. When we have a set of data, kinds of operations:
groups are defined by common values in certain The production of clusters and their charac-
variables, having the other variables random terisation. A simple in-depth from root until ev-
variations. Imagine that we have a database ery leaf node access of the tree will give us the
with characteristics of different animals includ- corresponding clusters, and also the potential
ing mammals and birds. The presence of these lists associated to the path nodes will indicate
two groups is based on the existence of depen- the behaviour of the other variables not in the
dencies between the variables (two legs is re- path.
lated with having wings and feathers). Once The computation of the probability for a cer-
these variables are fixed, there can be another tain complete configuration with a value for
variables (size, colour) that can have random each variable: x = {x1 , x2 , . . . , xm }. This al-
variations inside each group, but that they do gorithm is recursive and works as follows:
not define new subcategories. getProb(Configuration {x1 , x2 , . . . , xm },Node N )
Of course, there are some other possible al- P rob 1
1 For all Potential Pj List(N)
ternatives to define clustering. This is based 1.1. Xj Var(Pj )
on the idea that when all the variables are inde- 1.2. P rob P rob Pj (Xj = xj )
pendent, then to subdivide the population is not 2 If N is a leaf then return P rob
3 Else
useful, as we have a simple method to describe 3.1. XN Var(N)
the joint behaviour of the variables. However, 3.2. Next Node N Branch child(XN = xN )
if some of the variables are dependent, then the 3.3. return P robgetProb({x1 , x2 , . . . , xm },N )
values of some basic variables could help to de- If we have a certain configuration we should
termine the values of the other variables, and go through the tree from root until leaves tak-
then it can be useful to divide the population ing the corresponding branches. Every time we
in groups according to the values of these basic reach a node, we have to use the single poten-
variables. tials in the associated list to multiply the value
Another way of seeing it is that having inde- of probability by the values of the potentials
pendent variables is the simplest model we can corresponding to this configuration.
have. So, we determine a set of categories such It is also possible to compute the probabil-
that each one is described in a very simple way ity for an interest variable Z conditioned to
The Independency tree model: a new approach for clustering and factorisation 87
determines the root node N and its children in et al., 1993), where a link between two nodes is
a recursive way. For that, for any variable Xi deleted if these nodes are marginally indepen-
in the list, it computes dent.
X Another approximation is that it is assumed
Dep(Xi |D) = Dep(Xi , Xj |D) that all the variables are independent in a leaf
Xj L
when Dep(Xk |D) 0, even if some of the terms
Then, the variable Xk with maximum value we are adding are positive. We have found that
of Dep(Xi |D) is considered. this is a good compromise criterion to limit the
If Dep(Xk |D) > 0, then we assign variable complexity of learned models.
Xk to node N and we add potential Pk (D) to
the list List(N ), removing Xk for L. For all 5 Experiments
the remaining variables Xi in L, we compute
Dep(Xi , Xj |D) for Xj L(j 6= i), and Xj = To make an initial evaluation of the Indepen-
Xk . If all these values are less or equal than dence Tree (IndepT) model, we decided to com-
0, then we add potential Pi (D) to List(N ) and pare it with other well known unsupervised
remove Xi from L; i.e. we keep in this node classification techniques that are also based on
the variables which are independent of the rest Probabilistic Graphical Models: learning of a
of variables, including the variable in the node. Bayesian network (by a standard algorithm as
Finally, we build a child of N for each one of PC) and also Expectation-Maximisation with
the values Xk = xk . This is done by calling a Naive Bayes structure, which uses cross-
recursively to the same procedure, but with the validation to decide the number of clusters. Be-
new list of nodes, and changing database D to cause we produce a probabilistic description of
D[Xk = xk ], where D[Xk = xk ] is the subset the dataset, we use the log-likelihood (logL) of
of D given by those cases in which variable Xk the data given the model to score a given clus-
takes the value xk . tering. By using the logL as goodness measure
If Dep(Xk |D) 0, then the process is we can compare our approach with other algo-
stopped and this is a leaf node. We build rithms for probabilistic model-based unsuper-
List(N ), by adding potential Pi (D) for any vari- vised learning: probabilistic model-based clus-
able Xi in L. tering and Bayesian networks. This is a direct
In this algorithm, the complexity of each node evaluation of the procedures as methods to en-
computation is limited by O(m2 .n) where m is code a complex joint probability distribution.
the number of variables and n is the database At the same time, it is also an evaluation of our
size. The number of leaves is limited by the method from the clustering point of view, show-
database size n multiplied by the maximum ing whether the proposed segmentation is useful
number of cases of a variable, as a leaf with only for a simple description of the population.
one case of the database is never branched. But Then, the basic steps we have followed
usually the number of nodes is much lower. for the three procedures [IndepT,PC-Learn,EM-
In the algorithm, we make some approxima- Naive] are:
tions with respect to the independencies repre- 1. Divide the data cases into a training or data
sented by the model. First, we only look at one- set (SD ) and a test set (ST ), using (2/3,1/3).
to-one dependencies, and not to joint dependen- 2. Build the corresponding model for the cases
cies. It can be the case that Xi is indepen- in SD .
dent of Xj and Xk and it is not independent of 3. Compute the log-likelyhood of the obtained
(Xj , Xk ). However, testing these independents model over the data in ST .
is more costly and we do not have the possibility Among the tested cases, apart from easy syn-
of a direct representation of this in the model. thetical data bases created to check the ex-
This assumption is made by other Bayesian net- pected and right clusters, we have looked for
works learning algorithms such as PC (Spirtes real sets of cases. Some of them have been taken
The Independency tree model: a new approach for clustering and factorisation 89
variables are independent. In this sense, it is at A. P. Dempster, N. M. Laird, and D.B. Rubin. 1977.
the same time a clustering algorithm. Maximum likelihood from incomplete data via the
EM algorithm. Journal of the Royal Statistical
In the experiments with real and syntheti-
Society, 1:138.
cal data we have shown its good behaviour as
a method of approximating a joint probability, R.O. Duda, P.E. Hart, and D.J. Stork. 2001. Pat-
providing results that are better (except for two tern Recognition. Wiley.
cases for the PC algorithm) to the factorisa- M.J. Flores and J. A. Gamez. 2005. Breeding value
tion provided by an standard Bayesian networks classification in manchego sheep: A study of at-
learning algorithm (PC) and by EM clustering tribute selection and construction. In Kowledge-
Based Intelligent Information and Enginnering
algorithm. In any case, more extensive experi-
Systems. LNAI, volume 3682, pages 13381346.
ments are necessary in order to compare it with Springer Verlag.
another clustering and factorisation procedures.
We think that this model can be improved N. Friedman. 1998. The Bayesian structural EM al-
gorithm. In Proceedings of the 14th Conference on
and exploited in several ways: (1) Some of the Uncertainty in Artificial Intelligence (UAI-98),
clusters can have very few cases or can be very pages 129138. Morgan Kaufmann.
similar in the distributions of the variables. We
D. Hand, H. Mannila, and P. Smyth. 2001. Princi-
could devise a procedure to joint these clusters; ples of Data Mining. MIT Press.
(2) we feel that the different number of values of
the variables can have some undesirable effects. A.K. Jain, M.M. Murty, and P.J. Flynn. 1999. Data
This problem could be solved by considering bi- clustering: a review. ACM Computing Surveys,
31:264323.
nary trees in which the branching is determined
by a partition of the set of possible values of a F. V. Jensen. 2001. Bayesian Networks and Deci-
variable in two parts. This could allow to extend sion Graphs. Springer Verlag.
the model to continuous variables; (3) we could L. Kaufman and P. Rousseeuw. 1990. Finding
determine different stepping branching rules de- Groups in Data. John Wiley and Sons.
pending of the final objective: to approximate a
D. Lowd and P. Domingos. 2005. Naive Bayes mod-
joint probability or to provide a simple (though els for probability estimation. In Proceedings of
not necessarily exhaustive) partition of the state the 22nd ICML conference, pages 529536. ACM
space that could help us to have an idea of how Press.
the variables take their values; (4) to use this
D.J. Newman, S. Hettich, C.L. Blake, and C.J.
model in classification problems. Merz. 1998. UCI repository of machine learning
databases.
Acknowledgments
This work has been supported by Spanish MEC J.M. Pena, J.A. Lozano, and P. Larranaga. 2000.
An improved Bayesian structural EM algorithm
under project TIN2004-06204-C03-{02,03}. for learning Bayesian networks for clustering.
Pattern Recognition Letters, 21:779786.
Abstract
The Bayesian network formalism is becoming increasingly popular in many areas such
as decision aid or diagnosis, in particular thanks to its inference capabilities, even when
data are incomplete. For classification tasks, Naive Bayes and Augmented Naive Bayes
classifiers have shown excellent performances. Learning a Naive Bayes classifier from
incomplete datasets is not difficult as only parameter learning has to be performed. But
there are not many methods to efficiently learn Tree Augmented Naive Bayes classifiers
from incomplete datasets. In this paper, we take up the structural em algorithm principle
introduced by (Friedman, 1997) to propose an algorithm to answer this question.
With mcar data, the first and simplest possi- where Nijk
= E [Nijk ] = N P (Xi = xk , Pi =
ble approach is the complete case analysis. This
paj | )is obtained by inference in the network
is a parameter estimation based on Dco , the set < G, > if the {Xi ,Pi } are not completely mea-
of completely observed cases in Do . When D is sured, or else only by mere counting.
mcar, the estimator based on Dco is unbiased. (Dempster et al., 1977) proved convergence
However, with a high number of variables the of the em algorithm, as the fact that it was not
probability for a case [X1l . . . Xnl ] to be completely necessary to find the global optimum i+1 of
measured is low and Dco may be empty. function Q( : i ) but simply a value which
One advantage of Bayesian networks is that, would increase function Q (Generalized em).
if only Xi and Pi = P a(Xi ) are measured, then
the corresponding conditional probability table 2.3.4 Learning G with incomplete
can be estimated. Another possible method dataset
with mcar cases is the available case analy- The main methods for structural learning
sis, i.e. using for the estimation of each con- with incomplete data use the em principle: Al-
ditional probability P (Xi |P a(Xi )) the cases in ternative Model Selection em (ams-em) pro-
Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets 93
posed by (Friedman, 1997) or Bayesian Struc- Algorithm 2 : Detailed em for structural learning
tural em (bs-em) (Friedman, 1998). We can 1: Init: f inished = f alse, i = 0
also cite the Hybrid Independence Test pro- Random or heuristic choice of the initial
Bayesian network (G 0 , 0,0 )
posed in (Dash and Druzdzel, 2003) that can 2: repeat
use em to estimate the essential sufficient statis- 3: j=0
4: repeat
tics that are then used for an independence 5: i,j+1 = argmax Q(G i , : G i , i,j )
test in a constraint-based method. (Myers et 6: j =j+1
al., 1999) also proposes a structural learning 7: until convergence (i,j i,j )
o
o o
method based on genetic algorithm and mcmc. 8: if i = 0 or |Q(G i , i,j : G i1 , i1,j )
o o
We will now explain the structural em algorithm Q(G i1 , i1,j : G i1 , i1,j )| > then
o
principle in details and see how we could adapt 9: G i+1 = arg max Q(G, : G i , i,j )
GVG i
it to learn a tan model. 10: i+1,0 = argmax Q(G i+1 , : G i , i,j )
o
11: i=i+1
3 Structural em algorithm 12: else
13: f inished = true
3.1 General principle 14: end if
The EM principle, which we have described 15: until f inished
above for parameter learning, applies more
generally to structural learning (Algorithm 1
a super-exponential space. However, with Gen-
as proposed by (Friedman, 1997; Friedman,
eralised em it is sufficient to look for a better
1998)).
solution rather than the best possible one, with-
Algorithm 1 : Generic em for structural learning out affecting the algorithm convergence proper-
1: Init: i = 0 ties. This search for a better solution can then
Random or heuristic choice of the initial be done in a limited space, like for example VG ,
Bayesian network (G 0 , 0 ) the set of the neigbours of graph G that have
2: repeat
3: i=i+1 been generated by removal, addition or inver-
4: (G i , i ) = argmax Q(G, : G i1 , i1 ) sion of an arc.
G,
5: until |Q(G i , i : G i1 , i1 ) Concerning the search in the space of the pa-
Q(G i1 , i1 : G i1 , i1 )| 6 rameters (Eqn.7), (Friedman, 1997) proposes
repeating the operation several times, using a
The maximization step in this algorithm (step clever initialisation. This step then amounts to
4) has to be performed in the joint space running the parametric em algorithm for each
{G, } which amounts to searching the best structure G i , starting with structure G 0 (steps 4
structure and the best parameters correspond- to 7 of Algorithm 2). The two structural em
ing to this structure. In practice, these two algorithms proposed by Friedman can therefore
steps are clearly distinct2 : be considered as greedy search algorithms, with
G i = argmax Q(G, : G i1 , i1 ) (6) EM parameter learning at each iteration.
G
i = argmax Q(G i , : G i1 , i1 ) (7)
3.2 Choice of function Q
where Q(G, : G , ) is the expectation of
the likelihood of any Bayesian network < We now have to choose the function Q that will
G, > computed using a distribution of the be used for structural learning. The likelihood
missing data P (Dm |G , ). used for parameter learning is not a good indi-
Note that the first search (Eqn.6) in the space cator to determine the best graph since it gives
of possible graphs takes us back to the initial more importance to strongly connected struc-
problem, i.e. the search for the best structure in tures. Moreover, it is impossible to compute
2
marginal likelihood when data are incomplete,
The notation Q(G, : . . . ) used in Eqn.6 stands for
E [Q(G, : . . . )] for Bayesian scores or Q(G, o : . . . ) so that it is necessary to rely on an efficient
where o is obtained by likelihood maximisation. approximation like those reviewed by (Chicker-
with Nijk
= EG , [Nijk ] = N P (Xi = xk , Pi =
of the local scores on all the nodes, i.e. function
BIC of Eqn.2.
paj |G , ) obtained by inference in the network
{G , } if {Xi ,Pi } are not completely measured,
By applying the principle we described in sec-
or else only by mere counting. With the same tion 3.2, we can then adapt mwst to incom-
reasoning, (Friedman, 1998) proposes the adap- plete data by replacing the local bic score of
tation of the bde score to incomplete data. Equn.11 with its expectation; to do so, we use a
certain probability density of the missing data
4 TAN-EM, a structural EM for P (Dm |T , ) :
h i h
classification Q
Mij = Qbic (Xi , Pi = {Xj }, Xi|Xj : T , )
i,j
i
(Leray and Francois, 2005) have introduced Qbic (Xi , Pi = , Xi : T , ) (12)
mwst-em an adaptation of mwst dealing with With the same reasoning, running a maximum
incomplete datasets. The approach we propose (weight) spanning tree algorithm on matrix
here is using the same principles in order to ef- M Q enables us to get the best tree T that max-
ficiently learn tan classifiers from incomplete imises the sum of the local scores on all the
datasets. nodes, i.e. function QBIC of Eqn.9.
4.1 MWST-EM, a structural EM in 4.2 TAN-EM, a structural EM for
the space of trees classification
Step 1 of Algorithm 2, like all the previous al- The score used to find the best tan structure
gorithms, deals with the choice of the initial is very similar to the one used in mwst, so we
structure. The choice of an oriented chain graph can adapt it to incomplete datasets by defining
linking all the variables proposed by (Friedman, the following score matrix:
1997) seems even more judicious here, since this h
Q
i h
Mij = Qbic (Xi , Pi = {C, Xj }, Xi|Xj C : T , )
chain graph also belongs to the tree space. Steps i,j
i
4 to 7 do not change. They deal with the run- Qbic (Xi , Pi = {C}, Xi|C : T , ) (13)
ning of the parametric em algorithm for each Using this new score matrix, we can use the
structure Bi , starting with structure B0 . approach previously proposed for mwst-em to
Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets 95
get the best augmented tree, and connect the dedicated to classification tasks while the oth-
class node to all the other nodes to obtain the ers do not consider the class node as a specific
tan structure. We are currently using the same variable.
reasoning to find the best forest extension. We also give an confidence interval for each
classification rate, based on Eqn.14 proposed by
4.3 Related works (Bennani and Bossaert, 1996):
2
q
Z T (1T ) 2
Z
(Meila-Predoviciu, 1999) applies mwst algo- T+ 2N
Z N
+ 4N 2
I(, N ) = 2 (14)
rithm and em principle, but in another frame- 1+
Z
N
work, learning mixtures of trees. In this work, where N is the number of samples in the dataset,
the data is complete, but a new variable is intro- T is the classification rate and Z = 1, 96 for =
duced in order to take into account the weight 95%.
of each tree in the mixture. This variable isnt
5.2 Results
measured so em is used to determine the corre-
sponding parameters. The results are summed up in Table 1. First, we
(Pena et al., 2002) propose a change inside could see that even if the Naive Bayes classifier
the framework of the sem algorithm resulting often gives good results, the other tested meth-
in an alternative approach for learning Bayes ods allow to obtain better classification rates.
Nets for clustering more efficiently. But, where all runnings of nb-em give the same
(Greiner and Zhou, 2002) propose maximiz- results, as em parameter learning only needs
ing conditional likelihood for BN parameter an initialisiation, the other methods do not al-
learning. They apply their method to mcar in- ways give the same results, and then, the same
complete data by using available case analysis classification rates. We have also noticed (not
in order to find the best tan classifier. reported here) that, excepting nb-em, tan-
em seems the most stable method concerning
(Cohen et al., 2004) deal with tan classifiers
the evaluated classification rate while mwst-em
and em principle for partially unlabeled data.
seems to be the less stable.
In there work, only the variable corresponding
The method mwst-em can obtain very good
to the class can be partially missing whereas any
structures with a good initialisation. Then, ini-
variable can be partially missing in our tan-em
tialising it with the results of mwst-em gives us
extension.
stabler results (see (Leray and Francois, 2005)
5 Experiments for a more specific study of this point).
In our tests, except for this house dataset,
5.1 Protocol tan-em always obtains a structure that lead
to better classification rates in comparison with
The experiment stage aims at evaluating the
the other structure learning methods.
Tree Augmented Naive Bayes classifier on
Surprisingly, we also remark that mwst-em
incomplete datasets from UCI repository3 :
can give good classification rates even if the
Hepatitis, Horse, House, Mushrooms and
class node is connected to a maximum of two
Thyroid.
other attributes.
The tan-em method we proposed here is
Regarding the log-likelihood reported in Ta-
compared to the Naive Bayes classifier with
ble 1, we see that the tan-em algorithm finds
em parameters learning. We also indicate the
structures that can also lead to a good approx-
classification rate obtained by three methods:
imation of the underlying probability distribu-
mwst-em, sem initialised with a random chain
tion of the data, even with a strong constraint
and sem initialised with the tree given by
on the graph structure.
mwst-em (sem+t). The first two methods are
Finally, the Table 1 illustrates that tan-
3
http://www.ics.uci.edu/mlearn/MLRepository. em and mwst-em have about the same com-
html plexity (regarding the computational time) and
Table 1: First line: best classification rate (on 10 runs, except Mushrooms on 5, in %) on test
dataset and its confidence interval, for the following learning algorithms: nb-em, mwst-em, sem,
tan-em and sem+t. Second line: log-likelihood estimated with test data and calculation time
(sec) for the network with the best classification rate. The first six columns give us the name of the
dataset and some of its properties : number of attributes, learning sample size, test sample size,
number of classes and percentage of incomplete samples.
are a good compromise between nb-em (clas- applying in (subspace of) dag space. (Chicker-
sical Naive Bayes with em parameter learning) ing and Meek, 2002) proposed an optimal search
and mwst-em (greedy search with incomplete algorithm (ges) which deals with Markov equiv-
data). alent space. Logically enough, the next step
in our research is to adapt ges to incomplete
6 Conclusions and prospects datasets. Then we could test results of this
Bayesian networks are a tool of choice for rea- method on classification tasks.
soning in uncertainty, with incomplete data.
However, most of the time, Bayesian network 7 Acknowledgement
structural learning only deal with complete This work was supported in part by the ist
data. We have proposed here an adaptation of Programme of the European Community, under
the learning process of Tree Augmented Naive the pascal Network of Excellence, ist-2002-
Bayes classifier from incomplete datasets (and 506778. This publication only reflects the au-
not only partially labelled data). This method thors views.
has been successfuly tested on some datasets.
We have seen that tan-em was a good clas-
sification tool compared to other Bayesian net- References
works we could obtained with structural em like
learning methods. Y. Bennani and F. Bossaert. 1996. Predictive neu-
ral networks for traffic disturbance detection in
Our method can easily be extended to un- the telephone network. In Proceedings of IMACS-
supervised classification tasks by adding a new CESA96, page xx, Lille, France.
step in order to determine the best cardinality
for the class variable. D. Chickering and D. Heckerman. 1996. Efficient
Approximation for the Marginal Likelihood of In-
Related future works are the adaptation of
complete Data given a Bayesian Network. In
some other Augmented Naive Bayes classifiers UAI96, pages 158168. Morgan Kaufmann.
for incomplete datasets (fan for instance), but
also the study of these methods with mar D. Chickering and C. Meek. 2002. Finding optimal
datasets. bayesian networks. In Adnan Darwiche and Nir
Friedman, editors, Proceedings of the 18th Con-
mwst-em, tan-em and sem methods are re- ference on Uncertainty in Artificial Intelligence
spective adaptations of mwst, tan and greedy (UAI-02), pages 94102, S.F., Cal. Morgan Kauf-
search to incomplete data. These algorithms are mann Publishers.
Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets 97
D. Chickering, D. Geiger, and D. Heckerman. 1995. R. Greiner and W. Zhou. 2002. Structural extension
Learning bayesian networks: Search methods and to logistic regression. In Proceedings of the Eigh-
experimental results. In Proceedings of Fifth Con- teenth Annual National Conference on Artificial
ference on Artificial Intelligence and Statistics, Intelligence (AAI02), pages 167173, Edmonton,
pages 112128. Canada.
C.K. Chow and C.N. Liu. 1968. Approximating dis- D. Heckerman, D. Geiger, and M. Chickering. 1995.
crete probability distributions with dependence Learning Bayesian networks: The combination of
trees. IEEE Transactions on Information The- knowledge and statistical data. Machine Learn-
ory, 14(3):462467. ing, 20:197243.
I. Cohen, F. G. Cozman, N. Sebe, M. C. Cirelo, S. Lauritzen. 1995. The EM algorithm for graphical
and T. S. Huang. 2004. Semisupervised learning association models with missing data. Computa-
of classifiers: Theory, algorithms, and their ap- tional Statistics and Data Analysis, 19:191201.
plication to human-computer interaction. IEEE
Transactions on Pattern Analysis and Machine P. Leray and O. Francois. 2005. bayesian network
Intelligence, 26(12):15531568. structural learning and incomplete data. In Pro-
ceedings of the International and Interdisciplinary
D. Dash and M.J. Druzdzel. 2003. Robust inde- Conference on Adaptive Knowledge Representa-
pendence testing for constraint-based learning of tion and Reasoning (AKRR 2005), Espoo, Fin-
causal structure. In Proceedings of The Nine- land, pages 3340.
teenth Conference on Uncertainty in Artificial In-
telligence (UAI03), pages 167174. M. Meila-Predoviciu. 1999. Learning with Mixtures
of Trees. Ph.D. thesis, MIT.
A. Dempster, N. Laird, and D. Rubin. 1977. Maxi-
mum likelihood from incomplete data via the EM J.W. Myers, K.B. Laskey, and T.S. Lewitt. 1999.
algorithm. Journal of the Royal Statistical Soci- Learning bayesian network from incomplete data
ety, B 39:138. with stochatic search algorithms. In Proceedings
of the Fifteenth Conference on Uncertainty in Ar-
P. Domingos and M. Pazzani. 1997. On the optimal- tificial Intelligence (UAI99).
ity of the simple bayesian classifier under zero-one
loss. Machine Learning, 29:103130. J.M. Pena, J. Lozano, and P. Larranaga. 2002.
Learning recursive bayesian multinets for data
N. Friedman, D. Geiger, and M. Goldszmidt. 1997. clustering by means of constructive induction.
bayesian network classifiers. Machine Learning, Machine Learning, 47:1:6390.
29(2-3):131163.
M. Ramoni and P. Sebastiani. 1998. Parameter es-
N. Friedman. 1997. Learning belief networks in the timation in Bayesian networks from incomplete
presence of missing values and hidden variables. databases. Intelligent Data Analysis, 2:139160.
In Proceedings of the 14th International Confer-
ence on Machine Learning, pages 125133. Mor- M. Ramoni and P. Sebastiani. 2000. Robust learn-
gan Kaufmann. ing with missing data. Machine Learning, 45:147
170.
N. Friedman. 1998. The bayesian structural EM
algorithm. In Gregory F. Cooper and Serafn D.B. Rubin. 1976. Inference and missing data.
Moral, editors, Proceedings of the 14th Conference Biometrika, 63:581592.
on Uncertainty in Artificial Intelligence (UAI-
98), pages 129138, San Francisco, July. Morgan J.P. Sacha. 1999. New Synthesis of bayesian Net-
Kaufmann. work Classifiers and Cardiac SPECT Image In-
terpretation. Ph.D. thesis, The University of
D. Geiger. 1992. An entropy-based learning algo- Toledo.
rithm of bayesian conditional trees. In Uncer-
tainty in Artificial Intelligence: Proceedings of the D. J. Spiegelhalter and S. L. Lauritzen. 1990. Se-
Eighth Conference (UAI-1992), pages 9297, San quential updating of conditional probabilities on
Mateo, CA. Morgan Kaufmann Publishers. directed graphical structures. Networks, 20:579
605.
S. Geman and D. Geman. 1984. Stochastic re-
laxation, Gibbs distributions, and the Bayesian
restoration of images. IEEE Transactions on Pat-
tern Analysis and Machine Intelligence, 6(6):721
741, November.
V&CRrLV7PB_'PIC8<gRC # %I68PBV%jV%jDC
gRPQ8V&CRr'V # %;B8:V%jDC gRPQ\$[;IEk69h&PQ8<h$;IE%CE Cf<69E%C:>DE PQ\ V%jDCh&CTgRPQ8h%69:L}
j<6dHQj<CFE&}PBE :DCFE%C:gRPQ8^V&CRrLV # ( JQgkj<;B8DHQ698<HV%jDC@A;BMfDCPB_ CFE ;IV%6dPQ8h]Cj<;s@BCeV%j;IVWPQfDE\$CFV%jDPL:j<;Bh;*@BCFE%l
+ _cE%PQ\ 021!V&b P 0 %HQ69@BChE 69h&CpV&PT;@L6dPQM9;IV%6dPQ8>z,jDC j<6dHQjTE f<8V%69\$CgRPQ\$[M9CRrL6dVmlB>"#MdV%j<PQfDHQj6dV%hE%C'f<6dE%CR}
V%j<6dEk:@'6dPQM;IV%6dPQ8T\$C8V%6dPQ8DC:e;IP@BC69h#8DPBV,lBCFV!gRPs@} \$C8V%h$gF;B8C{E C:<f<gRC:V&P*;IV]MdC;Bh&V$h&PQ\]CCRr'V&C8V
CFE%C:lV%jDC{6:DC8^V%69<C:\]698<69\W;BMUgRPQ8V&CRrLVph%68<gRC]6dV 'lCRrL[MdPQ69V%698DHV%jDCW698:DCF[C8<:DC8<gRCh\$PL:DCM9M9C:698*;
j<;Bh!;{:6CFE C8^V#PBE 6dHQ698>,zj<69h,@L6dPQM9;IV%69PQ8TPBE 6dHQ68<;IV&Ch q;lBCh%6;B8{8DCFVmPBE YJL_cPBEM9;IE%HBCFE8DCFV~PBE%Lh698gFM9f<:<698<H
_cE%PQ\ ;g j<;B8<HBCePB_V%jDC@I;BM9fDCTPB_V%jDC@I;IE 69;IMdu C + ;M9;IE%HBCe8'f<\CFEWPB_PBh%CFE%@A;IYMdCT@I;IE 69;IMdCh{h&V%f<:Dl}
_cE%PQ\ 0%#V&P 0 (;B8<:;IHQ;B698j<;Bh #'%,_cPBE"6dV%hPIC8:<698DH 698DH\$PQ8DPBV&PQ869gF6dV~l698DCF@L6dV%;IM9l$CgRPQ\]Ch"698D_cC;Bh%6dMdCB>
gRPQ8V&CRr'VF>z,jDCgRPQ8V&CRr'6V # % 8DP+69h"\$CFE CMdl68<gF69:DC8L} z,jDCf<8D_;@BPQf<E ;IMdCpgRPQ\$[fDV%;IV%6dPQ8;BMUgRPQ\][MdCRrD6dV~lTPB_
V%;BMh%698<gRC!8DP4@L6dPQM9;IV%6dPQ8{j<;BhCFC8{69:DC8V%6d<C:W_PBEV%jDC V%jDC{[<E%PBMdC\TJUjDPCF@BCFEsJ
69h4M969BCMdlV&P_cPBE%Ch&V%;BM9MV%jDC
[;B6dE"PB_U;Bh%h%6dHQ8\$C8^V%h 0% # (?;B8<: 0 (&# (!69V%j68<gFM9f<:DC :DCh%69HQ8PB_UCh h&C8^V%6;BM9Mdl{\]PBE%CC WgF6dC8V\$CFV%j<P':<h>
V%jDC4j6dHQjDCFE&}PBE :<CFE%C:gRPQ8^V&CRrL6V # ( > `b8@L6dCF PB_V%jDCj<6dHQjE f<8V%69\$CgRPQ\$[MdCRrD6dVml698L}
zaP]gRPQ8<gFM9f<:<CBJ'CPQf<M9:M96dBC!V&PW8DPBV&CV%j<;IV_cPBE; @BPQMd@BC:J^C[<E%PB[PQh&CV&Pf<h&CPQfDE\$CFV%jDPL:_cPBEh&V%f<:Dl}
HQ6d@BC8h&CFVPB_@'6dPQM;IV%6dPQ8<hFJV%jDC!h%V&E f<gRV%fDE ;BMMdlp\W698<69\];BM 698DH![<E%PB[CFE%V%69Ch7PB_ L" TRN JI> E/F"G>FBI F"G5JNKJcN MPQ8<MdlB>0z,jDC
**( "(
( - (
**(
"( "!" #
%*&,( '
( ( "!" #$
%&(
)
"( '
"( +( "! #$
%&( , '+(
,
** "! #$
%&$
'
"(
*( "( "( "( "!" #
%*&,(
"( "(
"!" #
%&(
!" #$
%&(
"( , +( )
"(
**( "!" #
%&(
"( "!" #
%*&,(
*( "( +(
,
** '
, ' '
**
"!" #
%&$ '
+*
'
"( )
"( (
( "!" #$
%&(
+*( , '+( "( "!" #
&( (
** )
'
, ' "!" #
%&$
** "!" #
%*&
"! #$
%&
V&CRrLV%h_PBE"69:DC8V%6d_cl'68DHV%jDC\$PL:DCM9M9698<H698<;B:DC'f<;BgF6dCh
ID^CFO^PGO^U+`G+S-OTa2h-o^:?8%=]<t1
143D=?S3;?A@
7=ObH d;?OY?S7<@X36143;DQ6@ T=LO4?SR 1aC143HI@X7=_`H 7<LOY?=9 j\7Se=?Sa 17=336?AHfkT=@ n_AO47=a
a 1Y6?A@
a)7=a@
?A:r1Y?Ak?S5[7<T:=?=b
;T=@:y7<@X1YTZ]6aa
];6U7=R14O41Y?SaiT=UK_AO47=a
a 1Y6?A@
aSbkd;T?A:=?A@|b 1436_AO]65;?7a 1436DZOY?E_AO47=a
a:<7<@
147<OY?E7=3657=aa ]L_
d7<@ ?
H d;?QL@ T=LOY?SR 1aaITZOY:<7<LOY?143QTZO4er3;TZR147=O&H 14R?=9
lr TZ3;?`f561R?S36a 1YTZ3L7=O9 M?3;T 143
HI@ Tr5L]6_`?H d;?_`TZ3&
7=RQLO4?Sa136_AO4]65;?H d;?367=1Y:=?j\7Se=?Sa
147=3 7=365 uGv _`?AQ6HT=UR]LOYH 1f5614R?S36a 1YTZ367=O1YHfe143j\7Se=?Sa
147=33;?AH
_AO47=a a
1Y6?A@
a@ ?A:&1Y?A?S57<T:=?7=365H d;?a ];6U7=R14O4eT=U kT=@ _AO47=a a
1Y6?A@
aCre5;?A36143;D>7nU7=R14OYeT=URT&5;?SO4a
j\7Se=?Sa 17=3 3;?AHT=@
_AO7=a a 1Y6?A@XaT=U)dL14_
dH d;?na ];; H d67<H)R7Se1436_AO4]L5;?Ri]6OYH 14QLOY?_AO47=a a.:<7<@
147<LO4?SaA9
D=@
7<QLd143L56]6_`?S5
eNH d6?U?S7<H ];@ ?:<7<@
147<OY?Sa_`TZ3LaIH 1 Uy{4pm>}+}-k) P}~2<42k}>2< k$m>nz )4j2P
H ];HI?Sa7UWT=@
?SaIfH 50]L_A7=aAb <o=o
6X9\;T=@)T=H d;?A@a ];6U7=R P} k)zi14a.7j\7|e=?Sa 147=33;?AHfkT=@ T=UGd614_
dH d;?CD=@X7<QLd
14O414?Sa\T=U$j\7|e=?Sa 147=33;?AHfkT=@ >_AO47=a a
1Y6?A@
aAb&@ ?Sa ?S7<@
_
d;?A@Xa N !KL ' cQ d67=a7K@ ?SaIHI@
14_`HI?S5HIT=QTZO4T=D=e=9ud;?aI?AJ H
d67|:=?Q6@ ?SaI?S3
HI?S5d;?S];@
14a H 14_7=OYD=T=@X1YH d6Ra)d61_
dnQ6@ T<
*)
:&145;?D=T
T&5+bH d6TZ];DZd3;TZ3&T=Q6H 1R7=ObaITZO4]6H 1YTZ36& a 5^&7y T=U@
7=3656TZR :<7<@
147<LOY?Sa14aQ7<@ H 1YH 1YTZ3;?S5[143HITEH d;?aI?AH a
! w % ')()()(*' b / <bT=U_AO47=a aE:y7<@
17<LOY?Sa
d67=R1bSq=<q r K?AT=DZd7=3L5F$7 S7=361bSq=q=q 6X9M?
+ -,
7=365H d;?aI?AfH Zr ! w % ')()()($' ut b v / <b+T=UU?S7y
kTZ]6O45O41Y=?HIT3;T=HI?H d67<HH d;?Sa ?@
?Sa ]6OYH a7=O4Ok@ ?SO47<HI? H ];@ ?:y7<@X147<LOY?SaA9ud6?aI?AHkT=U7<@
_Aa T=U+H d;?D=@
7<QLd14a
HITOY?S7<@
3613;D7i_AO47=a a 146?A@UT=@.7i;lr?S5aI?AH\T=U@
?SOY?A:y7=3
H
)
H d;?aI?SOY?S_`H 1YTZ3T=U7=3T=Q6H 14R7=OaI?AHT=U)UW?S7<H ];@
?:<7<@
1 d67|:r1436DH d;?CUWTZOOYT|)13;DQ6@ T=Q?A@ H 1Y?Sa)
7<LOY?Sa)d67=a3;T=H?A?S3aITZO4:=?S5>7=ae=?AHA9 UT=@?S7=_Xs d uq[;3rH d;?A@
?E14aK_ 7 C [ )1YH d
^&136_`?1YH aD=@X7<QLd d67=a 7 ;l&?S5#HIT=QTZOYT=D=e=bH d;?
1
)2.0/ 5C ' 6[ 7=365UT=@?S7=_
d [_H d;?A@ ?
Q6@ T=OY?SR T=UnOY?S7<@
361436D7367=1Y:=?'jk7|e=?Sa 147=3 _AO7=a a 1 1a.7=3 f[Zr)1YH d 5 ' 6[ ).0/
6?A@-7=RTZ]63
H a'HIT FI]LaIH ?Sa H 7<LO414a dL143;DR7yl;14Ri]6R H d6?a
];6D=@
7<QLdT=cU N H d67<H14a14365L]6_`?S5
e Z
O41Y=?SO14d;TrTr5E?SaIH 14R7<HI?Sa)UW@
TZR#H d;?K7S:<7=14O47<OY?K567<H 7UT=@ 1
$? r]67=O4 a Nc !MLZ ' Q
1YH aQL7<@
7=R?AHI?A@
aA9;T=@7;l&?S5nD=@X7<QLd614_A7=O$aIHI@
]L_`H ];@ ? *)
143D=?S36?A@
7=ObGH d;?R7yl;14Ri]6RfO41Y=?SO14d;TrTr5?SaIH 14R7<HIT=@
a H d6?a
];6D=@
7<QLdT=8U N H d67<H14a143L56]6_`?S5
e 3r
T=U$H d;?CQL7<@
7=R?AHI?A@)Q6@ T=7<L14O41YH 14?Sa7<@ ?DZ1Y:=?S3re
1
$? r]67=O4
*) a Nqr !ML3r ' r Q 9
d q5Y^SYZ6 '
T V BW X V B !
ud;?ia ];6D=@X7<QL_ d Nc 14a7D=@X7<QLd614_A7=O2aIHI@
]6_`H ];@
?T:=?A@
)d;?A@
? d 5;?S3;T=HI?SaH d;?\?SRQ1Y@
14_A7=Or5614aIHI@X1YL];H 1YTZ35;?`
H d;?_AO47=a a:y7<@X147<LOY?Sa7=3651a_A7=O4O4?S5H d;?_AO47=a a 146?A@ pa 3
L3;?S5
e-H d6?U@ ?$
]6?S36_A1Y?Sa>T=UT&_A_A];@ @
?S36_`?143H d;? )4j2PPyZz=2 H d;?a ];LD=@
7<QLR
465 d Nqr14a+_A7=O4O4?S5K1YH a w kx2
567<H 7&9\ud;?CQ6@
T=LOY?SR#T=UOY?S7<@
3L143;D7uGv_AO47=a a
1Y6?A@ 465 m>y{z+kPyZz=2 9ud;?a
];6D=@
7<QLR d N !ML ' Q 14a
.0/ *) .0/
7=RTZ]63
H aHIT6@
a H)5;?AHI?A@
R136143;D7D=@X7<QLd614_A7=Oa HI@
]6_ 7L1YQL7<@ H 1YHI?D=@
7<QdNH d67<H@ ?SO47<HI?SaKH d6?:y7<@X1YTZ]6aCU?S7y
H ];@ ?T=U[R7yl;14Ri]6R O1Y=?SO414d;TrT&5 DZ1Y:=?S3 H d;?'7S:<7=14O H ];@ ?[:y7<@
17<LOY?SaHITH d6?[_AO47=a a:<7<@
147<OY?SaA.H d614aa ];;
7<LOY?567<H 77=365'H d;?S3-?SaIH 7<O414a d613;D?Sa H 14R7<HI?SaUT=@ D=@
7<QLd1a._A7=O4OY?S5H d;? w kx2<m>y{zxk8)k4kx*m>}><yZ<z+2 >T=U 7465
1YH aiQL7<@
7=R?AHI?A@
aA9;T=@_`TZ3LaIHI@
]6_`H 1436D7R7yl;14Ri]6R H d;?[_AO7=a a 1Y6?A@7=3L51YH aaI?AHT=UC7<@
_AaE14aHI?A@
R?S5H d;?
O41Y=?SO14d;TrTr5U?S7<H ];@ ?KHI@ ?A?=b7QTZOYe&3;TZR147=OYH 14R?7=OYD=T< 3 _AO47=a a
1Y6?A@ pa w kx2<m>y{zxk$k4kx$m>}~ 2z= )k$m9;T=@7=3e
@
1YH dLR#14a.7S:<7=14O47<LO4?CUW@ TZR ;@
1Y?S56R7=l 3 k)m24poq5Sq=q <6X9 :<7<@
147<LOY? # 1437ER]LOYH 1f5614R?S36a 1YTZ367=O_AO47=a a
1Y6?A@Sb;k?
u2T _`TZ36_AO]65;?=b?TZ]LO45O41Y=?HIT3;T=HI?H dL7<Hn7 3;T]6aIf? ^ # HITE5;?S36T=HI?H d;?_AO7=a aQL7<@
?S3H a)T=U #
j\7Se=?Sa 17=33;?AHfkT=@ _AO47=a a 1YL?A@_`TZRQ];HI?SaUWT=@?S7=_Xd 14 3 Nib&H d67<H.14aA{b ^ # ! ^ # Z9M?U]6@ H d;?A@\]6aI?
98
1436a H 7=36_`?[7_`TZ3L561YH 1YTZ367=OQ6@ T=L7<L1O41YHfe5614a HI@
1YL];H 14TZ3 ^r # HIT56?S3;T=HI?H d;?UW?S7<H ];@
?iQ7<@ ?S3H aT=U # 143 Ni9
T|:=?A@EH d;?_AO47=a a:<7<@
147<OY?=9;T=@_AO47=a a 1Y_A7<H 1YTZ3QL];@I vT=HI?)H d67<HUT=@7=3
e_AO47=a a:<7<@
147<OY? k?H dr]6ad67|:=?
QTZaI?SaAb.1YHU];@ H d6?A@14a7=a a Tr_A147<HI?S5 )1YH d7U]L36_`H 1YTZ3 H d67<H ^r ! 7=365 ^q !1^q9
;:
a 146?A@7=365a d;TH d67<HH d614a$Q6@
T=LOY?SR_A7=3?.5;?S_`TZR U@ TZR ;@
1Y?S5LR7=3 k)m 24jo5Sq=q <6XbG?d67|:=?H d67<HH d;?
QTZaI?S5143
HITHfkT[aI?AQL7<@
7<HI?T=QLH 14R14a 7<H 14TZ3[Q6@
T=LOY?SRa OYT=D<fO1Y=?SO414d;TrT&5NT=SU IDZ1Y:=?S37567<H 7aI?A?H _A7=3?
)dL14_
d_A7=3T=H d?KaITZO4:=?S5>143QTZOYe&3;TZR147=OH 14R?=9 @X1YHIHI?S37=a
6 G S5>I6!
jk?AUT=@ ?5;?AL3L143;DH d;? Q6@ T=OY?SR T=UO4?S7<@
36143;D7 + t
U]6O4OYeHI@ ?A?`f7=];DZR?S3
HI?S5Ri]6OYH 1Yf5614R?S36a
1YTZ367=O)_AO7=a a 1 !21 43 578:6 9 ^ 6<;43
5 c 578:6 9 5> S
^ 6
6?A@U@ TZR 567<H 7&bk?i@
?S_A7=O4OH dL7<HH d6?i@
?SO47<HI?S5Q6@ T=; i % i %
)1YH d6TZ];H)7=3
e>7<@
_AaAb7=a143[H d;?7<T|:=?iQ6@ T&_`?S56];@
?=bL14a 8kTZRQTZ]L365367=1Y:=? o&9
<o <&
&3;T|)37=a w zn2z=f$k4kx*m>}>L9$OYHI?A@
3L7<H 1Y:=?SOYe=b x2 n]6OYH 1f5L14R367=1Y:=? o&9 t=t=t * <q
n2z+-k4 }U}2<m>}~_A7=3?\]6aI?S5byd614_
daIH 7<@
H a$)1YH d 8kTZRQTZ]L365u2v o&9 s<oZt s<o <o
7KU]6O4OrD=@
7<Qd614_A7=O6aIHI@X]6_`H ];@ ?U@ TZR)d614_Xda 143;DZO4?7<@
_Aa n]6OYH 1f5L14RGuGv o&9
<t AoZ<q
7<@ ?K@
?SRT|:=?S5n1437=3>1YHI?A@
7<H 14:=?KU7=a
d61YTZ3+9
a 1 A?CT=U567<H 7a ?A)H
Zo=o
6
L2< Z<n < $4p2PP} kzfm> 3k
4 2P*o
{2z)o
4
Abstract
In this paper we propose using dependency networks (Heckerman et al., 2000), that is a
probabilistic graphical model similar to Bayesian networks, to model classifiers. The main
difference between these two models is that in dependency networks cycles are allowed, and
this fact has the consequence that the automatic learning process is much easier and can
be parallelized. These properties make dependency networks a valuable model especially
when it is needed to deal with large databases. Because of these promising characteristics
we analyse the usefulness of dependency networks-based Bayesian classifiers. We present
an approach that uses independence tests based on chi square distribution, in order to
find relationships between predictive variables. We show that this algorithm is as good
as some state-of the-art Bayesian classifiers, like TAN and an implementation of the BAN
model, and has, in addition, other interesting proprierties like scalability or good quality
for visualizing relationships.
Dependency networks based classifiers: learning models by using independence tests 117
With this method joint probability can be ob- where Pai is the current set of Xi s parents mi-
tained by computing each term separately. The nus C. We assess the goodness of every candi-
determination of the first term does not requires date parent by the difference between the per-
Gibbs sampling. The second and third terms centile of the 2 distribution (with = 0.025)
should be determined by using Gibbs sampling and the statistic value. Here the degrees of free-
but in this case we know all values for their con- dom must be
ditioning variables, so they can be determined Y
directly too by looking at their conditional prob- df = (|Xi | 1) (|Xj | 1) (|P ai |).
ability tables. If we assume that the class vari- P ai Pai
able will be determined before other variables Obviously this process finishes when all can-
(as it is the case), by this procedure we can com- didate parents not selected yet shows indepen-
pute any joint probability avoiding the Gibbs dence with Xi according to this test, and in or-
sampling process. der to make search more efficient, every candi-
4 Finding dependencies date parent that appear independent in any step
in the algorithm, is rejected and is not taken
In this section we present our algorithm to build into account in next iterations. The reason of
a classifier model from data based on depen- rejecting these variables is just for efficiency, al-
dency networks. Our idea is to use indepen- though we know that any of these rejected vari-
dence tests in order to find dependencies be- ables can become dependent in posterior inter-
tween predictive variables Xi and Xj , but tak- ations.
ing into account the class variable, and in this Figure 1 shows the pseudo-code of this algo-
way extend the Naive Bayes structure. It is rithm called ChiSqDN.
known that statistic
I n i t i a l i z e s t r u c t u r e t o Naive Bayes
2 N c I(Xi ; Xj |C = c)
For each v a r i a b l e Xi
Pai =
follows a2 distribution with (|Xi |1)(|Xj |1) Cand = X \ {Xi } // Candidate p a r e n t s
degrees of freedom under the null hypothesis of While Cand 6=
For each v a r i a b l e Xj Cand
independence, where N c is the number of input
Let val be a s s e s s m e n t f o r Xj by 2
instances in which C = c and I(Xi ; Xj |C = c) test
is the mutual information between variables Xi I f val 0 Then
Remove Xj from Cand
and Xj when the class variable C is equal to
c. Using this expression, for any variable can Let Xmax be t h e b e s t v a r i a b l e found
be found all other variables (in)dependent given Pai = Pai Xmax
Remove Xmax from Cand
the class.
But it is not this set what we are looking For each v a r i a b l e P ai Pai
for, we need a set of variables which made that Make new l i n k P ai Xi
variable independent from the other, that is, its
Markov blanket. So, we try to discover this set Figure 1: Pseudo-code for ChiSqDN algorithm.
by carrying out an iterative process, for every
predictive variable independently, in which in It is clear that determining degrees of free-
each step is selected as a new parent the vari- dom is critical in order to perform properly this
able not still chosen which shows more depen- test. The scheme outlined before is the general
dency with the variable under study (Xi ). This way for assessing this parameter, nonetheless,
selection is done considering all the parents pre- in some cases, especially for deterministic or al-
viously chosen. Thus the statistic used actually most deterministic relationships, the tests car-
is ried out using degrees of freedom in this way
2 N c I(Xi ; Xj |C = c, Pai ), can fail. Deterministic relationships are pretty
Dependency networks based classifiers: learning models by using independence tests 119
dation in which a 2-fold cv is performed five the ease of learning and possibility of doing this
times and in any repetition a different partition process in parallel without introduce an extra
is used. With this process we have the valida- workload.
tion power from the 10-fold cv but with less cor-
Apart from these characteristics that can be
relation between each result (Dietterich, 1998).
exploited from a dependency network, we can
Once we have performed our experiments in
focus on its descriptive capability, i.e. how
that way, we have the results for each classifier
good is the graph for each model to show re-
in table 2.
lations over the domain. Taking into account
TAN BAN+BIC ChiSqDN pima dataset, in figures 3 and 4 are shown the
australian 0.838841 0.866957 0.843478 graphs yielded by the algorithms BAN+BIC
heart 0.819259 0.826667 0.823704
and ChiSqDN respectively.
hepatitis 0.852930 0.872311 0.865801
iris 0.948000 0.962667 0.962667 Obviously, for every variable in these graphs,
lung 0.525000 0.525000 0.525000
pima 0.772135 0.773177 0.744531 those that form its Markov blanket are the more
post op 0.653333 0.673333 0.673333 informative or more related. The difference is
segment 0.936970 0.909437 0.931169 that for a Bayesian network graph the Markov
soybean 0.898661 0.906893 0.903950
vehicle 0.719385 0.697163 0.698345 blanket set is formed by parents, children and
vote 0.945299 0.944840 0.931045 parent of children for every variable, but for a
dependency network graph the Markov blanket
Table 2: Classification accuracy estimated for set are all variables directly connected. Thus in
algorithms tested. tables 3 and 4 we show the Markov blanket ex-
tracted from the corresponding graph for every
variable. Evidently the sets do not match due
5.1 Analysis of the results to the automatic learning process but are quite
The analysis done for the previous results con- similar. Apart from that, it is clear that discov-
sists on carring out a statistical test. The test ering these relationships from the dependency
selected is the paired Wilcoxon signed ranks, network graph is easier, especially for people not
that is a non parametric test which examines used to deal with Bayesian networks, than doing
two samples in order to decide whether the dif- it from a Bayesian network. We think that this
ferences shown by them can be assumed as not point is an important feature of dependency net-
significant. In our case, if this test holds, then works because can lead us to a tuning process
we can conclude that the classifiers which yield helped by human experts after the automatic
the results analysed have similar classification learning.
power.
Thus, this test is done with a level of sig-
nificance of 0.05 ( = 0.05) and it shows that Variable Markov blanket
both reference classifiers do not differ signifi- mass skin, class
cantly from our classifier learned with ChiSqDN skin age, mass, insu, class
algorithm. When we compare our algorithm insu skin, plas, class
with TAN we obtain a p-value of 0.9188, and plas insu, class
for the comparison with BAN model the p-value age pres, preg, skin, class
is 0.1415. Therefore ChiSqDN algorithm is as pres age, class
good as the Bayesian classifier considered, in preg age, class
spite of the approximation introduced in the fac- pedi class
torization of the joint probability distribution
Table 3: Markov blanket for each predictive
due to the use of general dependency networks.
variable from the BAN+BIC classifier.
Nonetheless, also as consequence of the use of
dependency networks we can take advantage of
plas pedi
mass pres
preg
insu age
skin
Figure 3: Network yielded by BAN algorithm with BIC metric for the pima dataset.
class
pedi
insu
preg
plas skin
age
mass pres
Dependency networks based classifiers: learning models by using independence tests 121
The ChiSqDN algorithm shown here learn Juan L. Mateo. 2006. Dependency networks: Use
structural information for every variable inde- in automatic classification and Estimation of Dis-
tribution Algorithms (in spanish). University of
pendently, so we plan to study a parallelized
Castilla-La Mancha.
version in the future. Also we will try to im-
prove this algorithm, in terms of complexity, D.J. Newman, S. Hettich, C.L. Blake,
by using simpler statistics. Our idea is to use and C.J. Merz. 1998. UCI Repos-
itory of machine learning databases.
an approximation for the mutual information http://www.ics.uci.edu/mlearn/MLRepository.html.
over multiples variables using a decomposition
in simpler terms (Roure Alcobe, 2004). Josep Roure Alcobe. 2004. An approximated MDL
score for learning Bayesian networks. In Proceed-
Acknowledgments ings of the Second European Workshop on Proba-
bilistic Graphical Models 2004 (PGM04).
This work has been partially supported by
Gideon Schwarz. 1978. Estimating the dimension of
Spanish Ministerio de Ciencia y Tecnologa, a model. Annals of Statistics, 6(2):461464.
Junta de Comunidades de Castilla-La Man-
cha and European Social Fund under projects Yusuf Kenan Yilmaz, Ethem Alpaydin, Levent Akin,
TIN2004-06204-C03-03 and PBI-05-022. and Taner Bilgi. 2002. Handling of Deterministic
Relationships in Constraint-based Causal Discov-
ery. In First European Workshop on Probabilistic
Graphical Models (PGPM02), pages 204211.
References
Wray L. Buntine. 1994. Operations for Learning
with Graphical Models. Journal of Artificial In-
telligence Research, 2:159225.
Abstract
In this paper we propose a naive Bayes model for unsupervised data clustering, where
the class variable is hidden. The feature variables can be discrete or continuous, as the
conditional distributions are represented as mixtures of truncated exponentials (MTEs).
The number of classes is determined using the data augmentation algorithm. The proposed
model is compared with the conditional Gaussian model for some real world and synthetic
databases.
Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials 125
Example 1. The function f defined as been set to four, again according to the results
in the cited reference.
5 + e2x + ex ,
y = 0, We start the algorithm with two clusters with
the same probability, i.e., the hidden class vari-
0 < x < 2,
able has two states and the marginal probability
1 + ex y = 0,
of each state is 1/2. The conditional distribu-
2 x < 3,
f (x|y) = 1 tion for each feature variable given each of the
+ e2x y = 1, two possible values of the class is initially the
5
same, and is computed as the marginal MTE
0 < x < 2,
density estimated from the train database ac-
4e3x y = 1,
cording to the estimation algorithm described
2 x < 3.
in section 3.
The initial model is refined using the data
is an MTE potential for a conditional x|y re- augmentation method (Tanner and Wong,
lation, with X continuous, x (0, 3]. Observe 1987):
that this potential is not actually a conditional
density. It must be normalised to integrate up 1. First of all, we divide the train database
to one. into two parts, one of them properly for
In general, an MTE conditional density training and the other one for testing the
intermediate models.
f (x|y) with one conditioning discrete variable,
with states y1 , . . . , ys , is learnt from a database 2. For each record in the test and train
as follows: databases, the distribution of the class vari-
able is computed.
1. Divide the database in s sets, according to
variable Y states. 3. According to the obtained distribution, a
value for the class variable is simulated and
2. For each set, learn a marginal density for inserted in the corresponding cell of the
X, as shown above, with k and m constant. database.
4 Probabilistic clustering by using 4. When a value for the class variable for all
MTEs the records has been simulated, the con-
ditional distributions of the feature vari-
In the clustering algorithm we propose here, the ables are re-estimated using the method de-
conditional distribution for each variable given scribed in section 3, using as sample the
the class is modeled by an MTE density. In the train database.
MTE framework, the domain of the variables is
split into pieces and in each resulting interval 5. The log-likelihood of the new model is com-
an MTE potential is fitted to the data. In this puted from the test database. If it is
work we will use the so-called f ive-parameter higher than the log-likelihood of the initial
MTE, which means that in each split there are model, the process is repeated. Otherwise,
at most five parameters to be estimated from the process is stopped, returning the best
data: model found for two clusters.
f (x) = a0 + a1 ea2 x + a3 ea4 x . (2) In order to avoid long cycles, we have lim-
ited the number of iterations in the procedure
The choice of the f ive-parameter MTE is mo- above to 100. After obtaining the best model for
tivated by its low complexity and high fitting two clusters, the algorithm continues increasing
power (Cobb et al., 2006). The maximum num- the number of clusters and repeating the proce-
ber of splits of the domain of each variable has dure above for three clusters, and so on. The
Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials 127
DataSet #instances #attributes discrete? classes source
returns 113 5 no (Shenoy and Shenoy, 1999)
pima 768 8 no 2 UCI(Newman et al., 1998)
bupa 345 6 no 2 UCI(Newman et al., 1998)
abalone 4177 9 2 UCI(Newman et al., 1998)
waveform 5000 21 no 3 UCI(Newman et al., 1998)
waveform-n 5000 40 no 3 UCI(Newman et al., 1998)
net15 10000 15 2 (Romero et al., 2006)
sheeps 3087 23 5 real farming data (Gamez, 2005)
Table 1: Brief description of the data sets used in the experiments. The column discrete means
if the dataset contains discrete/nominal variables (if so the number of discrete variables is given).
The column classes shows the number of class-labels when the dataset comes from classification
task.
Two independent runs have been carried significantly lower than the number of clusters
out for each algorithm and data set: produced by the EM. In some cases, a model
with many clusters, all of them with low prob-
The algorithm proposed in this pa-
ability, is not as useful as a model with fewer
per is executed over the given training
clusters but with higher probability, even if the
set. Let #mte clusters be the number of
last one has lower likelihood.
clusters it has decided. Then, the EM
Another fact that must be pointed out is that,
algorithm is executed receiving as in-
in the four problems where the exact number
puts the same training set and number
of classes is known (pima, bupa, waveform and
of clusters equal to #mte clusters.
waveform-n) the number of clusters returned by
The WEKA EM algorithm is exe-
the MTE-based algorithms is more accurate.
cuted over the given training set. Let
Besides, forcing the number of clusters in the
#em clusters be the number of clusters
EM algorithm to be equal to the number of clus-
it has decided. Then, our algorithm is
ters generated by the MTE method, the differ-
executed receiving as inputs the same
ences turn in favor of the MTE-based algorithm.
training set and number of clusters
After this analysis, we consider the MTE-
equal to #em clusters.
based clustering algorithm a promising method.
In this way we try to compare the perfor- It must be taken into account that the EM
mance of both algorithms over the same algorithm implemented in Weka is very opti-
number of clusters. mised and sophisticated, while the MTE-based
method presented here is a first version in which
Table 2 shows the results obtained. Out of many aspects are still to be improved. Even in
the 8 databases used in the experiments, 7 re- these conditions, our method is competitive in
fer to real world problems, while the other one some problems. Models which are away from
(Net15) is a randomly generated network with normality (as Net15) are more appropriate for
joint MTE distribution (Romero et al., 2006). applying the MTE-based algorithm rather than
A rough analysis of the results suggest a better EM.
performance of the EM clustering algorithm: in
5 out of the 8 problems, it obtains a model with 6 Discussion and concluding remarks
higher log-likelihood than the MTE-based algo-
rithm. However, it must be pointed out that the In this paper we have introduced a novel unsu-
number of clusters contained in the best mod- pervised clustering algorithm which is based on
els generated by the MTE-based algorithm is the MTE model. The model has been tested
Table 2: Results obtained. Columns 2-4 represents the number of clusters discovered by the
proposed MTE-based method, its result and the result obtained by EM when using that number
of clusters. Columns 5-7 represents the number of clusters discovered by EM algorithm, its result
and the result obtained by the MTE-based algorithm when using that number of clusters.
Unsupervised naive Bayes for data clustering with mixtures of truncated exponentials 129
A. P. Dempster, N. M. Laird, and D.B. Rubin. 1977. M.A. Tanner and W.H Wong. 1987. The calcula-
Maximum likelihood from incomplete data via the tion of posterior distributions by data augmenta-
em algorithm. Journal of the Royal Statistical So- tion (with discussion). Journal of the American
ciety, 1(39):138. Statistical Association, 82:528550.
R.O. Duda, P.E. Hart, and D.J. Stork. 2001. Pat- I.H. Witten and E. Frank. 2005. Data Min-
tern Recognition. Wiley. ing: Practical Machine Learning Tools and Tech-
niques. Morgan Kaufmann.
N. Friedman. 1998. The Bayesian structural EM
algorithm. In Gregory F. Cooper and Serafn
Moral, editors, Proceedings of the 14th Conference
on Uncertainty in Artificial Intelligence (UAI-
98), pages 129138. Morgan Kaufmann.
Francisco J. Dez
Department of Artificial Intelligence, UNED
Juan del Rosal 16
28040 Madrid, Spain
Abstract
In previous work we have introduced dynamic limited-memory influence diagrams (DLIM-
IDs) as an extension of LIMIDs aimed at representing infinite-horizon decision processes.
If a DLIMID respects the first-order Markov assumption then it can be represented by
2TLIMIDS. Given that the treatment selection algorithm for LIMIDs, called single policy
updating (SPU), can be infeasible even for small finite-horizon models, we propose two
alternative algorithms for treatment selection with 2TLIMIDS. First, single rule updating
(SRU) is a hill-climbing method inspired upon SPU which needs not iterate exhaustively
over all possible policies at each decision node. Second, a simulated annealing algorithm
can be used to avoid the local-maximum policies found by SPU and SRU.
D1 D1 D1 D1 D1 D1 D1 D1
C2 C2 C2 C2 C2 C2 C2 C2
U1 U1 U1 U1 U1 U1 U1 U1
L0 Lt DLIMID
Figure 1: Chance nodes are shown as circles, decision nodes as squares and utility nodes as dia-
monds. The 2TLIMID (left) can be unrolled into a DLIMID (right), where large brackets denote
the boundary between subsequent times. Note that due to the definition of a 2TLIMID, the infor-
mational predecessors of a decision can only occur in the same, or the preceding time-slice.
2.3 Dynamic LIMIDs and 2TLIMIDs for all V t1 Vt1:t in the transition model
it holds that Gt (V t1 ) = . The transition
Although LIMIDs can often represent finite- model is not yet bound to any specific t, but
horizon decision processes more compactly than if bound to some t 1 : N , then it is used
standard influence diagrams, they cannot rep- to represent P (Ct : Dt1:t ) and utility func-
resent infinite-horizon decision processes. Re- tions U Ut , where both G and P do not
cently, we introduced dynamic LIMIDs (DLIM- depend on t. The prior model is used to rep-
IDs) as an explicit representation of decision resent the initial distribution P 0 (C0 : D0 ) and
processes (van Gerven et al., 2006). To repre- utility functions U U0 . The interface of the
sent time, we use T N to denote a set of time transition model is the set It Vt1 such that
points, which we normally assume to be an in- (Vit1 , Vjt ) A(G) Vit1 It . Given a hori-
terval {u | t u t , {t, u, t } N}, also writ- zon N , we may unroll a 2TLIMID for n time-
ten as t : t . We assume that chance variables, slices in order to obtain a DLIMID with the
decision variables and utility functions are in- following joint distribution:
dexed by a superscript t T, and use CT , DT
Y
N
and UT to denote all chance variables, decision P (C, D) = P 0 (C0 : D0 ) P (Ct : Dt1:t ).
variables and utility functions at times t T, t=1
where we abbreviate CT DT with VT .
Let t = {PDt (D | G (D)) | D Dt } denote
A DLIMID is simply defined as a LIMID the strategy for a time-slice t and let the whole
(CT , DT , UT , G, P ) such that for all pairs of strategy be = 0 N . Given 0 , L0
variables X t , Y u VT UT it holds that if defines a distribution over the variables in V0 :
t < u then Y u cannot precede X t in the or- Y
P0 (V0 ) = P 0 (C0 : D0 ) PD (D | G0 (D))
dering induced by G. If T = 0 : N , where
DD0
N N is the (possibly infinite) horizon, then
we suppress T altogether, and we suppress in- and given a strategy t with t > 0, Lt defines
dices for individual chance variables, decision the following distribution over variables in Vt :
Y
variables and utility functions when clear from Pt (Vt | It ) = P (Ct : Du ) PD (D | G (D))
context. If a DLIMID respects the Markov DDt
assumption that the future is independent of
with u = t 1 : t. Combining both equations,
the past, given the present, then it can be
given a horizon N and strategy , a 2TLIMID
compactly represented by a 2TLIMID (see Fig.
induces a distribution over variables in V:
1), which is a pair T = (L0 , Lt ) with prior
model L0 = (C0 , D0 , U0 , G0 , P 0 ) and transition YN
P (V) = P0 (V0 ) Pt (Vt | It ). (1)
model Lt = (Ct1:t , Dt1:t , Ut , G, P ) such that t=1
References
G.F. Cooper. 1988. A method for using belief net-
works as influence diagrams. In Proceedings of the
4th Workshop on Uncertainty in AI, pages 5563,
University of Minnesota, Minneapolis.
R. Cowell, A. P. Dawid, S. L. Lauritzen, and
D. Spiegelhalter. 1999. Probabilistic Networks
and Expert Systems. Springer.
T. Dean and K. Kanazawa. 1989. A model for rea-
soning about persistence and causation. Compu-
tational Intelligence, 5(3):142150.
Figure 7: Policy graph for the best strategy
found by simulated annealing. R.A. Howard and J.E. Matheson. 1984. Influence di-
agrams. In R.A. Howard and J.E. Matheson, edi-
tors, Readings in the Principles and Applications
ing arcs, then these stand for a negative finding of Decision Analysis. Strategic Decisions Group,
Menlo Park, CA.
(dashed arc) and positive finding (solid arc) re-
spectively. Most of the time, the policy graph S. Kirkpatrick, C.D. Gelatt Jr., and M.P. Vecchi.
associates a negative test result with no treat- 1983. Optimization by simulated annealing. Sci-
ence, 220:671680.
ment and a positive test result with treatment.
S.L. Lauritzen and D. Nilsson. 2001. Representing
5 Conclusion and solving decision problems with limited infor-
mation. Management Science, 47(9):12351251.
In this paper we have demonstrated that reason- N. Meuleau, K.-E. Kim, L.P. Kaelbling, and A.R.
able strategies can be found for infinite-horizon Cassandra. 1999. Solving POMDPs by search-
DLIMIDs by means of SRU. Although compu- ing the space of finite policies. In Proceedings of
tationally more expensive, SA considerably im- the 15th Conference on Uncertainty in Artificial
Intelligence, pages 417426, Stockholm, Sweden.
proves the found strategies by avoiding local
maxima. Both SRU and SA do not suffer from K.P. Murphy. 2002. Dynamic Bayesian Networks.
the intractability of SPU when the number of Ph.D. thesis, UC Berkely.
informational predecessors increases. The ap- J. Pearl. 1988. Probabilistic Reasoning in Intelligent
proach does require that good strategies can be Systems: Networks of Plausible Inference. Mor-
found using a limited amount of memory, since gan Kaufmann Publishers, 2 edition.
otherwise, found strategies will fail to approx- S. Ross. 1983. Introduction to Stochastic Dynamic
imate the optimal strategy. This requirement Programming. Academic Press, New York.
should hold especially between time-slices, since M.A.J. van Gerven, F.J. Dez, B.G. Taal, and P.J.F.
the state-space of memory variables can become Lucas. 2006. Prognosis of high-grade carcinoid
prohibitively large when a large part of the ob- tumor patients using dynamic limited-memory in-
served history is required for optimal decision- fluence diagrams. In Intelligent Data Analysis in
bioMedicine and Pharmacology (IDAMAP) 2006.
making. Although this restricts the types of Accepted for publication.
decision problems that can be managed, DLIM-
IDs, constructed from a 2TLIMID, allow the S.D. Whitehead and D.H. Ballard. 1991. Learning
to perceive and act by trial and error. Machine
representation of large, or even infinite-horizon Learning, 7:4583.
decision problems, something which standard
Rosario Susi
rsusi@estad.ucm.es
Departamento de Estadstica e Investigacin Operativa III
Universidad Complutense de Madrid
Avda. Puerta de Hierro s/n,
28040 Madrid, Spain
Abstract
We present the behavior of a sensitivity measure defined to evaluate the impact of
model inaccuracies over the posterior marginal density of the variable of interest, after the
evidence propagation is executed, for extreme perturbations of parameters in Gaussian
Bayesian networks. This sensitivity measure is based on the Kullback-Leibler divergence
and yields different expressions depending on the type of parameter (mean, variance or
covariance) to be perturbed. This analysis is useful to know the extreme effect of uncer-
tainty about some of the initial parameters of the model in a Gaussian Bayesian network.
These concepts and methods are illustrated with some examples.
Keywords: Gaussian Bayesian Network, Sensitivity analysis, Kullback-Leibler diver-
gence.
If the perturbation is added to the mean When the perturbation is added to the
of the evidential variable, e = e + , variance of the evidential variable, being
the posterior density of the variable ee = ee + and > ee (1 max 2je )
Xj Y
of interest, with the perturbation, is
Y|E=e ie Y|E=e with je the corresponding correlation co-
Xi |E = e, N (i , ii ), efficient, the posterior density of inter-
ee
Y|E=e, Y|E=e,
then est is Xi |E = e, N (i , ii )
2 Y|E=e, ie
e 2 ie with i = i + (e e ) and
S (f (xi |e), f (xi |e, )) = . ee +
Y|E=e ee 2
2ii Y|E=e, ie
ii = ii therefore
ee + " Y|E=e, !
The perturbation added to the mean of any 1 ii
ee
other non-evidential variable, different from the S (f (xi |e), f (xi |e, )) = ln Y|E=e
+
2
variable of interest, has no influence over Xi ,
ii
2
ie
then f (xi |e, ) = f (xi |e), and the sensitivity ee ee + 1 + (e e )2 (ee +)ee
measure is zero. + Y|E=e, .
ii
measure is as follows,
lim S ie (f (xi |e), f (xi |e, )) = 0.
i 0
1. lim S (f (xi |e), f (xi |e, )) =
Proof. 1. It follows directly.
lim S i (f (xi |e), f (xi |e, )) = 0
0
2
with 2ie = ii ie
ee
" ! The behavior of the sensitivity measure is the
1 ii ee Mee 2
= ln ie
+ expected one, except when the extreme pertur-
2 ( 2 )
Mee ii ee ie bation is added to the evidential variance, be-
2
ie
1Mee (ee )2
1+
1Mee cause with known evidence, the posterior den-
ee
Mee ee
Mee sity of interest with the perturbation in the
+ 2 =
Mee ie model f (xi |e, ) is not so different of the pos-
"
2
! terior density of interest without the perturba-
1 Mee ie 2ie (1 Mee
)
tion f (xi |e), therefore although an extreme per-
= ln (1 2 )
+
2 Mee M 2ie turbation added to the evidential variance can
ie
!# ee
(e e )2 1 Mee
exist, the sensitivity measure tends to a finite
1+ .
ee Mee value.
Acknowledgments
References This research was supported by the MEC from
Castillo, E., Gutirrez, J.M., Hadi, A.S. 1997. Expert
Spain Grant MTM2005-05462 and Universidad
Systems and Probabilistic Network Models. New Complutense de Madrid-Comunidad de Madrid
York: SpringerVerlag. Grant UCM2005-910395.
Castillo, E., Kjrulff, U. 2003. Sensitivity analysis
in Gaussian Bayesian networks using a symbolic-
numerical technique. Reliability Engineering and
System Safety, 79:139148.
Chan, H., Darwiche, A. 2005. A distance Mea-
sure for Bounding Probabilistic Belief Change.
International Journal of Approximate Reasoning,
38(2):149174.
Coupe, V.M.H., van der Gaag, L.C. 2002. Propier-
ties of sensitivity analysis of Bayesian belief net-
works. Annals of Mathematics and Artificial In-
telligence, 36:323356.
Gomez-Villegas, M.A., Man, P., Susi, R. 2006. Sen-
sitivity analysis in Gaussian Bayesian networks
Abstract
This paper addresses the problem of learning a Bayes net (BN) structure from a database. We
advocate first searching the Markov networks (MNs) space to obtain an initial RB and, then, re-
fining it into an optimal RB. More precisely, it can be shown that under classical assumptions our
algorithm obtains the optimal RB moral graph in polynomial time. This MN is thus optimal w.r.t.
inference. The process is attractive in that, in addition to providing optimality guarrantees, the
MN space is substantially smaller than the traditional search spaces (those of BNs and equivalent
classes (ECs)). In practice, we face the classical shortcoming of constraint-based methods, namely
the unreliability of high-order conditional independence tests, and handle it using efficient con-
ditioning set computations based on graph triangulations. Our preliminary experimentations are
promising, both in terms of the quality of the produced solutions and in terms of time responses.
Though they are strongly related, BNs and MNs
are not able to represent precisely the same in- Under this assumption, B and equivalent BNs are
dependence shapes. MNs are mostly used in the obviously optimal. Using the axiomatic in (Pearl,
image processing and computer vision community 1988), it can be shown that, in the MN frame-
(Prez, 1998), while BNs are particularly used in work, the DAG-isomorphism hypothesis entails the
the AI community working on modeling and predic- so-called intersection property, leading to the exis-
tion tools like expert systems (Cowell et al., 1999). tence of a unique optimal MN, say G (Pearl, 1988).
In the latter prospect, BNs are mostly preferred for Moreover, G is the moral graph of B .
two reasons. First, the kind of independences they
can express using arcs orientations are considered 3 Markov network search
more interesting than the independence shapes only Searching the BN or the EC space is often per-
representable by MNs (Pearl, 1988). Secondly, as formed as an optimization process, that is, the algo-
BNs parameters are probabilities (while those of rithm looks for a structure optimizing some good-
MNs are non-negative potential functions without ness of fit measure, the latter being a decomposable
any real actual meaning), they are considered easier scoring function that involves maximum likelihood
to assess, to manipulate and to interpret. In BNs, estimates. For MNs, the computation of these esti-
the direction is exploited only through V-structures, mates is hard and requires time-expensive methods
that is, three-node chains whose central node is a (Murray and Ghahramani, 2004) (unless the MN is
collider the neighbors of which are not connected by triangulated, which is not frequent). Hence score-
an arc. This idea is expressed in a theorem (Pearl, based exploration strategies seem inappropriate for
1988) stating that two BNs are equivalent if and MN searches. Using statistical CI tests and com-
only if they have the same V-structures and the same bining them to reveal the MNs edges seems a more
skeleton, where the skeleton of a DAG is the undi- promising approach. This is the one we follow here.
rected graph resulting from the removal of its arcs
direction. Algorithm L EARN MN
Input : database
MNs are unable to encode V-structure informa- Output : moral graph G = (V, E2 )
tion since it is a directed notion. As a DAG with- 1. E1
out V-structure is chordal (i.e. triangulated), it is not 2. foreach edge (X, Y ) 6 E1 do
surprising that a result (Pearl et al., 1989) states that search, if necessary, a new SXY
s.t. X (V,E1 ) Y | SXY
independences of a family of distributions can be
alization. The moral graph of a DAG is the undi- 7. if (X, Y ) E2 s.t. X Y |SXY then
rected graph obtained by first adding edges between 8. remove edge (X, Y ) from E2 and go to line 6
9. return G = (V, E2 )
non adjacent nodes with a common child (i.e. the
extremities of V-structures) and then removing the End of algorithm
arcs orientations. It is easily seen that the moral Unlike most works where learning a MN amounts
graph of a BN is an optimal undirected I-map of this to independently learn the Markov blanket of all the
graph, even if some independences of the BN have variables, we construct a MN in a local search man-
been necessarily loosed in the transformation. The ner. Algorithm L EARN MN consists in two con-
converse, i.e. getting a directed I-map from a MN, secutive phases: the first one adds edges to the
is less easy and will be addressed in Section 4. empty graph until it converges toward a graph G 1 =
3. If X
9. foreach pair (X, S) in Nij do
4. Else Return G 0 = (V, E)
10. add (X, min(Ci Ck , S)) to N
11. call Distribute (Ck , Ci , N) End of algorithm
12. N0 message Mik sent during collect
13. foreach pair (X, S) in N0 do Before stating the properties of MN2BN, let us
14. add (X, min(Ci Ck , S)) to Nij illustrate it with an example. Consider the graphs in
15. done Figure 2. Assuming a DAG-isomorph distribution,
End of algorithm B represents the optimal BN. Suppose we obtained
the optimal MN G : we now apply MN2BN to it. In
(E, {C})
G = G , F and R belong to a single clique. Assume
C CE
(A, {B}) MN2BN chooses F . After saving its neighbors set
(A, {B}) in G (l.4), F is removed from G (l.7) as well as its
AB B BCD
(A, {B}), (E, {C}) (A, {B}), (E, {C}) adjacent edges (l.6), which results in a new current
BD BDF B BG graph G = G1 . The empty DAG B is altered by
(G, {B}), (F, {B, D}) (G, {B}) adding arcs from these neighbors toward F , hence
resulting in the next current DAG B = B 1 . By di-
Figure 1: Messages sent during Collect/diffusion. recting F s adjacent arcs toward F , we may create
H J H J H J H J H H
K L K L K L K K K K
R R
G G1 G2 G3 G4 G5 G6
C F C F C F C F C F C F C F
H J H J H J H J H J H J H J
K L K L K L K L K L K L K L
R R R R R R R
B B1 B2 B3 B4 B5 B6 = B 0
V-structures, hence some of the edges remaining in DAG B, its moral graph G and a node X belonging
G between F s neighbors may correspond to mor- to a single clique of G, there exists a DAG B 0 s.t.: i)
alization edges and may thus be safely discarded. B 0 is an I-map of B, ii) G is the moral graph of B 0
To this end, for each candidate edge, D EMORALIZE and iii) X has no child in B 0 . Finally, we prove the
tests whether its deletion would introduce spurious proposition by induction on V.
independence in the DAG by searching a separator
B0 encodes at least as many independences as G
in the current remaining undirected graph G. If not,
and possibly more if some moralization edges have
we discard it from G. In our example, there is only
been discarded. In the case where MN2BN selects
one candidate edge, (C, J). We therefore compute
nodes in the inverse topological order of B , B0 is
a set separating C from J in G1 without (C, J), say
actually B . However, this should not happen very
J | K. As this independence
K, and test if C
often and recovering B requires in general to refine
does not hold in a distribution exactly encoded by
B0 by identifying all remaining hidden V-structures.
B , the test fails and the edge is kept. For the second
Under DAG-isomorph distribution and sound statis-
iteration, only node R belongs to a single clique.
tical tests, the second phase of GES (Chickering,
The process is similar but, this time, moralization
2002) is known to transform an I-map of the genera-
edge (K, L) can be discarded. The algorithm then
tive distribution into B . Applied to B0 , it can there-
goes on similarly and finally returns B 6 = B0 . It
fore be used as the refinement phase of our algo-
is easily seen that B0 is a DAG which is an I-map
rithm. This one has an exponential complexity but
of B and that its moral graph is G . Note that by
benefits from an anytime property, in the sense that
mapping G into B6 , not all moralization edges have
it proceeds by constructing a sequence of BNs shar-
been identified. For instance, (C, F ) was not.
ing the same moral graph and the quality of which
Now, let us explain why MN2BN produces
converges monotonically from B0 to B . This last
DAGs closely related to the optimal BN we look for.
phase preserves the GES score-based optimality.
Proposition 2. Assuming a DAG-isomorph distri- With real-world databases, L EARN MN can fail
bution and sound statistical tests, if MN2BN is ap- to recover precisely G . In this case, departing from
plied to G , the returned graph B0 is a DAG that is our theoretical framework, we lose our guarantee
an I-map of B , and its moral graph is G . More- of accuracy. We also lose the guarantee to always
over, the algorithm runs in polynomial time. find a node belonging to a single clique. However,
Sketch of proof. By induction on V, we prove that, MN2BN can easily be modified to handle this situ-
at each iteration, G is the moral graph of a subgraph ation: if no node is found on line 3, just add edges to
of B . This entails the existence of a node in a single G so that a given node forms a clique with its neigh-
clique at each iteration. Then we prove that given a bors. Whatever the node chosen, B0 is guaranteed
alarm (65) barley (126) carpo (102) hailfinder (99) insurance (70) mildew (80) water (123)
size Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev Avg. Stdev
20000 GES + 13.6 2.69 26.6 3.85 17.6 5.14 17.9 2.34
GES - 39.7 1.1 38.7 4.22 44.4 3.17 94.6 2.15
LMN + 2.1 1.3 15.7 1.1 9.4 1.28 9.7 3 0.4 0.49 3.2 0.75 0 0
LMN - 9.6 1.11 60.1 2.02 23.2 1.89 10.1 0.94 15.5 1.63 22.3 1.55 52.3 1.85
K2 + 3.2 0.75 3 0 3.1 1.58 0 0 0 0 1 0 0 0
K2 - 19.7 1.79 87.8 0.6 31.1 2.43 26 0 19.3 1.1 59 0 71.6 2.58
2000 GES + 14 0.63 11.8 3.6 11.8 2.32 3.9 1.14
GES - 62 0.77 66.4 2.2 59.2 1.47 110.3 1.73
LMN + 3.4 0.8 13.3 2.19 7.5 3.35 2.2 1.17 0.3 0.46 4.5 1.2 0.2 0.4
LMN - 21.3 2.33 79 1.73 33.3 2.61 15.4 1.28 24.9 2.12 40.4 1.2 75 3.87
K2 + 3.2 0.6 3.1 0.3 2.3 0.78 0 0 0 0 0 0 0 0
K2 - 34.9 1.37 98.3 0.78 45.7 3.72 50.2 1.83 38.2 1.47 64.4 1.28 98.2 2.48
500 GES + 3.9 1.64 10.3 2.28 5.3 1 0.5 0.5
GES - 65 0 89.1 2.07 65.1 1.14 118.2 1.25
LMN + 7.1 2.26 8.2 1.25 5.9 2.12 2 1.26 1.1 0.94 4.4 1.5 0.4 0.92
LMN - 31.5 2.01 90.2 1.08 54.2 2.52 33.6 3.07 34.7 1.68 56.8 0.98 88.3 2.24
K2 + 3.7 0.78 3.4 0.49 4.5 2.29 0.1 0.3 0.3 0.46 0 0 0 0
K2 - 42.7 1 105.2 0.98 60.8 2.44 65.5 1.2 47.2 0.98 74.8 0.75 101.6 1.2
Spiegelhalter. 1999. Probabilistic Networks and Ex- P. Munteanu and M. Bendou. 2001. The EQ frame-
pert Systems. Springer-Verlag. work for learning equivalence classes of Bayesian net-
works. In Proceedings of the IEEE International Con-
S. van Dijk, L. van der Gaag, and D. Thierens. 2003. ference on Data Mining, pages 417424.
A skeleton-based approach to learning Bayesian net-
works from data. In Proceedings of PKDD. I. Murray and Z. Ghahramani. 2004. Bayesian learning
in undirected graphical models: Approximate MCMC
F. van den Eijkhof, H.L. Bodlaender, and A.M.C.A. algorithms. In Proceedings of UAI.
Koster. 2002. Safe reduction rules for weighted
treewidth. Technical Report UU-CS-2002-051, J. Pearl, D. Geiger, and T.S. Verma. 1989. The logic of
Utrecht University. influence diagrams. In R.M. Oliver and J.Q. Smith,
editors, Influence Diagrams, Belief Networks and De-
J. Flores, J. Gamez, and K. Olesen. 2003. Incremental cision Analysis. John Wiley and Sons.
compilation of bayesian networks. In Proceedings of
UAI, pages 233240. J. Pearl. 1988. Probabilistic Reasoning in Intelligent
Systems: Networks of Plausible Inference. Morgan
S.B. Gillispie and M.D. Perlman. 2001. Enumerating Kaufman.
Markov equivalence classes of acyclic digraphs mod-
els. In Proceedings of UAI, pages 171177. P. Prez. 1998. Markov random fields and images. CWI
Quarterly, 11(4):413437.
D. Heckerman, D. Geiger, and D.M. Chickering. 1995.
Learning Bayesian networks: The combination of R.W. Robinson. 1973. Counting labeled acyclic di-
knowledge and statistical data. Machine Learning, graphs. In F. Harary, editor, New directions in Graph
20(3):197243. Theory, pages 239273. Academic Press.
P. Spirtes, C. Glymour, and R. Scheines. 2000. Causa-
F.V. Jensen. 1996. An Introduction to Bayesian Net-
tion, Prediction, and Search. Springer Verlag.
works. Springer-Verlag, North America.
J. Tian, A. Paz, and J. Pearl. 1998. Finding minimal d-
W. Lam and F. Bacchus. 1993. Using causal information separators. Technical Report TR-254, Cognitive Sys-
and local measures to learn Bayesian networks. In tems Laboratory.
Proceedings of UAI, pages 243250.
T.S. Verma and J. Pearl. 1990. Causal networks : Se-
A.L. Madsen and F.V. Jensen. 1999. Lazy propagation: mantics and expressiveness. In Proceedings of UAI,
A junction tree inference algorithm based on lazy in- pages 6976.
ference. Artificial Intelligence, 113:203245.
! "
! $ )
#
! $
%$
! $ $
$ *$
+ $ ,
$- !
+$$.
+
/$ + #
)
#
+
+#
0 $
$
! '
$'
1/$
!
$ 2 # $
+ * $
34456
$
3447- !
.' /
!$$ $
$ /
$'
!
'
/ $ $
+
1
$ '
$ #/$
!
%$$
)$/
# !
$$ $
$ ,
$
/
'
!
$
/$ $
$ !
.
$ '
$
!
! $ . #$
# / '
!
#
!
.
#
'
8
$
/+ #/$ *
$$
:;<;6 $# $
3447- !.
$ . /$
!
'
/
/#
6
$
+#
$
! '.
+
+ +$$ !
. *
$$ / - $.
# '
'
'
# / #$
$ 9 / +
!
/
# / $.
'
$$
0 .
/$ / +
!
$
+#
1$
! /
$ !
' %
.
$+
/$
$
$ +
$
#/$
!
$$ 0 ! / $
.'
#
$
$
# $$ .
/
' $ !
/$
$
$ #/$ +
' $ +
*
$ 9 + +
$ $
34456
$
3447- +
+
$
/#
$
/ ! #/$ #
$#
.
$$
+
!
/$ . /
$$ $
$ $ $
$ $ ' 1/$ !
/ $$
/$ * $
.0 $ !
!
'
34446 $
3444- #/$ +
& $
$
$
+ .
+ + /
! $
#
$ $
' , %' :
+ 0 $
!
$
' '
1
!
$ + 1
$
' !
$$
+ %
$ +
* $
34456
$
3 =
+ 1 $ $ 3447- / !$ !
!
$ !
! . , '' +
> /
+
$ * $
3444-
0
$ !
5
#
. $'
! 9
!
$
1' $ #/$
! !
$ 9
%$$
7 ? !.
$
/
+
$
* $
3444-
0
$ + $ $ $ 0
!
#/$
/ !.
+
/# ' / !
,$ 0$
$
.'
$
$ * %$$
+ + /$
/#
&@().
* $
34456
$' /
! #
*+
$
3447-- / +
$
/# /
/# #/$ -
! #
$# #/$
'
'
/
# /
!
$$
+'
A
+
!
/# #/$
: !$$
! #/$ *$'
/.
$
*-
E
'
$ $ !
/
$ #/$ $ #/$ 9
#
+
2
$
/ *- * $
34456
$
3447-
''
+ $$
+
A
*
$$
:;<;-
' / . #/$ + 9 $ $
/#
/ *(- #/$
$$
/$
* $
34446 $
3444- !
$ $ 9
0' #
$#' $
.
3 #$ '
#/$ $ + $ #/$
*
$$
:;<;6
! #$ $ . $# $
3447-
!
'
$ #/$
$ C. #/$
$$
+'
$ .
/D *
-
$
.
/
'.
$
E F F
B
*:-
"
0 $
$
+ %'.
= / $$
. :
$ / $ $
#/$ +
/$
!$$ '.
/
!
.
#.
'
$ !
$
! #
B
* - /
* - B
* - $$ / ! ''
$
#/$ * - +
>
/
!
/# 6 '
!
2
'
! #/$ !
/#
$
/# #/$ / : !/$ /
$$
9
+
$
/# %
$
/$
!
'
! ' '
!
e2 x2 e4 x4 e5 x5 e2 x2 e4 x4 e5 x5
2 4 2 3 2 3
e6 x6 -8 e1 x1 6 12 -8 e1 x1
3 -5 -7 -5
e8 x8 e3 x3 e7 x7 e'8 x8 e3 x3
%' :A
0 $
! $ #/$ &@()
$ '
!
$$
+' '
A AB
AB
AB
AB 3 F > F
AB 3 F = F
AB =
F
AB < 5 F
AB ? F *
$$
/
'$- C/D #/$ + $$
$ !
/
* - #/$
+ 6
/# #
!
$ #$
!
$
$
'
+
*-
/#
$$ $$ ,#$
+
+ $$ $$ $# #/$ # / $1 + 0 !
$
!
2 !
$
! $$ $#
!
#/$
# $# #/$ ' !
$.
+ $# +
'
$A $
+'
A
1' $ $
/+
/. +
$ #/$ &@()
$
# #/$
!
$ ! .
$$ $# #/$ /
*-
!
/# #
.
+ $ G
! $ !
+
$ $
2 #
$
/ ' /
/.
#
$ $
AB 3 F > F
+
$ #/$ &@()
$
AB =
F
!
$
$'
!
/#
$$ ,#$ $$
$ 9
!
/# #/$
AB =*3 F > F
- F
/# #/$ $ !
+
.
B 7 F :3 F =
F $
! $ *3444- $
B 7 F :3 F 9 '# /
*
* --
+ B 2 $ !
$$
+ + # $1 B =
F
! +
$ ,
+ +
# #/$
/ ' /
/#.
!
$
/ $
$
$$ 0
+
/# $
%$$
+ 1
/
'$
$ $. $ #/$ &@()
$ + $.
$'
$ #/$
*
-
+ %' :/ @
#/$ $ +
$ * .
/ $1 + +
'' - %
$
' 9 $
/# #/$ #
! $
#/$ ! +
' /
! #/$
!$$
1 B F +
/
! (
! "#
0
$ / .
$
+ '$ !
+
$
$ % $
$
' *-
! #/$
$#'
!
/ B + B * -
.
$ $ C0' 0D %
!
# $ #/$ &@()
$
/#
$$ $$
,#$
$
$
+ +
$
! '$
$
$
+
+
+
!
'.
$ @0
' !$$
! #/$
/ /$ $ #/$ &@() $$ $ #/$
1 *
xk(i) 0
xk(i) xk(i) xi
i J i J i J
%' 3A
! !$$ " 0 !
$
$ * 0 $
= 7
/# #/$-
+
#/$
@
!
$
$ $
! 0
!
/0
/
+
/# #/$
#$ +
'
+
+ +
/ (.
/ 1
+
/# $ #/$
!
+ # $
/
!
0
+ *-
"
/# /
0
$
+
* !
-
+ $
!
/# #/$
/# 0' 0
+
!
$
$
- !
$$
+ / $$
/# #/$
$'
!
$
!
* $
-
$
!
!$$ 0
+ %' 3 $
'
$'
6
$
/$
!
/'
+
$
/#
! !$$ 0
+ /$
/ '
! G
$
$ + /.
! / #/$
#
$
+ %' 3
$ '
!
'
/# " / 0 #/$
! #/$
$$
! !
#' '
!
!
!$$ 0 $$ ! . #/$
/# #/$
$
!
, + +
$
$
$ / . # /$
$
#.
!
/# / 0 /$
/ #/$
'
+
+
!
$' $
! 1
$
/# / ! !
$ #
I
# *344>-
/$
'$ $
# $
+
#
$ "
., !'
$A 2 .
$ 1/$ '#
'
+ +
$
!
#$ $'
#$/$ !
#/$ @
+#
' *)
$ $
:;;?6
:;;;6
$ !
'
$$ %'.
#H $
344:-
3
/
!
/
+
/
!
/.
/
* '
$
! !
''
"
#
$
$
$
$
'
/ $
$
$$
#
$
% % & " $
9 + $ !
!$$
! /
#
!
"# 0 %
%
"$ %
!
@#$
# / !$$ $
.
% # $
$
*
5- #
$ & '
"# 1
$
' !
$$ $
$ B
"# 23 3 2 # !
0# $
.
%
%
*- #! $'
/
*-
#/$
! $'
!
$ & '
" #
3 $' +
*-
#
$
! $' $$ .
$ #/$ &@()
$ !
%$$
+$$.
)$/
!
$$
!
0 #$/$
$$
+
9
$$ $ #!
$
+ # 0 + !
$
/ / #
" / 0
+ $ $. ' $
$
!
!
$ / %
+ / / ''
!
#/$ $'
*$$- .
$ #/$ &@()
$
' $$ $
!
/ !
$
$$'
'
.
' $'
$ " / 0 !
!'
$/$ %
!
!
/#
$$ ,#$
$
$
+
$
!
$$ ' $'
$$
+$$ '#
$$ !.
! +
0 ''
$
$#'
/$ ,
#
!
!
!
+ #
' $'
*I!
1.287 -0.03126 x1 x1
-1.785
connections between observed variables
x1 connections to/from hidden variables
%' =A
$ ' $ #/$ &@()
$
$
$
+ $$
/#
$$ ,#$
+
*-
/#
$$
,#$ */ $$
,#$-
$ %
" / +
$ *- '
/$
' /+
$ */- *- 0 !
$
'$
$ #/$ $
! :444 #
+
+ $ #/$
! %' >
+ $ $ $'
.
/ ' +
$ # 1
! .
! 0
! / 0A +
* +
,$ .
2 '
&@()
$
!
1/$- +
$
/
/$ +
.
$
$$
' " #
$
$
/ 0
$ 34 9
'
/# #/$
CD
!
! .
>3
!
!
/$ ''
$
! !
.
$ + $
* $ -
'$ ) +
/!
!+
.
$ 2
/
$'
'
/ / $
$
$1
!
$
/$
!
/$
/.
!$ $' +
. $
! $/$ '
#
.
$ $
''
$
. $ " / 0 + /
! .
'
#
!
./ $ #/$ /
! .
# $'
* $
34446 $
/ #/$
+ " $.
3444-
$1
!
$ . '
*)
$ $
:;;?6
:;;;-
/# +
!
# $$
/
%$$
+ !
$$.$
.
+$'
#$/$ !
.
! $'
$ !
$ '
#
$ " / $/$
2 0.
!.' !+
$ '
*)
$ $
:;;?6
:;;;-
. %
$
+ / /$
$# $.
" /
/$ #$ $'
/$ # + .
$
/$
$
+
#/$
$
# $$ +
*=
/#
. + ' / $$
%' >A
$ ' $ #. # $% &
%
/$ &@()
$
'
'
(
)*+&
$ !
@
, - ./
+
'
! G $%
/# #/$
!
$$
+ . **
/ *-
$ * #H 2:
# 8
* ,%
>
3444-
$
.
2
3
/
%=
/
/
/
! 3 %>
=
$
+.
$ / 2 3
3/
*6+7+
!
$$
+
$
#$ +*
/ ' $$ / !
' 2:
3
, 8? *
" $'
* #H $
344:-
$.
!"
'
!
#
$ " +
$
$ , @3
=A $ , B +
/ @C%3% 0
/
=
23 > 3> %C3
%
( !
)
#!*+,' >
)+)*
2 # $
+
+
$ @3 B
%
$
$ + /
#
$#
.'
!
' #
*
"- . ) /
)
( $%/>
D2
#
#/$ 0 * $
34456
$
3447-
/
+ # . " "%93 2:
; # 8
/ 9
!
!
' $ #. * E2
=>3
3
%
3>
$
$0 (
/$
+
+
! 1
"
!
2
#1!$%%3'
$
+
/# . >
*) ,/3> "
$
!
+ !$$ )$/
#$/$
- "2 - "
$ B%3 # "
1 #/$
!
% *) F
>
33
2=
+
#$
!+
/
%
4 ( .
/
$
.
. .$
Abstract
Causal independence modelling is a well-known method both for reducing the size of prob-
ability tables and for explaining the underlying mechanisms in Bayesian networks. In this
paper, we propose an application of an extended class of causal independence models,
causal independence models based on the symmetric Boolean function, for classification.
We present an EM algorithm to learn the parameters of these models, and study con-
vergence of the algorithm. Experimental results on the Reuters data collection show the
competitive classification performance of causal independence models based on the sym-
metric Boolean function in comparison to noisy OR model and, consequently, with other
state-of-the-art classifiers.
Pr(e+ | C1 , . . . , Cn ) Pr(e+ | C1 , . . . , Cn )
X n
Y X X n
Y
= Pr(Hi | Ci ). (1) = Pr(Hi | Ci ).
f (H1 ,...,Hn )=> i=1 0 l n l (H1 ,...,Hn ) i=1
l
In this paper, we assume that the function f in
Equation (1) is a Boolean function. However, Let l denote the number of successes in n
n
there are 22 different n-ary Boolean functions independent trials, where pi is a probability
(Enderton, 1972; Wegener, 1987); thus, the po- of success in the ith trial, i = 1, . . . , n; let
tential number of causal interaction models is p = (p1 , . . . , pn ), then B(l; p) denotes the Pois-
huge. However, if we assume that the order of son binomial distribution (Le Cam, 1960; Dar-
D. Heckerman and J.S. Breese. 1994. A new look at J. Vomlel. 2006. Noisy-or classifier. International
causal independence. In Proceedings of the Tenth Journal of Intelligent Systems, 21:381398.
Conference on Uncertainty in Artificial Intelli-
I. Wegener. 1987. The Complexity of Boolean Func-
gence, pages 286292.
tions. John Wiley & Sons, New York.
M. Henrion. 1989. Some practical issues in con-
N.L. Zhang and D. Poole. 1996. Exploiting causal in-
structing belief networks. Uncertainty in Artificial
dependence in Bayesian networks inference. Jour-
Intelligence, 3:161173.
nal of Artificial Intelligence Research, 5:301328.
R. Jurgelenaite, T. Heskes and P.J.F. Lucas. 2006.
Noisy threshold functions for modelling causal
independence in Bayesian networks. Technical
Abstract
While quantitative probabilistic networks (QPNs) allow the expert to state influences
between nodes in the network as influence signs, rather than conditional probabilities,
inference in these networks often leads to ambiguous results due to unresolved trade-offs
in the network. Various enhancements have been proposed that incorporate a notion
of strength of the influence, such as enhanced and rich enhanced operators. Although
inference in standard (i.e., not enhanced) QPNs can be done in time polynomial to the
length of the input, the computational complexity of inference in such enhanced networks
has not been determined yet. In this paper, we introduce relaxation schemes to relate
these enhancements to the more general case where continuous influence intervals are
used. We show that inference in networks with continuous influence intervals is NP -hard,
and remains NP -hard when the intervals are discretised and the interval [1, 1] is divided
into blocks with length of 14 . We discuss membership of NP, and show how these general
complexity results may be used to determine the complexity of specific enhancements to
QPNs. Furthermore, this might give more insight in the particular properties of feasible
and infeasible approaches to enhance QPNs.
Example 1. (3satex )
U = {u1 , u2 , u3 , u4 }, and C = {(u1 u2 u3 ),
(u1 u2 u3 ), (u2 u3 u4 )} Figure 4: Variable gadget Vg
This instance is satisfiable, for example with that an -operation with [1, 0] will transform
the truth assignment u1 = T , u2 = F , u3 = F , a value of [1, 12 ] in [ 12 , 1] and vice versa, and
and u4 = T . [0, 1] will not change them. We can therefore
regard an influence F [1,0] as a negation of the
4.1 Construct for our proofs truth assignment for that influence. Note, that
For each variable in the 3sat instance, our net- [1, 1] will stay the same in both cases.
work contains a variable gadget as shown If we zoom in on the clause-network in figure
in figure 4. After the instantiation of node 6, we will see that the three incoming vari-
I with [1, 1], the influence at node D equals ables in a clause, that have a value of either
[ 12 , 1][ 12 , 12 ][1, 21 ], which is either [1, 12 ], [1, 12 ], [ 12 , 1], or [1, 1], are multiplied with
[ 12 , 1] or [1, 1], depending on the order of the arc influence Fi,j to form variables, and then
evaluation. We will use the non-associativity combined with the instantiation node (with a
of the -operator in this network as a non- value of [1, 1]), forming nodes wi . Note that
deterministic choice of assignment of truth val- [1, 12 ] [1, 1] = [1, 1] [1, 1] = [0, 1] and that
ues to variables. As we will see later, an eval- [ 12 , 1] [1, 1] = [ 12 , 1]. Since a [1, 1] result in
uation order that leads to [1, 1] can be safely the variable gadget does not change by multi-
dismissed (it will act as a falsum in the clauses, plying with the Fi,j influence, and leads to the
making both x and x false), so we will concen- same value as an F variable, such an assignment
trate on [ 21 , 1] (which will be our T assign- will never satisfy the 3sat instance.
ment) and [1, 12 ] (F assignment) as the two The influences associated with these nodes wi
possible choices. are multiplied by [ 12 , 1] and added together in
We construct literals ui from our 3sat in- the clause result Oj . At this point, Oj has a
stance, each with a variable gadget Vg as value of [ k4 , 1], where k equals the number of
input. Therefore, each variable can have a literals which are true in this clause. The con-
value of [1, 21 ] or [ 12 , 1] as influence, non- secutive adding to [ 41 , 1], multiplication with
deterministicly. Furthermore, we add clause- [0, 1] and adding to [1, 1] has the function of a
networks Clj for each clause in the instance logical or -operator, giving a value for Cj of [ 43 , 1]
and connect each variable ui to a clause-network if no literal in the clause was true, and [1, 1] if
Clj if the variable occurs in Cj . The influence one or more were true.
associated with this arc (ui , Clj ) is defined as We then combine the separate clauses Cj into
F [p,q] (ui , Clj ), where [p, q] equals [1, 0] if ui a variable Y , by adding edges from each clause
is in Cj , and [0, 1] if ui is in Cj (figure 5). Note to Y using intermediate variables D1 to Dn1 .
C1 C2 Ci Cn
u1 u2 u3 u4
F1,2 F2,1 F2,3 F3,1
F3,3
F1,1 F2,2 F3,2 F4,3 [ 12 , 1] [ 12 , 1] [ 12 , 1] [ 12 , 1]
D1
Cl3
[ 12 , 1]
Cl1 Cl2
u1 u2 u3 I
F2,j F3,j Y
F1,j
w3
Figure 7: Connecting the clauses
w2
w1
4.2 NP-hardness proof
[ 12 , 1] [ 12 , 1] [ 12 , 1] Using the construct presented in the previous
section, the computational complexity of the
iPieqnetd can be established as follows.
Oj
Theorem 1. The iPieqnetd problem is NP-
hard.
Proof. To prove NP -hardness, we construct a
[ 14 , 1] transformation from the 3sat problem. Let
Oj0
(U, C) be an instance of this problem, and let
[0, 1] Q(U,C) be the interval-based qualitative proba-
bilistic network constructed from this instance,
Cj
as described in the previous section. When the
Clj node I Q is instantiated with [1, 1], then I
has an influence of [1, 1] on Y (and therefore an
Figure 6: The clause construction influence on Y 0 which is a true subset of [1, 1])
if and only if all nodes Cj have a value of [1, 1],
i.e. there exists an ordering of the operators
The use of these intermediate variables allows in the variable-gadget such that at least one
us to generalise these results to more restricted literal in C is true. We conclude that (U, C)
cases. The interval of these edges is [ 12 , 1], lead- has a solution with at least one true literal in
ing to a value of [1, 1] in Y if and only if all each clause, if and only if the iPieqnetd prob-
clauses Cj have a value of [1, 1] (see figure 7). lem has a solution for network Q(U,C) , instanti-
Acknowledgements
This research has been (partly) supported by
the Netherlands Organisation for Scientific Re-
search (NWO).
The authors wish to thank Hans Bodlaender
and Silja Renooij for their insightful comments
on this subject.
References
G.F. Cooper. 1990. The Computational Complex-
ity of Probabilistic Inference using Bayesian Belief
Networks. Artificial Intelligence, 42(2):393405.
Abstract
Explanation of reasoning in expert systems is necessary for debugging the knowledge base,
for facilitating their acceptance by human users, and for using them as tutoring systems.
Influence diagrams have proved to be effective tools for building decision-support systems,
but explanation of their reasoning is difficult, because inference in probabilistic graphical
models seems to have little relation with human thinking. The current paper describes
some explanation capabilities for influence diagrams and how they have been implemented
in Elvira, a public software tool.
2 Influence diagrams
2.1 Definition of an ID
An influence diagram (ID) consists of a directed
acyclic graph that contains three kinds of nodes:
chance nodes VC , decision nodes VD and util-
ity nodes VU see Figure 1. Chance nodes rep-
resent random variables not controlled by the
decision maker. Decision nodes correspond to
actions under the direct control of the deci-
sion maker. Utility nodes represent the deci-
sion makers preferences. Utility nodes can not
be parents of chance or decision nodes. Given
that each node represents a variable, we will use
the terms variable and node interchangeably. Figure 1: ID with two decisions (ovals), two
In the extended framework proposed by chance nodes (squares) and three utility nodes
Tatman and Shachter (1990) there are two (diamonds). There is a directed path including
kinds of utility nodes: ordinary utility nodes, all the decisions and node U0 .
whose parents are decision and/or chance nodes,
and super-value nodes, whose parents are util- Standard IDs require that there is a directed
ity nodes, and can be in turn of two types, sum path that includes all the decision nodes and
and product. We assume that there is a utility indicates the order in which the decisions are
node U0 , which is either the only utility node or made. This in turn induces a partition of
a descendant of all the other utility nodes, and VC such that for an ID having n decisions
therefore has no children.2 {D0 , . . . , Dn1 }, the partition contains n + 1
There are three kinds of arcs in an ID, de- subsets {C0 , C1 , ..., Cn }, where Ci is the set
pending on the type of node they go into. Arcs of chance variables C such that there is a link
into chance nodes represent probabilistic depen- C Di and no link C Dj with j < i; i.e.,
dency. Arcs into decision nodes represent avail- Ci represents the set of random variables known
ability of information, i.e., an arc X D means for Di and unknown for previous decisions. Cn
that the state of X is known when making deci- is the set of variables having no link to any deci-
2 sion, i.e., the variables whose true value is never
An ID that does not fulfill this condition can be
transformed by adding a super-value node U0 of type known directly.
sum whose parents are the utility nodes that did not Given a chance or decision variable V , two
have descendants. The expected utility and the optimal
strategy (both defined below) of the transformed dia- decisions Di and Dj such that i < j, and two
gram are the same as those of the original one. links V Di and V Dj , the former link
Decision analysis with influence diagrams using Elvira's explanation facilities 181
The evaluation of an ID consists in finding X T
the MEU and an optimal strategy. It can be
proved (Cowell et al., 1999; Jensen, 2001) that Y
X X X
MEU = max . . . max D
d0 dn1
c0 cn1 cn
U1 U2
P (vC : vD )(vC , vD ) . (8)
U0
For instance, the MEU for the ID in Figure 1,
assuming that U0 is of type sum, is
Figure 2: Cooper policy network (PN) for the
X X ID in Figure 1.
MEU = max max P (x) P (y|t, x)
t d
y x
(U1 (x, d) + U2 (t)) . (9) PCP N (+u|r) = norm U (EUU (, r)) . (12)
| {z }
U0 (x,d,t) This equation allows us to compute EUU (, r)
as norm 1U (PCP N (+u|r)); i.e., the expected util-
2.3 Cooper policy networks
ity for a node U in the ID can be computed from
When a strategy = {PD |D VD } is de- the marginal probability of the corresponding
fined for an ID, we can convert this into a node in the CPN.
Bayesian network, called Cooper policy net-
work (CPN), as follows: each decision D 3 Explanation of influence diagrams
is replaced by a chance node with probabil- in Elvira
ity potential PD and parents IPred (D), and
3.1 Explanation of the model
each utility node U is converted into a chance
node whose parents are its functional predeces- The explanation of IDs in Elvira is based, to a
sors (cf. Sec. 2.1); the values of new chance great extent, on the methods developed for ex-
variables are {+u, u} and its probability is planation of Bayesian networks (Lacave et al.,
PCP N (+u|FPred (U )) =norm U (U (FPred(U ))), 2000; Lacave et al., 2006b). One of the meth-
where norm U is a bijective linear transforma- ods that have proven to be more useful is the
tion that maps the utilities onto the interval automatic colorings of links. The definitions in
[0, 1] (Cooper, 1988). (Lacave et al., 2006b) for the sign of influence
For instance, the CPN for the ID in Figure 1 is and magnitude of influence, inspired on (Well-
displayed in Figure 2. Please note the addition man, 1990), have been adapted to incoming arcs
of the non-forgetting link T D and that the to ordinary utility nodes.
parents of node U0 are no longer U1 and U2 but Specifically, the magnitude of the influence
T , X, and D, which were chance or decision gives a relative measure of how a variable is in-
nodes in the ID. fluencing an ordinary utility node (see (Lacave
The joint distribution of the CPN is: et al., 2006a) for further details). Then, the in-
fluence of a link pointing to a utility node is
PCP N (vC , vD , vU ) positive when higher values of A lead to higher
Y utilities. Coopers transformation norm U guar-
= P (vC , vD ) PU (u|pa(U )) . (10)
antees that the magnitude of the influence is
U VU
normalized.
For a configuration r defined over a set of vari- For instance, in Figure 1 the link X Y is
ables R VC VD and U a utility node, it is colored in red because it represents a positive
possible to prove that influence: the presence of the disease increases
the probability of a positive result of the test.
PCP N (r) =P (r) (11) The link X U1 is colored in blue because it
Decision analysis with influence diagrams using Elvira's explanation facilities 183
the prior case, i.e., the prevalence of the disease; of the test is irrelevant. The M EU for this de-
the red bar is longer than the green one because cision problem is U1 (+x, +d).
P (+x| + y) > P (+x). The global utility for the In contrast, the introduction of evidence in
second case is 81.05, which is smaller than the Elvira (which may include findings for deci-
green bar close to it (the expected utility for the sion variables as well) does not lead to a new de-
general population) because the presence of the cision scenario nor to a different strategy, since
symptom worsens the prognosis. The red bar the strategy is determined before introducing
for Treatment=yes is the probability that a pa- the evidence. Put another way, in Elvira we
tient having a positive result in the test receives adopt the point view of an external observer of a
the treatment; this probability is 100% because system that includes the decision maker as one
the optimal strategy determines that all symp- of its components. The probabilities and ex-
tomatic patients must be treated. pected utilities given by Equations 2 and 4 are
The possibility of introducing evidence in those corresponding to the subpopulation indi-
Elvira has been useful for building IDs in medi- cated by e when treated with strategy . For
cine: when we were interested in computing the instance, given the evidence {+x}, the probabil-
posterior probability of diagnoses given several ity P (+t| + x) shown by Elvira is the probabil-
sets of findings, we need to manually convert the ity that a patient suffering from X receives the
ID into a Bayesian network by removing deci- test, which is 100% (it was 0% in Ezawas sce-
sion and utility nodes, and each time the ID was nario), and P (+d| + x) is the probability that
modified we have to convert it into a Bayesian he receives the treatment; contrary to Ezawas
network to compute the probabilities. Now the scenario, this probability may differ from 100%
probabilities can be computed directly on the because of false negatives. The expected utility
ID. for a patient suffering from X is
Decision analysis with influence diagrams using Elvira's explanation facilities 185
able for IDs, such as the possibility of handling F. V. Jensen. 2001. Bayesian Networks and Deci-
several evidence cases simultaneously. Another sion Graphs. Springer-Verlag, New York.
novelty of Elvira with respect to most software C. Lacave and F. J. Dez. 2002. A review of explana-
tools for IDs is the ability to include super-value tion methods for Bayesian networks. Knowledge
nodes in the IDs, even when expanding the de- Engineering Review, 17:107127.
cision tree equivalent to an ID. C. Lacave, R. Atienza, and F. J. Dez. 2000. Graph-
We have also mentioned in this paper how ical explanation in Bayesian networks. In Pro-
these explanation facilities, which we have de- ceedings of the International Symposium on Med-
veloped as a response to the needs that we ical Data Analysis (ISMDA-2000), pages 122129,
Frankfurt, Germany. Springer-Verlag, Heidelberg.
have encountered when building medical mod-
els, have helped us to explain the IDs to the C. Lacave, M. Luque, and F. J. Dez. 2006a. Ex-
physicians collaborating with us and to debug planation of Bayesian networks and influence di-
those models (see (Lacave et al., 2006a) for fur- agrams in Elvira. Technical Report, CISIAD,
UNED, Madrid.
ther details).
C. Lacave, A. Onisko, and F. J. Dez. 2006b. Use of
Acknowledgments Elviras explanation facility for debugging prob-
Part of this work has been financed by the abilistic expert systems. Knowledge-Based Sys-
tems. In press.
Spanish CICYT under project TIC2001-2973-
C05. Manuel Luque has also been partially sup- M. Luque, F. J. Dez, and C. Disdier. 2005. In-
ported by Department of Education of the Co- fluence diagrams for medical decision problems:
Some limitations and proposed solutions. In J. H.
munidad de Madrid and the European Social Holmes and N. Peek, editors, Proceedings of the
Fund (ESF). Intelligent Data Analysis in Medicine and Phar-
macology, pages 8586.
F. J. Dez, J. Mira, E. Iturralde, and S. Zubillaga. J. W. Wallis and E. H. Shortliffe. 1984. Cus-
1997. DIAVAL, a Bayesian expert system for tomized explanations using causal knowledge. In
echocardiography. Artificial Intelligence in Medi- B. G. Buchanan and E. H. Shortliffe, editors,
cine, 10:5973. Rule-Based Expert Systems: The MYCIN Ex-
periments of the Stanford Heuristic Programming
F.J. Dez. 2004. Teaching probabilistic medical Project, chapter 20, pages 371388. Addison-
reasoning with the Elvira software. In R. Haux Wesley, Reading, MA.
and C. Kulikowski, editors, IMIA Yearbook of
Medical Informatics, pages 175180. Schattauer, M. P. Wellman. 1990. Fundamental concepts of
Sttutgart. qualitative probabilistic networks. Artificial In-
telligence, 44:257303.
The Elvira Consortium. 2002. Elvira: An environ-
ment for creating and using probabilistic graph-
ical models. In Proceedings of the First Euro-
pean Workshop on Probabilistic Graphical Models
(PGM02), pages 111, Cuenca, Spain.
Abstract
Factorisation of probability trees is a useful tool for inference in Bayesian networks. Prob-
abilistic potentials some of whose parts are proportional can be decomposed as a product
of smaller trees. Some algorithms, like lazy propagation, can take advantage of this fact.
Also, the factorisation can be used as a tool for approximating inference, if the decompo-
sition is carried out even if the proportionality is not completely reached. In this paper we
propose the use of approximate factorisation as a means of controlling the approximation
level in a dynamic importance sampling algorithm.
Dynamic importance sampling in Bayesian networks using factorisation of probability trees 189
W
0 1
X X
0 2 0 2
1 1
0.1 0.2 0.2 0.5 0.2 0.4 0.4 1.0 0.4 0.8 0.8 2.0
W W
0 1 0 1
X X Y 1
0 2 0 2 0 3
1 1 1 2
contain T1 and T2 can be decomposed by factori- In (Martnez et al., 2005) several divergence
sation. Formally, approximate factorisation is measures are considered, giving the optimal
defined through the concept of -factorisability. in each case. In this work we will consider
Definition 1. A probability tree T is - only the measure that showed the best perfor-
factorisable within context (XC = xC ), with mance in (Martnez et al., 2005) that we will
proportionality factors with respect to a di- describe now. Consider a probability tree T .
vergence measure D if there is an xi X and Let T1 and T2 be sub-trees of T below a vari-
a set L X \ {xi } such that for every xj L, able X, for a given context (XC = xc ) with
j > 0 such that leaves P = {pi : i = 1, . . . , n ; pi 6= 0} and Q =
{qi : i = 1, . . . , n} respectively. As we described
D(T R(XC =xC ,X=xj ) , j T R(XC =xC ,X=xi ) ) , before, approximate factorisation is achieved by
replacing T2 by another tree T2 such that T2 is
where T R(XC =xC ,X=xi ) denotes the subtree of proportional to T1 . It means that the leaves of
T which is reached by the branch where vari- T2 will be Q = {pi : i = 1, . . . , n}, where
ables XC take values xC and Xi takes value xi . is the proportionality factor between T1 and T2 .
Parameter > 0 is called the tolerance of the Let us denote by {i = qi /pi , i = 1, . . . , n} the
approximation. ratios between the leaves of T2 and T1 .
Observe that if = 0, we have exact factori- The 2 divergence, defined as
sation. In the definition above, D and =
(1 , . . . , k ) are related in such a way that n
can be computed in order to minimise the value
X (qi pi )2
D (T2 , T2 ) = ,
of the divergence measure D. qi
i=1
Dynamic importance sampling in Bayesian networks using factorisation of probability trees 191
three networks are called Link (Jensen et al.,
1995), Munin1 and Munin2 (Andreassen et al.,
1989). Table 1 displays, for each network, the
number of variables, number of observed vari-
ables (selected at random) and its size. By the
size we mean the sum of the clique sizes that re-
sulted when using HUGIN (Jensen et al., 1990)
for computing the exact posterior distributions
given the observed variables.
Dynamic importance sampling in Bayesian networks using factorisation of probability trees 193
A. Cano, S. Moral, and A. Salmeron. 2002. Lazy I. Martnez, S. Moral, C. Rodrguez, and
evaluation in Penniless propagation over join A. Salmeron. 2005. Approximate factorisation of
trees. Networks, 39:175185. probability trees. ECSQARU05. Lecture Notes
in Artificial Intelligence, 3571:5162.
A. Cano, S. Moral, and A. Salmeron. 2003. Novel
strategies to approximate probability trees in Pen- S. Moral and A. Salmeron. 2005. Dynamic im-
niless propagation. International Journal of Intel- portance sampling in Bayesian networks based on
ligent Systems, 18:193203. probability trees. International Journal of Ap-
proximate Reasoning, 38(3):245261.
J. Cheng and M.J. Druzdzel. 2000. AIS-
BN: An adaptive importance sampling algorithm J. Pearl. 1987. Evidential reasoning using stochas-
for evidential reasoning in large Bayesian net- tic simulation of causal models. Artificial Intelli-
works. Journal of Artificial Intelligence Research, gence, 32:247257.
13:155188.
R.Y. Rubinstein. 1981. Simulation and the Monte
P. Dagum and M. Luby. 1993. Approximating prob- Carlo Method. Wiley (New York).
abilistic inference in Bayesian belief networks is
A. Salmeron, A. Cano, and S. Moral. 2000. Impor-
NP-hard. Artificial Intelligence, 60:141153.
tance sampling in Bayesian networks using prob-
Elvira Consortium. 2002. Elvira: An environ- ability trees. Computational Statistics and Data
ment for creating and using probabilistic graph- Analysis, 34:387413.
ical models. In J.A. Gmez and A. Salmern, edi- R.D. Shachter and M.A. Peot. 1990. Simulation ap-
tors, Proceedings of the First European Workshop proaches to general probabilistic inference on be-
on Probabilistic Graphical Models, pages 222230. lief networks. In M. Henrion, R.D. Shachter, L.N.
Kanal, and J.F. Lemmer, editors, Uncertainty in
K.W. Fertig and N.R. Mann. 1980. An accurate
Artificial Intelligence, volume 5, pages 221231.
approximation to the sampling distribution of the
North Holland (Amsterdam).
studentized extreme-valued statistic. Technomet-
rics, 22:8390. N.L. Zhang and D. Poole. 1996. Exploiting
causal independence in Bayesian network infer-
L.D. Hernandez, S. Moral, and A. Salmeron. 1998. ence. Journal of Artificial Intelligence Research,
A Monte Carlo algorithm for probabilistic prop- 5:301328.
agation in belief networks based on importance
sampling and stratified simulation techniques. In-
ternational Journal of Approximate Reasoning,
18:5391.
INSA Rouen
St. Etienne du Rouvray, France
Abstract
Semi-Markovian causal models (SMCMs) are an extension of causal Bayesian networks for
modeling problems with latent variables. However, there is a big gap between the SMCMs
used in theoretical studies and the models that can be learned from observational data
alone. The result of standard algorithms for learning from observations, is a complete
partially ancestral graph (CPAG), representing the Markov equivalence class of maximal
ancestral graphs (MAGs). In MAGs not all edges can be interpreted as immediate causal
relationships. In order to apply state-of-the-art causal inference techniques we need to
completely orient the learned CPAG and to transform the result into a SMCM by removing
non-causal edges. In this paper we combine recent work on MAG structure learning from
observational data with causal learning from experiments in order to achieve that goal.
More specifically, we provide a set of rules that indicate which experiments are needed
in order to transform a CPAG to a completely oriented SMCM and how the results of
these experiments have to be processed. We will propose an alternative representation
for SMCMs that can easily be parametrised and where the parameters can be learned
with classical methods. Finally, we show how this parametrisation can be used to develop
methods to efficiently perform both probabilistic and causal inference.
L2
(b) (c)
An example causal inference query in the
(a)
SMCM of Figure 1(b) is P (V6 = v6 |do(V2 =
v2 )).
Figure 1: (a) A problem domain represented by Tian and Pearl have introduced theoretical
a causal DAG model with observable and latent causal inference algorithms to perform causal
variables. (b) A semi-Markovian causal model inference in SMCMs (Pearl, 2000; Tian and
representation of (a). (c) A maximal ancestral Pearl, 2002). However these algorithms assume
graph representation of (a). that any distribution that can be obtained from
the JPD over the observable variables is avail-
able C, induces a change in the probability dis- able. We will show that their algorithm per-
tribution of variable E, in that specific context forms efficiently on our parametrisation.
(Neapolitan, 2003).
In a SMCM a bi-directed edge between two 2.3 Maximal Ancestral Graphs
variables represents a latent variable that is a Maximal ancestral graphs are another approach
common cause of these two variables. Although to modeling with latent variables (Richardson
this seems very restrictive, it has been shown and Spirtes, 2002). The main research focus in
that models with arbitrary latent variables can that area lies on learning the structure of these
be converted into SMCMs while preserving the models.
same independence relations between the ob- Ancestral graphs (AGs) are complete under
servable variables (Tian and Pearl, 2002). marginalisation and conditioning. We will only
The semantics of both directed and bi- discuss AGs without conditioning as is com-
directed edges imply that SMCMs are not max- monly done in recent work.
imal. This means that they do not represent all
Definition 4. An ancestral graph without
dependencies between variables with an edge.
conditioning is a graph containing directed
This is because in a SMCM an edge either rep-
and bi-directed edges, such that there is no
resents an immediate causal relation or a latent
bi-directed edge between two variables that are
common cause, and therefore dependencies due
connected by a directed path.
to a so called inducing path, will not be repre-
sented by an edge. Pearls d-separation criterion can be extended
A node is a collider on a path if both its im- to be applied to ancestral graphs, and is called
mediate neighbors on the path point into it. m-separation in this setting. M -separation
characterizes the independence relations repre-
Definition 2. An inducing path is a path in
sented by an ancestral graph.
a graph such that each observable non-endpoint
node of the path is a collider, and an ancestor Definition 5. An ancestral graph is said to
of at least one of the endpoints. be maximal if, for every pair of non-adjacent
Inducing paths have the property that their nodes (Vi , Vj ) there exists a set Z such that Vi
endpoints can not be separated by conditioning and Vj are independent conditional on Z.
on any subset of the observable variables. For A non-maximal AG can be transformed into
instance, in Figure 1(a), the path V1 V2 a MAG by adding some bi-directed edges (in-
L1 V6 is inducing. dicating confounding) to the model. See Figure
SMCMs are specifically suited for another 1(c) for an example MAG representing the same
type of inference, i.e., causal inference. model as the underlying DAG in (a).
cumstances the edge is correct and otherwise it periment on V2 while conditioning on V3 . (f)
can be removed. After resolving all problems of Type 1 and 2.
In the example of Figure 1(c), we can block
the inducing path by performing an experiment of the FCI algorithm can be seen in Figure 3(a).
on V2 , and hence can check that V1 and V6 We will first resolve problems of Type 1 and 2,
do not covary with each other in these circum- and then remove i-false edges. The result of
stances, so the edge can be removed. each step is indicated in Figure 3.
In Table 3 an overview of the actions to re-
solve i-false edges is given. exp(V5 )
After resolving all problems of Type 1 and Taking into account that in a SMCM the la-
2 we end up with the SMCM structure shown tent variables are implicitly represented by bi-
in Figure 3(f). This representation is no longer directed edges, we introduce the following defi-
consistent with the MAG representation since nition.
there are bi-directed edges between two vari- Definition 8. In a SMCM, the set of observ-
ables on a directed path, i.e., V2 , V6 . There is able variables can be partitioned by assigning
a potentially i-false edge V1 V6 in the struc- two variables to the same group iff they are con-
ture with inducing path V1 , V2 , V6 , so we need nected by a bi-directed path. We call such a
to perform an experiment on V2 , blocking all group a c-component (from confounded com-
other paths between V1 and V6 (this is also done ponent) (Tian and Pearl, 2002).
by exp(V2 ) in this case). Given that the orig-
For instance, in Figure 1(b) variables
inal structure is as in Figure 1(a), performing
V2 , V5 , V6 belong to the same c-component.
exp(V2 ) shows that V1 and V6 are independent,
Then it can be readily seen that c-components
i.e., exp(V2 ) : (V1
V6 ). Thus the bi-directed
and their associated latent variables form re-
edge between V1 and V6 is removed, giving us
spective partitions of the observable and la-
the SMCM of Figure 1(b).
tent variables. Let Q[Si ] denote the contribu-
4 Parametrisation of SMCMs tion of a c-component with observable variables
Si V to the mixture of products in Equa-
As mentioned before, in their work on causal in- tion 1. Then Q we can rewrite the JPD as follows:
ference, Tian and Pearl provide an algorithm for P (v) = Q[Si ].
performing causal inference given knowledge of i{1,...,k}
the structure of an SMCM and the joint prob- Finally, (Tian and Pearl, 2002) proved that
ability distribution (JPD) over the observable each Q[S] could be calculated as follows. Let
variables. However, they do not provide a para- Vo1 < . . . < Von be a topological order over V ,
metrisation to efficiently store the JPD over the and let V (i) = Vo1 < . . . < Voi , i = 1, . . . , n, and
observables. V (0) = .
We start this section by discussing the fac- Y
Q[S] = P (vi |(Ti P a(Ti ))\{Vi }) (2)
torisation for SMCMs introduced in (Tian and
Vi S
Pearl, 2002). From that result we derive an ad-
ditional representation for SMCMs and a para- where Ti is the c-component of the SMCM G
metrisation that facilitates probabilistic and reduced to variables V (i) , that contains Vi . The
causal inference. We will also discuss how these SMCM G reduced to a set of variables V 0
parameters can be learned from data. V is the graph obtained by removing from the
graph all variables V \V 0 and the edges that are
4.1 Factorising with Latent Variables connected to them.
Consider an underlying DAG with observable In the rest of this section we will develop
variables V = {V1 , . . . , Vn } and latent variables a method for deriving a DAG from a SMCM.
Abstract
We study partitions of the symmetric group which have desirable geometric properties.
The statistical tests defined by such partitions involve counting all permutations in the
equivalence classes. These permutations are the linear extensions of partially ordered sets
specified by the data. Our methods refine rank tests of non-parametric statistics, such
as the sign test and the runs test, and are useful for the exploratory analysis of ordinal
data. Convex rank tests correspond to probabilistic conditional independence structures
known as semi-graphoids. Submodular rank tests are classified by the faces of the cone of
submodular functions, or by Minkowski summands of the permutohedron. We enumerate
all small instances of such rank tests. Graphical tests correspond to both graphical models
and to graph associahedra, and they have excellent statistical and algorithmic properties.
Abstract
In this paper we compare Nave Bayes (NB) models, general Bayes Net (BN) models and Proba-
bilistic Decision Graph (PDG) models w.r.t. accuracy and efficiency. As the basis for our analysis
we use graphs of size vs. likelihood that show the theoretical capabilities of the models. We also
measure accuracy and efficiency empirically by running exact inference algorithms on randomly
generated queries. Our analysis supports previous results by showing good accuracy for NB mod-
els compared to both BN and PDG models. However, our results also show that the advantage of
the low complexity inference provided by NB models is not as significant as assessed in a previous
study.
Log-Likelihood
-20 -20 -20 -20
-22 -22 -22 -22
-24 -24 -24 -24
-26 -26 -26 -26
-28 -28 -28 -28
-30 -30 -30 -30
0 5000 10000 15000 20000 25000 30000 0 5000 10000 15000 20000 25000 30000
Effective model size Effective model size
King, Rook vs. King - Training King, Rook vs. King - Test
-14.5 -14.5 -16 -16
-15 -15
-16.5 -16.5
-15.5 -15.5
-17 -17
-16 -16
Log-Likelihood
Log-Likelihood
-14 -14
-14 -14
-15 -15
-15 -15
-16 -16
-16 -16
-17 -17 -17 -17
Log-Likelihood
Log-Likelihood
-20 -20
-16 -16
-21 -21
-18 -18
-22 -22
-20 -20
-23 -23
-22 -22 -24 -24
-24 -24 -25 -25
-26 -26 -26 -26
0 5000 10000 15000 20000 25000 30000 0 5000 10000 15000 20000 25000 30000
Effective model size Effective model size
BN NB PDG -H(D) BN NB PDG
2 2 -9 -9
1 1 -10 -10
0 0 -11 -11
0 5000 10000 15000 20000 25000 30000 0 5000 10000 15000 20000 25000 30000
Effective model size Effective model size
Pageblocks Pageblocks
6 6 -4 -4
Empirical Accuracy (log-likelihood)
-4.2 -4.2
5 5
Empirical complexity (ms)
-4.4 -4.4
4 4
-4.6 -4.6
3 3 -4.8 -4.8
-5 -5
2 2
-5.2 -5.2
1 1
-5.4 -5.4
0 0 -5.6 -5.6
0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000
Effective model size Effective model size
Abalone Abalone
3 3 -3.8 -3.8
Empirical Accuracy (log-likelihood)
-3.9 -3.9
2.5 2.5
Empirical complexity (ms)
-4 -4
2 2
-4.1 -4.1
-4.3 -4.3
1 1
-4.4 -4.4
0.5 0.5
-4.5 -4.5
0 0 -4.6 -4.6
0 5000 10000 0 5000 10000
Effective model size Effective model size
18 18
Empirical complexity (ms)
16 16 -4.5 -4.5
14 14
12 12 -5 -5
10 10
8 8 -5.5 -5.5
6 6
4 4 -6 -6
2 2
0 0 -6.5 -6.5
0 25000 50000 75000 0 25000 50000 75000
Effective model size Effective model size
BN NB PDG BN NB PDG
Abstract
When an incremental structural learning method gradually modifies a Bayesian network (BN)
structure to fit observations, as they are read from a database, we call the process structural adap-
tation. Structural adaptation is useful when the learner is set to work in an unknown environment,
where a BN is to be gradually constructed as observations of the environment are made. Existing
algorithms for incremental learning assume that the samples in the database have been drawn from
a single underlying distribution. In this paper we relax this assumption, so that the underlying dis-
tribution can change during the sampling of the database. The method that we present can thus be
used in unknown environments, where it is not even known whether the dynamics of the environ-
ment are stable. We briefly state formal correctness results for our method, and demonstrate its
feasibility experimentally.
Before presenting our method for structural adapta- For a method M to adapt to a DAG faithful sample
tion, we describe the problem more precisely: sequence D wrt. dist then means that M seeks to
We say that a sequence is samples from a piece- minimize its deviance on D wrt. dist as possible.
wise DAG faithful distribution, if the sequence can
be partitioned into sets such that each set is a 4 A Structural Adaptation Method
database sampled from a single DAG faithful dis-
tribution, and the rank of the sequence is the size The method proposed here continuously monitors
of the smallest such partition. Formally, let D = the data stream D and evaluates whether the last,
(d1 , . . . , dl ) be a sequence of observations over say k, observations fit the current model. When this
variables V . We say that D is sampled from a piece- turns out not to be the case, we conclude that a shift
wise DAG faithful distribution (or simply that it is in D took place k observations ago. To adapt to the
a piecewise DAG faithful sequence), if there are in- change, an immediate approach could be to learn a
dices 1 = i1 < < im = l + 1, such that each new network from the last k cases. By following
of Dj = (dij , . . . , dij+1 1 ), for 1 j m 1, this approach, however, we will unfortunately loose
is a sequence of samples from a DAG faithful dis- all the knowledge gained from cases before the last
tribution. The rank of the sequence is defined as k observations. This is a problem if some parts of
minj ij+1 ij , and we say that m 1 is its size and the perfect maps, of the two distributions on each
l its length. A pair of consecutive samples, di and side of the shift, are the same, since in such situ-
di+1 , constitute a shift in D, if there is j such that ations we re-learn those parts from the new data,
di is in Dj and di+1 is in Dj+1 . Obviously, we can even though they have not changed. Not only is this
have any sequence of observations being indistin- a waste of computational effort, but it can also be the
guishable from a piecewise DAG faithful sequence, case that the last k observations, while not directly
by selecting the partitions small enough, so we re- contradicting these parts, do not enforce them ei-
strict our attention to sequences that are piecewise ther, and consequently they are altered erroneously.
DAG faithful of at least rank r. However, we do not Instead, we try to detect where in the perfect maps
assume that neither the actual rank nor size of the of the two distributions changes have taken place,
sequences are known, and specifically we do not as- and only learn these parts of the new perfect map.
sume that the indices i1 , . . . , im are known. This presents challenges, not only in detection, but
The learning task that we address in this paper also in learning the changed parts and having them
consists of incrementally learning a BN, while re- fit the non-changed parts seamlessly. Hence, the
ceiving a piecewise DAG faithful sequence of sam- method consists of two main mechanisms: One,
ples, and making sure that after each sample point monitoring the current BN while receiving obser-
the BN structure is as close as possible to the dis- vations and detecting when and where the model
tribution that generated this point. Throughout the should be changed, and two, re-learning the parts of
paper we assume that each sample is complete, so the model that conflicts with the observations, and
that no observations in the sequence have missing integrating the re-learned parts with the remaining
values. Formally, let D be a complete piecewise parts of the model. These two mechanisms are de-
DAG faithful sample sequence of length l, and let scribed below in Sections 4.1 and 4.2, respectively.
fore call the parameter k the allowed delay of the case, we cannot use that value directly for deter-
method. When the actual detection has taken place, mining whether a shift has occurred. Rather, we
as a last step, the detection algorithm invokes the look at the block of values for the last k cases,
updating algorithm (U PDATE N ET()) with the set and compare these with those from before that. If
of nodes, for which S HIFTI N S TREAM() detected a there is a tendency towards higher values in the for-
change, together with the last k observations. mer, then we conclude that this cannot be caused
only by rare cases, and that a shift must have oc-
Algorithm 1 Algorithm for BN adaption. Takes as curred. Specifically, for each variable X, S HIFT I N -
input an initial network B, defined over variables S TREAM(cX , k) checks whether there is a signif-
V , a series of cases D, and an allowed delay k for icant increase in the values of the last k entries in
detecting shifts in D. cX relative to those before that. In our implementa-
1: procedure A DAPT(B, V , D, k) tion S HIFTI N S TREAM(cX , k) calculates the nega-
2: D [] tive of the second discrete cosine transform compo-
3: cX [] (X V ) nent (see e.g. (Press et al., 2002)) of the last 2k mea-
4: loop
5: d N EXT C ASE (D) sures in cX , and returns true if this statistic exceeds
6: A PPEND(D , (d)) a pre-specified threshold value. We are unaware of
7: S
8: for X V do
any previous work using this technique for change
9: c C ONFLICT M EASURE(B, X, d) point detection, but we chose to use this as it out-
10: A PPEND(cX , c) performed the more traditional methods of log-odds
11: if S HIFT I N S TREAM(cX, k) then
12: S S {X} ratios and t-tests in our setting.
13: D L AST KE NTRIES(D, k)
14: if S 6= then 4.2 Learning and Incorporating Changes
15: U PDATE N ET (B, S, D ) When a shift involving nodes S has been detected,
U PDATE N ET(B, S, D ) in Algorithm 2 adapts the
To monitor how well each observation d BN B around the nodes in S to fit the empirical dis-
(d1 , . . . , dm ) fit the current model B, and espe- tribution defined by the last k cases D read from
cially the connections between a node Xi and the re- D. Throughout the text, both the cases and the em-
maining nodes in B, we have followed the approach pirical distribution will be denoted D . Since we
of Jensen et al. (1991): If the current model is cor- want to reuse the knowledge encoded in B that has
rect, then we would in general expect that the prob- not been deemed outdated by the detection part of
ability for observing d dictated by B is higher than the method, we will update B to fit D based on the
or equal to that yielded by most other models. This assumption that only nodes in S need updating of
should especially be the case for the empty model E, their probabilistic bindings (i.e. the structure asso-
where all nodes are unconnected. That is, in B, we ciated with their Markov boundaries in BD ). Ig-
expect the individual attributes of d to be positively noring most details for the moment, the updating
correlated (unless d is a rare case, in which case all method in Algorithm 2 first runs through the nodes
3. If Rule 1 cannot be applied, and if X Y is a To investigate how our method behaves in practice,
link and there is a directed path from X to Y , we ran a series of experiments. We constructed
then direct the link X Y as X Y . 100 experiments, where each consisted of five ran-
domly generated BNs B1 , . . . , B5 over ten variables,
3
That Rule 1 is sensible is proved in (Nielsen and Nielsen, each having between two and five states. We made
2006). Intuitively, we try to identify a graph fragment for X in sure that Bi was structurally identical to Bi1 ex-
B, that can be merged with the graph fragments learned from cept for the connection between two randomly cho-
D . It turns out that the arcs directed by Rule 1 are exactly
those that would have been learned by PARTIALLY D IRECT(X, sen nodes. All CPTs in Bi were kept the same as
GX , MBB (X), PB ). in Bi1 , except for the nodes with a new parent
Abstract
Lyme disease is an infection evolving in three stages. Lyme disease is characterised by a
number of symptoms whose manifestations evolve over time. In order to correctly classify
the disease it is important to include the clinical history of the patient. Consultations
are typically scattered at non-equidistant points in time and the probability of observing
symptoms depend on the time since the disease was inflicted on the patient.
A simple model of the evolution of symptoms over time forms the basis of a dynamically
tailored model that describes a specific patient. The time of infliction of the disease is
estimated by a model search that identifies the most probable model for the patient given
the pattern of symptom manifestations over time.
identified based on the available evidence. In other manifestations of Lyme disease is more
section 5 a preliminary evaluation of the ap- rare. The diagnosis of lyme arthritis is espe-
proach is described and finally we conclude in cially difficult as the symptoms are similar to
section 6. arthritis due to other causes and because the
disease is rare.
2 Lyme Disease Level
(http://www.oeghmp.at/eucalb/diagnosis case-
definition-outline.html) Lyme disease and its
manifestations are described in three stages as
3 weeks 4-6 weeks 4-6 months 2 years Time
Bite
shown in Table 1.
Erythema migrans is the most frequent man- Figure 1: IgM and IgG development after infec-
ifestation. The rash which migrates from the tion of Lyme disease.
site of the tickbite continuously grows in size.
Erythema migrans is fairly characteristic and
the diagnosis is clinical. Laboratory testing is The epidemiology is complex. For exam-
useless as only half of the patients are positive, ple different age groups are at different risks,
when the disease is localized to the skin only. there is a large seasonsal variation due to vari-
Neuroborreliosis is less frequent, in Denmark ations in tick activity (Figure 2), the incuba-
the incidence is around 3/100.000 per year. The tion period may vary from a few days to sev-
IgM or IgG may be positive in 80% of patients eral years and some clinical manifestations are
with neuroborreliosis, but if the duration of the very rare. There are large variations in in-
the clinical disease is more than two months, cubation time and clinical progression. Some
then 100% of patients are positive. Figure 1 patients may have the disease starting with a
shows that not only the probability of being rash (erythema migrans) and then progressing
positive, but also the level of antibodies vary as to ntaeuroborreliosis, but most patients with
a function of time. Thus the laboratory mea- neuroborreliosis are not aware of a preceeding
surement of IgM and IgG antibodies depends rash. The description of the disease and its pro-
both on the duration of the clinical disease and gression primarily based on (Smith et al., 2003)
the dissemination of the infection to other or- and (Gray et al., 2002).
gan systems than the skin. The incidence of the There are also large individual variations in
Diagnosing Lyme disease - Tailoring patient specific Bayesian networks for temporal reasoning 233
Laboratory
findings
Borrelia
Clinical
findings
IgMTest IgGTest
IgMTest IgGTest IgMTest IgGTest
...
Month of Bite Recall Bite Gender IgM IgG
IgM IgG
IgM IgG
Lyme disease
Lyme disease Lyme disease
Lyme
Tick Activity Tick Bite Tick Bite 2
Disease
Figure 5: The structure of the tailored patient specific model is composed of background knowledge
and consultation modules.
sion of the disease. Further, the model by Fisker ules, where each module describes a consul-
et al. takes the use of data from previous con- tation. The problem is that consultations
sultations into consideration. may appear at random points in time and
Both models implement the possibility of that the conditional probabilities for the symp-
reading the probability of having Lyme disease toms at each consultation vary, depending
in one of the three stages, but with two dif- on the time that has passed since the tick
ferent approaches. The model by Fisker et al. bite. Therefore frameworks such as dynamic
models the stages as states in a single variable, Bayesian networks and Markov chains do not
whereas the model by Dessau and Andersen apply. We introduce continuous time inspired
models the stages explicitly as causally depen- by (Nodelman et al., 2002). The proposed ap-
dent variables. proach involve a model for the conditional prob-
The main problem in both models is the tem- abilities, but before reaching that point we de-
poral progression of Lyme disease. Dessau and termine the structure of the model.
Andersens model gives a static picture of the
current situation, that only takes the total du- 4.1 Structure
ration of the period of illness into consideration. The structure of the patient specific model is
Fisker et al. is more explicit in the modelling illustrated in Figure 5. At the first consulta-
of the progress of the disease by letting states tion the background knowledge module is linked
denote time intervals since a symptom was first to a generic consultation module consisting of
observed and by duplicating variables for labo- the disease node, clinical findings and labora-
ratory findings. This enables inclusion of find- tory findings. In order to keep focus on the
ings from the past, but is still limited to single overall modelling technique we deliberately kept
observations for clinical evidence and two en- the model simple. The consultation module is
tries for laboratory tests. a naive Bayes model, and subsequent consulta-
Besides the difficulties in determining the tions are included in the the model by extended-
structure of the models, a considerable effort ing it with a new consultation module. This
was required to specify the conditional proba- process can be repeated for an arbitrary num-
bilities. ber of consultations. The consultation modules
In the following section we propose a further are connected only through the disease nodes.
elaboration of the modelling by introduction of This is a quite strong assumption that may be
continuous time and an arbitrary number of debated, but it simplifies things and keep the
consultations. complexity of the resulting model linear in the
number of consultations. Less critical is that
4 Tailoring patient specific models
the disease nodes do not take the stage of the
We aim for a Bayesian network that incor- disease into account; we consider that the clas-
porates the full clinical history of individual sification into stages are mostly for descriptive
patients. The model is composed of mod- purposes, but it could be modeled as stages in
Diagnosing Lyme disease - Tailoring patient specific Bayesian networks for temporal reasoning 235
1
the disease node, although this would compli- Age: 020
Age: 2170
0.8
P( EM | Lyme disease)
0.6
Diagnosing Lyme disease - Tailoring patient specific Bayesian networks for temporal reasoning 237
ability of Lyme disease is gradually increased gradually increased from consultation to consul-
for each added consultation, as the general de- tation, whereas it was reduced when the history
velopment pattern of the disease is confirmed in did not confirm the pattern.
typical courses. We conclude that the proposed method seems
In the estimates from the model by Dessau viable for temporal domains, where changing
and Andersen the probability of Lyme disease conditional probabilities can be modeled by con-
decreases as the later stage symptoms are ob- tinuous functions.
served. This reasoning does not seem appropri-
ate when it is known from earlier consultations
that the clinical history points toward a borre-
References
lial infection. The model by Dessau and Ander- [Arroyo-Figueroa and Sucar1999] Gustava Arroyo-
sen is not designed to incorporate the clinical Figueroa and Luis Enrique Sucar. 1999. A
Temporal Bayesian Network for Diagnosis and
history, but from the results it can be seen that Prediction. In Proceedings of the 15th Conference
this parameter is important in order to provide on Uncertainty in Artificial Intelligence, pages
reasonable estimates of the probability of Lyme 1320. ISBN: 1-55860-614-9.
disease. [Dessau and Andersen2001] Ram Dessau and Maj-
Britt Nordahl Andersen. 2001. Beslutningssttte
6 Conclusion med Bayesiansk netvrk - En model til tolkning
af laboratoriesvar. Technical report, Aalborg Uni-
Temporal reasoning is an integral part of medi- versitet. In danish.
cal expertise. Many decisions are based on prog- [Fisker et al.2002] Brian Fisker, Kristian Kirk,
noses or diagnoses where symptoms evolve over Jimmy Klitgaard, and Lars Reng. 2002. Decision
time and the effects of treatments typically be- Support System for Lyme Disease. Technical
come apparent only with some delay. A com- report, Aalborg Universitet.
mon scenario in the general practice is that a [Gray et al.2002] J.S. Gray, O. Kahl, R.S. Lane, and
patient attends the clinic at different times on G. Stanek. 2002. Lyme Borreliosis: Biology, Epi-
a non-regular basis. demiology and Control. Cabi Publishing. ISBN:
0-85199-632-9.
Lyme disease is an infection characterised by
a number of symptoms whose manifestations [Jensen2001] Finn V. Jensen. 2001. Bayesian Net-
evolve over time. In order to correctly classify works and Decision Graphs. Springer Verlag.
the disease it is important to include the clini- ISBN: 0-387-95259-4.
cal history of the patient. We have proposed a [Nodelman et al.2002] Uri Nodelman, Christian R.
method to include consultations scattered over Shelton, and Daphne Koller. 2002. Continuous
time at non-equidistant points. A description Time Bayesian Networks. In Proceedings of the
18th Conference on Uncertainty in Artificial In-
of the evolution of symptoms over time forms telligence, pages 378387. ISBN: 1-55860-897-4.
the basis of a dynamically tailored model that
describes a specific patient. The time of in- [Smith et al.2003] Martin Smith, Jeremy
fliction of the disease is estimated by a model Gray, Marta Granstrom, Crawford Re-
vie, and George Gettinby. 2003. Euro-
search that identifies the most probable model pean Concerted Action on Lyme Borreliosis.
for the patient given the pattern of symptoms http://vie.dis.strath.ac.uk/vie/LymeEU/.
over time. Read 04-12-2003.
Based on a preliminary evaluation it can be
concluded that the method proposed for han-
dling nonequivalent time intervals in order to in-
corporate the clinical history works satisfactory.
In cases where the clinical history confirmed
the general development pattern of Lyme dis-
ease the estimated probability the disease was
Abstract
We study dynamic reliability of systems where system components age with a constant
failure rate and there is a budget constraint. We develop a methodology to eectively
prepare a predictive maintenance plan of such a system using dynamic Bayesian networks
(DBNs). DBN representation allows monitoring the system reliability in a given planning
horizon and predicting the system state under dierent replacement plans. When the
system reliability falls below a predetermined threshold value, component replacements
are planned such that maintenance budget is not exceeded. The decision of which com-
ponent(s) to replace is an important issue since it aects future system reliability and
consequently the next time to do replacement. Component marginal probabilities given
the system state are used to determine which component(s) to replace. Two approaches
are proposed in calculating the marginal probabilities of components. The rst is a myopic
approach where the only evidence belongs to the current planning time. The second is a
look-ahead approach where all the subsequent time intervals are included as evidence.
0.85 B 0.85 B
B B B
0.8 A 0.8
B A
B
Probability
A B
0.75 0.75
Probability
A
0.7 0.7
A A A
0.65 0.65
0.6 0.6
0.55 0.55
0.5
10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100
Time Time
Figure 3: Scenario 2- system reliability and re- Figure 4: Scenario 2- system reliability and re-
placement plan with the myopic approach placement plan with the look-ahead approach
ments while look-ahead approach plans a total k replacements in a reasonable time horizon.
of 16 replacements. Here, k refers to the upper bound of minimum
When the threshold value increases, an inter- replacements given by our DBN algorithm. The
esting question arises: Does it still worth to ac- total number of solutions in a horizon of T pe-
count for future information in choosing which riods with k replacements is given as:
component to replace? Table 3 shows the num-
T
ber of replacements found by the myopic and 2k (8)
k
the look-ahead approaches at various threshold
(L) values for scenario 2. When L = 0.80, my- The rst term is the total number of all possible
opic approach nds fewer replacements than the size-k subsets of a size-T set. This corresponds
look-ahead approach. This is because as thresh- to the total number of all possible time alterna-
old increases, more frequent replacements will tives of k replacements in horizon T . The sec-
be planned, hence focusing on short-term relia- ond term is the total number of all possible re-
bility instead of the future reliability may result placements for two components. Since this solu-
in less number of replacements. tion space becomes intractable with increasing
T and k, a smaller part of the planning horizon
Table 3: Number of replacements for the myopic is taken for enumeration of both scenarios. We
and look-ahead approaches at various threshold started working with T = 50 periods where 2
values for T = 100 repairs are proposed and T = 30 periods where
Threshold Myopic Look-ahead 3 repairs are proposed by our algorithm for sce-
0.50 11 10 nario 1 and scenario 2, respectively. The next
0.60 15 15 replacements correspond to t = 62 (Figure 2)
and t = 40 (Figure 4). Hence, we increased
0.70 23 22
T = 61 and T = 39, the periods just before
0.80 33 35
the 3rd and 4th replacements given by our al-
0.90 67 67
gorithm for scenarios 1 and 2. The number of
0.95 100 100
feasible solutions found by enumeration are re-
ported in Table 4.
In order to understand how good our method- When we decrease k, number of replacements,
ology is, we enumerate all possible solutions of by 1 (k = 1 and k = 2 for scenarios 1 and 2 re-
Abstract
We present a sound and complete graphical criterion for reading dependencies from the
minimal undirected independence map of a graphoid that satisfies weak transitivity. We
argue that assuming weak transitivity is not too restrictive.
Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies
Weak Transitivity 249
Lebesgue measure zero wrt M (H) by Theorem G alone. However, sep in G is not complete if
2, and (iv) the union of the probability distri- being complete is understood as being able to
butions in (ii) and (iii) has Lebesgue measure identify all the independencies in p. Actually,
zero wrt M (H) because the finite union of sets no sound criterion for reading independencies
of Lebesgue measure zero has Lebesgue measure from G alone is complete in the latter sense.
zero. An example follows.
Let p(U, W) denote any probability distribu- Example 1. Let p be a multinomial (Gaussian)
tion in M (H) that is faithful to H and satis- probability distribution that is faithful to the
fies that p(U|W = w) has the same indepen- DAG X Z Y . Such a probability dis-
dencies for all w. As proven in the paragraph tribution exists (Meek, 1995). Let G denote
above, such a probability distribution exists. the MUI map of p, namely the complete UG.
Fix any w and let X, Y and Z denote three Note that p is not faithful to G. However, by
mutually disjoint subsets of U. Then, X Y|Z Theorem 3, there exists a multinomial (Gaus-
in p(U|W = w) iff X Y|ZW in p(U, W) sian) probability distribution q that is faithful
iff sep(X, Y|ZW) in H iff sep(X, Y|Z) in G. to G. As discussed in Section 3, p and q are WT
Then, p(U|W = w) is faithful to G. graphoids. Let us assume that we are dealing
The proof for Gaussian probability distribu- with p. Then, no sound criterion can conclude
tions is analogous. In this case, p(U, W) is a X Y | by just studying G because this inde-
Gaussian probability distribution and thus, for pendency does not hold in q, and it is impossible
any w, p(U|W = w) is Gaussian too (Ander- to know whether we are dealing with p or q on
son, 1984). Theorem 2 is not needed in the proof the sole basis of G.
because, as discussed in Section 3, any Gaussian
probability distribution p(U, W) satisfies that 5 Reading Dependencies
p(U|W = w) has the same independencies for
all w. In this section, we propose a sound and com-
plete criterion for reading dependencies from
The theorem above has previously been pro- the MUI map of a WT graphoid. We define the
ven for multinomial probability distributions dependence base of an independence model p as
in (Geiger and Pearl, 1993), but the proof the set of all the dependencies X 6 Y |U \ (XY )
constrains the cardinality of U. Our proof does with X, Y U. If p is a WT graphoid, then ad-
not constraint the cardinality of U and applies ditional dependencies in p can be derived from
not only to multinomial but also to Gaussian its dependence base via the WT graphoid pro-
probability distributions. It has been proven perties. For this purpose, we rephrase the WT
in (Frydenberg, 1990) that sep in an UG G is graphoid properties as follows. Let X, Y, Z
complete in the sense that it identifies all the and W denote four mutually disjoint subsets
independencies holding in every Gaussian pro- of U. Symmetry Y 6 X|Z X 6 Y|Z. De-
bability distribution for which G is an indepen- composition X 6 Y|Z X 6 YW|Z. Weak
dence map. Our result is stronger because it union X 6 Y|ZW X 6 YW|Z. Contraction
proves the existence of a Gaussian probability X 6
YW|Z X 6 Y|ZW X 6 W|Z is pro-
distribution with exactly these independencies. blematic for deriving new dependencies because
We learned from one of the reviewers, whom we it contains a disjunction in the right-hand side
thank for it, that a rather different proof of the and, thus, it should be split into two properties:
theorem above for Gaussian probability distri- Contraction1 X 6 YW|Z X Y|ZW X
butions is reported in (Lnenicka, 2005). 6
W|Z, and contraction2 X 6 YW|Z X
The theorem above proves that sep in the
W|Z X 6 Y|ZW. Likewise, intersec-
MUI map G of a WT graphoid p is complete tion X 6 YW|Z X 6 Y|ZW X 6 W|ZY
in the sense that it identifies all the indepen- gives rise to intersection1 X 6 YW|Z X
dencies in p that can be identified by studying Y|ZW X 6 W|ZY, and intersection2
Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies
Weak Transitivity 251
pendence base of p and the fact that p is a WT Contraction2 con(X, YW|Z)
graphoid. sep(X, W|Z) con(X, Y|ZW). Since Z
Theorem 5. Let p be a WT graphoid and G blocks all the paths in G between X and
its MUI map. Then, con in G identifies all the W, the path X1:n in the left-hand side
dependencies in the WT graphoid closure of the must be between X and Y and, thus, it
dependence base of p. satisfies the right-hand side.
Proof. It suffices to prove (i) that all the depen- Intersection con(X, YW|Z)
dencies in the dependence base of p are identi- sep(X, Y|ZW) con(X, W|ZY). Since
fied by con in G, and (ii) that con satisfies the ZW blocks all the paths in G between X
WT graphoid properties. Since the first point and Y, the path X1:n in the left-hand side
is trivial, we only prove the second point. Let must be between X and W and, thus, it
X, Y, Z and W denote four mutually disjoint satisfies the right-hand side.
subsets of U. Weak transitivity2 con(X, Xm |Z)
con(Xm , Y|Z) sep(X, Y|ZXm )
Symmetry con(Y, X|Z) con(X, Y|Z).
con(X, Y|Z) with Xm U \ (XYZ). Let
Trivial.
X1:m and Xm:n denote the paths in the first
Decomposition con(X, Y|Z) and second, respectively, con statements in
con(X, YW|Z). Trivial if W contains no the left-hand side. Let X1:m:n denote the
node in the path X1:n in the left-hand path X1 , . . . , Xm , . . . , Xn . Then, X1:m:n
side. If W contains some node in X1:n , satisfies the right-hand side because (i) Z
then let Xm denote the closest node to does not block X1:m:n , and (ii) Z blocks
X1 such that Xm X1:n W. Then, the all the other paths in G between X1 and
path X1:m satisfies the right-hand side Xn , because otherwise there exist several
because (X \ X1 )(YW \ Xm )Z blocks all unblocked paths in G between X1 and Xm
the other paths in G between X1 and Xm , or between Xm and Xn , which contradicts
since (X \ X1 )(Y \ Xn )Z blocks all the the left-hand side.
paths in G between X1 and Xm except Weak transitivity1 con(X, Xm |Z)
X1:m , because otherwise there exist several con(Xm , Y|Z) sep(X, Y|Z)
unblocked paths in G between X1 and Xn , con(X, Y|ZXm ) with Xm U \ (XYZ).
which contradicts the left-hand side. This property never applies because, as
seen in weak transitivity2, sep(X, Y|Z)
Weak union con(X, Y|ZW)
never holds since Z does not block X1:m:n .
con(X, YW|Z). Trivial because W
contains no node in the path X1:n in the
left-hand side.
Note that the meaning of completeness in the
Contraction1 con(X, YW|Z) theorem above differs from that in Theorem 3.
sep(X, Y|ZW) con(X, W|Z). Since It remains an open question whether con in G
ZW blocks all the paths in G between identifies all the dependencies in p that can be
X and Y, then (i) the path X1:n in the identified by studying G alone. Note also that
left-hand side must be between X and W, con in G is not complete if being complete is
and (ii) all the paths in G between X1 and understood as being able to identify all the de-
Xn that are blocked by Y are also blocked pendencies in p. Actually, no sound criterion for
by (W \ Xn )Z and, thus, Y is not needed reading dependencies from G alone is complete
to block all the paths in G between X1 in this sense. Example 1 illustrates this point.
and Xn except X1:n . Then, X1:n satisfies Let us now assume that we are dealing with q
the right-hand side. instead of with p. Then, no sound criterion can
Reading Dependencies from the Minimal Undirected Independence Map of a Graphoid that Satisfies
Weak Transitivity 253
present the causal relations between the gene References
expression levels and the protein amounts. It Theodore W. Anderson. 1984. An Introduction to
is important that the Bayesian network is dy- Multivariate Statistical Analysis. John Wiley &
namic because a gene can regulate some of its Sons.
regulators and even itself with some time delay. Ann Becker, Dan Geiger and Christopher Meek.
Since the technology for measuring the state of 2000. Perfect Tree-Like Markovian Distributions.
the protein nodes is not widely available yet, In 16th Conference on UAI, pages 1923.
the data in most projects on gene expression
Remco R. Bouckaert. 1995. Bayesian Belief Net-
data analysis are a sample of the probability dis- works: From Construction to Inference. PhD The-
tribution represented by the dynamic Bayesian sis, University of Utrecht.
network after marginalizing the protein nodes
Morten Frydenberg. 1990. Marginalization and Col-
out. The probability distribution with no node lapsability in Graphical Interaction Models. An-
marginalized out is almost surely faithful to the nals of Statistics, 18:790805.
dynamic Bayesian network (Meek, 1995) and,
Dan Geiger and Judea Pearl. 1993. Logical and Al-
thus, it satisfies weak transitivity (see Section 3)
gorithmic Properties of Conditional Independence
and, thus, so does the probability distribution and Graphical Models. The Annals of Statistics,
after marginalizing the protein nodes out (see 21:20012021.
Theorem 1). The assumption that the proba-
Radim Lnenicka. 2005. On Gaussian Conditional In-
bility distribution sampled is strictly positive is dependence Structures. Technical Report 2005/
justified because measuring the state of the gene 14, Academy of Sciences of the Czech Republic.
nodes involves a series of complex wet-lab and
Christopher Meek. 1995. Strong Completeness and
computer-assisted steps that introduces noise in Faithfulness in Bayesian Networks. In 11th Confe-
the measurements (Sebastiani et al., 2003). rence on UAI, pages 411418.
Additional evidence supporting the claim Kevin Murphy and Saira Mian. 1999. Modelling
that the results in this paper can be help- Gene Expression Data Using Dynamic Bayesian
ful for learning gene dependencies comes from Networks. Technical Report, University of Cali-
fornia.
the increasing attention that graphical Gaussian
models of gene networks have been receiving Masashi Okamoto. 1973. Distinctness of the Eigen-
from the bioinformatics community (Schafer values of a Quadratic Form in a Multivariate
and Strimmer, 2005). A graphical Gaussian mo- Sample. Annals of Statistics, 1:763765.
del of a gene network is not more than the MUI Judea Pearl. 1988. Probabilistic Reasoning in Intel-
map of the probability distribution underlying ligent Systems: Networks of Plausible Inference.
the gene network, which is assumed to be Gaus- Morgan Kaufmann.
sian, hence the name of the model. Then, this Jose M. Pena, Roland Nilsson, Johan Bjorkegren
underlying probability distribution is a WT gra- and Jesper Tegner. 2006. Identifying the Rele-
phoid and, thus, the results in this paper apply. vant Nodes Without Learning the Model. In 22nd
Conference on UAI, pages 367374.
Juliane Schafer and Korbinian Strimmer. 2005.
Acknowledgments Learning Large-Scale Graphical Gaussian Models
from Genomic Data. In Science of Complex Net-
works: From Biology to the Internet and WWW.
We thank the anonymous referees for their com-
Paola Sebastiani, Emanuela Gussoni, Isaac S. Ko-
ments. This work is funded by the Swedish Re-
hane and Marco F. Ramoni. 2003. Statistical
search Council (VR-621-2005-4202), the Ph.D. Challenges in Functional Genomics (with Discus-
Programme in Medical Bioinformatics, the Swe- sion). Statistical Science, 18:3360.
dish Foundation for Strategic Research, Clinical
Milan Studeny. 2005. Probabilistic Conditional In-
Gene Networks AB, and Linkoping University. dependence Structures. Springer.
3[T%:<C36;IW=BLeU18JT3[=@;GEX;8BC>C>3[T36=@;bB
:+a%T^>CIfTdI:HJ8JL% s>1 s<0
_`=@;>aT%Le8B36;I:<CW^>;S_`T3[=@;>aMG:ol=@^>56C563[B:KT%=;I=BT%: II I
Td>8JTKa:<;>a3[T3[H43[Tf8B;>8B5[f4a36ad>8Ba:M:<;.8JR>RS53[:<C:` (s,t)
W=BL:O36;TdI:y_`=@;?T%:`msT=BWY;>8B3[HB:Ok]8<fB:<a38B;\_M568Baae3[>:MLeaMb
center
|2r|
8Ba8U:<8B;>a=BWR>Le=H436C>36;IP=@^>;>C>a]=@;KTdI:8BU=@^>;?T vertex
=BW]RS8JLe8BU1:MT%:MLHJ8JLe368JT3[=@;Td>8JT36a8B565[=l:<CV3[TdI=@^IT
III IV
_M568Baea8B;36;>aT8B;>_`:3a8Baa36P@;I:<CT%=F]dS8B; Q 8JL% t0 t0
3[P@^ILe:JvVdI:R=@aa36S5[:d?f?R:ML=@568Bao8B;>CTdI:<3[L
V3_edI:BbOJpBp@t@!G$::`m4T%:<;>Cw=@;.Td>36aL:<ae^>5[TV3[Td
_`=@;>aT8B;~TaMb
+dI:ML:1TdI:_`=@;>a%T%Le8B3;~Ta=@;%$4b&8B;>C('
W^ILTdI:ML3;>a3[P@d?TaMG 8JL:yaR/:<_M36S_WZ=BL+a%:<;>a3[T36Hs3[TfW^>;>_`T3[=@;SaMG
=BL_M568Baa36S_M8JT3[=@;R>L=BS56:<U1aMbA3[T36a=BWT%:<;8Ba
a^SU:<CTd>8JT:MHs36C>:<;>_`:36a8<HJ8B36568JS5[:W=BL:MHB:MLfa36;4 T%:MLeaYR:MLT8B36;S36;IPT%=&TdI:a8BU:]_`=@;>C>3[T36=@;>8B5~C>3a%T%Le3[S^I
P@5[:oWZ:<8JT^>L:oHJ8JLe368J5[:BGX;;s^>U:MLe=@^>a8JR>R5636_M8JT3[=@; T3[=@;8JL:_`=JH8JL!3[:<CR>L=BR=BLT3[=@;>8B55[fBb>TdI:<;TdI:a%:<;4
CI=@U8B36;>aMb>dI=
:MHB:MLbTd>3aV8Baa^SUR>T3[=@;KU8<fK;I=BT: a36T3[Hs36TgfW^>;>_`T36=@;36a
8ys^I=BT3[:<;?T
=BW/T
=5636;I:<8JL
W^>;>_
L:<8B536a%T36_JG)vd>:\?^>:<a%T3[=@;TdI:<;8JL!36a%:<ad>= U^S_ed T3[=@;Sa36;TdI:R8JLe8BU:MT%:M*L )^>;>C>:MLa%T^>CIfF]8Ba%T3655[=M
36U1RS8B_`T8BC>CS3[T3[=@;>8B5:MH436CI:<;S_`:o_`=@^>56Cd>8HB:o=@;TdI: J[ b<rBr@?Fl=@^IR]: 8B;CI:MLN8B8JPIbJpBp@@!EU=BL:
=@^IT%R^IT
=BW/3;~T%:ML:<aT8B;>C1dI=a%:<;Sa3[T3[HB:]Td>36a36URS8B_`T W=BLeU18B5656fBbsTd>:yW^S;>_`T3[=@;T8JB:<a+TdI:OWZ=BL!U
36aT%=36;>8B_M_M^IL!8B_M3[:<aO36;TdI:1;>:MTgl=BLqaRS8JLe8BU1:MT%:MLeaMG
:136;?T%L=4C>^>_`:TdI:1;I=HB:<5;I=BT3[=@;=BW]a_`:<;>8JL!3[=Ka%:<;4 +
)- ,/.16 0 )7)329258 4
a36T3[Hs36TgfT%=_M8JRST^IL:TdI:568JT%T%:MLTf?R:=BWa%:<;>a3[T36Hs3[Tf
8B;>Cad>=Td>8JTTdI:]:`:<_`Ta
=BWSRS8JL!8BU:MT%:ML
H8JL!368JT3[=@; Vd>:ML:TdI:_`0=@;>a%T8B;?Ta .
: 4 : 6 : 8o8JL:^>365[TWL=@U(TdI:
36;H43[:M =BWa_`:<;>8JL!3[=@aO=BW8BC>C>3[T36=@;>8B5Y:MHs36C>:<;>_`:1_M8B; 8Baa:<aaU:<;?Ta
W=BL
TdI:;I=@;4HJ8JLe3[:<CRS8JLe8BU1:MT%:MLeaMG
vdI:
::<a%T8JS563adI:<C:M_M3[:<;?T5[fBG _`=@;>aT8B;~Ta=BWTdI:W^>;S_`T3[=@;_M8B;/:WZ:<8Ba36S5[f_`=@U
vVdI:oR8JR/:MLA36a=BLeP@8B;>36a%:<Cw8BaAW=@565[=VaMG X;2s:<_ RS^>T%:<CWL=@U TdI:.;I:MT
=BLe Fl=@^>Rl: 8B;CI:ML
T3[=@;sbl:R>L:<a%:<;?T&a=@U:R>L:<5636U36;>8JLe3[:<a=@;a:<;>a3 N8B8JPIbJpBp@s
;l=7 <]Le^S5 8B;oCI:ML&N8B8JPIbJpBpBp~!G
T3[H43[TfW^>;>_`T36=@;>aV8B;>C\TdI:<3[L8Baa%=4_M368JT%:<Ca%:<;>a3[T36Hs3[Tf a%:<;Sa3[T3[H43[TgfW^>;>_`T3[=@;K3a:<3[TdI:ML&8 > W^>;>_
R>Le=BR/:MLeT3[:<aMGlX;2s:<_`T3[=@;tsbl::<aT8JS5636adKTdI:W^>;>_ T3[=@;1=BL
8OWLe8JP@U:<;?T=BW8 > ? A@?4[ >CBA=D >FE " [ G
T3[=@;S8B5BWZ=BL!U1a=BW4TdI:a:<;>a3[T3[H43[Tf+W^>;>_`T3[=@;SaWZ=BLY;>8B3[HB: L:<_`T8B;>P@^>568JLVd?f?R:ML=@568T8JB:<a+TdI:yPB:<;I:MLe8B5WZ=BLeU
k]8<fB:<ae368B;;I:MT
=BLesa
8B;>C8BC>CIL:<aea
TdI::<;>a^S36;IPOa%:<;4 +
)- , )HG($ & 2('
a36T3[Hs36TgfR>L=BR:MLT3[:<aMGX;2s:<_`T3[=@ ; l:3;~T%L=4C>^>_`:
TdI:;I=BT3[=@;=BWa_`:<;S8JLe3[=a:<;>a3[T3[H43[Tf8B;SCad>=wTd>8JT Vd>:ML:Bb~W=BL
8W^>;S_`T3[=@;V3[Td .I: 4 : 6 : 88Ba:MW=BL:Bb~l:
3[Ty_M8B;::<aT8JS5636ad>:<CWZL=@U a%T8B;>C>8JLeCa%:<;>a3[T36Hs3[Tf d>8HB:OTd>8JCT &J,KGMLN~Ob ',/PN4b48B;SQC $R,TSN-2U& 0 ' GvdI:
W^>;>_`T3[=@;SaMGYvVdI:VRS8JR:ML
:<;>C>a+3[Td1=@^IL]_`=@;>_M56^Sa3[=@;>a d?f?R:ML=@568Vd>8BaYTgl=&SLe8B;>_ed>:<aY8B;SCTdI:lTgl=y8Ba%f4URI
8B;>CACS3[L:<_`T3[=@;>a]W=BLW^IT^IL:OLe:<a%:<8JLe_edo36;\2s:<_`T3[=@;ousG T%=BT%:<Va ),W&8B;>5 C +
)X ,T' 3[P@^IL:36565^>a%T%Le8JT%:<a
TdI:o5[=s_M8JT36=@;>a=BWOTdI:KR=@aae3[S5[:>Le8B;S_edI:<aLe:<568JT3[HB:
g w ~
T%=KTd>:8Ba%f4UR>T%=BT%:<aMGK2436;>_`:=BTYd )8B;>(C +
)&8JL:
R>Le=BS8JS365636T3[:<aMb<TdI:]T
=JgC>3U:<;>a3[=@;S8B5?a%R8B_`:l=BWTdI:<3[L
vE=o36;?HB:<a%T3[P@8JT%:TdI::`/:<_`Ta=BW36;>8B_M_M^>Le8B_M3[:<a3;3[Ta W:<8Ba3[S56:
H8B56^>:<a36aCI:M;I:<COsf3p Z[) : +
)\ Z)JBTd>36a
RS8JL!8BU:MT%:MLeaMb8.k]8<fB:<a38B;;I:MT
=BL_M8B; :a^II a%R8B_`:36aT%:MLeU:<CTd>: ^ ]- !_ "]
G
2436;S_`:]8B;?fa%:<;4
7%:<_`T%:<CT%=8a%:<;Sa3[T3[H43[Tgf8B;>8B5[f4a36a<GEX;1a^>_!d8B;8B;>8B5 a36T3[Hs36Tgf\W^>;>_`T36=@;adI=@^>56C:l:<565:<d>8<HB:<C36;TdI:
f4a36aMbBTd>:l:`:<_`Ta
=BWSHJ8JLfs3;IPO8Oa36;IP@5[:]RS8JL!8BU:MT%:ML=@; ^>;S3[T+36;>CI=yb8d~fsR:ML=@5636_a%:<;>a3[T36Hs3[TfW^>;>_`T3[=@;
8B;.=@^>T%RS^ITAR>L=B8JS36563[Tf=BW36;?T%:ML:<a%TK8JL:aT^>C>3[:<CG 36a8WZLe8JP@U1:<;~T=BW]7^>a%T=@;I:o=BWTdI:oW=@^ILR=@aa3[S56:
vd>:<a%::`:<_`Ta18JL:KC>:<a_`Le3[:<Csf8a3URS5[:U18JTd4 >L!8B;>_edI:<aV36; 3[P@^IL:1JG
:<U18JT3_M8B5EW^>;>_`T36=@;b_M8B565[:<C8 < ! #" G L=@U 8a%:<;Sa3[T3[H43[TgfW^S;>_`T3[=@;b
H8JLe36=@^>aR>L=BR:ML%
XWb^IR=@;H8JLefs36;>P8a36;IP@56:
RS8JLe8BU:MT%:ML<bTdI:R8JLe8BU:` T3[:<a_M8B;:)_`=@URS^IT%:<CG `V:MLe:)
:>Le3[b: aSf L:`
_M568BaaVH8B5^I:8B;>C\:<8B_edoRS8JLe8BU:MT%:MLGX;\Td>36aVa:<_`T3[=@; s
z
z'
s
l:1CI:MLe3[HB:TdI:<a%:W^>;>_`T36=@;>a8B;>CC>36a_M^SaaVTd>:<3[LO:<;4 z3
z>
a^>3;IPa%:<;>a3[T36Hs3[TfR>Le=BR/:MLeT3[:<aMG
] B > & ? , )^G
_ & , ) 2
X ? \
> By
J eB& @g
>
D " " B
" >> D "
_ A@
%'&( )+*-, B .0/ , 1324/S 5 J J h L
#"
_ OQ- B 7" > @ 1" 6 QJN
kl:MWZ=BLe::<a%T8JS536ad>36;>PTd>:a%:<;>ae3[T3[H43[TgfW^>;S_`T3[=@;>aW=BL > "F" M :R>L=HB:TdI:R>L=BR:MLTgfaT8JT%:<C(36; TdI:
;>8B3[HB:k]8<fB:<a368B;;I:MT
=BLesaMbl:36;?T%L=4C>^>_`:a%=@U: R>L=BR=@a36T3[=@;WZ=BDL N R ,N QR~ITdI:R>L=s=BWEW=BDL N RH, N QR36a
;I=BT8JT3[=@;S8B5_`=@;?HB:<;?T3[=@;>aMG :_`=@;>a36CI:ML8;>8B3[HB: 8B;>8B5[=BPB=@^SaMG
:O:MP@36;1sf1Le36T36;IPyTdI:&R>L=B8JS36563[Tf
k]8<fB:<a38B;K;I:MTgl=BLV3[Td;I=4CI:<7a 69 8 , z :< ;>= b h L 6 : N36;T%:ML!U1a=BWYTdI:;I:MT
=BL/qaR8JLe8BU:MT%:MLeaM
= , z =@? =7BS D b CFE sb+8B;SC8JL!_MHa G9 8y ,
z : : =I K JM:
LHA
A
, A: :
A
A
A: C : 36a_M8B565[:<CTd>: M[ = h L 6 : N , OY N I J 6 0 O 6 0 OY N R J 6 :
? > E` K=BWVTdI:;I:MTgl=BL8B;>C =7I 8JLe:T%:ML!U:<C3[Ta `xk]
fc (x )
fc (x )
0.8 fc (x ) 0.8
h L 6 Q : N` ) , OY N I J 6 Q 0 OY 6 Q 0 )
0.6 0.6
fc (x )
0.4 0.4
`xk]
W=BL 8B;>CK8Ba
0.2 0.2
6 , 6 Q 0 0
h L 6 : N` ) , YO N I J 6 0 YO 6
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
3[P@^ILe:s mI8BURS5[:ya%:<;Sa3[T3[H43[TgfW^>;>_`T3[=@;SaWZ=BL+TdI:
x0 x x0 x
l=BLA36;>_M56^>C>:<a
Td>:yWZ=@55[=V3;IP18Baa%:<aaeU:<;~TaVWZ=BL+TdI: 3
RS8JLe8BU1:MT%:MLeaW=BLTd>36a]HJ8JLe368J5[:B 2.5
2
1.5
"!
1
!%" ! !%" ! !%" !#" ! "
% !%" * 0.5
0.8
1
!%" & !%" & ! "
% ! "
# ,." !%" * 00 0.6
+x =a^IRSR/=@a:Td>8JTl:8JLe:3;~T%:ML:<aT%:<C36;Td>::MWZ
0.2 0.4 P
0.4 0.2
W:<_`Taw=BW36;>8B_M_M^ILe8B_M36:<a3;TdI:RS8JLe8BU:MT%:ML ) ,
0.8
1 0
OY g+ " "(, ;>= J T, X Y =@;TdI:oR/=@aT%:MLe3[=BL 8W^>;>_`T3[=@;=BW )O8B;SCO`QbIWZ=BL N R ,NQR@G
R>L=B8JS36563[T36:<a h L JN+W=BLy=@^ILOR8JT3[:<;~TO+dI=Ad>8Ba
;I=:MHs3CI:<;>_`:=BW5[=4_`=JL:MP@3[=@;>8B5U:MT8BaT8Ba%:<aMGvVdI:<a%: k]8Ba%:<C^IR=@;KTd>:PB:<;I:ML!8B5EW=BLeU(=BW
TdI:a%:<;>a3[T36H~
:`:<_`Ta8JLe:o_M8JR>T^ILe:<Csfae3mW^>;S_`T3[=@# $;> aV36TdTdI: 3[TfHJ8B56^I:CI:MLe36HB:<C.8J/=HB:Bb 3[P@^IL:tCI:MR36_`TaTdI:
HB:MLT36_M8B5+8Ba%f4UR>T%=BT%Y: & ? , p A uB G # % ? , GVp A tBtsG a%:<;>ae3[T3[H43[Tgf1HJ8B56^I:=B-W +UjV4X N Y []\ )lP@3[HB:<T; N R ,NQ R W=BL
w3[TdI=@^ITd>8Hs36;>PT%=R:MLW=BLeU8B;~fW^ILeTdI:MLl_`=@U1RS^4 C>3:ML:<;?T_`=@US36;S8JT3[=@;>a=BWHJ8B56^I:<aWZ=B7L )8B;>C O`QG
T8JT3[=@;>a<b>
:;I=);>CTd>8JT :=BSa%:MLeHB:Td>8JTd>3[P@da%:<;>a36T3[Hs36TgfyHJ8B56^I:<a_M8B;=@;>56f
+&('*) ) , )72) p tBt :WZ=@^>;SC3[WTdI:8Baa%:<aaeU:<;~TAW=BLATdI:RS8JLe8BU1:MT%:ML
^>;>C>:ML&a%T^SCIf\36aaU18B55E8B;>CTdI:=BLe36P@36;>8B5ER/=@aT%:MLe3[=BL
8B;>C A R>L=B8JS36563[TTf O Q 3a;I=@;4:`m4T%L:<U:B9 G 8\=BL:WZ=BL!U18B565[fBb
+,+ ) , h L JN 0 )72p A nBp u tBt l:Kd>8<HB:ATd>8JT )< : A3[W+8B;>C=@;>5[f36W
o r
p t
q s u
v w
fc (x )
0.8 0.8 ) > D _1bE C G ) !_ G ) h > D
0.4 0.4
a%:MLeH8JT3[=@;TdS8JT8B55EW^>;>_`T36=@;>a +UjV4X NkY [l\ V3[Td 6 , 6 Q
0.2 0.2
d>8HB:oTd>:\a8BU:dI=BLe3[M=@;?T8B5V8B;SCHB:MLT36_M8B5&8Ba%f4URI
T%=BT%:<aO8B;>CoTdI:ML:MW=BL:CI=;I=BT36;~T%:ML!a%:<_`TMG a8_`=@;4
fc (x )
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
a%:<s^I:<;>_`:Bbs:<3[TdI:ML 6 Q =BL 6 36aTd>:+U=@a%T]563[B:<56fHJ8B56^I:
3[P@^ILe1 : I mI8BURS56:<a+=BWla%:<;>a36T3[Hs36TgfKW^>;>_`T3[=@;Sa+W=BL =BW : b
L:MP@8JLeC>5[:<aea=BW&Td>:AH8B56^>:=BJW )EGXW&TdI:ATgl=
x0 x x0 x
R>L=B8JS36563[TfC>36a%T%Le36S^IT3[=@;=HB:ML1TdI:_M568BaaH8JL!368JS5[: *
!#" !
!#" &
!%" !
!%" &
!%" !
!%" &
!%" !
!%" &
!#" !
!#" &
,." /
!%"
8B;>CdI=wa:<;>a3[T3[HB:&Td>36al36UR8B_`Tl36a
T%=36;>8B_M_M^IL!8B_M3[:<a L=@U TdI:T8JS5[:8B;>CTdI:RSL=BS8JS3653[Tgf)C>3a%T%Le3[S^I
36;TdI:1;I:MT
=BLeqaOR8JLe8BU:MT%:MLeaMGvd>:W=BLeU:MLy3aa^I: T3[=@; h L J7NWL=@U mI8BURS56:JbVl:S;>CTd>8JT
36a1_M56=@a%:<5[fL:<568JT%:<CT%=Td>:\;I=BT3[=@;=BW&HJ8B56^I:A=BWO36;4 + h L g+ J > ,.fB:<a@J1 h L J N ,)p <JpIG
W=BLeU18JT3[=@;GvVdI:O568JT%T%:ML+36aea^I:36;~HB=@56HB:<a8;I=BT3[=@;A=BW vdI:;I:MR=@a%T%:MLe36=BLR>L=BS8J36560 3[Tf=BW (, X k)A ;I=
a%:<;>ae3[T3[H43[TgfTd>8JTC>3:MLeaWL=@U TdI:+a%T8B;>C>8JLeC;>=BT3[=@; W=@565[=+aC>3[L:<_`T5[fWL=@U
^>a%:<C36;TdI:1R>Le:MHs3[=@^SaOa%:<_`T3[=@;>ay3;TdI:a%:<;>a:Td>8JT h LX k JN , p A >Br ,wp >Bt
3[T]R:MLT8B36;>a];I=BTT%=8<HJ8B36568JS5[::MH436CI:<;>_`:O^IT]T%=1a_`:` p B p <Jp A
;>8JLe36=@a=BWR=@aae3[S5[f8BC>C>3[T36=@;>8B5>:MH436CI:<;>_`:BG:L:MW:ML V3[Td>=@^IT+A Le:<?^>36Le36;IP8B;?A fo8BCSC>3[T3[=@;>8B5E_`=@URS^IT8JT36=@;>a
T%=Td>3a;I=BT3[=@;=BWOa%:<;>ae3[T3[H43[Tgf8BQa > #" < WL=@U TdI:;I:MT
=BLeGx+=BT%:Td>8JTTd>:;I:MT%:<a%TLe:<a^>5[T
8B;SC^>a:TdI:T%:MLeU _
H < T%=
L:MW:MLVT%=TdI:U=BLe:a%T8B;>CS8JLeCK;I=BT3[=@;G 5[TdI=@^>P@dK3[T l=@^>56CA_edS8B;IPB:TdI:yU1=@a%TV563[B:<56fa%T8JPB:BG -
36a18JRSRS5636_M8JS56:AT%=k]8<fB:<a38B;;I:MT
=BLesa36;PB:<;>:MLe8B5b vdI:8J=HB:R>L=BR=@a36T3[=@;.8B565[=VaAW=BLK:<a%T8JS563ad>36;IP
36;Td>36a]a%:<_`T36=@;
:yC>3a_M^>aala_`:<;>8JLe36=1a%:<;>a3[T36Hs3[Tf136; TdI:y36U1RS8B_`T]=BWE8BC>C>36T3[=@;>8B5/:MHs3CI:<;>_`:=@;TdI:OR=@a%T%:`
TdI:_`=@;?T%:`m4TV=BW;>8B3[HB:k]8<fB:<ae368B;\_M568Baea3[>:MLea<G Le3[=BLRSL=BS8JS3653[TgfCS36a%T%Le3[^IT3[=@;1=HB:MLVTdI:_M568Baa]HJ8JLe3
vE=a%T^SCIfTdI:y:`:<_`T+=BW8BC>CS3[T3[=@;>8B5:MH436CI:<;S_`:y=@; 8JS5[:BGvE=\_M8JR>T^ILe:TdI:1a%:<;>a36T3[Hs36Tgf\=BWlTd>36aO3URS8B_`T
8B;K=@^>T%RS^ITR>Le=BS8JS365636TgfW=BLVTd>:_M568BaaVH8JL!368JS5[:BbIl: T%=&RS8JLe8BU:MT%:MLYH8JL!368JT3[=@;b
:]_`=@;>a3CI:MLTdI:
W^S;>_`T3[=@;
_`=@;>a3CI:MLTdI:&Le8JT3[= h L 6 JNY
T%= h L 6 JN+dI:ML: )VTd>8JTyCI:<a_`Le36/:<aVTdI:8J=HB:R>L=B8JS36563[TfLe8JT3[=
N 8B;S C N CI:<;I=BT%:]TdI:l:MHs3CI:<;>_`:RSLe3[=BLYT%=&8B;>C8JWT%:ML 8BaV8W^>;>_`T36=@;=BW81RS8JLe8BU:MT%:ML )Q, OY NQ R J 6 Q6!
L:<_`:<3[H436;IPTd>:y;I:M):MH436CI:<;>_`:OL:<aR/:<_`T36HB:<5[fBG ,"! hh LL 66 JJNN Y $ # ) , hh LL 66 JJNN Y `` ))
)
^ /`_M/ .Z ./ , %'&Db M =
_ = E <M X" -
> Y > E` ]- B
= = = !_ XWETdI:OR8JLe8BU:MT%:MCL )oR:MLT8B36;>alT%=8HJ8JLe368J5[:O36;TdI:
= UG = , z = ? =
h Z7ZfC b < N a%:MT = UG = blTdI:<;TdI:KCI:<;>=@U136;>8JT%=BL3a8_`=@;4
_ N E " g
H"
_ = h a%T8B;?T
+3[TdL:<a%R:<_`TT%V= )G
vd>:W^S;>_`T3[=@; )
TdI:<;
:
A
A
A:
=
2 ' ,=*><@?-,?$A,B.$C*D (*' 'FE G? ' C-,!, ' @HI (J K
RS56:BG+:;I=8JLe:3;~T%:ML:<aT%:<C3;\TdI::`:<_`Ta=BW
36;4 L*MNMNMPO$Q*RTSVU)RXW&Y6Z\[TSU#[TS^]_GU`Yba&c8UedgfhRTSdiR-Skj^lm_n*a)Q`o
8B_M_M^IL!8B_M3[:<a36;TdI:RS8JLe8BU:MT%:M7L )[, OY g+ B > , Spa&Y6Z\W&Ueqr;<smt7Gquvt<q-w
p A BJG:;I=:<a%T8J5636ad\TdI:a%:<;Sa3[T3[H43[TgfW^>;>_`T3[=@; 8
4
I"' % -,.N
5
>T-,. 'K(
X-
kq-}}rq$
V ( X$
+ UWV9X Y [ \ )
, # ? 8B;>CS;>CTdS8JT 'KH( * ' g)
A5 S' ,SkR-=*U><[@v@?fh-,R-Y?$a&cvgRTY6-Z\ W`UE GRT? Sk' j5,!Q`Y6 Z I' W) Z\' RTk, L&' So
Yba)(*Z=J Xa)SkWea)wsiwXq-w1uwXV
Le=@U m48BU1RS5[:&tl:&dS8BCTd>8JT]TdI:&R>L=B8JS36563[Tf=BW
A A m
{5,Vri-,.
ri +K+ -,
$7:99r;$
5,8*x ' X$v-
TdI:y_M58BaaHJ8B56^I:OX k.36;S_`L:<8Ba%:<CAWL=@Up A BT%=p A >Bt @?#-$x ' V 'IE G? ' ,
D)X ')( "V,. 'K(A+K')( -, '
^IR=@;8+R=@a36T3[HB:
5636HB:MLYa_M8B;b@TdI:ML:Msfy:<_`=@U136;IPu A X*K
fhRXW*$ZSkaAa*R-Q`SZS<-q9VsF7K}Xw!u7:w}V
T36U1:<a8BaE53[B:<5[fBG:]_M8B;;I=36;8BC>C>3[T3[=@;O_`=@;S_M56^>CI: 1
( "V,.
5
>T-,. ')(
#q}}X}V
5h J
Td>8JT
3[WTd>:RS8JLe8BU:MT%:ML )3a
H8JL!3[:<Cb@TdI:+_M568Baa
HJ8B56^I: ,V ' ,=*><@?h-,-?<C!D&V"V*T*X,-? '& D& ' ,r:
"
O g Jg
E
FA J' ?
#7K99r$
|2 ' ,><=@?,-?$C^ ( ( -
=@?* ' ' ,r*, E G? ' C-,, ' @H (*J )
LMMNM
X;Td>3a1RS8JR:ML<b]l:o^>a%:<CT%:<_ed>;S36?^>:<aWL=@U a:<;>a3 OVQR-SVU)RW)YZ\[-SVU8[-S]k_GU`Ya)c1UedNfhR-SVdRTSkjlm_nea&Q`Spa&Y6Z\W&Ue
T3[H43[Tf8B;>8B5[f4a36aT%=aT^>CIfTd>:K:`/:<_`Ta=BW+R8JLe8BU:` qX$si9}7u9X}9V
T%:ML3;>8B_M_M^ILe8B_M3[:<ay=@;8A;>8B3[HB:kl8fB:<a368B;;I:MTgl=BL/qa m
T"-
G ' V,.11
:"
q}}XV
:/r"-,r*=eT> ' ".$?
R=@a%T%:MLe36=BL+R>Le=BS8JS365636TgfKC>36aT%Le3[S^IT36=@;>aMGV:1aedI=l:<C *x '' ' D&N-p*,V!.V-*#,vD)X= ')( )
Tb,LMMNM
Td>8JTATd>:36;>CI:MR:<;>CI:<;S_`:8Baae^>UR>T3[=@;Sa=BWa^S_ed.8 FQ[KWeaeaejTZS<TU[YVa-YL`SYa)Q`SkR-YZ\[-SkR-8lN[TS:&a&Q*a&SpW*a
[-SlN[-cp$Ya)QBR-SkjL`S:&[TQ`cvRTY6Z\[TSOaeWe<Sp[T[*-_T3- '
;I:MT
=BLe_`=@;Sa%T%Le8B36;TdI:VW^>;S_`T3[=@;>8B5sWZ=BL!U =BWTdI:8Ba q!uwXwV
a%=4_M368JT%:<C.a%:<;Sa3[T3[H43[TgfW^>;S_`T3[=@;>aMTd>:<a%:W^S;>_`T3[=@;>a `
<~|Cx
$q-}X}V7X
/, ' V ( CD)-k@*".$?-3*x ' ,-> 'E G? '
8JL:OCI:MT%:ML!U136;I:<Ca%=@5[:<56f1?fTdI:O8Baa%:<aaeU:<;~TW=BL]TdI: D)X ')(
Kb,L&klgL[TQTUV[e[-SBMckZQ`Z\WeRTfa)Y<o
RS8JL!8BU:MT%:ML^>;>C>:MLya%T^>CIf8B;>CTdI:1=BL!3[P@36;>8B5YR=@a%T%:` [:jGU!ZS5Q`YZ W&Z\R-L`SYa)Z=ra&SpW*a)V ' Ft75utXV
Le36=BLR>L=BS8JS3563[Tgf1CS36a%T%Le3[^IT3[=@;=HB:MLVTd>:y_M568BaaHJ8JLe3
5
>G,. ')(
-,.2p
~ ' ,V< &
kq}}7
V/,-?<,V
8JS56:BGvVdI:a%:<;>a3[T36Hs3[TfKR>L=BR:MLT36:<aWZ=@5656=V36;>PWL=@U ' ,><=@?.VTe ( X ( VC@*D, ' @H (*J K
|b,
TdI:W^>;>_`T3[=@;SaR>Le=H436CI:<Ca%=@U:OW^>;>C>8BU:<;?T8B5/8JLP@^4 FQ[KWeaeaejTZS<TU[YVa $ YlN[TS:&a&Qa)SkWea[TSASpWea&Q`o
YbRTZSY_ZS5Q`Y6Z IW)Z\RTvL`SYa)Z=ra&SpW*a) ( r-,!-"V
U:<;?TaVd>36_!d_`=BLLe=B/=BL!8JT%:<CTdI::<URS36Le36_M8B565[f=BI v,V,$ ' w}u^-wr;$
a%:MLeHB:<CL=BS^>a%T;>:<aal=BWY_M568Baea3[>:MLealT%=RS8JL!8BU:MT%:MLV36;4
8B_M_M^IL!8B_M3[:<aMG 8K=BL:L:<a%:<8JLe_!do3aL:<s^>3[L:<Cb>d>=l:MHB:ML<b
5
>T-,. ')(
--,.2p
|~ ' ,V<X &
|q-}X}V
|5,x '
' ,><=@?I ( VC@*D8, ' @HI (J 5 (' C-=@?
T%=W^ILTd>:MLKa^ISa%T8B;?T368JT%:Td>:<a%:8JLP@^>U1:<;~TaMG(: Dex ( D` ')( CD)K
fh[KjXa&Q`S0L`S:&[TQ`cvRTY6Z\[TSQ[:Wea`UeU`ZSrr
W^ILTdI:ML36;?T%L=4C>^>_`:<CKTd>:;>=HB:<5
;I=BT3[=@;o=BW
ae_`:<;>8JLe3[= Q[-cOpVae[TQ`_Y[BIpZ\WeRTY6Z\[TSUek NC ' >< 'K( < ' FwXX
a%:<;Sa3[T3[H43[TgfBbVd>36_!dCI:<a_`Le3[:<a\TdI::`/:<_`Ta=BWRS8 utr}X$
Abstract
We propose a novel approach for solving continuous and hybrid Markov Decision Processes
(MDPs) based on two phases. In the first phase, an initial approximate solution is obtained
by partitioning the state space based on the reward function, and solving the resulting
discrete MDP. In the second phase, the initial abstraction is refined and improved. States
with high variance in their value with respect to neighboring states are partitioned, and
the MDP is solved locally to improve the policy. In our approach, the reward function and
transition model are learned from a random exploration of the environment, and can work
with both, pure continuous spaces; or hybrid, with continuous and discrete variables. We
demonstrate empirically the method in several simulated robot navigation problems, with
different sizes and complexities. Our results show an approximate optimal solution with
an important reduction in state size and solution time compared to a fine discretization
of the space.
Abstraction and Refinement for Solving Continuous Markov Decision Processes 265
same reward value, we assign a different qualita- factors. The qualitative state space Q, is an
tive state value. This generates more states but additional factor that concentrates all the con-
at the same time creates more guidance that tinuous variables. The idea is to substitute
helps produce more adequate policies. States all these variables by this abstraction to re-
with similar reward are partitioned so each q duce the dimensionality of the state space. Ini-
state is a continuous region. Figure 1 shows tially, only the continuous variables involved in
this tree transformation in a two dimensional the reward function are considered, but, as de-
domain. scribed in Section 4, other continuous variables
can be incorporated in the refinement stage.
Thus, a Qualitative MDP state is described in
a factored form as X = {X1 , . . . , Xn , Q}, where
X1 , . . . , Xn are the discrete factors, and Q is a
factor that represents the relevant continuous
dimensions.
3.3 Learning Qualitative MDPs
The Qualitative MDP model is learned from
Figure 1: Transformation of the reward deci-
data based on a random exploration of the en-
sion tree into a Q-tree. Nodes in the tree rep-
vironment with a 10% Gaussian noise intro-
resent continuous variables and edges evaluate
duced on the actions outcomes. We assume that
whether this variable is less or greater than a
the agent can explore the state space, and for
particular bound.
each stateaction can receive some immediate
Each branch in the Qtree denotes a set of reward. Based on this random exploration, an
constraints for each partition qi . Figure 2 illus- initial partition, Q0 , of the continuous dimen-
trates the constraints associated to the example sions is obtained, and the reward function and
presented above, and its representation in a 2- transition functions are induced.
dimensional space. Given a set of state transition represented as
a set of random variables, O j = {Xt , A, Xt+1 },
for j = 1, 2, ..., M , for each state and action A
executed by an agent, and a reward (or cost) R j
associated to each transition, we learn a quali-
tative factored MDP model:
1. From a set of examples {O, R} obtain a
decision tree, RDT , that predicts the re-
ward function R in terms of continuous
and discrete state variables, X1 , . . . , Xk , Q.
Figure 2: In a Q-tree, branches are constraints We used J48, a Java re- implementation of
and leaves are qualitative states. A graphical C4.5 (Quinlan, 1993) in Weka, to induce a
representation of the tree is also shown. Note pruned decision tree.
that when an upper or lower variable bound is
2. Obtain from the decision tree, RDT , the
infinite, it must be understood as the upper or
set of constraints for the continuous vari-
lower variable bound in the domain.
ables relevant to determine the qualitative
states (qstates) in the form of a Q-tree. In
3.2 Qualitative MDP Model terms of the domain variables, we obtain
Specification a new variable Q representing the reward-
We can define a qualitative MDP as a factored based qualitative state space whose values
MDP with a set of hybrid qualitativediscrete are the qstates.
Abstraction and Refinement for Solving Continuous Markov Decision Processes 267
Figure 4: Abstraction and refinement process. Upper left: reward regions. Upper right: explo-
ration process. Lower left: initial qualitative states and their corresponding policies, where u=up,
d=down, r=right, and l=left. Lower right: refined partition.
dark-colored squares with negative reward. The The qualitative representation and refine-
remaining regions in the navigation area receive ment were tested with several problems of dif-
0 reward (white). Experimentally, we express ferent sizes and complexities, and compared to a
the size of a rewarded (non zero reward) as a fine discretization of the environment in terms
function of the navigation area. Rewarded re- of precision and complexity. The precision is
gions are multivalued and can be distributed evaluated by comparing the policies and values
randomly over the navigation area. The num- per state. The policy precision is obtained by
ber of rewarded squares is also variable. Since comparing the policies generated with respect
obstacles are not considered robot states, they to the policy obtained from a fine discretiza-
are not included. tion. That is, we count the number of fine cells
in which the policies are the same:
The robot sensor system included the x-y
position, angular orientation, and navigation P P = (N EC/N T C) 100,
bounds detection. In a set of experiments the
possible actions are discrete orthogonal move- where P P is the policy precision in percentage,
ments to the right, left, up, and down. Fig- N EC is the number of fine cells with the same
ure 4 upper left shows an example of a navi- policy, and N T C is the total number of fine
gation problem with 26 rewarded regions. The cells. This measure is pessimistic because in
reward function can have six possible values. In some states it is possible for more than one ac-
this example, goals are represented as different- tion to have the same or similar value, and in
scale light colors. Similarly, negative rewards this measure only one is considered correct.
are represented with different-scale dark colors. The utility error is calculated as follows. The
The planning problem is to automatically ob- utility values of all the states in each represen-
tain an optimal policy for the robot to achieve tation is first normalized. The sum of the abso-
its goals avoiding negative rewarded regions. lute differences of the utility values of the corre-
sponding states is evaluated and averaged over and processing time required to solve the prob-
all the differences. lems. This is important since in complex do-
Figure 4 shows the abstraction and refine- mains where it can be difficult to define an ade-
ment process for the motion planning problem quate abstraction or solve the resulting MDP
presented above. A color inversion and gray problem, one option is to create abstractions
scale format is used for clarification. The upper and hope for suboptimal policies. To evaluate
left figure shows the rewarded regions. The up- the quality of the results Table 2 shows that the
per right figure illustrates the exploration pro- proposed abstraction produces on average only
cess. The lower left figure shows the initial qual- 9.17% error in the utility value when compared
itative states and their corresponding policies. against the values obtained from the dicretized
The lower right figure shows the refined parti- problem as can be seen in Table 2. Finally, since
tion. an initial refinement can miss some relevant as-
Table 1 presents a comparison between the pects of the domain, a refinement process may
behavior of seven problems solved with a sim- be necessary to obtain more adequate policies.
ple discretization approach and our qualitative Our refinement process is able to maintain or
approach. Problems are identified with a num- improve the utility values in all cases. The per-
ber as shown in the first column. The first centage of policy precision with respect to the
five columns describe the characteristics of each initial qualitative model can sometimes decrease
problem. For example, problem 1 (first row) has with the refinement process. This is due to our
2 reward cells with values different from zero pessimistic measure.
that occupy 20% of the number of cells, the dif-
ferent number of reward values is 3 (e.g., -10,
0 and 10) and we generated 40,000 samples to
build the MDP model.
Table 2: Comparative results between the ini-
Table 2 presents a comparison between the
tial abstraction and the proposed refinement
qualitative and the refined representation. The
process.
first three columns describe the characteristics
Qualitative Refinement
of the qualitative model and the following de- util. policy util. policy refin.
id error precis. error precis. time
scribe the characteristics with our refinement (%) (%) (%) (%) (min)
process. They are compared in terms of utility 1 7.38 80 7.38 80 0.33
2 9.03 64 9.03 64 0.247
error in %, the policy precision also in %, and 3 10.68 64 9.45 74 6.3
4 12.65 52 8.82 54.5 6.13
the time spent in the refinement in minutes. 5 7.13 35 5.79 36 1.23
As can be seen from Table 1, there is a signif- 6
7
11.56
5.78
47.2
44.78
11.32
5.45
46.72
43.89
10.31
23.34
icant reduction in the complexity of the prob-
lems using our abstraction approach. This can
be clearly appreciated from the number of states
Abstraction and Refinement for Solving Continuous Markov Decision Processes 269
6 Conclusions and Future Work C. Boutilier, T. Dean, and S. Hanks. 1999. Decision-
theoretic planning: structural assumptions and
In this paper, a new approach for solving con- computational leverage. Journal of AI Research,
tinuous and hybrid infinite-horizon MDPs is 11:194.
described. In the first phase we use an ex- G. F. Cooper and E. Herskovits. 1992. A Bayesian
ploration strategy of the environment and a method for the induction of probabilistic networks
machine learning approach to induce an ini- from data. Machine Learning.
tial state abstraction. We then follow a refine- T. Dean and R. Givan. 1997. Model minimization in
ment process to improve on the utility value. Markov Decision Processes. In Proc. of the 14th
Our approach creates significant reductions in National Conf. on AI, pages 106111. AAAI.
space and time allowing to solve quickly rela-
T. Dean and K. Kanazawa. 1989. A model for rea-
tively large problems. The utility values on our soning about persistence and causation. Compu-
abstracted representation are reasonably close tational Intelligence, 5:142150.
(less than 10%) to those obtained using a fine
Z. Feng, R. Dearden, N. Meuleau, and R. Washing-
discretization of the domain. A new refinement ton. 2004. Dynamic programming for structured
process to improve the results of the initial pro- continuous Markov decision problems. In Proc. of
posed abstraction is also presented. It always the 20th Conf. on Uncertainty in AI (UAI-2004).
improves or at least maintains the utility val- Banff, Canada.
ues obtained from the qualitative abstraction. C. Guestrin, M. Hauskrecht, and B. Kveton. 2004.
Although tested on small solvable problems for Solving factored mdps with continuous and dis-
comparison purposes, the approach can be ap- crete variables. In Twentieth Conference on Un-
certainty in Artificial Intelligence (UAI 2004),
plied to more complex domains where a simple
Banff, Canada.
dicretization approach is not feasible.
As future research work we will like to im- M. Hauskrecht and B. Kveton. 2003. Linear pro-
prove our refinement strategy to select a better gram approximation for factored continuous-state
Markov decision processes. In Advances in Neural
segmentation of the abstract states and use an Information Processing Systems NIPS(03), pages
alternative search strategy. We also plan to test 895 902.
our approach in other domains.
B. Kuipers. 1986. Qualitative simulation. AI,
29:289338.
Acknowledgments
M.L. Puterman. 1994. Markov Decision Processes.
This work was supported in part by the Insti-
Wiley, New York.
tuto de Investigaciones Electricas, Mexico and
CONACYT Project No. 47968. J.R. Quinlan. 1993. C4.5: Programs for ma-
chine learning. Morgan Kaufmann, San Fran-
cisco, Calif., USA.
References
R.E. Bellman. 1957. Dynamic Programming.
Princeton U. Press, Princeton, N.J.
Abstract
Selecting a single model for clustering ignores the uncertainty left by nite data as to
which is the correct model to describe the dataset. In fact, the fewer samples the dataset
has, the higher the uncertainty is in model selection. In these cases, a Bayesian approach
may be benecial, but unfortunately this approach is usually computationally intractable
and only approximations are feasible. For supervised classication problems, it has been
demonstrated that model averaging calculations, under some restrictions, are feasible and
ecient. In this paper, we extend the expectation model averaging (EMA) algorithm
originally proposed in Santafe et al. (2006) to deal with model averaging of naive Bayes
models for clustering. Thus, the extended algorithm, EMA-TAN, allows to perform an
ecient approximation for a model averaging over the class of tree augmented naive Bayes
(TAN) models for clustering. We also present some empirical results that show how the
EMA algorithm based on TAN outperforms other clustering methods.
ri
( i
ijk + E(Nijk |B
i (t)
))
(t)
p(c, x|D ) = (7) ( i
ijk )
k=1
Table 1: Comparison between EMA-TAN models and EM-TAN, EM-BNET and EMA learned from datasets
sampled from random TAN models
J. Cerquides and R. Lopez de Mantaras. 2005. TAN N. Friedman, D. Geiger and M. Goldszmidt 1997.
classiers based on decomposable distributions. Bayesian networks classiers. Machine Learning,
Machine Learning, 59(3):323354. 29:131163.
P. Cheeseman and J. Stutz. 1996. Bayesian classi- N. Friedman. 1998. The Bayesian structural EM
cation (Autoclass): Theory and results. In Ad- algorithm. In Proceedings of the Fourteenth Con-
vances in Knowledge Discovery and Data Mining, ference on Uncertainty in Artificial Intelligence,
pages 153180. AAAI Press. pages 129138.
B. Clarke. 2003. Comparing Bayes model averag- D. Heckerman, D. Geiger, and D. M. Chickering.
ing and stacking when model approximation error 1995. Learning Bayesian networks: The combina-
cannot be ignored. Journal of Machine Learning tion of knowledge and statistical data. Machine
Research, 4:683712. Learning, 20:197243.
G. F. Cooper and E. Herskovits. 1992. A Bayesian J. Hoeting, D. Madigan, A. E. Raftery, and C. Volin-
method for the induction of probabilistic networks sky. 1999. Bayesian model averaging. Statistical
from data. Machine Learning, 9:309347. Science, 14:382401.
D. Dash and G. F. Cooper. 2004. Model averaging F.V. Jensen. 2001. Bayesian Networks and Decision
for prediction with discrete Bayesian networks. Graphs. Springer Verlag.
Journal of Machine Learning Research, 5:1177 D. Madigan and A. E. Raftery. 1994. Model se-
1203. lection and accounting for model uncertainty in
A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. graphical models using Occams window. Journal
Maximum likelihood from incomplete data via the of the American Statistical Association, 89:1535
EM algorithm (with discussion). Journal of the 1546.
Royal Statistical Society B, 39:138. T. Minka. 2002. Bayesian model averaging is not
model combination. MIT Media Lab note.
P. Domingos. 2000. Bayesian averaging of classiers
and the overtting problem. In Proceedings of the J. M. Pena, J. A. Lozano, and P. Larranaga. 2002.
Seventeenth International Conference on Machine Learning recursive Bayesian multinets for cluster-
Learning, pages 223230. ing by means of constructive induction. Machine
Learning, 47(1):63-90.
R. Duda and P. Hart. 1973. Pattern Classification
and Scene Analysis. John Wiley and Sons. G. Santafe, J. A. Lozano, and P. Larranaga. 2006.
Bayesian model averaging of naive Bayes for clus-
N. Friedman and D. Koller. 2003. Being Bayesian tering. IEEE Transactions on Systems, Man, and
about network structure: A Bayesian approach CyberneticsPart B. Accepted for publication.
to structure discovery in Bayesian networks. Ma-
chine Learning, 50:95126.
Abstract
In this paper we introduce the Dynamic Weighting A* (DWA*) search algorithm for solv-
ing MAP. By exploiting asymmetries in the distribution of MAP variables, the algorithm
can greatly reduce the search space and yield MAP solutions with high quality.
but it is capable of handling difficult cases that The general idea of DWA* is that in each in-
exact methods cannot tackle. ner node of the probability tree, we can compute
the value of item (a) in the function above ex-
3 Solving MAP using Dynamic actly. We can estimate the heuristic value of the
Weighting A* Search item (b) for the MAP variables that have not
been instantiated given the initial evidence set
We propose in this section an algorithm for and the MAP variables that have been instanti-
solving MAP using Dynamic Weighting A* ated as the new evidence. In order to fit the typ-
search, which incorporates the dynamic weight- ical format of the cost function of A* search, we
ing (Pearl, 1988) in the heuristic function, can take the logarithm of the equation above,
relevance reasoning (Druzdzel and Suermondt, which will not change its monotonicity. Then
1994), and dynamic ordering in the search tree. we get f (n) = g(n) + h(n), where g(n) and h(n)
1.2
g1 h1 g2 h2 p1 g2 h2 p1 h2
> 0 > 0 > 0 .
1
p2 g2 h2 p2 g2 h2 p2 h2
0.8
0.6
It is clear that only when the ratio between
0.4 JUHHG\JXHVV the probability of suboptimal assignment and
FRQVWDQWZHLJKWLQJ
0.2 G\QDPLFZHLJKWLQJ the optimal one is larger than the ratio be-
0
RSWLPDO tween the inadmissible heuristic function and
0 5 10 15 20 25 30 35 40 the ideal one, will the algorithm find a subop-
T h e n u m b e r o f M A P v a ria b le s
timal solution. Because of large asymmetries
Figure 1: An empirical comparison of the con- among probabilities that are further amplified
stant weighting and the dynamic weighting by their multiplicative combination (Druzdzel,
heuristics based on greedy guess. 1994), we can expect that for most cases, the
ratios between p1 and p2 are far less than 1.
Even though the heuristic function will some-
3.3 Searching with Inadmissible times break the rule of admissibility, if only the
Heuristics for MAP Problem greedy guess is not too divergent from the ideal
Since the minimum weight that can guarantee estimate, the algorithm will still achieve opti-
the heuristic function to be admissible is un- mality. Our simulation results also confirm the
known before the MAP problem is solved, and robustness of the algorithm.
it may vary between cases, we normally set to
3.4 Improvements to the Algorithm
be a safe parameter that is supposed to be larger
than (In our experiments, we set to be 1.0). There are two main techniques that we used to
However, if is accidentally smaller than , it improve the efficiency of the basic A* algorithm.
M. J. Druzdzel. 1994. Some properties of joint prob- C. Yuan, T. Lu, and M. J. Druzdzel. 2004. Annealed
ability distributions. In Proceedings of the 10th MAP. In Proceedings of the 20th Annual Con-
Conference on Uncertainty in Artificial Intelli- ference on Uncertainty in Artificial Intelligence
gence (UAI94), pages 187194, Morgan Kauf- (UAI04), pages 628635, AUAI Press, Arling-
mann Publishers San Francisco, California. ton, Virginia.
Abstract
The paper discusses the problem to characterize collections of conditional independence
triples (independence model) that are representable by a discrete distribution. The known
results are summarized and the number of representable models over four elements set,
often mistakenly claimed to be 18300, is corrected. In the second part, the bounds for the
number of positively representable models over four elements set are derived.
X
4 3 aN
Independence models I and I over N will be See (Studeny and Vejnarova, 1998), pp. 5, for
called isomorphic if there exists a permutation more details.
2
A , B given C might be defined analogously. However, Of course, it is also possible to consider repre-
we will see that such relationships are uniquely deter- sentability in other distribution frameworks that the dis-
mined by the elementary ones (|A| = |B| = 1). crete distributions, cf. (Lnenicka, 2005).
288 P. imeek
Figure 3: Irreducible models over N = {1, 2, 3, 4}.
290 P. imeek
Figure 4: Undecided types.
292 P. imeek
Evaluating Causal ee
ts using Chain Event Graphs
Peter Thwaites and Jim Smith
University of Warwi
k Statisti
s Department
Coventry, United Kingdom
Abstra
t
The Chain Event Graph (CEG) is a
oloured mixed graph used for the representation
of nite dis
rete distributions. It
an be derived from an Event Tree (ET) together
with a set of equivalen
e statements relating to the probabilisti
stru
ture of the ET.
CEGs are espe
ially useful for representing and analysing asymmetri
pro
esses, and
olle
tions of implied
onditional independen
e statements over a variety of fun
tions
an be read from their topology. The CEG is also a valuable framework for expressing
ausal hypotheses, and manipulated-probability expressions analogous to that given
by Pearl in his Ba
k Door Theorem
an be derived. The expression we derive here
is valid for a far larger set of interventions than
an be analysed using Bayesian Net-
works (BNs), and also for models whi
h have insu
ient symmetry to be des
ribed
adequately by a Bayesian Network.
1 Introdu
tion this in
ludes all the properties that
an be
read from the equivalent BN if the CEG rep-
Bayesian Networks are good graphi
al repre- resents a symmetri
model and far more if the
sentations for many dis
rete joint probabil- model is asymmetri
.
ity distributions. However, many asymmet- In the next se
tion we show how CEGs
ri
models (by whi
h we mean models with
an be
onstru
ted. We then des
ribe how we
non-symmetri
sample spa
e stru
tures)
an-
an use CEGs to analyse the ee
ts of Causal
not be fully des
ribed by a BN. Su
h pro- manipulation.
esses arise frequently in, for example, bio- Bayesian Networks are often extended to
logi
al regulation, risk analysis and Bayesian apply also to a
ontrol spa
e. When it is valid
poli
y analysis. to make this extension the BN is
alled
ausal.
In eli
iting these models it is usually sen- Although there is debate (Pearl, 2000; Lau-
sible to start with an Event Tree (Shafer, ritzen, 2001; Dawid, 2002) about terminol-
1996), whi
h is essentially a des
ription of ogy, it is
ertainly the
ase that BNs are use-
how the pro
ess unfolds rather than how the ful for analysing (in Pearl's notation) the ef-
system might appear to an observer. Work- fe
ts of manipulations of the form Do X = x0
ing with an ET
an be quite
umbersome, but in symmetri
models, where X is a variable
they do re
e
t any model asymmetry, both to be manipulated and x0 the setting that
in model development and in model sample this variable is to be manipulated to. This
spa
e stru
ture. type of intervention, whi
h might be termed
The Chain Event Graph (Ri
omagno & atomi
, is a
tually a rather
oarse manipula-
Smith, 2005; Smith & Anderson, 2006; tion sin
e we would need to extend the spa
e
Thwaites & Smith, 2006a) is a graphi
al to make predi
tions of ee
ts when X is ma-
stru
ture designed for analysis of asymmetri
nipulated to any value. Although there is
systems. It retains the advantages of the ET, a
ase for only
onsidering su
h manipula-
whilst typi
ally having far fewer edges and tions when a model is very symmetri
, it is
verti
es. Moreover, the CEG
an be read for too
oarse to
apture many of the manipu-
a ri
h
olle
tion of
onditional independen
e lations we might want to
onsider in asym-
properties of the model. Unlike Jaeger's very metri
environments. We use this paper to
useful Probabilisti
De
ision Graph (2002) show how CEGs
an be used to analyse a far
more rened singular manipulation in models (1 on the ET in Figure 1); A
he
ked, fault
whi
h may have insu
ient symmetry to be found, ma
hine swit
hed o (2 ); A
he
ked,
des
ribed adequately by a Bayesian Network. no fault found, B
he
ked, fault found, ma-
hine swit
hed o (3 ).
2 CEG
onstru
tion If A is found faulty it is repla
ed and the
We
an produ
e a CEG from an Event Tree ma
hine swit
hed ba
k on (vertex v1 ), and
whi
h we believe represents the model (see for B is then
he
ked. B is then either found
example Figure 1). This ET is just a graph- not faulty (4 ), or faulty and the ma
hine
i
al des
ription of how the pro
ess unfolds, swit
hed o (5 ).
and the set of atoms of the Event Spa
e (or If B is found faulty by the monitoring sys-
path sigma algebra) of the tree is simply the tem, then it is removed and
he
ked (verti
es
set of root to leaf paths within the tree. Any v2 and v3 ). There are then three possibili-
random variables dened on the tree are mea- ties, whose probabilities are independent of
sureable with respe
t to this path sigma alge- whether or not
omponent A has been re-
bra. pla
ed: B is not in fa
t faulty, the ma
hine
is reset and restarted (6 ); B is faulty, is su
-
9
essfully repla
ed and the ma
hine restarted
v4 (7 ); B is faulty, is repla
ed unsu
essfully
1
1
0 and the ma
hine is left o until the engineer
9
an see it (8 ).
4 v5 1 At the time of any monitoring
y
le, the
9
0
quality of the produ
t produ
ed (10 ) is un-
v6 ae
ted by the repla
ement of A unless B is
6 10 also repla
ed. It is however dependent on the
v7 11 ee
tiveness of B whi
h depends on its age,
v0
2 5 v2 7
but also, if it is a new
omponent, on the age
v1 8
12 of A; so:
v18 (good produ
t j A and B repla
ed) = 12
> (good produ
t j only B repla
ed) = 14
> (good produ
t j B not repla
ed) = 10
9
v8
v3 6 10
3
7 v9 13 An ET for this set-up is given in Figure 1 and
a derived CEG in Figure 2. Note that:
8
14
The subtrees rooted in the verti
es v4 ; v5 ;
v23 v6 and v8 of the ET are identi
al (both
in physi
al stru
ture and in probability
distribution), so these verti
es have been
Figure 1. ET for ma
hine example.
onjoined into the vertex (or position) w4
in the CEG.
Example 1. The subtrees rooted in v2 and v3 are not
A ma
hine in a produ
tion line utilises two identi
al (as 11 6= 13 ; 12 6= 14 ), but
repla
eable
omponents A and B. Faults in the edges leaving v2 and v3
arry identi
al
these
omponents do not automati
ally
ause probabilities. The equivalent positions in
the ma
hine to fail, but do ae
t the qual- the CEG w2 and w3 have been joined by
ity of the produ
t, so the ma
hine in
orpo- an undire
ted edge.
rates an automated monitoring system, whi
h All leaf-verti
es of the ET have been
on-
is
ompletely reliable for nding faults in A, joined into one sink-vertex in the CEG,
but whi
h
an dete
t a fault in B when it is labelled w1 .
fun
tioning
orre
tly.
In any monitoring
y
le,
omponent A is To
omplete the transformation, note that
he
ked rst, and there are three initial pos- vi ! wi for 0 i 3, v7 ! w5 and v9 ! w6 .
sibilities: A, B
he
ked and no faults found
294 P. Thwaites and J. Smith
w4
9
1
6
0
6
w5
4
1
2 5 w2
w0 1 winf
2
w1 8
13
w6 14
3
7
8
w3
A formal des
ription of the pro
ess is as fol- (a) (e(v1 i ; v1 i+1 )) = e( (v1 i ); (v1 i+1 ))
lows: Consider the ET T = (V (T ); E (T )) = e(v2 i ; v2 i+1 ) for 0 i m()
where ea
h element of E (T ) has an asso
i- (b) (v1 i+1 j v1 i ) = (v2 i+1 j v2 i )
ated edge probability. Let S (T ) V (T ) be where v1 i+1 and v2 i+1 label the same value
the set of non-leaf verti
es of the ET. on the sample spa
es X(vi 1 ) and X(vi 2 ) for
Let vi vj indi
ate that there is a path join- i 0.
ing verti
es vi and vj in the ET, and that vi The set of equivalen
e
lasses indu
ed by
pre
edes vj on this path. the bije
tion is denoted K (T ), and the ele-
Let X(v) be the sample spa
e of X (v), the ments of K (T ) are
alled positions.
random variable asso
iated with the vertex v
(X(v)
an be thought of as the set of edges Denition 2. For any v1 ; v2 2 S (T ), v1 and
leaving v, so in our example, X(v1 ) = v2 are termed stage-equivalent, i there is a
fB found not faulty; B found faultyg). bije
tion whi
h maps the set of edges
E1 = fe1(v1 ; v1 0 ) j v1 0 2 X(v1 )g onto
For any v 2 S (T ), vl 2 V (T )nS (T ) su
h that E2 = fe2 (v2 ; v2 0 ) j v2 0 2 X(v2 )g in su
h
v vl : a way that:
Label v = v 0 (v1 0 j v1 ) = ((v1 0 ) j (v1 )) = (v2 0 j v2 )
Let vi+1 be the vertex su
h that where v1 0 and v2 0 label the same value on the
v i v i+1 vl for whi
h there is no sample spa
es X(v1 ) and X(v2 ).
vertex v0 su
h that v i v0 v i+1 for The set of equivalen
e
lasses indu
ed by
i0 the bije
tion is denoted L(T ), and the ele-
Label vl = v m , where the path
onsists ments of L(T ) are
alled stages.
of m edges of the form e(vi ; v i+1 )
A CEG C (T ) of our model is
onstru
ted as
Denition 1. For any v1 ; v2 2 S (T ), v1 and follows:
v2 are termed equivalent, i there is a bije
- (1) V (C (T )) = K (T ) [ fw1g
tion whi
h maps the set of paths (and
om- (2) Ea
h w; w0 2 K (T ) will
orrespond to a
ponent edges) set of v; v0 2 S (T ). If, for su
h v; v0 , 9
1 = f1 (v1 ; vl1 ) j vl1 2 V (T )nS (T )g onto a dire
ted edge e(v; v0 ) 2 E (T ), then 9 a
2 = f2 (v2 ; vl2 ) j vl2 2 V (T )nS (T )g in su
h dire
ted edge e(w; w0 ) 2 E (C (T ))
a way that: (3) If 9 an edge e(v; vl ) 2 E (T ) st v 2 S (T )
Abstract
It is common knowledge that the EM algorithm can be trapped at local maxima and
consequently fails to reach global maxima. We empirically investigate the severity of this
problem in the context of hierarchical latent class (HLC) models. Our experiments were
run on HLC models where dependency between neighboring variables is strong. (The
reason for focusing on this class of models will be made clear in the main text.) We
first ran EM from randomly generated single starting points, and observed that (1) the
probability of hitting global maxima is generally high, (2) it increases with the strength of
dependency and sample sizes, and (3) it decreases with the amount of extreme probability
values. We also observed that, at high dependence strength levels, local maxima are far
apart from global ones in terms of likelihoods. Those imply that local maxima can be
reliably avoided by running EM from a few starting points and hence are not a serious
issue. This is confirmed by our second set of experiments.
Severity of Local Maxima for the EM Algorithm: Experiences with Hierarchical Latent
Class Models 303
X1 The smaller it is, the higher the quality of the
parameters produced by the EM run. In partic-
X2 X3 X4 ular, the relative likelihood shortfall of the run
that produced the global maximum l is 0.
Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 For a given DS level, there are 5 models and
hence 500 EM runs. We put the relative like-
Figure 2: The structure of generative models. lihood shortfalls of all those EM runs into one
set and let, for any nonnegative real number x,
we examined 25 models in total. F (x) be the percent of the elements in the set
From each of the 25 models, four training sets that is no larger than x. We call F (x) the dis-
of 100, 500, 1000, and 5000 records were sam- tribution function of (aggregated) relative likeli-
pled. On each training set, we ran EM indepen- hood shortfalls of EM runs, or simply distribu-
dently for 100 times, each time starting from a tion of EM relative likelihood shortfall, for the
randomly selected initial point. The stopping DS level.
threshold of EM was set at 0.0001. The high- The first 5 plots in Figure 4 depict the dis-
est loglikelihood obtained in the 100 runs is re- tributions of EM relative likelihood shortfall for
garded as the global maximum. the 5 DS levels. There are four curves in each
plot, each corresponding to a sample size3 .
3.2 Results
The results are summarized in Figures 3 and 3.2.1 Probability of Hitting Global
4. In Figure 3, there is a plot for each combi- Maxima
nation of DS level (row) and sample size (col- The most interesting question is how often
umn). As mentioned earlier, 5 models were cre- EM hits global maxima. To answer this ques-
ated for each DS level. The plot is for one of tion, we first look at the solid curves in Fig-
those 5 models. In the plot, there are three ure 3. Most of them are stair-shaped. In each
curves: solid, dashed, and dotted2 . The dashed curve, the x-position of the right most stair is
and dotted curves are for Section 4. The solid the global maximum, and the height of that
curve is for this section. The curve depicts a stair is the frequency of hitting the global max-
distribution function F (x), where x is loglikeli- imum.
hood and F (x) is the percent of EM runs that We see that the frequency of hitting global
achieved a loglikelihood no larger than x. maxima was generally high for high DS lev-
While a plot in Figure 3 is about one model els. In particular, for DS level 3 or above, EM
at a DS level, a plot in Figure 4 represents an reached global maxima more than half of the
aggregation of results about all 5 models at a DS time. For sample size 500 or above, the fre-
level. Loglikelihoods for different models can be quency was even greater than 0.7. On the other
in very different ranges. To aggregate them in hand, the frequency was low for DS level 1, es-
a meaningful way, we introduce the concept of pecially when the sample size was small.
relative likelihood shortfall of EM run. The first 5 plots in Figure 4 tell the same
Consider a particular model. We have run story. In those plots, the global maxima are
EM 100 times and hence obtained 100 loglike- represented by x=0. The heights of the curves
lihoods. The maximum is regarded as the op- at x=0 are the frequencies of hitting global max-
timum and is denoted by l . Suppose a partic- ima. We see that for DS level 3 or above, the fre-
ular EM run resulted in loglikelihood l. Then quency of hitting global maxima is larger than
the relative likelihood shortfall of that EM run is 0.5, except that for DS level 3 and sample size
defined as (ll )/l . This value is nonnegative. 100. And once again, the frequency was low for
2
DS level 1.
Note that in some plot (e.g., that for DS level 4 and
3
sample size 5000) the curves overlap and are indistin- Note that in Figure 4 (b) and (c) some curves are
guishable. close to the y-axes and are hardly distinguishable.
Distribution function
Distribution function
Distribution function
Distribution function
DS level 1
0 0 0 0
925 920 915 910 4715 4710 4705 4700 4695 9504 9503 9502 9501 9500 9499 4.755 4.7545 4.754 4.7535 4.753
Loglikelihood Loglikelihood Loglikelihood Loglikelihood 4
x 10
1 1 1 1
Distribution function
Distribution function
Distribution function
Distribution function
DS level 2
0 0 0 0
890 885 880 875 870 4495 4490 4485 4480 4475 4470 8986.5 8986 8985.5 8985 4.55044.55044.55044.55044.55034.5503
Loglikelihood Loglikelihood Loglikelihood Loglikelihood 4
x 10
1 1 1 1
Distribution function
Distribution function
Distribution function
Distribution function
DS level 3
0 0 0 0
800 795 790 785 780 775 4100 4090 4080 4070 4060 4050 7967 7966 7965 7964 7963 7962 4.05 4.049 4.048 4.047 4.046 4.045
Loglikelihood Loglikelihood Loglikelihood Loglikelihood 4
x 10
1 1 1 1
Distribution function
Distribution function
DS level 4
0 0 0 0
680 660 640 620 600 3500 3400 3300 3200 3100 7200 7000 6800 6600 6400 3.241 3.2405 3.24 3.2395 3.239
Loglikelihood Loglikelihood Loglikelihood Loglikelihood 4
x 10
1 1 1 1
Distribution function
Distribution function
DS level 5
0 0 0 0
450 400 350 300 2400 2200 2000 1800 1600 4500 4000 3500 3000 2.2 2 1.8 1.6
Loglikelihood Loglikelihood Loglikelihood Loglikelihood 4
x 10
Figure 3: Distributions of loglikelihoods obtained by different EM runs. Solid curves are for EM
runs with single starting points. Dashed and dotted curves are for runs of multiple restart EM with
setting 410 and 1650 respectively (see Section 4). Each curve depicts a distribution function
F (x), where x is loglikelihood and F (x) is the percent of runs that achieved a loglikelihood no
larger than x.
Severity of Local Maxima for the EM Algorithm: Experiences with Hierarchical Latent
Class Models 305
1 1 1
Distribution function
Distribution function
0.6 0.6 0.6
1 1 1
Distrubition function
Distribution function
0.6 0.6 0.6
(d) DS level 4 (e) DS level 5 (f) DS level 3 with 15% zero components
Severity of Local Maxima for the EM Algorithm: Experiences with Hierarchical Latent
Class Models 307
timal starting point by chance. Moreover, due of Bayesian networks with hidden variables. Ma-
to the small discrepancy among local maxima chine Learning, 29:181212.
(see Section 3.2.5), it demands a large number T. M. Cover and J. A. Thomas. 1991. Elements of
of initial steps to distinguish an optimal starting Information Theory. John Wiley & Sons, Inc.
point from the others. Therefore, the effective- A. P. Dempster, N. M. Laird, and D. R. Rubin.
ness of multiple restart EM degenerates. In such 1977. Maximum likelihood from incomplete data
situations, one can either increase the number via the EM algorithm. Journal of the Royal Sta-
tistical Society, Series B, 39(1):138.
of starting points and initial steps, or appeal to
more sophisticated methods for avoiding local G. Elidan, M. Ninio, N. Friedman, and D. Schuur-
mans. 2002. Data perturbation for escaping local
maxima. maxima in learning. In AAAI-02, pages 132139.
5 Conclusions B. S. Everitt and D. J. Hand. 1981. Finite Mixture
Distributions. Chapman and Hall.
We have empirically investigated the severity of
local maxima for EM in the context of strong de- U. M. Fayyad, C. A. Reina, and P. S. Bradley. 1998.
Initialization of iterative refinement clustering al-
pendence HLC models. We have observed that gorithms. In KDD-98, pages 194198.
(1) the probability of hitting global maxima is
G. Karciauskas, T. Kocka, F. V. Jensen, P. Lar-
generally high, (2) it increases with the strength ranaga, and J. A. Lozano. 2004. Learning of
of dependency and sample sizes, (3) it decreases latent class models by splitting and merging com-
with the amount of extreme probability values, ponents. In PGM-04.
and (4) likelihoods of local maxima are far apart P. F. Lazarsfeld and N. W. Henry. 1968. Latent
from those of global maxima at high dependence Structure Analysis. Houghton Mifflin.
strength levels. We have also empirically shown A. L. McCutcheon. 2002. Basic concepts and pro-
that the local maxima can be reliably avoided cedures in single- and multiple-group latent class
by using multiple restart EM with a few starting analysis. In Applied Latent Class Analysis, chap-
points and hence are not a serious issue. ter 2, pages 5685. Cambridge University Press.
Our discussion has been restricted to a spe- J. S. Uebersax. 2000. A brief study of local
cific class of HLC models. In particular, we maximum solutions in latent class analysis.
http://ourworld.compuserve.com/homepages/
have defined the strong dependency in an op- jsuebersax/local.htm.
erational way. One can devise more formal def-
N. Ueda and R. Nakano. 1998. Determinis-
initions and carry on similar studies for gener- tic annealing EM algorithm. Neural Networks,
alized strong dependence models. Another fu- 11(2):271282.
ture work would be the theoretical exploration
N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hin-
to support our experiences. ton. 2000. SMEM algorithm for mixture models.
Based on our observations, we have ana- Neural Computation, 12(9):21092128.
lyzed the performance of multiple restart EM. F. van de Pol, R. Langeheine, and W. de Jong, 1998.
One can exploit those observations to analyze PANMARK user manual, version 3. Netherlands
more sophisticated strategies for avoiding local Central Bureau of Statistics.
maxima. We believe that those observations J. K. Vermunt and J. Magidson, 2000. Latent GOLD
can also give some inspirations to develop new Users Guide. Statistical Innovations Inc.
methods on this direction. C. F. J. Wu. 1983. On the convergence properties
of the EM algorithm. The Annals of Statistics,
Acknowledgments 11(1):95103.
Research on this work was supported by Hong N. L. Zhang and T. Kocka. 2004. Efficient learning
Kong Grants Council Grant #622105. of hierarchical latent class models. In ICTAI-04,
pages 585593.
References N. L. Zhang. 2004. Hierarchical latent class models
for cluster analysis. Journal of Machine Learning
D. M. Chickering and D. Heckerman. 1997. Effi- Research, 5(6):697723.
cient approximations for the marginal likelihood
hz9H/84`J3>=|A:CE0:G1GVF$>/1G1/8>q?2.AbQ.9H=y=oFPm>kAnA`2$9:+
68%9:1 32
0H?:9H=/10322.9<>qhzA:;jN+ ] ==FKmn/2$4~0:QPQP/8>/87:9,/2.Q.9HI9[2ba
3 ;
7=
+ ] ;_ p;9HI.;9H=9[23>=}Q.9HI9[2.Q.9[2._e?f9<>qhz9<9[2 E
= ?>A@B8 x k
@
>DCE
@
F F =
9[2`7$/8;A`2Pm~9[2`>0:G6C0:_e>kA:;=<+J./1=}>q?$IN9A:C0^;_<=
I9[2.QK=OA`2T9[237$/8;A`2Pmn9[23>yC0:_e>kA:; o+
#/84`F$;9n=J$A[h=0~>k;/87$/10:G9e{.03mnIPG89)6+
B8 I
J
/1=x0u=k9<>~A:CIA:>k9[2`>/10:G1=nA`2$9t0:==kAb_</10^>k9HQh|/8>J d_memory
=k9<>BA:C,IP0^;9[2`>t2$A$Q.9H=TA:C + ] _<_eA:;QP/2.4>kAG89<430:G
0^;_<=HLb0Q.9H=/84`2nIK0^;03m~9<>k9<; _<032vJ.0[7:9A`2PG8?nQ.9H=/84`2
t_temperature
d_12v t_humidity
0u_eA`2.=>0323>xQK/1=k>k;/1fKF$>/8A`2+ >J.9<;h|/1=k9:L x .k
J /84`F.;9x^ ] >k;/87$/10:GV6CA:;|OMm~A:>J.9<;fA30^;Q+
m,F.=k>O_eA`2`>0:/2B>J$967^0:GF$9) /+9:+1LN2$A:>y=k>k;/1_e>G8?BIA3=/a |9H=k>k;/1_e>/8A`2A:CG89<430:G)0^;_<=tQPA9H=t2$A:>B_eA`2.=>k;0:/2
! "
/1=0:G1=kA}0x>q?$IP/1_<0:G#I.;A3fP0:fK/1G1/8>q?tQP/1=k>k;o/1fKF$>/8A`2LN;9HI.;9ea I9H_e>k9HQ,F$>/1G1/8>q?A:CK0323?43/87:9[2,QP9H=/84`2/1=#/2.QP9HIN9[2PQ.9[23>
P
310 Y. Xiang
7^0:GF$9tCpA:; l/1=vFP2.Q.9<wN2.9HQ+|9[2._e9:L l_<032f9B;9ea ! 1 x0^;9>J.9nIP0^;9[2`>=A:C /2>J.9nL=F._J
m~AY7:9HQnCp;A`muh|/8>J$A`F$>09H_e>/2$4>J$9A3I.>/mx0:GPQ.9ea >J.0^> 1 _<032K2$A:>u>0^j:97^0:GF$90:G1G~0^>u>J$9
P
>J.0^>>J$97^0:GF$9A:C /1=j.2$A[h20^>>J.9>/m~9A:CQ.9ea
I
9HI9[2.QP/2$4)A`2n>J$9|j.2$AYh2~7^0:GF$9|A:C Lb>J$9|/mxIP0:_e>
#
7^0:GF$9u/1=t2$A`2bac<9<;A.LCA:;B/2.=k>032P_e9:L>J$9u_eA`2$wK4`F$;0a
A:C )A`2 mn0[?lQP/9<;H+C|0QP/8;9H_e>k9HQIP0^>JlCp;A`m
>/8A`2 #5 1 b
9H0:_oJ9[2`7b/8;A`2Km~9[2`>0:GC0:_e>kA:;v>kAr0gF$>/1G1/8>@?2$AbQP9/1= =F._J&>J.0^6> / # "d 5 1 &
I
b0 7$+Av9<7^0:GF.0^>k9~032`?tQ.9H=/84`2_eA`2.=/1=k>k9[2`>h|/8>J
o<HPp:/8CV>J.9<;969e{$/1=k>=y032TFP2PQP/8;9H_e>k9HQTIP0^>JT/2B
O/1=K:Ke<HP^ +
>J$9_eA`2$wP4`F$;0^>/8A`2/1=v/2P_eA`2.=/1=k>k9[2`>xh|/8>J $+
su9;9 bFP/8;9B>J.0^>v0:G1G|Q.9H=/84`2IK0^;03m~9<>k9<;=HLzI9<;ka
] =t>J$9d;9H=F.G8><L>J$9gFPIQK0^>k9HQIA:>k9[23>/10:G70:GF$9r/1=
CpA:;omn032._e9Tm~9H0:=F.;9H=<LE032.Qg9[237$/8;A`2Pmn9[23>0:GiCp0:_e>kA:;o=
$#
u.:Kb3q cP
uK^c
0^Cp>k9<;}9[2`>k9<;/2$4 ! $L>J$9INA:>k9[2`>/10:G70:GF.9CpA:;
] 2/1G1G89<430:GQ.9H=/84`27$/8A3G10^>k9H=0^>)G89H0:=k>A`2.9,QP9H=/84`2 |JP/1=V032.0:G8?$=/1=#=F$4:4:9H=k>=#0m~9<>J$A$Q)>kA6Q.9<>k9H_e>E/1G1G89ea
_eA`2.=k>k;o0:/23><+xJ$9~=kAA`2.9<;/8>)_<032df9~Q.9<>k9H_e>k9HQL>J$9 430:GQ.9H=/84`2.=H+sJ$9[2}0IP0^;>/1_HF.G10^;yQ.9H=/84`2 IA3==/1fPG8?
=kAbA`2$9<;}>J$99<7^0:GF.0^>/8A`2&_eA`mnIKF.>0^>/8A`2_<032f9QP/a /1G1G89<430:Gpi/1=O9<7^0:GF.0^>k9HQL.CpA:;O9H0:_oJB_<GF.=>k9<D; * >J.0^>J.0:=
;9H_e>k9HQ>kAd0:G8>k9<;2.0^>/87:9Q.9H=/84`2.=~032.Q>J$9tm~A:;9}9<Ca f9<9[2~0:==/84`2.9HQ x $k#;9HG10^>/87:9>kA0)Q.9H=/84`2xIP0a
wK_</89[2`>u>J$9A[7:9<;0:G1GvQ.9H=/84`2_eA`mxIKF$>0^>/8A`2+!.F.I$a ;03mn9<>k9<; .Lz9[23>k9<;v>J$9t7^0:GF$9tA:C l032PQ>J$9B7^0:GF$9
&
h|/1G1Gf92$Ax2.9<9HQB>kAx9<7^0:GF.0^>k9032`?}A`2$9)A:C#>J$9[mB+
u_w, m_h d_a, m_g
u
qd @qV m_h d_a, d_d, m_g
k2T0:QPQP/8>/8A`2T>kAx9H0^;G8?BQ.9<>k9H_e>/8A`2BA:C#/1G1G89<430:GQP9H=/84`2.=<L
d_d, m_g
) L>J$;9<9=k9HIP0^;0^>kA:;= L 032.Q
J m_i, m_j, d_b d_a, d_b, d_c
_eA`2.=/1=k>A:C#Q.9H=/84`2IP0^;03mn9<>k9<;=OA`2.G8?:+
) . ) .
) . m_i, m_j m_j, d_b d_b, d_c
#/84`F$;9 ] kFP2P_e>/8A`2t>k;9<9,;9HIP;9H=k9[2`>0^>/8A`2A:CQ.9ea
t_q
d_b
=/84`22$9<>qhzA:;jB/2 #/84`F$;9b+
m_i J
d_a t_p
J
u_v
m_j d_c m_g
] =}0329e{.03mnIPG89:Ly_eA`2.=/1Q.9<;}0rQP/87$/1=/8A`2>k;9<9CpA:;
d_d m_h
>J$9v/2 #/84`F$;9}b+B|J$9x=9<> A:Cy70^;o/10:fPG89H=
/2u/1=6>J$94:9[2$9<;0^>/2$4t=k9<><+)J$9=9<>A:CzQP/87$/1=/8A`2.=
J
u_w
J #/84`F$;9b|J$9)=k>k;F._e>F$;9A:CE0~Q.9H=/84`22$9<>qhzA:;j+ /1|= J.! 9x=k9< > # A:oC QP/8o7$ /1=5 /8oA` 2l< =9H0:IP=B0^;=o0^J$>kA[A:h;=2&/1=/2 0:G1#=A/84`=F$J$;AY9h.2 +
) . J
5=/2$4">J.9H=k9=k9HIP0^;o0^>kA:;=0:=fA`FP2PQP0^;/89H=<Ldhz9 /2>J$9,wP4`F.;9:+>_<032uf99H0:=/1G8?7:9<;/8wK9HQ>J.0^>>J$9
_eA`mxIP/1G89~>J$9 ) /2`>kA0B;9HI.;9H=k9[2`>0^>/8A`2L#_<0:G1G89HQ 4:;0:INJQ.9HIP/1_e>k9HQ/1=0 ) +[J$9=k9<>A:CPQP/87b/1=/8A`2 ) |=V/1=
:p[ :o>J.0^>/1=m~A:;9u9e9H_e>/87:9CA:;Q.9H=/84`2 % 5 < 0:==J.A[h2~/2 /84`F.;9bL
_eA`mxIKF$>0^>/8A`2+ hJ.9<;9n>J$9v4:9[2$9<;0^>/2$4u=k9<>A:CQP/87b/1=/8A`2 ) /1=
) . J @
,
iP:d#V9<>zf90)Q.9H=/84`2n2$9<>qhzA:;j,A[7:9<;O032 QP/87$/1=/8A`2 + @
2$A`2.9[mnI.>@?=k9<>A:C9H==9[23>/10:GK70^;/10:fKG89H= T
+ ]
CpA:;/1=0~>F.IPG89 u
c"!T$#$% /1=>J$9 & '() & A:*C + !
/1=y0n=oF.fP=k9<>OA:C>J$9)IA[hz9<;=k9<> ,+ c A:C =FP_oJ
V1 d_a, d_b, d_c d_a, d_d, m_g, m_h, t_p, u_w
)1' + # /1=|_eA`mxINA3=9HQTA:C>J$9CA3G1G8AYh|/2$4. d_b, d_c, m_i, m_j, t_q, u_v d_a, d_c, d_d
# 32 . . 7 . . !T . = .
V2
V0
) : 9
312 Y. Xiang
>/8A`2}_<032n>J.9[2xf9;9<>k;o/89<7:9HQvCp;A`m>J$9_eA:;;9H=IA`2.Q$a
ST2
/2$4dF$>/1G1/8>@?r7^0^;/10:fPG89:+r|J$9T9e{.IN9H_e>k9HQF.>/1G1/8>q?_eA`2ba
>k;/1fKF.>/8A`2A:C>J$9~IP0^;>/10:GEQP9H=/84`2_<032f9A3fP>0:/2$9HQ
t_q, m_i, d_b
d_a, d_c, d_d
m_i, d_b
f?_eA`mfP/2./2$4}>J$9,;9H=F.G8>Cp;A`m /2.QP/87$/1QKF.0:G#F$>/1G1/8>q?
7^0^;/10:fPG89H=<+
ST1
A:;96I.;9H_</1=k9HG8?:L./2vQP/87b/1=/8A`2 Lb>J$9CA3G1G8AYh|/2$4~/1=
m_i, m_j, d_b d_a, d_b, d_c
@
# x ck
m_i, m_j, u_v ST0 m_j, d_b, d_c
= > 8 > C
F F =
d_a, t_p, m_g
hz9H/84`J3>0:==kA$_</10^>k9HQh|/8>J ^+
08
d_a, d_d, m_g
su9}9e{b>k9[2PQl>J$9TQ.9<wN2P/8>/8A`2dA:C 6>kAQP/87$/a
u_w, m_h K
=/8A`2.=h|/8>J.A`F$>F$>/1G1/8>q?70^;/10:fKG89H=032.Q>kA6/1G1G89<430:GKIP0^;ka
d_d, m_g =
m_h
] C>k9<;~>J.9T0:fAY7:9B9<70:GF.0^>/8A`2A`2r9H0:_oJ_eA`2.wP4`Fba
= ,
=
;0^>/8A`2A:C Li>J$9BQ.9H=/84`2r9<7^0:GF.0^>/8A`2h/8>J./2r>J$9
@
0:_JQP/87$/1=/8A`2 _eA`2`>0:/2.=~0322$A`2$9[mxI.>q?d=oF.fP=k9<>
@
;A`FP2PQP=y0:G8A`2$4n>J$9)QK/87b/1=/8A`2t>k;9<9:+|J.9)=k?$2`>0{t032.Q
A:CQ.9H=/84`2,IP0^;03m~9<>k9<;o=<L3hJ$9<;9 V/2.Q.9e{$9H=#>J$9QP/a
@
=k9[mx0323>/1_<=|A:C#mn9H==0^4:9H=|/2B9H0:_oJB;A`FP2.QBQP/9<;H+
7$/1=/8A`2+#|J./1=V/1=V>k;oF$9z=/2._e9>J$9zQP/87$/1=/8A`2_eA`2`>0:/2.=0^>
G
|J.90:G84:A:;/8>JPmn=d>J.0^>dhz9I.;9H=k9[2`>lf9HG8A[h0^;9
G89H0:=k>y>J$96Q.9H=/84`2}IP0^;o03m~9<>k9<;=z>J.0^>OCpA:;om>J$9=9HIP0a m~A3=>G8?x9e{$9H_HF$>k9HQf?}/2PQP/87b/1QNF.0:GQP/87b/1=/8A`2.=<L.9e{._e9HI.>
;0^>kA:;fN9<>@hi9<9[2/8>=k9HG8CE032.Qt032$A:>J.9<;|QP/87$/1=/8A`2+ A`2$9f`?B>J$9,0^4:9[2`>+s/8>J$A`F$>6G8A3=/2$4}4:9[2$9<;0:G1/8>q?:L
#/8;=k><LCA:;i9H0:_oJv_eA`2$wP4`F$;o0^>/8A`2nA:C L`hJ$9<>J.9<;/8>
@
hz9Q.9[2.A:>k9>J$9,QP/87$/1=/8A`2A3f$9H_e>9e{$9H_HF$>/2$4T>J$9,0:Ga
/1=z0G89<430:GIP0^;>/10:GNQ.9H=/84`2}2$9<9HQP=>kA,f9|Q.9<>k9<;omx/2$9HQL
J
4:A:;/8>JPmf`? Y+i|J$969e{b9H_HF.>/8A`2/1=0:_e>/87^0^>k9HQf`?B0
>kA>J$9v9e{$>k9[2`>,0:G1G8AYhi9HQf`?l/2.CA:;omx0^>/8A`2l0[70:/1G10:fKG89 _<0:G1G89<;HLQP9[2$A:>k9HQgf? LhJ./1_oJg/1=9H/8>J$9<;032r0:Q^k0a
/2>J$9QP/87b/1=/8A`2+|JP/1=n_<032f9t0:_oJP/89<7:9HQ0:=xA`F$>a _e9[2`>zQP/87$/1=/8A`2nA:C /2 A:;O0^4:9[2`>+E|J$9|=k9HIK0^;0a
G1/2$9HQu/2l$9H_e>/8A`2 + ] QP/87$/1=/8A`2dmx0H?A:;mn0H?2$A:> >kA:;yf9<>qhz9<9[2v_<0:G1G89<;yQP/87$/1=/8A`2 032.Q |/1=zQP9[2$A:>k9HQ
_eA`2`>0:/2dF$>/1G1/8>q?7^0^;/10:fPG89H=<+@C/8>Q.Ab9H=)2$A:>_eA`2`>0:/2
0:= +BC J.0:=0:QKQP/8>/8A`2.0:Gz0:Q^0:_e9[2`>,QP/87b/1=/8A`2.=<L
F$>/1G1/8>@?v7^0^;/10:fPG89H=<L.>J.9[2x>J.99<7^0:GF.0^>/8A`2th|/8>J./2v>J$9 >J$9<?t0^;9Q.9[2$A:>k9HQ0:= o 5 1o 032PQB>J$9H/8;6=k9HI$a
QP/87$/1=/8A`2&/1=T_eA`mnIPG89<>k9:+ .A:;B>J$9QP/87$/1=/8A`2>k;9<9u/2 0^;0^>kA:;=th|/8>J 0^;9uQ.9[2$A:>k9HQ0:= 5 1 L
#/84`F$;9).L |032.Q 5 0^;9=oF._oJBQK/87b/1=/8A`2P=<+
J
;9H=I9H_e>/87:9HG8?:+
C>J$9)QP/87$/1=/8A`2T_eA`2`>0:/2.=|F$>/1G1/8>@?v7^0^;/10:fPG89H=HL.>J$9[2
J
CpA:;9H0:_JG89<430:G_eA`2.wP4`F$;0^>/8A`2A:C LNCF$;>J.9<;9<7^0:Ga
@ CqKb3q3C @! V:`#"zE3c
F.0^>/8A`22.9<9HQP=>kAf9I9<;CpA:;omn9HQ+ |J./1=u_<032f9 k26>J$9wK;=k>;A`FP2PQA:CPm~9H==0^4:9iIP0:==/2$4.L^QP/87$/1=/8A`2
Q.A`2$9}f`?uf9HG1/89<CI.;A3IK0^430^>/8A`2lh/8>J./2u>J$9}QP/87$/1=/8A`2 ;9H_e9H/87:9H=067:9H_e>kA:;zm~9H==0^4:9OCp;A`m 9H0:_oJ~0:Q:0:_e9[2`>QP/a
) )+|J$9}9e{.IN9H_e>k9HQF$>/1G1/8>@?lCpA:;,9H0:_oJF$>/1G1/8>@?gCFP2._a 7$/1=/8A`2 +EG89[mn9[23>=#A:C>J$9y7:9H_e>kA:;z0^;9/2.Q.9e{$9HQ~f`?
@
0:= +,J$9~_eA:;;9H=IA`2.QP/2.4T_eA`mxIA`2$9[23>=6CpA:;
@
k2>J$9O=k9H_eA`2.Q;A`FP2PQ)A:CKm~9H==0^4:9zIK0:==/2$4.L:QP/87$/1=/8A`2
>J$97:9H_e>kA:;m~9H==0^4:9>J.0^> =k9[2.QP=O>kAxQK/87b/1=/8A`2 6;9H_e9H/87:9H=C;A`mQP/87$/1=/8A`2 0vIP0^;>/10:GVQP9H=/84`2
0^;9 A[7:9<; i032PQ , LP;9H=IN9H_e>/87:9HG8?:+ >J.0^>/1=_eA`2.=/1=k>k9[2`>h|/8>J>J$9A3I.>/mx0:G:Q.9H=/84`2+EMzA`m,a
@
C /1=z0G89H0^CA`2x>J$96QP/87b/1=/8A`2x>k;9<9 hJ.A3=k9|A`2.G8? fP/2P/2$4}/8>h/8>J>J$9~Q.9H=/84`29<7^0:GF.0^>/8A`2dI9<;CpA:;om~9HQ
0:Q^k0:_e9[2`>QP/87$/1=/8A`2t/1= YL , _eA:;;9H=INA`2PQP=y>kA @
9H0^;G1/89<;h|/8>J./2t>J$9QK/87b/1=/8A`2L )/1Q.9[2`>/8wP9H=|>J.9)A3I$a
>J$9mx0{$/m,FPm9e{.I9H_e>k9HQF$>/1G1/8>@? >/mx0:GKIP0^;>/10:GQ.9H=/84`2xh|/8>J./2x>J$9QP/87b/1=/8A`2+E@>i>J$9[2
=k9[2PQP=y>J$9A3I.>/mn0:GIP0^;>/10:GVQP9H=/84`2t;9HG10^>/87:9>kAv>J$9
c
@
=k9HIK0^;0^>kA:;A:C|9H0:_oJr0:Q^k0:_e9[23>,QP/87$/1=/8A`2g>kA>J$9}_eA:;ka
;9H=INA`2PQP/2$4 + ] =m~9H==0^4:9xIP0:==/2$4tI.;A:4:;9H==k9H=<L
0 =
@
Cp;A`m +
@ =
b# b r @
7^0:GF$9 f.;9H0^j$/2$46>/89H=0^;fP/8>k;0^;o/1G8?K032.Q,G10a
= = > @
= f9HG/8>|0:= +
hJ$9<;9~ , /1=/2.Q.9e{$9HQf?IP0^;>/10:GEQ.9H=/84`2
@
+ PA:;9H0:_oJx0:Q^0:_e9[2`>QP/87$/1=/8A`2 L`_<0:G1GK/1=>k;/1fKFba
@
b
0 = " cKb3q 3
dK^cV
AY7:9<;0:G1GvIK0^;>/10:GvQP9H=/84`2 >JP0^>g0^;9_eA`2ba k2g>J$9tG10:=k>~;A`FP2PQgA:Cm~9H==0^4:9tIP0:==/2.4.LiQP/87$/1=/8A`2
=/1=k>k9[23>h|/8>J LG10:f9HG1=A`2$9=oF._oJIP0^;>/10:G
=
QP9H=/84`2>JP0^>;9H0:_J$9H=>J$9l7^0:GF$9r f? x;9H_e9H/87:9H=~>J$9TA3I.>/mn0:GOIP0^;>/10:GzQP9H=/84`2rA[7:9<;x @
=
>kA + >J$9[mh|/8>Jg/8>=A[h2gA3I.>/mx0:GIP0^;>/10:GOQ.9H=/84`2dAY7:9<;
032.Qu=k9[2.QK=>J.9;9H=F.G8>>kABQP/87$/1=/8A`2 + ] >>J$9
A:>k9>J.0^> dmn0[?f99 $F.0:Gx>kA
+ 9[2.QtA:C>J./1=|;A`FP2.QLb>J.9)A3I.>/mn0:GVQ.9H=/84`2TAY7:9<; /1=
sJ$9[2
IP0^;>/1_</1IP0^>k9H=n/20320:QKQP/8>/8A`2Lhz9T;9ea
= $# ,
A3f.>0:/2.9HQ+
$F./8;9~>J.0^>>J$9x=FKm /1=
+xsJ$9[2
OIP0^;>/1_a 33op
E^p1H# !p~^ ,e1: sJ$9[2}QP/a
,
;9H=oF.G8>/1=
/8C032.QnA`2.G8?~/8C0:G1GKA3I9<;0323>=i0^;9
+ =/84`2LP/8>|Q.Ab9H=y>J$9)CpA3G1G8A[h|/2.4.
$#
,
,
314 Y. Xiang
^+ PA:;9H0:_oJ0:Q^k0:_e9[23>|QK/87b/1=/8A`2 L 6_<0:G1G1=6MzA3Ga @
>J$9|QP9H=/84`2,;9<>F$;2$9HQ~>J$;A`F.4`JxMzA3G1G89H_e> 6I.>/mx0:G19ea
G89H_e> I.>/mn0:G19H=/84`2/2 032PQ;9H_e9H/87:9H= =/84`2+ 9[2.QBA:C#=kj:9<>_oJ
J @
=
;9H=F.G8>y>kA +
= /=
|J.9CpA3G1G8A[h|/2$40:G84:A:;o/8>JPm0:_e>/870^>k9H=t>J$9>J$;9<9 ;AA:C=kj:9<>_J#@>=F.x_e9H=E>kA=J.A[h>J.0^>
A3f.>0:/2$9HQf?>J$9;AbA:>TQP/87b/1=/8A`2 /2=k>k9HIlA:C
=
/1=y9H=k>0:fKG1/1=J$9HQf9HG8AYh)+;A3IA3=/8>/8A`2~0:==k9<;>=>J.0^>
P 2
032.Qr>J$9tI.;A3INA3=/8>/8A`2g_<032f9T9H0:=/1G8?=J$AYh2+ ] =a
.A:; 6I.>/mx0:G19H=/84`2 ?b6/87$/1=/8A`2.;9<9:LG89<>>J$9
2bFPmfN9<;A:CQP/87b/1=/8A`2.=if9 ! L>J$9mn0{./m,FPm2bFPm,a
J
WVXB u^+#9<>>J$9|;AA:>iQP/87b/1=/8A`2nf9 032PQn/8>=
0:Q^k0:_e9[23>6QP/87$/1=/8A`2.=fN9 E 1 [ +|J$9=F.f$a
0
>J$9)mn0{./m,FPm_<0^;QP/2.0:G1/8>@?}A:CQP/87$/1=/8A`2T=k9HIP0^;0^>kA:;o=
fa9 `+vF$;/2$4MzA3G1G89H_e>6/87b/1=/8A`2P5|>/1G1/8>@?:Li9H0:_JlQP/87$/a
@
=FPmxI.>/8A`2LV/8CMzA3G1G89H_e>6/87$/1=/8A`2.5>/1G1/8>q?/1=_<0:G1G89HQgA`2
|J.90:_e>F.0:G39e{$9H_HF$>/8A`2QP/9<;=Cp;A`m>J$90:fA[7:9O0:=
CpA3G1G8A[h|=<yMzA3G1G89H_e>6/87b/1=/8A`2.5|>/1G1/8>@?/1=|_<0:G1G89HQtA`2T9H0:_J BPh g
:Kj i 3
@
f`? ihJ./1_J);9H_e9H/87:9H=i , C;A`m + .A:;E9H0:_J
@ @
/2l>J$9}=oF.f.>k;9<9};AbA:>k9HQr0^> + ?g2bF.G1G0:QPQK/8>/8A`2L
@
>kAnA:>J$9<;;9HG10^>k9HQthzA:;jTf9HG8A[h)
032`?IK0^;>/10:GQ.9H=/84`2 t/2 ~_eA`2.=/1=k>k9[2`>h/8>J k2>J$9G1/8>k9<;0^>F$;9A`24:;0:INJ./1_<0:GmnAbQP9HG1=<Lz0l;9ea
h|/1G1Gf99<70:GF.0^>k9HQ>kA b#
+ |9[2._e9:L G10^>k9HQ hiA:;j/1==k>k;A`2$4dkFP2P_e>/8A`2>k;9<9 ) 9[2.=k9[29<>
=
_eA`2`>k;/1fKF$>/8A`2>0^j:9H=E>J.9z0:IPIP;A30:_oJA:CN=9<>aqfP0:=k9HQ,Q.9ea #/2.032._</10:G=F.IPIA:;>~>kAg>J./1=n;9H=k9H0^;_oJ/1=xIP;A[7$/1Q.9HQ
f?}$iyMA:CEMO032P0:QP0$+
J
=/84`2LKhJ./1_Jt/2`7:A3G87:9H==/m,F.G8>032$9<A`F.=|9<70:GFP0^>/8A`2A:C
0m,F._J~G10^;4:962FPmf9<;EA:CQ.9H=/84`2xIP0^;03m~9<>k9<;o=E>J.032
hJP0^>v/1=T_eA`2.=/1Q.9<;9HQ/20l>@?bIK/1_<0:G=9 bF$9[2`>/10:G6Q.9ea g P3N$K
_</1=/8A`2gI.;A3fKG89[mB+vJ$9x/1==F.9nA:CQ.9H=/84`2l_eA`2.=k>k;0:/2`>
$#
316 Y. Xiang
Hybrid Loopy Belief Propagation
Changhe Yuan Marek J. Druzdzel
Department of Computer Science and Engineering Decision Systems Laboratory
Mississippi State University School of Information Sciences
Mississippi State, MS 39762 University of Pittsburgh
cyuan@cse.msstate.edu Pittsburgh, PA 15260
marek@sis.pitt.edu
Abstract
We propose an algorithm called Hybrid Loopy Belief Propagation (HLBP), which extends
the Loopy Belief Propagation (LBP) (Murphy et al., 1999) and Nonparametric Belief
Propagation (NBP) (Sudderth et al., 2003) algorithms to deal with general hybrid Bayesian
networks. The main idea is to represent the LBP messages with mixture of Gaussians and
formulate their calculation as Monte Carlo integration problems. The new algorithm is
general enough to deal with hybrid models that may represent linear or nonlinear equations
and arbitrary probability distributions.
(t+1)
Now, for the messages sent from X to
and the message Yj (x) that X sends to its its different children, we can share most of
child Yj is defined as: the calculation. We first get samples for
ui s, and then sample from the product of
(t+1)
Y j (x) = Q (t)
X (x) Yk (x)P (x|u). For each different mes-
Q (t) R Q (t) k
X (x) Yk (x) P (x|u) X (uk ) .(2) (t+1)
k6=j u k sage Yj (x), we use the same sample x but
(t)
assign it different weights 1/Yi (x).
where X (x) is a message that X sends to itself Let us now consider how to calculate the
representing evidence. For simplicity, I use only (t+1)
X (ui ) message defined in Equation 1. First,
integrations in the message definitions. It is ev- we rearrange the equation analogously:
ident that no closed-form solutions exist for the
(t+1)
messages in general hybrid models. However, X (ui ) =
we observe that their calculations can be for- P (x,uk :k6=i|ui )
{
mulated as Monte Carlo integration problems. R z
Y
}|
Y
(t) (t)
(t+1) X (x) Yj (x)P (x|u) X (uk ) (3).
x,uk :k6=i
First, let us look at the Yj (x) message de-
j k6=i
fined in Equation 2. We can rearrange the equa-
tion in the following way:
It turns out that here we are facing a quite dif-
(t+1)
(t+1) ferent problem from the calculation of Yj (x).
Yj (x) =
Note that now we have P (x, uk : k 6= i|ui ), a
P (x,u)
z }| {
joint distribution over x and uk (k 6= i) condi-
R Y (t) Y (t)
X (x) Yk (x)P (x|u) X (uk ) . tional on ui , so the whole expression is only a
u
k6=j k
likelihood function of ui , which is not guaran-
teed to be integrable. As in (Sudderth et al.,
Essentially, we put all the integrations out- 2003; Koller et al., 1999), we choose to restrict
side of the other operations. Given the new for- our attention to densities and assume that the
mula, we realize that we have a joint probability ranges of all continuous variables are bounded
distribution over x and ui s, and the task is to (maybe large). The assumption only solves part
integrate all the ui s out and get the marginal of the problem. Another difficulty is that com-
probability distribution over x. Since P (x, u) position method is no longer applicable here,
can be naturally decomposed into P (x|u)P (u), because we have to draw samples for all par-
the calculation of the message can be solved us- ents ui before we can decide P (x|u). We note
ing a Monte Carlo sampling technique called that for any fixed ui , Equation 3 is an integra-
Composition method (Tanner, 1993). The idea tion over x and uk , k 6= i and can be estimated
is to first draw samples for each ui s from using Monte Carlo methods. That means that
(t+1)
(t)
X (ui ), and then sample from the product of we can estimate X (ui ) up to a constant for
Q (t) any value ui , although we do not know how to
X (x) Yk (x)P (x|u). We will discuss how
k6=j sample from it. This is exactly the time when
to take the product in the next subsection. For importance sampling becomes handy. We can
now, let us assume that the computation is pos- estimate the message by drawing a set of im-
sible. To make life even easier, we make further portance samples as follows: sample ui from a
joint probability distribution of all the labels Calculate the normalized weights
wik .
P (L1 , L2 , ..., LK ). We note that the joint prob- Randomly pick a component, say ji , from the ith
ability distribution of the labels of all the mes- message using the normalized weights.
sages P (L1 , L2 , ..., LK ) can be factorized using i iji ; i ij
i
.
the chain rule as follows: i i + 1;
wImportance = wImportance wij .
Y
K i
H e llin g e r's D is ta n c e
W as te N o rm a l A p p ro x im a tio n
0.5
H e llin g e r's D is ta n c e
E m is s io n 0.08 0.8
0.4
P ro b a b ility
0.06 0.6
D u st 0.3
Metal C O 2
E m is s io n 0.04 0.4
E m is s io n S en s o r 0.2
3
H e llin g e r's D is ta n c e
H e llin g e r's D is ta n c e
0.2 5 0.2 5 2.5
R u n n in g T im e (s )
R u n n in g T im e (s )
2.5
0.2 0.2 2
2
0.1 5 0.1 5 1.5
1.5
0.1 0.1 1
1
0 0 0 0
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
P ro p a g a tio n L e n g th P ro p a g a tio n L e n g th P ro p a g a tio n L e n g th P ro p a g a tio n L e n g th
Figure 4: (a) Emission network (without dashed nodes) and augmented emission network (with
dashed nodes). (b,c) The influence of number of message samples and number of message inte-
gration samples on the precision of HLBP on augmented Emission network CO2Sensor and Dust-
Sensor both observed to be true and Penetrability to be 0.5. (d) Posterior probability distribution
of DustEmission when observing CO2Emission to be 1.3, Penetrability to be 0.5 and WasteType
to be 1 in Emission network. (e,f,g,h) Results of HLBP and Lazy HLBP: (e) error on emission,
(f) running time on emission, (g) error on augmented emission, (h) running time on augmented
emission.
ber of message samples increases, the difference by Lazy LBP) as a function of the propaga-
becomes negligible. Since the importance sam- tion length. We can see that HLBP needs only
pler is much more efficient, we use it in all our several steps to converge. Furthermore, HLBP
other experiments. achieves better precision than its lazy version,
but Lazy HLBP is much more efficient than
6.2 Results on Emission Networks
HLBP. Theoretically, Lazy LBP should not af-
We note that mean and variance alone provide fect the results of HLBP but only improve its
only limited information about the actual poste- efficiency if the messages are calcualted exactly.
rior probability distributions. Figure 4(d) shows However, we use importance sampling to esti-
the posterior probability distribution of node mate the messages. Since we use the messages
DustEmission when observing CO2Emission at from the last iteration as the importance func-
1.3, Penetrability at 0.5, and WasteType at tions, iterations will help improving the func-
1. We also plot in the same figure the corre- tions.
sponding normal approximation with mean 3.77
and variance 1.74. We see that the normal ap-
proximation does not reflect the true posterior.
While the actual posterior distribution has a We also tested the HLBP algorithm on the
multimodal shape, the normal approximation augmented Emission network (Lerner et al.,
does not tell us where the mass really is. We 2001) with CO2Sensor and DustSensor both ob-
also report the estimated posterior probability served to be true and Penetrability to be 0.5
distribution of DustEmission by HLBP. HLBP and report the results in Figures 4(g,h). We
seemed able to estimate the shape of the actual again observed that not many iterations are
distribution very accurately. needed for HLBP to converge. In this case,
In Figures 4(e,f), we plot the error curves Lazy HLBP provides comparable results while
of HLBP and Lazy HLBP (HLBP enhanced improving the efficiency of HLBP.
Abstract
One practical problem with building large scale Bayesian network models is an expo-
nential growth of the number of numerical parameters in conditional probability tables.
Obtaining large number of probabilities from domain experts is too expensive and too time
demanding in practice. A widely accepted solution to this problem is the assumption of
independence of causal influences (ICI) which allows for parametric models that define
conditional probability distributions using only a number of parameters that is linear in
the number of causes. ICI models, such as the noisy-OR and the noisy-AND gates, have
been widely used by practitioners. In this paper we propose PICI, probabilistic ICI, an
extension of the ICI assumption that leads to more expressive parametric models. We
provide examples of three PICI models and demonstrate how they can cope with a com-
bination of positive and negative influences, something that is hard for noisy-OR and
noisy-AND gates.
In this section we briefly introduce the concept Function f that combines the individual influ-
of independence of causal influences (ICI). First ences is the deterministic OR. It is important
1 1 7 CONCLUSIONS
= I(y = a) + I(y = b) .
2 2
The fact that the combination function is de- In this paper, we formally introduced a new
composable may be easily exploited by inference class of models for local probability distribu-
algorithms. Additionally, we showed that this tions that is called PICI. The new class is an
model presents benefits for learning from small extension of the widely accepted concept of in-
data sets (Zagorecki et al., 2006). dependence of causal influences. The basic idea
Theoretically, this model is amechanistic, be- is to relax the assumption that the combination
cause it is possible to obtain parameters of function should be deterministic. We claim that
this model (probability distributions over mech- such an assumption is not necessary either for
anism variables) by asking an expert only for clarity of the models and their parameters, nor
probabilities in the form of P (Y |X). For ex- for other aspects such as convenient decompo-
ample, assuming variables in the model are bi- sitions of the combination function that can be
nary, we have 2n parameters in the model. It exploited by inference algorithms.
would be enough to select 2n arbitrary proba- To support our claim, we presented two con-
bilities P (Y |X) out of 2n and create a set of 2n ceptually distinct models for local probability
linear equations applying Equation 4. Though distributions that address different limitations
in practice, one needs to ensure that the set of of existing models based on the ICI. These mod-
equations has exactly one solution what in gen- els have clear parameterizations that facilitate
eral case is not guaranteed. their use by human experts. The proposed mod-
As an example, let us assume that we want els can be directly exploited by inference al-
to model classification of a threat at a mili- gorithms due to fact that they can be explic-
tary checkpoint. There is an expected terrorist itly represented by means of a BN, and their
threat at that location and there are particular combination function can be decomposed into a
elements of behavior that can help spot a terror- chain of binary relationships. This property has
ist. We can expect that a terrorist can approach been recognized to provide significant inference
the checkpoint in a large vehicle, being the only speed-ups for the ICI models. Finally, because
ISBN 80-86742-14-8