Anda di halaman 1dari 6

7

International Conference on Electrical Engineering and Computer Science (ICEECS-2012), May 12, 2012,
Trivandrum, India, ISBN Number: 978-93-81693-58-2

A Method for Inferring the Structure of Bayesian Networks in
Continuous Systems Using Copulas


Hari M. Koduvely
Center for Knowledge Driven Intelligent Systems
Infosys Labs, Infosys
Bangalore, INDIA
harimanassery_k@infosys.com
Madhu Gopinathan
vMobo Inc.
Bangalore, INDIA
madhu@vmobo.com


Abstract We describe a new method to discover the structure
of Bayesian networks from continuous data without
discretizing. Our method makes use of mathematical functions
called Copulas and their empirical estimation method using
Bernstein Polynomials to find conditional independence
between variables in the data set. Once the conditional
independence between all the variables is obtained using the
copula method, we make use of the Inductive Causation (IC)
algorithm proposed by Judea Pearl to infer the structure of the
Bayesian network. We have applied our method to a Supply
Chain Performance Management System and shown that
copula method along with IC algorithm produces better results
compared to the use of IC algorithm on the discretized data.
Keywords-component; Graphical Models; Bayesian Networks;
Copula; Causal Models
I. INTRODUCTION
Many complex systems such as gene regulatory networks
and supply chain networks contain highly interacting sub-
components. One important aspect in understanding and
managing of such complex systems is the cause-effect
relationship between these components. For example, in a
supply chain management system, one would be interested in
monitoring and controlling some Key Performance
Indicators or KPIs to improve the overall efficiency of the
supply chain. To change the values of some high level
business metrics such as total revenue one would have to
typically change some operational level metrics in the system
such as the number of suppliers in a given category.
Therefore, it is essential to understand how the different
KPIs are causally related to each other for managing a supply
chain network.
To understand the cause-effect relationship between
different variables, it is not enough to estimate the statistical
correlations between them. Two variables could be
correlated just because they both are influenced by a third
variable which is acting as a common cause. It is hard to
eliminate such spurious correlations using conventional
statistical methods. In the last two decades, two main
frameworks have been developed for measuring the cause-
effect relationships. These are Bayesian Networks and
Structural Equations Modeling [1].
Bayesian Networks or Belief Networks which belong to
the class of Graphical Models is a powerful and intuitive
method to represent causal relations between variables in a
complex interacting system [1]. It is a directed acyclic graph
(DAG) encoding an N-dimensional probability distribution
involving N-variables, which are the nodes in the network,
with the edges representing the direct dependencies between
them. Bayesian Networks have been used for several
practical applications such as diagnosis of diseases in
medicine, root cause determination, Bio-Informatics, object
tracking in computer vision, and sensor networks [2, 3].

The first step in building a Bayesian Network model is to
determine the structure of the network which represents the
cause-effect relationship between different variables. The
next step is to determine the conditional probability
distributions, which encode the strength of these cause-effect
relationships from data. Once the structure of the network
and conditional distributions are known, Bayesian network
can be used for inference or decision making purpose.
Learning the structure of the network from data is more
difficult compared to learning the probability distributions or
inference. It is in general a NP-Hard problem [4]. Many
issues related to scalability, accuracy and discrete or
continuous nature of the variables are still not solved
completely. Therefore, in many cases, the structure is
constructed manually with the help of a domain expert rather
than by using algorithms.
One particular issue associated with discovering the
network structure is that many of the current algorithms
require the variables to be discrete in nature. This is because
in the case of discrete variables, there are well known and
easily implementable methods to determine the conditional
independency between variables. The cases of continuous
variables are dealt with either discretizing or modeling them
using some known parametric distributions [5]. Both
discretization of continuous variables and modeling them
using parametric distributions introduce approximations and
hence reduces the accuracy. The error involved with
discretization of continuous variables can be reduced in
general by increasing the number of states. However this has
two problems. Firstly, the memory requirements will go up
since total number of possible states in the system increases
exponentially. Secondly, more data will be required. This is
8
International Conference on Electrical Engineering and Computer Science (ICEECS-2012), May 12, 2012,
Trivandrum, India, ISBN Number: 978-93-81693-58-2

because for each state of a variable some minimum amount
of data would be required to accurately compute the
conditional distributions or scores associated with the
network. Other approach would be to use non-parametric
methods. This is the approach used by Hofman and Tresp for
continuous variables in their work [9].
In this paper, we describe a new method to discover the
structure Bayesian networks for continuous variables,
without the need to discretize them. We make use of
mathematical functions called copulas and their empirical
estimation methods along with the Inductive Causation
algorithm of Judea Pearl for discovering the Bayesian
network structure. Copulas are used in Mathematics and
Statistics fields to represent multivariate distributions but its
application in the Machine Learning community is very rare.
Our objective of this work is to introduce the method of
copulas into classical machine learning problems such as
discovery of the structure of Bayesian Networks and
compare its performance with other methods involving
discretization of continuous data on a problem of practical
interest. To our knowledge we have not seen any prior work
where copulas have been used for inferring the structure of
Bayesian networks from continuous data.
The rest of the paper is organized as follows. In section 2,
we briefly review Bayesian Networks and Inductive
Causation (IC) Algorithm which is used for discovering the
network structure from data. In section 3, we introduce the
concept of copulas and show how to estimate the local
conditional dependencies using them. In section 4, we use
the method of copulas along with IC algorithm to discover
the structure of Bayesian Network in a supply chain
performance management system. In section 5, we compare
our method with other algorithms for discovering the
Bayesian networks using their implementation in an open
source machine learning software Weka.
II. BAYESIAN NETWORKS AND INDUCTIVE
CAUSATION ALGORITHM
A. Bayesian Networks
A Bayesian Network consists of N variables {X
1
, X
2
,
X
N
} which forms the set of nodes V and a set E of directed
edges between these nodes representing the cause effect
relationships between them. The set (V, E) forms a Directed
Acyclic Graph (DAG). The Bayesian Network is a formal
representation of the knowledge of conditional
dependencies between the variables in V.
If P(X
1
,X
2
,...,X
N
) denotes the joint distribution of V, then
the Bayesian Network represents the factorization of P into
conditional probabilities given by

( ) ( )
[
=
=
N
i
i i N
P X P X X X P
1
2 1
| ,..., , (1)

Here P
i
denotes the set of parents of variable X
i
(i.e.
there is a directed edge from each node in P
i
to X
i
. Since
Bayesian Networks contain directed edges, they can
represent cause-effect relationship between variables rather
than merely statistical correlations. An example of Bayesian
Network and the factorization of the corresponding joint
distribution is shown in Fig. 1.

























Figure 1. An example of Bayesian network and the
factorization of corresponding joint probability distribution.

B. Learning the Structure of Bayesian Networks from Data
There are two approaches for learning the structure of
Bayesian Networks from data. The first approach known as
Score + Search method performs a heuristic search in the
space of all possible networks and selects the network which
gives the maximum score for the given data. Neither there is
any guarantee that the solutions found by this method is
globally optimum nor there is an accurate estimate of how
close is the generated solution to the optimal one. The
second is a constrained based learning approach in which
the optimum structure is found by estimating the conditional
dependencies between all the variables. Since Bayesian
Network, by design, encodes the conditional dependencies
between variables, this is a more natural approach for
discovering the structure. Inductive Causation (IC)
Algorithm proposed by Judea Pearl is one of the commonly
used algorithms for this approach [9]. One main drawback
of constrained based method is that it is very sensitive to the
accuracy of estimating the conditional dependency from
data.
This becomes particularly significant when the
variables involved are continuous in nature. For example,
consider the case of testing the dependency between two
X
1

X
2

X
3

X
4

X
5

P(X
1
, X
2
, X
3
, X
4
, X
5
) = P(X
5
|X
4
) P(X
4
|X
2
,X
3
) P(X
2
|X
1
) P(X
3
|X
1
) P(X
1
)
9
International Conference on Electrical Engineering and Computer Science (ICEECS-2012), May 12, 2012,
Trivandrum, India, ISBN Number: 978-93-81693-58-2

continuous variables Y and Z conditioned on X. If X is a
categorical variable, and if there are sufficient observations
per category it is possible to design tests for dependency
between Y and Z for each category and later combine the
results. On the other hand if X, Y and Z are continuous it can
be shown that the hypothesis Y is independent of Z given X
is not testable against any alternatives [6]. One simple
explanation for this is that an i.i.d. sample (X, Y, Z) has, with
probability one, at most one observed pair of (Y,Z) for a
given X. Some methods have been suggested in the
literature for testing the conditional independence of
continuous variables. These are based on partial correlation
coefficient and Kendall's partial tau. However these
methods are applicable only when certain restrictive
conditions are met [7]. Therefore in general, estimating
conditional independence for continuous variables from data
is a hard problem.
From the applications perspective, this is a very
serious limitation. For example, consider the case of using
Bayesian Networks in a Performance Management System
(PMS). Typically in a PMS, there would be several KPIs
(Key Performance Indicators) which are used as measures
of performance of various entities. These KPIs are typically
continuous variables and also aggregated measures (average
over a day, week, month etc). Aggregating data would
reduce their number of records. Therefore aggregated and
continuous nature of KPIs poses challenges in discovering
cause effect relationships between them using Bayesian
Networks.

C. Inductive Causation Algorithm

Now we describe the Inductive Causation (IC)
algorithm used for discovering the structure of Bayesian
Networks from data [1]. The main steps of the IC algorithm
are:
i. Start with a set of vertices V.
ii. For a pair of variables a and b in V search for a set
S
ab
so that a and b are conditionally independent on
S
ab
.
iii. Connect a and b with an undirected edge if no such
set S
ab
is found.
iv. Repeat this for all pairs of variables.
v. In the resulting undirected graph, for each pair of
variables a and b connected to a common variable
c through undirected edges, check if c in S
ab
. If,
then do not add any direction to the edges. If not,
then add arrows from a to c and from b to c.
vi. In the partially directed graph that results, orient as
many of the undirected edges as possible subjected
to two conditions.
a. The orientation should not create a new V
structure (a c b)
b. The orientation should not create a
directed cycle.
For the last step (vi) Pearl suggests 4 different rules for
obtaining maximally oriented graph. For details readers
please refer to [1]. The most important step in the IC
algorithm is step (ii) where the conditional dependency
between two nodes a and b conditioned on a set of variables
S
ab
are found. One can use any standard statistical
procedures which test the conditional dependency at this
step. In the case of discrete variables, standard statistical
tests such as Chi-square test can be applied for testing
conditional dependency. However in the case of continuous
variables, as mentioned earlier, testing conditional
independence accurately poses a challenge.
The problem of estimating conditional independence
of continuous variables from data has been studied in the
domain of probability and statistics using the method of
copulas [8]. These authors have used this method to
discover Granger Causality between some important
economic variables. However to our knowledge no one so
far has used this for discovering the structure of Bayesian
Networks involving many continuous variables and
hierarchical relationships from data. In our work, we use the
method of copulas to test conditional dependence in step 2
of the IC algorithm. In the next section, we will give a brief
review of the concept of copulas and how to test conditional
independence using them.

III. ESTIMATION OF CONDITIONAL INDEPENDECE
USING COPULAS
A. Introduction to Copulas
Copulas are functions used to represent the joint
probability distributions of multivariate systems in a
convenient way. The main advantage of copulas is that the
joint distribution can be separated into two parts: the
marginal distributions of each variable by itself and the
copula function which combines these marginal
distributions into joint a distribution. More formally, if P(X
1
,
X
2
,...,X
N
) is the joint distribution of N random variables
{X
1
,X
2
,...,X
N
} and F
i
(X
i
) is the marginal distribution of each
random variable X
i
, then according to Sklar's theorem there
exists the copula function C such that[10]

( ) ( ) ( ) ( ) ( )
N N N
X F X F X F C X X X P ,..., , ,..., ,
2 2 1 1 2 1
=
(2)

Sklar's theorem also states that if the marginal
distributions are continuous then the copula function is
unique. Since each of the marginal distribution function is
invertible we can express it as functions

( ) ( ) ( ) ( ) ( )
N N N
u F u F u F C X X X P
1
2
1
2 1
1
1 2 1
,..., , ,..., ,

=
(3)
or in simple notation

10
International Conference on Electrical Engineering and Computer Science (ICEECS-2012), May 12, 2012,
Trivandrum, India, ISBN Number: 978-93-81693-58-2

( ) ( )
N N
u u u C X X X P , , , , , ,
2 1 2 1
= (4)

where each u
i
is a continuous variable in the range [0,1].
From the copula function, which represents probability
distributions, one can define the associated copula density
functions as follows:

( ) ( )
N
N
N
u u u C
u u u
u u u c , , , , , ,
2 1
2 1
2 1

c
c
c
c
c
c
=
(5)

To estimate the copula density function from data we use a
non-parametric approximation of copula functions using
Bernstein's polynomial [12]. The Bernstein approximation
to the copula density is given by

( )
( ) ( )
[
=

=

=
N
i
i i i i N
m
k
m
k
N
u k m B m k k k p
u u u c
N
N
1
2 1
1 1
0
2 1
, , 1 , , ,
, , ,
1
1


(6)

where B(m,k,u) is the Bernstein Polynomial defined by

( )
( )
1 0 ) 1 (
! !
!
, , s s

=

u u u
k k m
m
u k m B
k m k
(7)

and p(k
1
,k
2
,,k
N
) is the N dimensional empirical copula
density. The empirical copula density is calculated from
data by first computing the empirical copula function
defined by

( ) ( ) ( )
n
i X X i X X X X tuples
n
i
n
i
C
N N N
N
s s
= |
.
|

\
|
, , | , , #
, ,
1 1 1
1


(8)
where X(i
j
) represents j
th
order statistics of X
i
and n is the
number of rows of data. The empirical copula density is
then found from this Empirical Formula through the
differentiation operation

( ) ( )
N
N
N
u u C
u u
u u p , , , ,
1
1
1

c
c
c
c
= (9)

B. Testing Conditional Independence Using Copulas
The problem of testing conditional independence can be
reformulated in terms of copulas as follows:

We are interested in finding from data whether two
continuous random variables Y and Z are dependent
conditioned on a third continuous random variable X. If Y
and Z are conditionally independent then

( ) ( ) X Y P Z X Y P | , | = (10)

This means that knowing Z will not give any additional
information on estimation of probability of Y given an X
value. This also implies that if X is fixed there is no direct
dependency between Y and Z.

The above statement can be written as a Hypothesis Test
statement using conditional copula density [8].

Null Hypothesis H
0


( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
1
, ,
, ,
=
|
|
.
|

\
| =
Z F X F c Y F X F c
Z F Y F X F c
P
Y X Y X
Z Y X
(11)

Alternate Hypothesis H
1

( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
1
, ,
, ,
<
|
|
.
|

\
| =
Z F X F c Y F X F c
Z F Y F X F c
P
Y X Y X
Z Y X
(12)
where c(F
X
(X),F
Y
(Y),F
Z
(Z)) is the Copula density function.
To estimate similarity of two copula density functions, we
use the Hellinger Distance defined by [11]

( ) ( )
( )
| |
( ) w v u dc
w v u c
w u c v u c
H
N
D
, ,
, ,
, ,
1
2
1 , 0
}
|
|
.
|

\
|
= (13)

Once the Hellinger distance is computed using empirical
copula density estimated from data, it is relatively straight
forward to find if either H
0
or H
1
holds. If Hellinger distance
is close to zero (or sufficiently small) then H
0
holds,
otherwise it can be rejected. For accurately estimating H
D
,
we have used a boot strap method. Thus using the copula
density and Hellinger distance, one can do the conditional
independence test on continuous variables.

IV. APPLICATION OF COPULA METHOD TO INFER
BAYESIAN NETWORK IN A SUPPLY CHAIN PERFORMANCE
MANAGEMENT SYSTEM
We used our copula based method for discovering the
Bayesian network structure in a supply chain performance
management system. This system is designed to monitor
and improve the efficiency of procurement process in
supply chains. It consists of several Key Performance
Indicators from which we have chosen a few important ones
for our study. These are Total Number of Invoices
(Tot_Inv_Quantity), Total Invoice Amount (Tot_Inv_Amt),
11
International Conference on Electrical Engineering and Computer Science (ICEECS-2012), May 12, 2012,
Trivandrum, India, ISBN Number: 978-93-81693-58-2

Average Price (Avg_Price), Percentage of Invoice Amount
from Non-Local Suppliers (Pcnt_Nonloc_Inv_Amt),
Percentage of Invoice Amount from High Cost Suppliers
(Pcnt_Highco_Inv_Amt), Percentage of Defected Items
(Pcnt_Defect). Though this is relatively a small system, it
has the sufficient complexity for validating our method. The
data set for this study was generated using a purchase order
simulation tool. Daily purchase order data of two years is
used for this study. From the purchase order data, the KPIs
are calculated by straight forward aggregation using SQL
commands. Final data set had 730 rows.
The structure of the Bayesian network obtained using
our copula method is shown below. The structure captures
the expected cause-effect relationships between the KPIs.
For example, changing the percentage of orders from non-
local suppliers or from high cost suppliers can cause
changes in the average price. This in turn can cause change
in the total spend. Similarly, change in the percentage of
defect items (a measure of quality of supplier) can cause
change in the total number of invoices because in the
purchase order system we have used, defected items are re
ordered by the buyer. This in turn can increase the spend
amount. We experimented with different values of m, order
of Bernstein polynomial, and found values in the range 3-5
giving best results.
Next, we used the standard implementation of IC
algorithm in Weka for the generation of network structure
after discretizing the data. We used 3 different discretization
schemes. In the first case, we discretized each KPI into two
states. In the second case, each KPI has 5 states and in the
third case 10 states. We discretized by splitting the KPI
value into equal size intervals and naming each interval as a
state. More sophisticated methods of discretization produced
qualitatively similar results. For all the cases we used IC
algorithm in Weka with Bayes score function.



















Figure 2. Structure of Bayesian network using Copula
method

Here the first figure is the Bayesian network structure
obtained with discretization to 2 states, second figure with
discretization to 5 states and third figure discretization to 10
states. It is clear from the figure that keeping all other
factors constant, changing the discretization scheme changes
the network structure. As mentioned earlier this is due to a
combination of two factors. The increase in the number of
possible states and the increase in the data required for
computing the conditional probability distributions
accurately. The first figure captures some cause-effect
relationships correctly. The arrows from percentage of non-
local supplier and percentage of high cost suppliers to
average price, arrow from percentage defect to total order
volume are the same in the network obtained using copula
method. However, the arrow between Total Number of
Orders and Total Spend Amount got reversed. As the
number of states increases, less number of edges are seen in
the graph. This is because we need more data to compute
the conditional probability distributions accurately.
Therefore, the conditional independence test between some
nodes is giving incorrect results leading to the removal of
the edges between these nodes.
V. SUMMARY AND FUTURE WORK
In this paper, we have introduced a new method, based on
copulas, for discovering the structure of Bayesian networks
from data for the case of continuous variables. Copulas are
commonly used in statistics for representing multivariate
distributions in a more convenient way. We have used our
method in a performance management system in the context
of supply chains and showed that it gives more accurate
results compared to methods available in the literature
which involves discretization of the variables. In the future
work we will study how this method can be made more
scalable in terms of computational time for systems
involving large number of variables.









Figure 3. Structure of Bayesian network using discretization
of the data into 2 states
Pcnt_Nonloc_
Inv_Amt
Pcnt_Highco_
Inv_Amt
Pcnt_Defect
Avg_Price Tot_Inv_Quan
tity
Tot_Inv_Amt
Pcnt_Nonloc_
Inv_Amt
Pcnt_Highco_
Inv_Amt
Pcnt_Defect
Avg_Price Tot_Inv_Quan
tity
Tot_Inv_Amt
12
International Conference on Electrical Engineering and Computer Science (ICEECS-2012), May 12, 2012,
Trivandrum, India, ISBN Number: 978-93-81693-58-2










Figure 4. Structure of Bayesian network using discretization
of the data into 5 states


















Figure 5. Structure of Bayesian network using discretization
of the data into 10 states




ACKNOWLEDGMENT
We would like to thank Piyush Kumar Marwaha for
implementing the Copula Method in .Net platform and for
helping with validation.





REFERENCES

[1] Judea Peral, Causality: Models, Reasoning and Inference (Second
Edition), Cambridge University Press (2009).
[2] O. Pourret, P. Naim and B. Marcot, Bayesian Networks: A Practical
Guide to Applications, Wiley Publications (2008).
[3] N. Friedman, M. Linial, I. Nachman, D. Pe'er, Using Bayesian
Networks to Analyze Expression Data, Journal of Computational
Biology 7 (34): 601620 (2000).
[4] D.M. Chickering, C. Meek, D. Heckerman, "Large-sample learning of
Bayesian networks is NP-hard," in: U. Kjaerulff, C. Meek. (Eds.),
Proceedings of the Nineteenth Conference on Uncertainty in Artificial
Intelligence, Morgan Kaufmann, Acapulco, Mexico, 2003, pp. 124
133
[5] L. D. Fu, A comparison of state-of-the-art algorithms for learning
bayesian network structure from continuous data, Masters Thesis,
Vanderbilt University, 2005
(http://etd.library.vanderbilt.edu/available/etd-12022005-171510/)
[6] W.P. Bergsma, Testing Conditional Independence for Continuous
Random Variables, Unpublished
[7] E.L. Korn, The ranges of limiting values of some partial correlations
under conditional independence, The American Statistician, Vol 38,
pp 61-62 (1984)
[8] T. Bouezmarnia, J. V. K. Romboutsb and A. Taamoutic, A
Nonparametric Copula Based Test for Conditional Independence with
Applications to Granger Causality, Journal of Business & Economic
Statistics, Dec 2011.
[9] R. Hofmann and V. Tresp, Discovering Structure in Continuous
Variables Using Bayesian Networks, Advances In Neural
Information Processing System 8, MIT Press, Cambridge, MA 1996.
[10] http://en.wikipedia.org/wiki/Copula_statistics.
[11] http://en.wikipedia.org/wiki/Hellinger_distance
[12] A. Sancetta and S. Satchell, Bernstein Approximations to the Copula
Function and Portfolio Optimization, Cambridge Working Papers in
Economics (2001) http://www.dspace.cam.ac.uk/handle/1810/284


Pcnt_Nonloc_
Inv_Amt
Pcnt_Highco_
Inv_Amt
Pcnt_Defect
Avg_Price Tot_Inv_Quan
tity
Tot_Inv_Amt
Pcnt_Nonloc_
Inv_Amt
Pcnt_Highco_
Inv_Amt
Pcnt_Defect
Avg_Price Tot_Inv_Quan
tity
Tot_Inv_Amt

Anda mungkin juga menyukai