net/publication/260585491
CITATIONS READS
3 12,914
2 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Andy Ray Terrel on 05 March 2015.
Abstract—We add a random variable type to a mathematical modeling lan- Consider an artilleryman firing a cannon down into a valley.
guage. We demonstrate through examples how this is a highly separable way He knows the initial position (x0 , y0 ) and orientation, θ , of the
to introduce uncertainty and produce and query stochastic models. We motivate cannon as well as the muzzle velocity, v, and the altitude of
the use of symbolics and thin compilers in scientific computing.
the target, y f .
# Inputs
Index Terms—Symbolics, mathematical modeling, uncertainty, SymPy >>> x0 = 0
>>> y0 = 0
FT
>>> yf = -30 # target is 30 meters below
Introduction >>> g = -10 # gravitational constant
>>> v = 30 # m/s
Scientific computing is becoming more challenging. On the >>> theta = pi/4
computational machinery side heterogeneity and increased If this artilleryman has a computer nearby he may write some
parallelism are increasing the required effort to produce high code to evolve forward the state of the cannonball to see where
performance codes. On the scientific side, computation is it hits lands.
used for problems of increasing complexity by an increas- >>> while y > yf: # evolve time forward until y hits the ground
ingly broad and untrained audience. The scientific community ... t += dt
... y = y0 + v*sin(theta)*t
is attempting to fill this widening need-to-ability gulf with ... + g*t**2 / 2
various solutions. This paper discusses symbolic mathematical >>> x = x0 + v*cos(theta)*t
RA
modeling. Notice that in this solution the mathematical description of
2
Symbolic mathematical modeling provides an important the problem y = y0 + v sin(θ )t + gt2 lies within the while loop.
interface layer between the description of a problem by domain The problem and method are woven together. This makes it
scientists and description of methods of solution by compu- difficult both to reason about the problem and to easily swap
tational scientists. This allows each community to develop out new methods of solution.
asynchronously and facilitates code reuse. If the artilleryman also has a computer algebra system he
In this paper we will discuss how a particular problem do- may choose to model this problem and solve it separately.
main, uncertainty propagation, can be expressed symbolically. >>> t = Symbol(’t’) # SymPy variable for time
We do this by adding a random variable type to a popular >>> x = x0 + v * cos(theta) * t
>>> y = y0 + v * sin(theta) * t + g*t**2
mathematical modeling language, SymPy [Sym, Joy11]. This >>> impact_time = solve(y - yf, t)
allows us to describe stochastic systems in a highly separable >>> xf = x0 + v * cos(theta) * impact_time
and minimally complex way.
D
v g
Distribution of velocity
FT
In this case he believes that the velocity is normally distributed
impact_time
with mean 30 and standard deviation 1.
>>> from sympy.stats import *
>>> v = Normal(’v’, 30, 1)
>>> pdf = density(v)
x y yf >>> z = Symbol(’z’)
>>> plot(pdf(z), (z, 27, 33))
√ − 1 (z−30)2
2e 2
√
x0 v t y0 theta g 2 π
RA
v is now a random variable. We can query it with the
following operators
Fig. 2: A graph of all the varibles in our system. Variables on top P -- # Probability
depend on variables connected below them. The leaves are inputs to E -- # Expectation
our system. variance -- # Variance
density -- # Probability density function
sample -- # A random sample
q 2 2 2
v −v sin (θ ) + −4gy0 + 4gy f + v2 sin2 (θ ) cos (θ )
This converts a random/stochastic expression v > 31 into
x0 +
2g a deterministic computation. The expression P(v > 31)
actually produces an intermediate integral expression which
Rather than produce a numeric result, SymPy produces an
is solved with SymPy’s integration routines.
abstract syntax tree. This form of result is easy to reason about
>>> P(v > 31, evaluate=False)
for both humans and computers. This allows for the manipu- Z ∞ √ − 1 (z−30)2
lations which provide the above expresssions and others. For 2e 2
√ dz
example if the artilleryman later decides he needs derivatives 31 2 π
he can very easily perform this operation on his graph. Every expression in our graph that depends on v is now a
random expression
Motivating Example - Uncertainty Modeling We can ask similar questions about the these expressions.
For example we can compute the probability density of the
To control the velocity of the cannon ball the artilleryman
position of the ball as a function of time.
introduces a certain quantity of gunpowder to the cannon. He >>> a,b = symbols(’a,b’)
is unable to pour exactly the desired quantity of gunpowder >>> density(x)(a) * density(y)(b)
however and so his estimate of the velocity will be uncertain. (b+5t 2 )
2
√ √
2 2(b+5t 2 )
He models this uncertain quantity as a random variable that − a2 − 2a
e t e t2 e30 t e30 t
can take on a range of values, each with a certain probability. πt 2 e900
UNCERTAINTY MODELING WITH SYMPY STATS 3
x0 v t y0 theta g
studied one (computing integrals) to which we can apply
general techniques.
Fig. 4: A graph of all the varibles in our system. Red variables are Sampling
stochastic. Every variable that depends on the uncertain input, v, is
red due to its dependence. One method to approximate difficult integrals is through
sampling.
FT
SymPy.stats contains a basic Monte Carlo backend which
Probability that the cannon
1.2 ball has not yet landed can be easily accessed with an additional keyword argument.
>>> E(impact_time, numsamples=10000)
1.0 5.36178452172906
0.8 Implementation
0.6
A RandomSymbol class/type and the functions P,
E, density, sample are the outward-facing core of
0.4 sympy.stats and the PSpace class in the internal core rep-
resenting the mathematical concept of a probability space.
RA
0.2 A RandomSymbol object behaves in every way like a
standard sympy Symbol object. Because of this one can
4.5 5.0 0.05.5 6.0 6.5 replace standard sympy variable declarations like
Time x = Symbol(’x’)
0.2
with code like
x = Normal(’x’, 0, 1)
and continue to use standard SymPy without modification.
After final expressions are formed the user can query
Or we can plot the probability that the ball is still in the air
them using the functions P, E, density, sample.
at time t
>>> plot( P(y>yf), (t, 4.5, 6.5))
These functions inspect the expression tree, draw out the
RandomSymbols and ask these random symbols to construct
Note that to obtain these expressions the only novel work the
a probabaility space or PSpace object.
D
0.15
0.10
Temperature
0.05
0.0020 25 30 35 40
Temperature (C)
FT
new data into our prior understanding. And plot the three
together.
>>> data = 26 + noise
>>> T_posterior = Given(T, Eq(observation, 26))
We now describe how SymPy.stats obtained this result. The
expression T_posterior contains two random variables,
T and noise each of which can independently take on Measurement Noise
different values. We plot the joint distribution below in figure Fig. 6: The joint prior distribution of the temperature and measure-
6. We represent the observation that T + noise == 26 as ment noise. The constraint T + noise == 26 (diagonal line) and
a diagonal line over the domain for which this statement is the resultant posterior distribution of temperature on the left.
RA
true. We project the probability density on this line to the left
to obtain the posterior density of the temperature.
These gemoetric operations correspond exactly to Bayesian High quality implementations of vertical slices of the stack
probability. All of the operations such as restricting to the con- are available through higher level libraries such as PETSc and
dition, projecting to the temperature axis, etc... are managed Trilinos or through code generation solutions such as FENICS.
using core SymPy functionality. These projects provide end to end solutions but do not provide
intermediate interface layers. They also struggle to generalize
well to novel hardware.
Multi-Compilation Symbolic mathematical modeling attempts to serve as a thin
horizontal interface layer near the top of this stack, a relatiely
Scientific computing is a demanding field. Solutions frequently
empty space at present.
encompass concepts in a domain discipline (such as fluid
SymPy stats is designed to be as vertically thin as possible.
D
Conclusion
SymPy.stats
Uncertainty We have foremost demonstrated the use of sympy.stats a
module that enhances sympy with a random variable type. We
Scientific description have shown how this module allows mathematical modellers
to describe the undertainty of their inputs and compute the
Math / PDE description uncertainty of their outputs with simple and non-intrusive
FEniCS changes to their symbolic code.
Linear Algebra/ Secondarily we have motivated the use of symbolics in
Matrix Expressions computation and argued for a more separable computational
PETSc/Trilinos stack within the scientific computing domain.
Sparse matrix algorithms
R EFERENCES
Parallel solution / [Sym] SymPy Development Team (2012). SymPy: Python library for
scheduler symbolic mathematics URL http://www.sympy.org.
BLAS/LAPACK [Roc12] M. Rocklin, A. Terrel, Symbolic Statistics with SymPy Computing
Numerical Linear Algebra in Science & Engineering, June 2012
[Joy11] D. Joyner, O. Čertík, A. Meurer, B. Granger, Open source com-
puter algebra systems: SymPy ACM Communications in Computer
C/FORTRAN CUDA gcc/nvcc
FT
Algebra, Vol 45 December 2011
−1
HT HT
0
0
Σ 0
I
Σ = [I 0] I − Σ0 R I
[H I] Σ
0 R I
[H I] 0 R 0
−1
µ= µ+ΣH T (R+HΣH T ) (Hµ−data)
D
−1
Σ= I−ΣH T (R+HΣH T ) H Σ