Anda di halaman 1dari 56

# Solving Differential Equations

## through Means of Deep Learning

Juliane Braunsmann
February 8, 2019

1 Neural Networks

## 3 Learning with a residual loss...

... and hard assignment of constraints
... and soft assignment of constraints

4 Summary

1 Neural Networks

## 3 Learning with a residual loss...

... and hard assignment of constraints
... and soft assignment of constraints

4 Summary
Solving Differential Equations through Means of Deep Learning

Machine Learning

what’s expected.

## We require three things:

▶ Input data points e.g. images for image tagging, speech for speech recognition
▶ Examples of the expected output e.g. tagged images, transcribed audio files
▶ A way to measure if the algorithm is doing a good job to determine if the output of the
algorithm is close to the expected output, this enables learning

Juliane Braunsmann 1 46
Solving Differential Equations through Means of Deep Learning

Formalization
Training data, consiting of pairs of input data and expected output:

{(x1 , y1 ), … , (xN , yN )} ⊆ 𝒳 × 𝒴 ,

## assumed to be i.i.d distributed samples (observations) from an unknown probability

distribution P.
Goal: Given a new input x ∈ 𝒳 , predict output y ̂ ∈ 𝒴 , i. e. find a prediction function
f: x ↦ y.̂
Assumption: (x, y) is another independent observation of P
To measure if the prediction function is doing a good job, we have to define a loss function

L: 𝒴 × 𝒴 → ℝ,

where L(y, y)̂ measures how close the expected output y is to the predicted output y.̂
Juliane Braunsmann 2 46
Solving Differential Equations through Means of Deep Learning

Typical losses
Some typical loss functions are:
squared loss
2
L(y, y)̂ = |y − y|̂ for 𝒴 = ℝ,

zero-one loss
L(y, y)̂ = 𝟙y (y)̂ for arbitrary𝒴 ,

cross-entropy loss
L(y, y)̂ = −(y log(y)̂ + (1 − y) log(1 − y))
̂ for 𝒴 = [0, 1].

Juliane Braunsmann 3 46
Solving Differential Equations through Means of Deep Learning

Average loss

## E(f) = 𝔼P [L(y, f(x))] = ∫ L(y, f(x))dP(x)

𝒳
N N
1
≈ ∑ L(yi , f(xi )) = ∑ Ei (f)
N i=1 i=1

Then find
f∗ = arg min E(f),
f

## where the type of f is determined by the applied prediction algorithm.

Juliane Braunsmann 4 46
Solving Differential Equations through Means of Deep Learning

## There are many different types of prediction functions in machine learning:

▶ decision trees
▶ support vector machines
▶ naive Bayes classifiers
▶ k-nearest neighbor algorithm
▶ neural networks, specifically deep learning

Juliane Braunsmann 5 46
Solving Differential Equations through Means of Deep Learning

## There are many different types of prediction functions in machine learning:

▶ decision trees
▶ support vector machines
▶ naive Bayes classifiers
▶ k-nearest neighbor algorithm
▶ neural networks, specifically deep learning

Juliane Braunsmann 6 46
Solving Differential Equations through Means of Deep Learning

Deep Learning

## Deep learning is a specific subfield of machine learning: a new take on learning

representations from data that puts an emphasis on learning successive layers of
increasingly meaningful representations.

## — François Chollet, Deep Learning with Python [Cho17]

Juliane Braunsmann 7 46
Solving Differential Equations through Means of Deep Learning

## Feedforward Neural Network1

+1 b
x1
w1 n
𝜎 (∑ wi xi + b)
w2 i=1
x2 Σ𝜎
w3
x3

n
w

xn

1
https://github.com/PetarV-/TikZ/tree/master/Multilayer%20perceptron

Juliane Braunsmann 8 46
Solving Differential Equations through Means of Deep Learning

## +1 b Input Hidden Output

w1 n layer layer layer
x1 𝜎 (∑ wi xi + b)
w2 i=1
x2 Σ𝜎
w3
x3 I1 O1

n
w

xn I2
I3 O2

1
https://github.com/PetarV-/TikZ/tree/master/Multilayer%20perceptron

Juliane Braunsmann 9 46
Solving Differential Equations through Means of Deep Learning

## Some typical activation functions are:

Sigmoid function
1
𝜎(z) =
1 + exp(−z)

Tanh function
exp(z) − exp(−z)
𝜎(z) =
exp(z) + exp(−z)

ReLu function
𝜎(z) = max(z, 0)

Juliane Braunsmann 10 46
Solving Differential Equations through Means of Deep Learning

## Typical activation functions

1
ReLu (scaled)
Tanh
Sigmoid

−1

Juliane Braunsmann 11 46
Solving Differential Equations through Means of Deep Learning

## Formalization of (feed-forward) neural network

Given an input vector zl ∈ ℝn , the output of the fully connected layer l + 1 is
n
(𝜎 (∑ wli,j zi + bli )) ∈ ℝm .
i=1 j=1,…,m

Such layers can be concatenated, yielding a deep neural network parametrized by weight
matrices Wl and bias vectors bl for each layer.
We denote these parameters by 𝜃 and the corresponding neural network by f𝜃 .
We write
N
1 N
E(𝜃) = ∑ Ei (𝜃) = ∑ L(yi , f𝜃 (xi )).
i=1
N i=1

Juliane Braunsmann 12 46
learning. The next chapter explains in more detail how backpropagation works.
Solving Differential Equations through Means of Deep Learning

Input X

Layer
Weights
(data transformation) Training loop
Layer
Weights
(data transformation)

update Y' Y

## Figure 1.9 The loss score is used as a

Loss score
feedback signal to adjust the weights.
Source: [Cho17]

Initially, the weights of the network are assigned random values, so the13netwo
Juliane Braunsmann 46
Solving Differential Equations through Means of Deep Learning

N
Let E(𝜃) = ∑i=1 Ei (𝜃) be a loss function. Then stochastic gradient descent can be
presented as follows:
▶ Initialize weights 𝜃 and a learning rate 𝜂.
▶ Repeat until stopping criterion:
▶ choose an index 1 ≤ i ≤ N
▶ calculate Ei (𝜃) (forward pass)
▶ calculate ∇𝜃 Ei (𝜃) using backpropagation and update 𝜃 = 𝜃 − 𝜂∇Ei (𝜃) (backward
pass)

Juliane Braunsmann 14 46

1 Neural Networks

## 3 Learning with a residual loss...

... and hard assignment of constraints
... and soft assignment of constraints

4 Summary
Solving Differential Equations through Means of Deep Learning

Beginnings

The idea of solving differential equations by means of machine learning was introduced in
1996 by Isaac Lagaris, Aristidis Likas and Dimitrios Fotiadis (University of Ioannina,
Greece).
They saw the following advantages in this approach:
▶ the solution learned by a neural network has a differentiable, closed analytic form
▶ the method is general
▶ efficient implementation on parallel architectures
▶ the neural network solution showed good generalization properties on unseen points
▶ a high accuracy could be achieved with much fewer parameters

Juliane Braunsmann 15 46
Solving Differential Equations through Means of Deep Learning

General setup

Problem
Let Ω ⊆ ℝn bounded. Given a general differential equation of the form

## subject to 𝜓(x) = g(x) on Γ ⊆ ∂Ω,

where 𝜓: Ω ⊆ ℝn → ℝ, G: ℝn × ℝ × ℝn × ℝn×n → ℝ and g: Γ → ℝ. Find a solution 𝜓.

## Idea: parametrize the solution 𝜓 with a neural network as 𝜓 = 𝜓𝜃 for some

parameters 𝜃

Juliane Braunsmann 16 46
Solving Differential Equations through Means of Deep Learning

Loss functions
There exist several possibilities how to define loss functions:
▶ use the residual error, i. e. minimize

Ω

̃ ̃
∇𝜓(x)) dx
̃
𝜓∈H Ω

## for some space H [NM18; EY18]

Both integrals then have to be discretized for training, which is done by using a Monte
Carlo integration method.

Juliane Braunsmann 17 46
Solving Differential Equations through Means of Deep Learning

## Interpretation of the integral as an infinite sum

For xi ∈ Ω we can write

## E(𝜃) = ∫ G(x, 𝜓𝜃 (x), ∇𝜓𝜃 (x), ∇2 𝜓𝜃 (x))2 dx

Ω

“ = ” ∑ G(xi , 𝜓𝜃 (xi ), ∇𝜓𝜃 (xi ), ∇2 𝜓𝜃 (xi ))2
i=1

= ∑ Ei (𝜃),
i=1

## We can interpret the points xi ∈ Ω as an infinite amount of training data.

▶ no risk of overfitting

Juliane Braunsmann 18 46
Solving Differential Equations through Means of Deep Learning

## Discretization of the intergral using collocation points

We can instead choose some fixed collocation points, for example a grid of equidistant
points, or uniformly distributed points.
Now, the collocation points can be seen as training data.
▶ fewer training points, thus faster
In the literature
▶ [LLF98], [McF06], [BN18] use 10-100 equidistant points to discretize the interval [0,1]
▶ [SS18], [EY18] use “infinite” training data, up to 500 million data points in total

Juliane Braunsmann 19 46
Solving Differential Equations through Means of Deep Learning

## Dealing with boundary conditions

Problem: how can we assert that 𝜓𝜃 satisfies the boundary conditions? In the literature,
there are two possibilities to deal with boundary conditions:
▶ add a penalty term to the loss function (“soft assignment”)
▶ use trial solutions that automatically satisfy the boundary conditions (“hard
assignment”)

Juliane Braunsmann 20 46

1 Neural Networks

## 3 Learning with a residual loss...

... and hard assignment of constraints
... and soft assignment of constraints

4 Summary
Solving Differential Equations through Means of Deep Learning

## One possibility to get trial solution is to define

𝜓ts
𝜃 (x) = A(x) + F(x, N𝜃 (x)),

where A satisfies the boundary conditions, F does not contribute to the boundary
conditions and N𝜃 is a neural network with parameters 𝜃.
The functions A and F need to be defined “by hand” according to the problem.

Juliane Braunsmann 21 46
Solving Differential Equations through Means of Deep Learning

Example in 1D
Consider the linear diffusion equation
d2 u
Lu = = f, 0 < x < 1,
dx2
u(0) = g0 , u(1) = g1

## In this case, we can use the trial solution

𝜓ts
𝜃 (x) = (1 − x)g0 + xg1 + x(1 − x)N𝜃 (x)

## and we have 𝜓ts ts

𝜃 (0) = g0 , 𝜓𝜃 (1) = g1 .

## The corresponding loss function is

1 1 2
dN dN d2 N𝜃
E(𝜃) = ∫ (L𝜓ts
𝜃 )(x)dx = ∫ (2 ((1 − x) 𝜃 (x) − x 𝜃 (x) − N𝜃 (x)) + x(1 − x) (x)) dx.
0 0 dx dx dx2

Juliane Braunsmann 22 46
Demonstration
Solving Differential Equations through Means of Deep Learning

## Comparison with FEM [LLF98] (2D nonlinear PDE)

Juliane Braunsmann 23 46
Solving Differential Equations through Means of Deep Learning

## Extension to irregular boundaries [MM09]

It is necessary to specify A, which satisfies the boundary condition, and F, which doesn’t
contribute to the boundary condition → not trivial if the boundary is more complex.
One way to construct such a function F for irregular boundaries is proposed in [McF06;
MM09] with the introduction of length factors.

Length factor
For a domain Ω and a boundary Γ ⊆ ∂Ω, a length factor is a function L: Ω → ℝ such that
L(x) = 0 for all x ∈ Γ and L(x) ≠ 0 for all x ∈ Ω ⧵ Γ.

## Interpretation: length factor describes a sort of distance to the boundary

Juliane Braunsmann 24 46
Solving Differential Equations through Means of Deep Learning

## We can use the length factor to define a trial solution

𝜓ts
𝜃 (x) = A(x) + L(x)N𝜃 (x),

## where L is a length factor which is zero on the boundary and A is an extension of

the boundary data to the whole domain.

## How do we find the maps A and L?

Juliane Braunsmann 25 46
Solving Differential Equations through Means of Deep Learning

## A unified framework [BN18]

𝜓ts
𝜃 (x) = A(x) + L(x)N𝜃 (x).
▶ learn the maps A and L
▶ use a smoothed distance function for L, i. e. L(x) ≈ d(x) where

d(x) = min‖x − xb ‖
xb ∈Γ

## ▶ this is achieved by training a smooth network L𝜂 : Ω → ℝ to approximate d

▶ train a network A𝜅 : Ω → ℝ such that

## A𝜅 (x) = g(x) for x ∈ Γ

▶ only relatively few points needed to train these networks (only collocation points on Γ
and a few in the interior for the distance function)

Juliane Braunsmann 26 46
Solving Differential Equations through Means of Deep Learning

Figure: Smoothed distance function on a star shaped domain, learned using a single layer with 20
neurons using less than 1000 collocation points.
Source: [BN18]

Juliane Braunsmann 27 46
Solving Differential Equations through Means of Deep Learning

Figure: From left to right: A𝜅 , boundary function continued on the whole domain, neural network
solution, difference to exact solution
Source: Middle, Right: [BN18]

Juliane Braunsmann 28 46
Solving Differential Equations through Means of Deep Learning

## A more complex example for linear diffusion [BN18]

For problem with simple polygonal boundaries, high quality meshes can be generated →
not clear if ANN method is competitive with FEM

## ▶ consider a domain with a more complicated boundary: Sweden

▶ authors report that mesh generated was not finished after 16h
▶ their method took 10 minutes on a high end laptop, implemented in SciPy → even
more performance again expected when using frameworks such as PyTorch or
TensorFlow

Juliane Braunsmann 29 46
Solving Differential Equations through Means of Deep Learning
Boundary and collocation points

1.6

1.5
y

1.4

1.3

1.2

## 0.20 0.25 0.30 0.35 0.40

Source: [BN18]
x
(a) Collocation and boundary points (b) Smoothed distance function using a single hidden layer
used to compute d(x). with 20 neurons.
Juliane Braunsmann 30 46
Figure 10: Boundary and collocation points to compute the smoothed distance function
Solving Differential Equations through Means of Deep Learning

(a) ANN solution to the 2D stationary diffusion (b) Difference between the exact and computed
equation. solution. Source: [BN18]
Figure 11: Solution and error for the diffusion equation in a complex 2D geometry using
Juliane Braunsmann
five hidden layers with 10 neurons each. 31 46
Solving Differential Equations through Means of Deep Learning

Recap

## Two possibilities to deal with boundary conditions:

▶ use trial solutions that automatically satisfy the boundary conditions (“hard
assignment”)
▶ add a penalty term to the loss function (“soft assignment”)

Juliane Braunsmann 32 46
Solving Differential Equations through Means of Deep Learning

Boundary penalty

A different approach to defining trial functions which satisfy the boundary conditions
exactly is to add penalty terms to the loss function, i. e.

## for some hyperparameter 𝜆.

▶ Advantages: easy to formulate, different types of boundary conditions can be
incorporated without further effort
▶ Disadvantages: boundary conditions not exactly satisfied, training has to be
balanced
[SS18] call this method the “Deep Galerkin Method”.

Juliane Braunsmann 33 46
Solving Differential Equations through Means of Deep Learning

## A more complex example [SS18]

Another application of learning algorithms is finding a solution of a PDE over a range of
different setups (i. e. physical conditions, boundary conditions).

Burgers’ equation
The one dimensional (viscous) Burgers’ equation is defined as

du d2 u du
= 𝜈 2 − 𝛼u , (t, x) ∈ [0, 1] × [0, 1]
dt dx dx
u(t, x = 0) = a, t ∈ [0, 1]
u(t, x = 1) = b, t ∈ [0, 1]
u(t = 0, x) = g(x) = a + (b − a)x, x ∈ [0, 1]

## for a parameter space 𝒫 = {(a, b, 𝜈, 𝛼) ∈ ℝ4 }.

Juliane Braunsmann 34 46
Solving Differential Equations through Means of Deep Learning

Approach

Instead of solving the equation for each configuration in the parameter space separately, a
neural network is trained on the whole space

## by sampling on this space.

A network with 6 layers with 200 neurons each is used (1200 parameters).

Juliane Braunsmann 35 46
Solving Differential Equations through Means of Deep Learning

Result

Figure: Deep learning solution is in red, solution found via finite differences in in blue. The problem
setups are (𝜈, 𝛼, a, b) = (0.01, 0.95, 0.9, −0.9), (0.09, 0.95, 0.5, −0.5).
Source: [SS18]

Juliane Braunsmann 36 46
Solving Differential Equations through Means of Deep Learning

## Another example (Stock price dynamics) [SS18]

[SS18] demonstrate that their method works also in high dimensions by analysing a free
boundary PDE describing stock price dynamics (Black Scholes model)
→ # dimensions = # stocks

They choose parameters for which a semi-analytic solution exists (can be reduced to a
one-dimensional PDE which can be used by finite difference methods).

## Number of dimensions Error

3 0.05%
20 0.03 %
100 0.11 %
200 0.22 %

Juliane Braunsmann 37 46
Solving Differential Equations through Means of Deep Learning

## Examples indicate that the method works,

but is there any theory to support this?

Juliane Braunsmann 38 46
Solving Differential Equations through Means of Deep Learning

Approximation Theorem

## Neural Network Approximation Theorem for PDEs [SS18]

Denote by Θn = ℝn × ℝd×n the set of parameters of neural networks with n hidden units
with input dimension d. Let 𝜃n be the parameter configuration which minimizes E(𝜃) over
the set Θn and let u𝜃 be the corresponding neural network. Under certain growth and
smoothness assumptions on the nonlinear terms, for a class of quasilinear parabolic PDEs,
there exists a sequence of optimizers 𝜃n such that

E(𝜃n ) → 0 as n → ∞
as well as
f𝜃n → u
strongly in L𝜌 with 𝜌 < 2.

Juliane Braunsmann 39 46
Solving Differential Equations through Means of Deep Learning

## Concept of the proof

(1) prove that E(𝜃n ) → 0 as n → ∞ using neural network function approximation results
(2) establish that each neural network u𝜃n satisfies a PDE with a source term fn
(3) prove the convergence of u𝜃n → u in L𝜌 as n → ∞ using the smoothness of the neural
network approximations and compactness arguments

Juliane Braunsmann 40 46
Solving Differential Equations through Means of Deep Learning

## Approximation theorem by Hornik

Uniformly m-dense subset
Let X ⊆ ℝd compact and for f ∈ Cm (X) define the Sobolev norm

## ‖f‖m,X = max sup|D𝛼 f(x)|,

|𝛼|≤m x∈X

where 𝛼 is a multi-index. Then a subset S of Cm (X) is called uniformly m-dense if for all 𝜀 > 0 there
is a function g = g(f, 𝜀) ∈ S such that
‖f − g‖m,X < 𝜀.

Approximation theorem
If the activation function 𝜓 ∈ Cm (X) is nonconstant and bounded, then the set of neural networks
with one hidden layer {f𝜃 |𝜃 ∈ ∪n≥1 Θn } is m-dense in Cm (X).

Juliane Braunsmann 41 46
Solving Differential Equations through Means of Deep Learning

## Application to Step (1)

Consider a quasilinear parabolic PDE

𝒢 [u](t, x) = ∂t u(t, x) − div(𝛼(t, x, u(t, x), ∇u(t, x))) + 𝛾(t, x, u(t, x), ∇u(t, x)) = 0
for (t, x) ∈ ΩT = (0, T] × Ω

for a bounded set Ω ⊆ ℝd with additional boundary conditions with a solution u ∈ C2 (Ω̄ T ).
We can then apply the approximation theorem to get a function f𝜃 such that

## ‖f𝜃 − u‖2,Ω̄ < 𝜀,

T

and thus sup |∂t u(t, x) − ∂t f𝜃 (t, x)| + max sup |D𝛼 u(t, x) − D𝛼 f𝜃 (t, x)| < 𝜀.
(t,x)∈ΩT |𝛼|≤2 (t,x)∈Ω̄
T

## Using Lipschitz continuity assumptions on (the derivatives of ) 𝛼 and 𝛾, the convergence of

E(𝜃) → 0 can be shown.
Juliane Braunsmann 42 46
Solving Differential Equations through Means of Deep Learning

Further remarks

Some more work has to be done to prove step (3): it is not obvious that
E(𝜃) = ‖𝒢 [u]‖L2 (Ω ) + boundary term norms → 0 implies f𝜃 → u.
T

The approximation theorem is only an existence result, minimization of the functional E(𝜃)
is highly non-trivial, since it is not convex. However, deep learning optimization algorithms
are designed to deal with these kinds of optimization problems and work quite well in
practice.

Juliane Braunsmann 43 46

1 Neural Networks

## 3 Learning with a residual loss...

... and hard assignment of constraints
... and soft assignment of constraints

4 Summary
Solving Differential Equations through Means of Deep Learning

## ▶ the solution has a closed analytical form, no interpolation necessary

▶ loss function is straightforward to formulate and the problem-dependent additional
effort is minimal
▶ the method is general and mesh-free
▶ the total number of sampled points might be large, but can be processed sequentially
without harming convergence
▶ transfer learning can be used to solve similar problems
▶ parallel on GPU, with frameworks that make relatively easy implementation possible
and have a large community
▶ conversely: body of literature for PDE is very large and thus offers the possibility to
study neural networks in a well-understood context ([MQH18])

Juliane Braunsmann 44 46
Solving Differential Equations through Means of Deep Learning

## ▶ choice of suitable network architecture is difficult

▶ dependence on initialization
▶ due to non-convex optimization there is a risk of getting a local minimum
▶ convergence results?

Juliane Braunsmann 45 46
Solving Differential Equations through Means of Deep Learning

Summary

## ▶ neural network offer a novel way to approximate solution to differential equations by

▶ there exist different methods to treat boundary conditions
▶ there is a theorem which guarantees that it is possible to express the solution of
certain PDEs by neural networks, and that minimizing L2 residual error leads to a
solution
▶ results considering convergence speed and approximation accuracy are still lacking

Juliane Braunsmann 46 46
Solving Differential Equations through Means of Deep Learning

Bibliography I

Francois Chollet. Deep Learning with Python. 1st. 00111. Greenwich, CT, USA:
Manning Publications Co., 2017.
for High-Dimensional Random Partial Differential Equations”. In:
arXiv:1806.02957 [physics, stat] (June 2018). 00005 arXiv: 1806.02957.
Weinan E and Bing Yu. “The Deep Ritz Method: A Deep Learning-Based
Numerical Algorithm for Solving Variational Problems”. en. In: Communications
in Mathematics and Statistics 6.1 (Mar. 2018). 00006, pp. 1–12.
I.E. Lagaris, A. Likas, and D.I. Fotiadis. “Artificial neural networks for solving
ordinary and partial differential equations”. en. In: IEEE Transactions on Neural
Networks 9.5 (Sept. 1998). 00403, pp. 987–1000.

Juliane Braunsmann 0 2
Solving Differential Equations through Means of Deep Learning

Bibliography II

Kevin Stanley McFall. “An artificial neural network method for solving boundary
value problems with arbitrary irregular boundaries”. en. In: (Apr. 2006). 00006.
Jens Berg and Kaj Nyström. “A unified deep artificial neural network approach
to partial differential equations in complex geometries”. In: Neurocomputing
317 (Nov. 2018). 00011, pp. 28–41.
Justin Sirignano and Konstantinos Spiliopoulos. “DGM: A deep learning
algorithm for solving partial differential equations”. en. In: Journal of
Computational Physics 375 (Dec. 2018). 00000 arXiv: 1708.07469,
pp. 1339–1364.

Juliane Braunsmann 1 2
Solving Differential Equations through Means of Deep Learning

Bibliography III

## K. S. McFall and J. R. Mahan. “Artificial Neural Network Method for Solution of

Boundary Value Problems With Exact Satisfaction of Arbitrary Boundary
Conditions”. In: IEEE Transactions on Neural Networks 20.8 (Aug. 2009).
00055, pp. 1221–1233.
Martin Magill, Faisal Qureshi, and Hendrick W. de Haan. “Neural Networks
Trained to Solve Differential Equations Learn General Representations”. In:
arXiv:1807.00042 [physics, stat] (June 2018). 00000 arXiv: 1807.00042.

Juliane Braunsmann 2 2