Anda di halaman 1dari 47

Machine Learning

for the
Materials Scientist

Chris Fischer*, Kevin Tibbetts, Gerbrand Ceder


Massachusetts Institute of Technology, Cambridge, MA

Dane Morgan
University of Wisconsin, Madison, WI

NGDM, October 10, 2007


Motivation: materials design through calculation

computing power:
exponential scaling with
time

Skylaris, C. et. al. J. Phys. Chem. 122, 084119 (2005)

O(N3)

Moore, G. ISSCC 2003 slides (http://www.intel.com) O(N)

Run-time: polynomial
scaling with number of
atoms

NGDM, October 10, 2007


DFT as a predictive tool

Burkett, T. et. al. Phys. Rev. Lett. 93 (2004) Norskov, J. et. al. MRS Bulletin 31 (2006)

Marzari, N. MRS Bulletin 31 (2006)


courtesy of M. Lazzeri, Paris VI Jussieu

Marzari, N. MRS Bulletin 31 (2006)


courtesy of D. Scherlis, MIT

NGDM, October 10, 2007


computational materials design strategies

Calculating properties of
realistic nanostructures
ab initio

Lee, Y. S. et al. PRL 95 076804 (2005)

Galli, G. University of California, Davis

NGDM, October 10, 2007


computational materials design strategies

Which combinations yield


the optimal material ?

NGDM, October 10, 2007


Outline

Machine learning in
Computational Materials Design

Searching for Structure:


combining historical information
with Density Functional Theory

Data Mining the


High-Throughput engine

wrap-up
NGDM, October 10, 2007
computational materials design strategies

Which combinations yield


the optimal material ?

NGDM, October 10, 2007


Motivation: searching for new materials

for i in (relevant chemistries) {


...
...

getStablePhases(i);

...
...
Depends on which
calculateProperty(i); phases are stable and
i = nextChemistry(); their structure
}

NGDM, October 10, 2007


Motivation: materials by design

for i in (relevant chemistries) {


...
...

getStablePhases(i);

...
... Machine Learning
calculateProperty(i); needed on
Depends here !! phases are
which
i = nextChemistry(); stable and their structure
}

NGDM, October 10, 2007


The need for machine learning

Material
DFT Code Property
Predictions

Doesn't know what


to calculate next

NGDM, October 10, 2007


The need for machine learning

Material
DFT Code Property
Predictions

Database
Machine Learning of
Framework Computed and
Experimental
results

NGDM, October 10, 2007


Computational Materials Design poised for impact

'Commodity'
computational resources

Open source
electronic structure
software

~$200-250k capital Computing budget


investment ~50k compounds/year

NGDM, October 10, 2007


Computational Materials Design poised for impact

ICSD: World's Largest


database of inorganic
crystal structures

Computing budget
First Entry: 1913 ~50k compounds/year
# of entries: 100,243
# usable compounds: 29,962
NGDM, October 10, 2007
The structure search problem

for i in (relevant chemistries) {


...
...
getStablePhases(i);
...
...
calculateProperty(i); Where do we put the atoms
i = nextChemistry(); if no experimental structure
} is known ??
Depends on which phases are
stable and their structure

NGDM, October 10, 2007


Strategies to search for structure

Coordinate Search: Heuristic Rules


Optimize energy (or free
energy) directly in the space
or
of atomic coordinates Chemical Intuition

NGDM, October 10, 2007


Methods to search for structure

Coordinate Search: GroundState≡arg minr ,


r 2 ,, 
rN E  r 1 , r 2 , , r N 
1
Optimize energy (or free
energy) directly in the space
of atomic coordinates # of dimensions = 3N – 3 + dim(a,b,c,,,)

complex energy
landscape

Doye, J. PRL, 88, 238701, (2002)

NGDM, October 10, 2007


Methods to search for structure

Coordinate Search: GroundState≡arg minr ,


r 2 ,, 
rN E  r 1 , r 2 , , r N 
1
Optimize energy (or free
energy) directly in the space
of atomic coordinates # of dimensions = 3N – 3 + dim(a,b,c,,,)

Proposed Solutions
Calculate energy of a finite set
of structure prototypes

Doye, J. PRL, 88, 238701, (2002)

NGDM, October 10, 2007


Methods to search for structure

Coordinate Search: GroundState≡arg minr ,


r 2 ,, 
rN E  r 1 , r 2 , , r N 
1
Optimize energy (or free
energy) directly in the space
of atomic coordinates # of dimensions = 3N – 3 + dim(a,b,c,,,)

Proposed Solutions

Calculate energy of a finite set


of structure prototypes

Use a stochastic optimization


procedure (hop from basin to
basin)
e.g., Simulated Annealing
Genetic Algorithms
Doye, J. PRL, 88, 238701, (2002)

NGDM, October 10, 2007


Methods to search for structure

Coordinate Search: GroundState≡arg minr ,


r 2 ,, 
rN E  r 1 , r 2 , , r N 
1
Optimize energy (or free
energy) directly in the space
of atomic coordinates # of dimensions = 3N – 3 + dim(a,b,c,,,)

Proposed Solutions

Calculate energy of a finite set


of structure prototypes
Knowledge is not transferred across
chemistries

Use a stochastic optimization


procedure (hop from basin to
basin)
e.g., Simulated Annealing
Genetic Algorithms
Doye, J. PRL, 88, 238701, (2002)

NGDM, October 10, 2007


Methods to search for structure

Heuristic Rules
Use previous experiments to suggest
what to calculate

How ?
Identify a set of simple parameters
based on alloy constituents

1932: Pauling electronegativity 


1935: Laves & Witte  r A ,B
1926,1936-7: Hume-Rothery, nate
Mott & Jones
 e
1976: Miedema n ws

NGDM, October 10, 2007


Methods to search for structure

Heuristic Rules
Plot stable structures in space of
parameters 1986: Pettifor
1983: Villars
  r A ,B nate

NGDM, October 10, 2007


Methods to search for structure

Heuristic Rules
Plot stable structures in space of
parameters 1986: Pettifor
1983: Villars
  r A ,B nate
Heuristic rules efficiently code
historical knowledge
provide transfer of knowledge

Can we leverage historical


knowledge to intelligently
search for structure ?

NGDM, October 10, 2007


description of knowledge base

Knowledge Base
Experimental Data

Pauling File binaries edition (Villars, P. et. al. J. of Alloys and Compounds, (2004))

1335 binary alloys

3975 non-unique
compounds
4263 compounds total

alloys not containing


elements:
He, B, C, N, O, F, Ne, Si,
P, S, Cl, Ar, As, Se, Br,
Kr, Te, I, Xe, At, Rn
NGDM, October 10, 2007
Machine learning framework: concepts


x= x A , x0 ,  , x 1 ,  , xB
2  Low temperature state of alloy

Data≡ {x 1,  , x N } database of N binary alloys

NGDM, October 10, 2007


Machine learning framework: concepts


x= x A , x0 ,  , x 1 ,  , xB
2  Low temperature state of alloy

Data≡ {x 1,  , xN } database of N binary alloys

Probability of low temperature


p x 
state (fitted to data)

Probability of low temperature


p  x ∣e  state conditioned on evidence 'e'

NGDM, October 10, 2007


how to use the machine learning framework

Material
DFT Code Property
Predictions

Set of likely
structure candidates
Machine Learning Database
Framework of
Computed and
p  x ∣e  Experimental
results

NGDM, October 10, 2007


Preliminaries and open questions

Are probabilities consistent with


physical intuition ?

Do probabilities encode the


physics of structure stability ?

NGDM, October 10, 2007


quantifying correlation in probabilistic framework

probability that both structures


occur in same system
estimated from database
correlated
g(2)(xi,xj)

Pair Cumulant
p  xi , x j 
gij  x i , x j =
p  x i p x j 
1 uncorrelated

probability that only xi occurs


anti-correlated

0
NGDM, October 10, 2007
how probabilities represent physics of mixing

Do probabilities embody real physical effects ?


Compounds stabilized by “size” effect:

p  xi , x j 
gij  x i , x j =
p  x i p x j

8.48 Fe3C
MgCu2

1 1
0 4 3
1 2 3 1
3 4
2

cB
Data from Pauling File, Binaries Edition

NGDM, October 10, 2007


how probabilities represent physics of mixing

Do probabilities embody real physical effects ?


Compounds stabilized by “size” effect:

p  xi , x j 
gij  x i , x j =
p  x i p x j

8.48 Fe3C
MgCu2 ~0

1 1
0 1 2 3 1
4 3 3 4
2

cB Places 'small' atoms


on 'large' atom sites
Data from Pauling File, Binaries Edition
G. Ceder
NGDM, October 10, 2007
how probabilities represent physics of mixing:
more interesting correlations
Gd2Co7 PuNi3
gij  x i , x j =54
AABAAB... stacking ABAB... stacking

A
B
A
A A
B B
A
A
B

Both structures share the same local


environments
NGDM, October 10, 2007
Structure correlation observations

Correlation factors are


probabilistic analogue
of heuristic rules

No explicit reference to physics.


Physics is embedded in
experimental data

NGDM, October 10, 2007


Information theory for structure stability

Suppose I know Fe3C forms @ c = ¾, how does this change


prediction @ c = ½ ?

How much information is carried by knowledge of structure ?

Mutual Information

Ii , j= ∑
xi , x j
p  x i , x j log
 p xi , x j 
p  xi  p  x j  
Ii , j= 〈 log [ g ij  xi , x j  ] 〉

NGDM, October 10, 2007


Information theory for structure stability

Each element of
matrix is correlation
between Xi and Xj

degree of correlation
Ii , j= ∑
xi , x j
p  x i , x j  log
 p x i , x j 
p  xi  p  x j  
e.g.,
Xi=”AB prototype”
and
Xj=”A2B prototype”

NGDM, October 10, 2007


Prediction and validation in Li-Pt

NGDM, October 10, 2007


Predicting structures in Li-Pt

??
LiRh MgCu2 CuPt7 a.k.a. MgPt7
AlB2

Use these as conditioning evidence for: p  x ∣e 

NGDM, October 10, 2007


Predicting structures in Li-Pt

Suggested
Known phases
phases

NGDM, October 10, 2007


cross validation to evaluate performance

Success of method Material


DFT Code on how
depends Property
short this list is Predictions

Set of likely
structure candidates
Machine Learning Database
Framework of
Computed and
p  x ∣e  Experimental
results

NGDM, October 10, 2007


Cross validation results

Length of List = average 'loss'

~28
candidates
req'd for freq.

10 candidates
--> 95%
chance of
seeing GS !!

Including structure
Independent Variables correlation
Nature Materials, 6, 641-646, 2006
NGDM, October 10, 2007
Some open questions

ICSD: World's Largest


database of inorganic
crystal structures What is the information
content in a chemical
database?

How many 'independent'


crystal structures exist in
nature ?

First Entry: 1913


# of entries: 100,243
# usable compounds: 29,962
# structure prototypes: 2,485

NGDM, October 10, 2007


Structure prediction: wrap-up

for i in (relevant chemistries) {


...
... Much more needed
here
getStablePhases(i);

...
...
Now have efficient
calculateProperty(i);
i = nextChemistry(); tool for this
}

NGDM, October 10, 2007


Directions for future work/collaboration

Material
DFT Code Property
Predictions

Database Machine Learning


of
Computed
Framework
results

NGDM, October 10, 2007


Directions for future work/collaboration

Set of features
● Charge Density
● Total energy

● Bulk moduli

● Coordination
DFT Code
● Bond strength

● Bond character

● Magnetic moments

● Polarization

● ...

NGDM, October 10, 2007


Directions for future collaboration

Material Properties
DFT Code -catalytic activity
-conductivity
-plasticity
-voltage/energy density

Machine Learning
Database
of Framework
Computed (functional mapping)
results

NGDM, October 10, 2007


The End

Data from High Throughput alloy study


Online structure predictor
http://datamine.mit.edu

ITR grant (DMR-031253)

NGDM, October 10, 2007


DELETE ME !!!

• introduce CMS, what is it being applied


to ?
• Data mining and materials design – make
some outline slide ?
• introduce structure prediction problem,
present our solution
• discuss higher order property prediction.
data management, dissemination

NGDM, October 10, 2007


DATASET NOTES

1335 alloys

3975 non-unique
compounds

4263 compounds total

alloys not containing


elements:
He, B, C, N, O, F, Ne, Si,
P, S, Cl, Ar, As, Se, Br,
Kr, Te, I, Xe, At, Rn

NGDM, October 10, 2007

Anda mungkin juga menyukai