Anda di halaman 1dari 24

Analysis of Binding

Interactions:
Obtaining Meaningful
Values with Monte Carlo
Simulations

Simon Chiang

Thursday, April 16, 2009


Theoretical Studies of
Protein-DNA Binding Interactions:
Goal of studies:
Measurement of Energetic Model Effect of Proteins
Parameters on gene Expression

“Questionable” Resolution: “Meaningful” Resolution:


-Poorly reproducible values -Reproducible values
-Large standard deviations -Small standard deviations

Identification Monte Carlo Rational Design


Simulations

Thursday, April 16, 2009


Resolving Parameters
from Data
• Results of a single site protein-DNA
binding experiment
Fractional Saturation

1.0
0.8
0.6
0.4
0.2
0.0 Keq
+
-12 -10 -8
Log [Protein] ΔG = -RT ln Keq

Single Site Binding Isotherm Parameter to Resolve : ΔGbinding

Thursday, April 16, 2009


Fitting Program Detail
Fitting Program
Evaluate Fit of
Calculate Curve Data to Curve
(Least-Squares)

Refine Guess Values

Model for Data Fit Value ± Uncertainty


1.0 1.0
Fractional Saturation
Fractional Saturation

0.8 0.8
0.6 0.6
0.4
0.2 ΔGguess 0.4
0.2 ΔGfit±σGfit
0.0 0.0

-12 -10 -8 -12 -10 -8


Log [X] Log [X]

Thursday, April 16, 2009


Notes on fitting
 Fit Value only makes the fit curve optimized to a
single, imperfect data set.
ΔGfit1 = -10.2
Data Set 1
(kcal/mol)

 Different Data Set, Different Curve,


Different Fit Value
ΔGfit2 = -10.1
Data Set 2
(kcal/mol)

Thursday, April 16, 2009


Criteria for a Meaningfully
Resolved Parameter
Fit Program

Number of Occurences
ΔGfit1
ΔGfit2
ΔGfit3 σ
ΔGfit4
ΔGfit5 Fit ΔG Value
x5000 Single Peak, resolution of a single value
 Fit Values over many data sets follow a distribution
with a single peak in probability
 Distribution limited near this value (small uncertainty)

Thursday, April 16, 2009


Ideally Distributions are
Gaussian

Number of Occurences

σσ

Fit Value

• Fit programs generally report uncertainties in σ


• Standard deviations are only rigorously accurate for
Gaussian Distributions, and can be wildly inaccurate for
other distributions.

Thursday, April 16, 2009


Not all parameters follow
Gaussian Distributions

Number of Occurences
Fit Program

Fit Value
x5000 No single peak in probability

Perilous to trust the results of a fit program:


• No resolution of a single value
•Reported σ could be very inaccurate (non-gaussian)

Thursday, April 16, 2009


Triplicate is not enough.

Number of Occurences
Results of 3 experiments
given by dots

Fit Value

Suggests a gaussian with low σ, resolving a single value

 Repeating an experiment many times is necessary to


establish that a parameter is meaningfully resolved
(ie reproducible).

Thursday, April 16, 2009


Monte Carlo Simulations:
Repeating experiments computationally
Basic Procedure:

Number of Occurences
Simulated Data Sets

Fit Parameter Value

Experimental Data Sets


and Model for Data Parameter Distributions

•Simulated data points are made to resemble and


have uncertainties equal in magnitude to the
experimental data points

Thursday, April 16, 2009


Generating Simulated Data
A non-trivial task
 Several methods exist; subtle in approach,
sometimes long in statistical acceptance.

 A Universal Requirement: Uncertainties on


experimental data points must be well-resolved
 Generally requires repeating an experiment several
times.

Well-resolved
uncertainties on the data

Thursday, April 16, 2009


Utility of Monte Carlo
Simulations

Number of Occurences
Number of Occurences

vs
σ

Fit Parameter Value Fit Parameter Value

 Simulations allow an assessment of the parameter


distributions from several rather than thousands of
repetitions.
 Generally ~ few hrs CPU time for 10k data sets

Thursday, April 16, 2009


Overview of my program:
+ •Able to study multi-site, arbitrarily configured
systems binding multiple ligands undergoing
linked rxns

Binding Simulator
Program
Binding Curves & Derivatives
• Experiments that monitor
Fractional Saturation

Monte Carlo Simulations


•Experimental/Computer generated Data

Thursday, April 16, 2009


Bacteriophage λ
Right Operator (OR)

λ repressor dimer

cro gene
(lytic state)

OR1 OR2 OR3 cI gene


(lysogenic state)
(Ptashne, Mark A Genetic Switch 1986)

Thursday, April 16, 2009


OR Statistical Mechanical Model
•Complex set of equations: determined by five free
energy parameters. (Ackers, G. et al PNAS (1982) 79, 1129-1133)
3 Intrinsic site binding free energies:
ΔG1 ΔG2 ΔG3

ΔG12 ΔG23

Thursday, April 16, 2009


Why not to blindly trust fit
programs: A dramatic OR example
1.0
Fractional Saturation

0.8
Site 1
0.6 Site 2
0.4
Site 3
0.2

0.0

-12 -11 -10 -9 -8 -7 -6


Log [ λ repressor dimer]

Fit Values for two data sets (kcal/mol): Erroneous Conclusions:


Set 1 ΔG1= -12.5 Set1=2-12.5
ΔG Resolved Parameters
ΔG2= -10.5 ΔG2= -10.5 ΔG1, ΔG2*, ΔG12*
ΔG3= -9.4 ΔG3= 0
Unresolved Parameters
ΔG12= -2.9 ΔG12= -2.9
ΔG23= -2.9 ΔG23= -12.9
ΔG3, ΔG23

Thursday, April 16, 2009


Monte Carlo Analysis of OR
number of occurences

number of occurences

number of occurences
-12.8 -12.4 -12.0 -12.0 -11.0 -10.0 -11.0 -10.0 -9.0 -8.0
ΔG1 (σ = 0.1) ΔG2 (‘σ’= 0.5) ΔG3 (‘σ’= 0.5)
number of occurences

number of occurences
<σ> =
kcal/mol
-4.0 -3.0 -2.0 -1.0 -4 -2 0
ΔG12 (‘σ’= 0.5) ΔG23 (‘σ’= 0.5)
• Fit values, uncertainties only meaningful for ΔG1
• ‘standard deviations’ on other parameters are inaccurate

Thursday, April 16, 2009


Resolution problems endemic to
cooperative systems
 Root of poor resolution is a signal to noise issue.
Fractional Saturation
1.0

0.8 Site 1
0.6
Site 2
0.4

0.2
Site 3
0.0
-12 -11 -10 -9 -8 -7 -6
Log [ λ repressor dimer]

 Example: Sites 1 and 2 fully saturated due to cooperativity


at concentrations where site 3 begins to fill.
 Therefore there is little direct data on ΔG23; the pairwise
interaction between just site 2 and site 3.

Thursday, April 16, 2009


OR parameter resolution problem
recognized/addressed in the literature
 Published solution uses empirically chosen mutant
OR operators that emphasize poorly resolved
parameters:
OR+ OR-1

OR-3 OR-1 -3

• Global analysis of wild-type OR and 3 mutant


operators required resolution.
(Brenowitz, M., et al. PNAS (1986) 83, 8462-8466)

Thursday, April 16, 2009


MC Analysis of published OR
resolution technique:

number of occurences
number of occurences
number of occurences

-12.8 -12.6 -12.4 -12.2 -10.7 -10.5 -10.3 -9.6 -9.2 -8.8

ΔG1 (σ = 0.1) ΔG2 (σ = 0.1) ΔG3 (‘σ’ = 0.2)


number of occurences

number of occurences
<σ> =
kcal/mol
-3.2 -3.0 -2.8 -2.6 -3.6 -3.2 -2.8 -2.4

ΔG12 (σ = 0.1) ΔG23 (‘σ’ = 0.2)


•Standard deviations for ΔG3, ΔG23 still somewhat inaccurate
•Overall resolution is much better (more gaussian, lower σ)

Thursday, April 16, 2009


Improved resolution with rational
choice of mutants
• Study of distributions clarifies how different
mutants resolve specific parameters.
OR+

OR-2 OR-1 -3

• Global analysis of wild-type OR and 2 mutant operators


predicted to improve resolution

Thursday, April 16, 2009


MC Analysis of rationally designed
resolution technique:
number of occurences

number of occurences

number of occurences
-12.6 -12.5 -12.4 -10.7 -10.5 -10.3 -9.7 -9.5 -9.3
ΔG1 (σ = 0.05) ΔG2 (σ = 0.06) ΔG3 (σ = 0.06)
number of occurences

number of occurences
<σ> =
kcal/mol

-3.2 -3.0 -2.8 -2.6 -3.5 -3.0 -2.5 -2.0


ΔG12 (σ = 0.1) ΔG23 (σ = 0.2)
•Rational design results in meaningful fit values and standard
deviations for all parameters; resolution in fact improves.

Thursday, April 16, 2009


Summary
 Monte Carlo Simulations examine the
resolution of fit parameters without literally
repeating experiments thousands of times.
 Analysis of distributions can assist in the
rational design of experiments.

 In the future my program and the insights


obtained from analyzing OR will be applied to
study the 4-site PRE system.

Thursday, April 16, 2009


Acknowledgements
UCHSC Dept. of Pharmaceutical Sciences
 Dr. David Bain

Bain Laboratory
 Dr. Aaron Heneghan
 Nancy Berton
 Michael Miura

Thursday, April 16, 2009

Anda mungkin juga menyukai