Anda di halaman 1dari 6

QSAR

n QSAR = Quantitative Structure Activity


Relationships
QSAR n Current Applications
n Two-dimensional
n Three-dimensional: requires molecular alignment
n Foundation: Physical Organic Chemistry
n Relationships between structure and reactivity
(equilibrium and rate constants for related
structures)
n Originally formulated by Hammett, extended by
Taft and others

Hansch’s Application of the


Hammett Equation Log (1/C) Versus Log (P)
n Biological activity of indoleacetic acid-like
synthetic hormones Poor bioavailability
n Log(1/C) = -k1(logP)2+k2(logP)+k3σ+k4 Compounds that are too polar
log 1/C

n C: Concentration having a standard response in a will not partition into


standard time membranes in the first place
n P: Octanol/water partition coefficient
n Log P reflects pharmacokinetic influence on activity –
does the compound get where it needs to go? Compounds that are not polar
-3 -2 -1 0 1 2 3
n σ reflects pharmacodynamic influence on activity – enough cannot partition back
does the electronic nature of the compound induce log P out
activity?
n Why is there a squared log P term? This concept promoted a field of study called pharmacokinetics, studying ADMET
properties (Absorption, Distribution, Metabolism, Excretion and Toxicology).

Importance of Hansch’s Work Descriptors


n Demonstrated that biological activities could n Descriptor: A numeric representation of
be quantitatively related to physical and structure
chemical characteristics n Descriptors used in the Hansch approach
n Developed a group-additive method for (log P, σ) are empirical (derived from
calculating log P (so that compounds could be experimental observation)
predicted prior to their synthesis) n Limitations
n Utilized a QSAR equation to assist in n σ is a substituent descriptor -> won’t be
applicable to non-congeneric series
developing a physical interpretation or
Log P is an experimentally determined value -> a
generalization about biological activity n
computational method is needed before it can be
used to make predictions

1
Computing Log P Example Fragmentation
n Initial attempt – π method
n Use measured log P for largest possible
substructure n 2 Polar Fragments H
n Add contributions (π values) for substituents N CH3
n 7 ICs
n More Common – Fragment summation
methods n 7 ICHs O
Cl
n Hansch’s implementation: CLOGP
n Defines two hydrophobic fragment types
n Isolating carbons (ICs) – carbons not double or triple bonded
to a heteroatom
n Hydrogens attached to ICs (ICHs)
n Contiguous remaining groups are polar fragments

Other Considerations Example LogP Calculation


n Fragment environment -> different values
Cl amide Caliph Carom H correction factors
stored for fragments in these environments
n Aliphatic 0.94 + (-1.51) + 0.2 + 6(0.13) + 7(0.225) –0.12 + 0.30 – 0.84 = 1.34
n Benzyl Measured value = 1.28

n Vinyl H
N CH3

n Styryl
n Aromatic O

n Interactions among fragments Cl

n Handled by adding correction factors

Non-Empirical Descriptors Class Exercise I


n Topological n Build a molecule containing multiple
n Descriptors computed from structural formula functional groups (a drug for your disease)
n Conformation independent n Perform a conformational search of your
n Geometric choice, with an appropriate forcefield
n Descriptors computed from molecular geometry n Open the resulting database and use
n Conformation and stereochemistry dependent Compute->Descriptors to calculate all
n Electrostatic descriptors implemented in MOE for each of
n Descriptors computed from the charges or your conformations
charge distribution of the molecule n Which ones do not change with conformation?
n Some are conformation/stereochemistry n Which ones do change with conformation?
dependent

2
Weiner’s Path Number, w Weiner’s Path Number (cont’d)
n An example topological descriptor C1-C2: 1 C2-C3: 1 C3-C4: 1 C4-C5: 3
n Applied to QSPR of hydrocarbon boiling C1-C3: 2 C2-C4: 2 C3-C5: 2
points in 1947 C1-C4: 3 C2-C5: 1 C1 C2 C3 C4
n Sum of bond distances between carbon atom C1-C5: 2
pairs in the molecule Sum = 18 C5
n Physical meaning: a reflection of size and
Calculation is simplified by multiplying the number of heavy
compactness atoms on each side of every bond and summing
(1x4)+(3x2)+(4x1)+(1x4) = 18

Comparison of Structures Maximum Negative Charge


C5H12 Isomers C6H14 Isomers
n An example electronic descriptor
C C C C C C C C C C C n A measure of the atom with the greatest
2(1x4)+2(2x3) = 20
2(1x5)+2(2x4)+(3x3) = 35
C
partial negative charge
C C C C C n Physical meaning:
C C C C
3(1x5)+2(2x4) = 31
C n Might indicate ability of the molecule to accept a
C
3(1x4)+1(2x3) = 18 C C C C C
hydrogen bond or interact with a metal ion
3(1x5)+(2x4)+(3x3) = 32
C C
n Conformation dependence varies based on partial
C C C C
charge assignment method
Forcefield partial charges are generally conformation
C C C
4(1x5)+(2x4) = 28 n

C
C
and stereochemically independent
4(1x4) = 16 C C n Quantum mechanical charge distributions are
4(1x5)+(3x3) = 29
C C C C
generally conformation and stereochemically variable

Shadow Indices Shadow Indices (S1, S2, S3)


n An example geometric descriptor
n Calculated from the area of the molecule
projected onto the XY, YZ and XZ planes
n Physical meaning:
n Captures shape and size of molecule
n Orientation dependent
n Conformation dependent
n Stereochemistry independent

3
Electronic/Geometric QSAR Dataset
R R'
No. R R’ Log (1/C[M])
n Common 3D QSAR methods (COMFA, O O
1 methyl 1-methyl-1-propenyl 2.64
COMSIA…) use electronic descriptors HN NH 2 methyl 1-methylvinyl 2.12
calculated on a grid (thus having geometric 1-5
O
3 ethyl ethyl 3.09
dependence) 4 ethyl 1-methylbutyl 4.05
5 sec-butyl 2-methylallyl 3.42
n First requires alignment of molecules on the grid
n Alignment should place groups interacting with R''

common receptor sites in the same location O O

n This process results in a huge number of HN NH


No. R” ring Log (1/C[M])
O O

descriptors per molecule 6-7 O


6 Ethyl unsaturated 2.96 HN NH

n Many of the descriptors are correlated 7 isopropyl saturated 3.55


O
8
8 3.98

QSAR Exercise QSAR prediction


n Hypnotic effects of the previous barbiturates n Predict C for one of the following structures
were assayed between 1923 and 1949 using your equation:
n They are a subset of 108 barbiturate No. R R’ Log (1/C[M])
derivatives for which QSAR equations* have
R R'

O O 9 ethyl 1-methyl-1-propenyl 3.15

been derived based solely on logP and (logP)2 HN NH


10 propyl 1-methylvinyl 3.04

n Use the database on UMdrive: 11


12
ethyl
propyl
isobutyl
1-methylbutyl
3.63
3.90
https://umdrive.memphis.edu/aparrill/public/
O

13 ethyl 2-methylallyl 3.23


to derive a QSAR equation for the subset
n What is the average absolute residual error? n What is your residual error?

*Hansch, Steward, Anderson and Bentley, J. Med. Chem., 1968, 11, 1, 1.

Exercise Discussion The Descriptor Explosion


n Hypnotic effects depend solely on logP n Most programs used in QSAR can calculate
because: hundreds of standard descriptors + field-
based descriptors
n electronic effects on GABA receptor binding are
small relative to logP effects on transport n Quantitative models with an overwhelming
n metabolism of barbiturates is non-specific and number of independent variables are over-
dependent only on logP determined
n residual errors are significantly lower when the n Multiple sets of coefficients exist that fit the
set of compounds used to derive the model is dependent variable
larger n Most of these will not be predictive (fit the data,
but without physical meaning)

4
Variable Selection Principal Components Analysis
n Principle Components Analysis
n Principal components analysis is a variable
n Elimination of Correlated Descriptors reduction method (an alteration of the
n Genetic Function Approximation (GFA) coordinate system) – allowing visual analysis
n Implemented in Cerius2 of multi-dimensional data in fewer dimensions
n Evolves models with subsets of possible n The first principal component explains the
descriptors to improve the fit of the data maximum amount of variation possible in the
n Initially develops random population of QSAR models data set in one direction – the % of variation
Evaluates fitness (fit) of the models
n
explained can be precisely calculated
n Selects those with better features to create next
generation of models from

PCA – Example Suggested Preprocessing

n Autoscaling
n Needed if measurements are of different types
with different ranges

n Mean centering
n Always required for PCA due to orthogonality of
the components

Why Mean Centering? How Many Components (Rank)?

5
Class Exercise II PCA Strengths/Weaknesses
n Compute the principle components for your n Strengths
database from the previous exercise n Displays highly dimensional data with relatively few
n compute 10-20 descriptors other than log P plots
n you may need to delete fields with identical values for all n Can filter noise from data sets
structures first n Can determine amount of variation contained in
n Generate a principle components report each descriptor (loading)
n How many principle components are needed to n Weaknesses
describe >75% of the variability in the descriptors?
n Inherent dimensionality (rank) must be determined
How many for >90%?
n If the dimensionality is greater than three,
n Which descriptors contribute most significantly to
visualization is still difficult
the first principle component?

Reading
n Sections 2.2F & 2.2G
n Section 2.4 Problems 13-17

Anda mungkin juga menyukai