NATAL - RN
2016
NATAL RN
2016
Data de aprovao:
___/___/___
Agradecimentos
Aos meus colegas de laboratrio, em especial ao Edjan, pelas ajudas mtuas nos
nossos respectivos projetos.
E aos meus amigos por me aguentarem usar o diploma de cientista pra ganhar
discusses.
PIGRAFE
The nitrogen in our DNA, the calcium in our teeth, the iron in our blood, the carbon in our
apple pies were made in the interiors of collapsing stars
. We are made of starstuff.
Carl Sagan.
LISTA DE FIGURAS
Figura 1
Figura 2
Figura 3
Figura 4
Figura 5
Figura 6
Figura 7
Figura 8
Figura 9
Figura 10
Figura 11
Figura 12
Pepstatina A e alofenilnorstatina.
Figura 13
Figura 14
Figura 15
Figura 16
Figura 17
Figura 18
Figura 19
Figura 20
Figura 21
Figura 22
Figura 23
Figura 24
Figura 25
Figura 26
Figura 27
Figura 28
Figura 29
Figura 30
Figura 31
LISTA DE TABELAS
Tabela 1.
Tabela 2.
Tabela 3.
Tabela 4.
Tabela 5
Tabela 6.
RESUMO
ABSTRACT
SUMRIO
1.
INTRODUO .........................................................................................................................................13
1.1
MALRIA................................................................................................................................................. 13
1.1.1
1.1.2
1.1.2.1
1.1.3
1.1.4
Resistncia ....................................................................................................................................... 22
1.2
DESCOBERTA DE FRMACOS........................................................................................................................ 23
1.2.1
1.2.2
QSAR-4D .......................................................................................................................................... 25
1.2.3
1.2.4
1.3
1.3.1
1.3.2
OBJETIVOS ..............................................................................................................................................37
2.1
MATERIAL ............................................................................................................................................... 38
3.1.1
Computadores. ................................................................................................................................ 38
3.1.2
3.1.3
Receptor........................................................................................................................................... 38
3.2
METODOLOGIA......................................................................................................................................... 39
3.2.1
3.2.2
3.2.3
3.2.4
3.2.4.1
3.2.4.2
3.2.5
3.2.5.1
Leave-N-out e y-randomization...............................................................................................................43
4.2
4.3
4.4
4D LQTA-QSAR ..................................................................................................................................... 55
4.5
REFERNCIAS ..........................................................................................................................................69
ANEXOS ..................................................................................................................................................76
7.1
7.2
ANEXO 2. SCRIPT REALIZADO PARA A REALIZAO DOS FILTROS VIRTUAIS NO MATLAB. ........................................... 76
7.3
ANEXO 3. TABELA COM A REGRESSO PLS PARA O MELHOR MODELO 4D LQTA-QSAR ......................................... 78
7.4
7.5
13
1. INTRODUO
1.1 Malria.
Os agentes etiolgicos da malria so protozorios eucariotos unicelulares
pertencentes ao filo Apicomplexa, da famlia Plasmodiidae e ao gnero Plasmodium,
onde as espcies P. falciparum, P. vivax, P. malariae, e P. ovale so conhecidas por
causarem a doena. A malria transmitida pela picada da fmea do mosquito
Anopheles, compartilhamento de agulhas e seringas infectadas, transfuso de
sangue, ou da gestante para o beb antes ou durante o parto. Os principais sinais
clnicos so febre alta, calafrio, sudorese e cefaleia. (MINISTRIO DA SADE, 2010).
Apesar de na maioria dos casos a malria no apresentar um risco vida, sua
manifestao em mulheres grvidas e crianas de extremo perigo. A espcie P.
falciparum a mais virulenta das formas de malria humana e o principal causador
de casos relatados de mortes em crianas, alm de ser a nica capaz de evoluir para
malria cerebral (SNOW; KORENROMP; GOUWS, 2004).
A resistncia crescente terapia atual (WHITE et al., 1999) e enquadramento
da malria como uma doena negligenciada (BRASIL, 2008) denota a importncia na
pesquisa em inovao para o planejamento de novos candidatos as frmacos
antimalricos. De fato, as tcnicas de modelagem molecular vem demonstrando um
papel importante na descoberta de candidatos a frmacos antimalricos (BARMADE
et al., 2015).
O custo de realizao desse tipo de projeto puramente computacional, e por
isso, bastante reduzido e seus resultados podem minimizar despesas desnecessrias
na sntese dos compostos ativos. O uso de altos nveis da teoria quntica molecular
mnimo nesse tipo de estudo, barateando ainda mais o custo computacional por
molcula investigada nos estudos QSAR. Dessa forma, com estaes de trabalho,
embora modestas, gera-se em tempo hbil resultados empricos que podem ser
utilizados diretamente na triagem de novos compostos bem como na proposio de
estruturas qumicas com possvel atividade in vitro e in vivo.
14
15
16
17
18
19
20
(atovaquona)
os
antibiticos
(tetraciclina,
doxiciclina
21
quimioterpicos,
como
sulfonamidas
(sulfadoxina),
tetraciclinas
(tetraciclina,
22
265 OZ439/PQ
(Takeda)
MMV
(Sanofi)
048 Ferroquina
(UCT/TIA)
Tafenoquina
Artesunato
Artesunato
(GSK)
retal
injetvel (Guilin)
co-
Pironaridina/Ar
ACT (Sigma
(Sanofi
trimoxazol( in tesunato
Aventis )
stituto
peditrico
Medicina
(Shin Poong)
Tau)
Tropical)
P218 (Novatis )
KAE609
ASAQ
(Sanofi
( Novartis )
Aventis/DNDi)
GSK030
KAF156
Pironaridina/Art
(GSK)
( Novartis)
esunato
(Shin Poong)
23
24
utilizados,
estabelecendo
quantitativamente
correlao
entre
25
bidimensionais (2D), que descrevem propriedades que podem ser calculadas de uma
representao 2D (ex., nmero de tomos, nmero de ligaes, ndices de
conectividade, entre outros); e tridimensionais (3D), que dependem da conformao
das molculas (ex., volume de Van der Waals, rea de superfcie acessvel ao
solvente, entre outros) (XUE; BAJORATH, 2000).
1.2.2 QSAR-4D
A anlise de QSAR-4D, proposta inicialmente por Hopfinger e colaboradores
(HOPFINGER et al., 1997) incorpora a liberdade conformacional com o alinhamento
para o desenvolvimento de modelos QSAR-3D levando em conta, ento uma quarta
dimenso para os descritores moleculares. Nessa abordagem, os valores dos
descritores so as ocupaes medidas para os tomos que compe as molculas do
conjunto investigado a partir da amostragem de conformao e do espao de
alinhamento (ANDRADE et al., 2010).
Uma nova abordagem de QSAR-4D, desenvolvida pelo grupo Laboratrio de
Quimiometria Terica e Aplicada (LQTA Unicamp) baseado na gerao de um
perfil de amostragem conformacional para cada composto. Ao invs de somente uma
conformao seguindo os princpios do QSAR-3D (KUBINYI, 1997). Esse perfil
conformacional seguido para os clculos de descritores 3D (MIFF - Molecular
Interaction Fields ou descritores de interao molecular).
Essa metodologia contempla simultaneamente, as principais caractersticas do
mtodo CoMFA (Comparative Molecular Field Analisys). Tcnica desenvolvida por
Cramer que demonstra que a propriedade biolgica dos compostos pode ser
correlacionada com as energias estricas e eletrosttica (descritores MIF)
provenientes das interaes formadas com o ligante no stio ativo do alvo biolgico
(CRAMER; PATTERSON; BUNCE, 1988) e o QSAR-4D (HOPFINGER et al., 1997).
Essas metodologias de QSAR-3D e 4D apresentam alguns problemas. No
QSAR-3D realizado pelo CoMFA, o fato da conformao bioativa no ser conhecida
acarreta ao usurio fazer a escolha de uma nica conformao para cada molcula
(geralmente a conformao bioativa). Em QSAR-4D a ausncia dos descritores de
interao dificultam a interpretao dos modelos obtidos (GHASEMI; SAFAVI-SOHI;
BARBOSA, 2012). Alis, no existe programas livres que possibilite a gerao de
descritores para essas duas metodologias. Existe uma verso gratuita do programa,
Open3DQSAR (TOSCO; BALLE, 2011), mas para utilizar o CoMFA necessrio
26
adquirir a verso do programa pago. Para usar o QSAR-4D, necessrio adquirir uma
licena colaborativa com o grupo do Hopfinger.
O LQTA-QSAR faz uso do pacote livre GROMACS (VAN DER SPOEL et al.,
2005) para calcular as simulaes de dinmicas moleculares (MD) e estimar o perfil
conformacional gerado. As simulaes de MD podem ser desenvolvidas considerando
molculas de solvente explicito, que gera aproximaes biolgicas mais precisas (mas
aumenta o custo computacional) ou com solvente implcito, menos dispendioso.
O programa open3DQSAR gera ao redor do alinhamento de todas as estruturas
uma caixa virtual cbica (grid) (Figura 6). A energia de interao para cada tipo de
campo de interao molecular (descritores estricos e eletrostticos) para onde, onde
em casa canto do grid, as interaes entre uma sonda e cada molculas so
calculadas (PATEL et al., 2014; TOSCO; BALLE, 2011).
A gerao de descritores MIF pelas metodologias de QSAR 3D geram uma
quantidade exorbitante de descritores. O nmero total de variveis disponveis muito
maior do que o nmero que ser efetivamente includo nos modelos, incluindo rudo e
informaes redundantes ou irrelevantes (TEFILO; MARTINS; FERREIRA, 2009a).
O objetivo da seleo de variveis diminuir o volume de processamento de data e
melhorar a performance preditiva do modelo(ARAKAWA; HASEGAWA; FUNATSU,
2007; GONZLEZ et al., 2008).
Figura 6. Representao de uma caixa cbica utilizada para calcular descritores MIF.
A sonda utilizada para calcular as energias de interao est no canto superior
esquerdo.
27
28
=
=1
6
2
= =1(
)
(1)
(2)
29
Fonte : adaptado de (a) (DE ARAJO SANTOS et al., 2014) e (b) (DOWEYKO, 2004).
1.2.4 Validao dos modelos de LQTA-QSAR
A validao do modelo QSAR essencial para garantir a qualidade da
capacidade de previso do modelo. Para tanto, a validao deve ocorrer tanto de
maneira interna (do prprio modelo) quanto de maneira externa (do grupo de
previso). A validao cruzada (leave-one-out - LOO) utilizada para determinar o
nmero de variveis latentes no modelo PLS (KIRALJ; FERREIRA, 2009). A validao
cruzada leave-one-out consiste em excluir uma amostra de cada vez do conjunto de
treinamento, construir um modelo sem essa amostra, e ento realizar a previso da
atividade para esta amostra deixada de fora. O procedimento realizado quantas
vezes for o nmero das amostras do conjunto de treinamento. A diferena entre o
valor experimental e o estimado para as amostras retiradas so usados para calcular
30
=1( )
= 1
=1( [])
Coeficiente de correlao de
( )
=1
( [])
determinao mltipla
= 1
( )
( [])
31
32
33
Val105,
Asp214. Phe111,
Ile123,
Tyr17,
Ile32,
Ser215,
Ser218,
Phe120,
Met15,
Ile123,Ile14,
Tyr77.
Thr114.
Pro243,
Leu271,
Phe244,
Met286,
Ile290,
Tyr273.
Tyr192,
Trp193,
Ile300,
Ile212,
Phe294.
Asn39,
Leu131,
Ser132,
Ile133.
34
35
baseados
nesse
ncleo
qumico
demonstraram
grande
36
37
2 OBJETIVOS
O objetivo geral dessa dissertao consiste no desenvolvimento de um modelo de
previso LQTA-QSAR para uma srie de inibidores da Plasmepsina II.
2.1 Objetivos especficos
38
3 MATERIAL E MTODOS
3.1 Material
3.1.1 Computadores.
Todos os estudos foram realizados no Laboratrio de Qumica Farmacutica
Computacional, em estaes computacionais operando em Linux, com processadores
Intel XeonE5-2690, 2.9 GHz, 20 MB cache, 16 GB RAM. Os softwares utilizados
esto disponveis atravs de licenas acadmicas ou so de uso livre.
3.1.3 Receptor
A estrutura cristalogrfica da enzima Plasmepsina II presente no Plasmodium
falciparum foi obtida a partir do banco de dados online Protein Data Bank ( PDB)
(BERNSTEIN et al., 1977) sob o cdigo 4CKU (JAUDZEMS et al., 2014), com
resoluo de 1,85 e valor-R de 0,209 (Figura 13). As estruturas, que apresentavam
resduos de aminocidos incompletos foram reconstrudas utilizando modelos de
homologia com o Swiss Model (GUEX; PEITSCH, 1997).
39
Fonte: <http://www.rcsb.org>
3.2 Metodologia
3.2.1 Padronizao do conjunto de dados e preparo das estruturas tridimensionais.
As estruturas 3D dos compostos foram desenhadas com o auxlio do programa
Marvin Scketch verso 15.9 (ChemAxon). O estado de protonao foi estimado para
pH = 7,40 e os tomos de hidrognio foram adicionados com o programa Avogadro
(HANWELL et al., 2012). Foi realizada uma inspeo criteriosa do banco de dados
selecionado da literatura utilizando o protocolo descrito por Fourches & Tropsha
(FOURCHES, 2010).
Todos os modelos moleculares, nos devidos estados de protonao, passaram
por uma minimizao de energia com o campo de forma MMFFFG94 e tiveram suas
energias otimizadas ao nvel semi-emprico PM6, com o programa Gaussian. Cargas
AM1-BCC (JAKALIAN; JACK; BAYLY, 2002) foram adicionadas com o UCSF Chimera
(PETTERSEN et al., 2004).
Para as simulaes de dinmica molecular, todo o conjunto de dados tiveram
suas ligaes parametrizadas para o campo de fora AMBER99SB-ILDN (LINDORFFLARSEN et al., 2010).
40
41
42
do Open3DQSAR e o script feito para aplicar os filtros virtuais nas matrizes do Matlab
esto disponveis no Anexo I.
,, 30 1 , ,, = ,,
,,
,, > 30 1 , ,, = 30 + ( 1 29)
(3)
43
O segundo filtro virtual foi realizado por meio da eliminao dos descritores de
baixa correlao, ou seja, se comportam como um vetor aleatrio. Para tal, so
eliminados os descritores que tem correlao de Pearson absoluta (|r|) com atividade
biolgica (y) no mesmo nvel de rudo aleatrio.
A ltima etapa de filtros virtuais visa eliminar os descritores com perfil de
distribuio muito diferente em relao a atividade biolgica (y), utilizando o
Comparative Distribution Detection Algorithm (CDDA) (BARBOSA; FERREIRA, 2012).
Tal algoritmo permite um modo de quantificar o quo similar y est distribudo para
um descritor especifico.
44
i)
= + ||
ii)
= + ||
(4)
45
4 RESULTADOS E DISCUSSES.
4.1 Caracterizao do conjunto de dados.
Um conjunto de dados de 49 compostos foi utilizado para a construo de
modelos LQTA-QSAR dependente de receptor (JAUDZEMS et al., 2014). A figura 15
mostra o esqueleto principal dos inibidores da srie escolhida. As molculas do
conjunto de dados variam a partir de quatro pontos de modificao, R1, R2, R3, R4,
mantendo sempre o radical Alofenilnorstatinas com um anel tiazolidina.
Seguindo a conveno de Schechter (SCHECHTER; BERGER, 1967), as
regies dos substratos, ou inibidores (P1-Pn/P1-Pn) so correspondentes a regies
que interagem no sitio ativo (S1-Sn/S1-Sn) das plasmepsinas (BHAUMIK;
GUSTCHINA; WLODAWER, 2012).
Figura 15 Farmacforo caracterstico da srie de compostos utilizada na construo
do modelo LQTA-QSAR. Anel de cinco membros tiazolidina e os locais de possveis
substituies (R1, R2, R3 e R4)
46
Composto
R1
R2
R3
pKi
CH3
8.04
CH3
8.07
CH3
CH3
7.92
CH2 CH3
9.16
CH2 CH3
CH2 CH3
8.29
CH2 CH3
CH2 CH3
8.37
CH2 CH3
CH2 CH3
CH2 CH3
Composto
pKi
Composto
pKi
9.30
17
6.95
10
7.14
18
6.14
11
7.65
19
7.02
47
Tabela 4 Continuao.
Composto
Composto
pKi
pKi
12
6.27
20
7.38
13
7.95
21
6.25
15
7.82
23
16
24
6.56
Composto
25
R1
R2
R3
R4
pKi
4.02
48
Tabela 4 Continuao.
Composto
R2
R3
CH3
CH3
5.06
4.89
29
CH3
CH3
7.15
30
5.03
31
6.22
26
27
R1
R4
pKi
49
32
CH3
CH3
7.52
CH3
CH3
7.69
5.09
33
35
Composto4
pKi
36
8.52
37
7.82
50
Tabela 4 Continuao
Composto4
pKi
38
8.0k5
39
9.7
40
8.3
Composto4
R1
R2
R3
pKi
41
CH3
7.75
42
CH3
8.30
43
NH2
8.40
44
NH2
7.92
45
OH
7.95
46
OH
8.40
47
OCH3
8.05
48
OCH3
7.77
49
OCH3
8.40
1Miura
et al., 2010; 2Kiso et al., 2004; 3Nezami et al., 2002; 4Hidaka et al., 2007.
51
52
53
54
55
56
49x31680
varincia
correlao
distribuio
(0,02)
(0,03)
(CDDA)
49x4925
49x4925
49x4342
37x246
49x4925
49x4925
49x198
37x197
(QQ)
49x31680 (LJ)
Figura 21. Resultado do filtro virtual referente ao tratamento dos descritores de LJ.
57
A matriz final com 37x443 foi levada ao programa QSAR modelling (Disponvel
em
<http://lqta.iqm.unicamp.br/QSAR_modelling/QSARmodeling.zip>)
para
0,95
0,79
10
N Variveis Latentes
2
58
59
60
Figura 24. Representao grfica dos descritores do modelo final. Vises com os
compostos 33 (a) e 25 (b).
61
62
63
Figura 27 Mapas de contorno obtidos por Muthas (MUTHAS et al., 2005). A regio
S1 est assinalada pelo mapa verde, que representa regio estrica favorvel para
presena de grupos volumosos.
64
Figura 28. Composto 39 (azul) com pKi: 9,7 e 25 (verde), pKi: 4,02. Em roxo os
resduos de aminocidos prximos aos descritores.
65
66
67
68
5 CONCLUSES E PERSPECTIVAS.
69
REFERNCIAS
AFONSO, A. et al. Malaria parasites can develop stable resistance to artemisinin but
lack mutations in candidate genes atp6 (encoding the sarcoplasmic and endoplasmic
reticulum Ca2+ ATPase), tctp, mdr1, and cg10. Antimicrobial agents and
chemotherapy, v. 50, n. 2, p. 4809, 1 fev. 2006.
ANDRADE, C. H. et al. 4D-QSAR: perspectives in drug design. Molecules (Basel,
Switzerland), v. 15, n. 5, p. 328194, maio 2010.
ARAKAWA, M.; HASEGAWA, K.; FUNATSU, K. The recent trend in QSAR modeling Variable selection and 3D-QSAR methods. Current Computer-Aided Drug Design,
v. 3, n. 4, p. 254262, 2007.
ARAV-BOGER, R.; SHAPIRO, T. A. Molecular mechanisms of resistance in
antimalarial chemotherapy: the unmet challenge. Annual review of pharmacology
and toxicology, v. 45, p. 56585, 7 jan. 2005.
ASOJO, O. A. et al. Novel uncomplexed and complexed structures of plasmepsin II,
an aspartic protease from Plasmodium falciparum. Journal of Molecular Biology, v.
327, n. 1, p. 173181, 2003.
AZOUZI, S.; EL KIRAT, K.; MORANDAT, S. Hematin loses its membranotropic activity
upon oligomerization into malaria pigment. Biochimica et Biophysica Acta (BBA) Biomembranes, v. 1848, n. 11, p. 29522959, 2015.
BARBOSA, E. G.; FERREIRA, M. M. C. Digital Filters for Molecular Interaction Field
Descriptors. Molecular Informatics, v. 31, n. 1, p. 7584, 11 jan. 2012.
BARMADE, M. A et al. Discovery of anti-malarial agents through application of in silico
studies. Combinatorial chemistry & high throughput screening, v. 18, n. 2, p. 151
187, 2015.
BARREIRO, E. J. Medicinal Chemistry and the paradigm of the lead compound.
Revista Virtual de Qumica, v. 1, n. 1, p. 2634, 2009.
BAS, D. C.; ROGERS, D. M.; JENSEN, J. H. Very fast prediction and rationalization of
pKa values for protein-ligand complexes. Proteins, v. 73, n. 3, p. 76583, 15 nov.
2008.
BASCO, L. K.; RINGWALD, P. Molecular epidemiology of malaria in Yaounde,
Cameroon I. Analysis of point mutations in the dihydrofolate reductase-thymidylate
synthase gene of Plasmodium falciparum. The American journal of tropical
medicine and hygiene, v. 58, n. 3, p. 36973, 1998.
BERNSTEIN, F. C. et al. The Protein Data Bank: a computer-based archival file for
macromolecular structures. Journal of molecular biology, v. 112, n. 3, p. 53542, 25
maio 1977.
BHAUMIK, P. et al. Structural insights into the activation and inhibition of histo-aspartic
protease from Plasmodium falciparum. Biochemistry, v. 50, n. 41, p. 88628879,
2011.
BHAUMIK, P.; GUSTCHINA, A.; WLODAWER, A. Structural studies of vacuolar
plasmepsins. Biochimica et Biophysica Acta - Proteins and Proteomics, v. 1824,
n. 1, p. 207223, 2012.
BIAMONTE, M. A.; WANNER, J.; LE ROCH, K. G. Recent advances in malaria drug
discovery. Bioorganic & medicinal chemistry letters, v. 23, n. 10, p. 282943, 15
70
maio 2013.
BJELIC, S. et al. Computational inhibitor design against malaria plasmepsins. Cellular
and Molecular Life Sciences, v. 64, n. 17, p. 22852305, 2007.
BOSS, C. et al. Inhibitors of the Plasmodium falciparum parasite aspartic protease
plasmepsin II as potential antimalarial agents. Current medicinal chemistry, v. 10, n.
11, p. 883907, 2003.
BRAGA, . M. .; FONTES, C. J. F. Plasmodium - Malria. In: NEVES, D. P. (Ed.). .
Parasitologia Humana. 11. ed. So Paulo: Atheneu, 2005. p. 143161.
BRASIL. MINISTRIO DA SADE. SECRETARIA DE VIGILNCIA EM SADE.
DEPARTAMENTO DE VIGILNCIA. Malarias treatment in Brazil practical guide.
2010.
BRASIL. MINISTRIO DA SADE.SECRETARIA DE CINCIA, T. E I. E. (SCTIE). D.
DE C. E T. (DECIT). P. EM SADE NO B. Revista de Sade Pblica. v. 42, n. 4, 2008.
BRINKWORTH, R. I. et al. Hemoglobin-degrading, aspartic proteases of blood-feeding
parasites. Substrate specificity revealed by homology models. Journal of Biological
Chemistry, v. 276, n. 42, p. 3884438851, 2001.
CHIRICO, N.; GRAMATICA, P. Real external predictivity of QSAR models: how to
evaluate it? Comparison of different validation criteria and proposal of using the
concordance correlation coefficient. Journal of chemical information and modeling,
v. 51, n. 9, p. 232035, 2011.
COHEN, N. C. Guidebook on molecular modelling in drug design. San Diego:
Press, Academic, 1996.
CRAMER, R. D.; PATTERSON, D. E.; BUNCE, J. D. Comparative molecular field
analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. Journal
of the American Chemical Society, v. 110, n. 18, p. 595967, 1 ago. 1988.
CRUCIANI, G. Molecular Interaction Fields. Perugia IT: Wiley - VCH Verlag GmbH
& Co. KGaA, 2006.
DALRYMPLE, U.; MAPPIN, B.; GETHING, P. W. Malaria mapping: understanding the
global endemicity of falciparum and vivax malaria. BMC Medicine, v. 13, n. 1, p. 140,
2015.
DAN, N.; BHAKAT, S. New paradigm of an old target: An update on structural biology
and current progress in drug design towards plasmepsin II. European journal of
medicinal chemistry, v. 95, p. 324348, 2015.
DE ARAJO SANTOS, R. A. et al. Mixed 2D3D-LQTA-QSAR study of a series of
Plasmodium falciparum dUTPase inhibitors. Medicinal Chemistry Research, v. 24,
n. 3, p. 10981111, 6 ago. 2014.
DE FARIAS SILVA, N.; LAMEIRA, J.; ALVES, C. N. Computational analysis of aspartic
protease plasmepsin II complexed with EH58 inhibitor: A QM/MM MD study. Journal
of Molecular Modeling, v. 17, n. 10, p. 26312638, 2011.
DONDORP, A. M. et al. Artemisinin resistance: current status and scenarios for
containment. Nature reviews. Microbiology, v. 8, n. 4, p. 27280, abr. 2010.
DOUGLAS, R. G. et al. Active migration and passive transport of malaria parasites.
Trends in Parasitology, v. 31, n. 8, p. 357362, 2015.
DOWEYKO, A. M. 3D-QSAR illusions. Journal of Computer-Aided Molecular
71
72
73
74
75
76
6 ANEXOS
6.2 Anexo 2. Script realizado para a realizao dos filtros virtuais no Matlab.
vdwname= VDW;
elename= ELE';
ele=reshape (ele, 31680, 54);
vdw=reshape( vdw, 31680, 54);
ele3=ele2';
vdw3=vdw2';
varLJ=var ( vdw3);
vdw4=vdw3;
vdw4(:,varLJ<0.02)=[];
[vdw4,varLJ]=cutoff_mesmo (vdw3,0.02);
ele4=ele3;
ele4(:,varLJ<0.02)=[];
elename2=elename;
77
elename2(:,varLJ<0.02)=[];
vdwname2=vdwname;
vdwname2(:,varLJ<0.02)=[];
vdw5=vdw4;
elimin=[19222731 34];
vdw5(elimin,:)=[];
ele5(elimin,:)=[];
extLJ=vdw4(elimin,:);
extQQ=ele4(elimin,:);
corr_cut_val(y)
[vdw6,vdwname3] = cdda(vdw5,y,vdwname2,0.1,0.3);
[ele6,elename3] = cdda(ele5,y,elename2,0.1,0.3);
[XoutLJ,Xname_outLJ] = inter_corr_rem(vdw6,vdwname3,y,0.9);
[XoutELE,Xname_outELE] = inter_corr_rem(ele6,elename3,y,0.9);
XN=[XoutLJ XoutELE];
XnameN=[Xname_outLJ Xname_outELE];
d = pdist(X2);
z = linkage(d);
dendrogram(z,0);
ytrain=y;
ytrain(testset)= [];
yext=y(testset);
XN2=XN;
XN2(testset,:)=[];
Xext2=XN(testset,:);
save descN.dat XN2 -ascii
save yextN.day yext -ascii
save ytrainN.dat ytrain ascii
78
6.3 Anexo 3. Tabela com a regresso PLS para o melhor modelo 4D LQTA-QSAR
34,58622
32,98143
34,62861
33,94449
35,11832
32,38606
34,95184
36,42481
36,65759
33,80257
34,09216
34,86923
37,24233
32,53176
32,31909
35,07513
34,57823
34,22676
37,09978
35,7403
31,32228
35,7394
33,65535
38,32029
34,15938
34,77232
38,41507
37,83609
35,09341
37,11355
15,6297
31,49017
34,96353
35,49317
34,58876
35,66969
33,1254
-26,5864
22,53707
-27,6889
-6,85812
-0,23186
-29,9775
20,48283
6,637085
-7,71524
6,58935
-9,16186
11,40218
-0,51467
21,67416
15,34149
48,28254
51,33432
12,35282
64,68356
14,10081
-22,6374
-15,1101
-17,7531
-16,2242
-25,1789
32,35683
49,83027
-16,5183
-12,5611
66,75574
-3,33622
-14,5028
36,41247
-26,2865
-61,3604
-49,9338
-57,3448
34,00342
35,69502
37,59481
34,14438
34,08909
35,25512
34,83275
35,37526
35,50188
34,68317
36,74104
36,6443
35,71986
33,56827
32,97112
33,66425
34,93441
34,76508
36,57412
36,9441
36,53522
36,5642
38,32017
36,13382
37,30352
38,02796
36,05158
36,32409
37,60172
35,38187
38,70792
36,19517
34,07679
34,15069
36,58301
35,84031
18,57819
Vetor de regresso
0,073078
0,0065555
0,06235406
-0,00486644
0,01830458
0,02077115
0,00309439
-3,49385318
0,01240195
0,01425531
-61,7916
-16,6665
-41,5923
-73,6739
-39,8034
43,96798
-70,1234
-106,441
8,694194
29,13157
9,337798
31,74899
-201,33
-22,4891
-55,7464
-46,7692
-1,39963
-11,9591
17,38044
-59,3608
-30,3597
-54,7008
19,45007
-25,0638
-52,5167
-25,482
-72,5363
-55,4929
-50,417
-37,1926
-3,98568
17,63766
-80,2636
19,91936
-12,3259
-2,06757
-7,83085
Descritores
1,850097 1,369324
31,35194 0,080294
10,13478 7,518929
20,57318 4,625244
3,565506 19,17647
31,7857 4,641404
-0,27574 2,483003
31,72549 0,489916
7,526613 22,50404
-0,32773 7,321241
32,17135 1,860433
-0,18126 3,500519
30,79509 31,97454
32,14644 32,28259
11,44379 22,66038
-0,38247 16,65829
34,15524 1,175658
31,39277 8,121331
29,12618 11,16674
0,94865 1,605744
31,61793 32,20605
32,2582 1,069265
11,346 30,97021
31,19549 8,712752
32,37888 9,158957
9,167486 1,91983
32,88108 25,61545
9,917931 2,189746
-0,33218 31,99676
36,23695 1,368796
-0,57096 -0,29545
1,271896 1,133354
1,663467 5,951532
6,182395 6,521636
0,097181 3,728484
1,707503 0,25094
-0,32949 -0,45855
-57,5891
-70,9426
-88,1076
-90,0973
-88,5303
-76,798
-115,336
-81,7706
-43,8311
-32,6947
-41,4951
-59,4224
-94,3062
-45,0082
126,2354
-203,218
9,314931
53,30782
-22,9858
-17,67
8,581951
-21,4922
-29,9616
-0,3877
-21,8296
-40,9488
-5,16806
0,246465
-25,2275
-39,1844
-17,3468
-29,5976
-82,1483
-44,2121
-111,765
-178,711
-64,8361
-0,13128
-0,11563
-0,13269
-0,13062
-0,09555
-0,05589
-0,1232
0,078381
-0,14196
-0,05752
-0,06722
-0,07365
-0,17544
-0,04788
-0,17194
-0,11206
-0,1251
-0,13414
-0,10867
-0,15395
-0,20464
-0,22532
-0,23587
-0,10226
-0,23007
-0,05417
-0,145
-0,10417
-0,1353
-0,1376
-0,02351
-0,03437
-0,23326
-0,07405
-0,06412
-0,05801
-0,06766
-16,3696
-17,2638
-11,3574
-12,1956
-14,5177
-16,3869
-19,9878
-15,839
-18,3921
-29,6135
-17,274
-13,8183
-6,58456
14,3798
-12,0359
14,95574
7,044745
21,48538
-16,2568
-3,41493
-10,9782
-18,4599
-15,1136
-13,1842
-33,4492
-13,6285
0,512689
-11,3757
-3,2853
-10,6176
-13,3842
-14,0198
-13,9145
-27,2159
-45,2052
-62,8218
-77,6581
6,685121
-0,03453
10,15886
2,33021
10,86729
-0,05132
22,26222
31,3525
32,48228
2,604802
4,029307
28,92
33,22965
-0,22204
0,32527
31,26275
4,931028
18,23493
32,19693
26,60423
-0,3311
31,00199
1,94004
33,25722
3,210202
17,4188
33,03444
33,07935
6,459479
32,36128
-0,32932
-0,29345
8,562773
13,87245
7,617981
23,67948
1,642828
79
Termo independente
1,75631127
2,527491639
-0,174287211
2,120251104
0,300705017
0,033865249
0,028442434
-0,17820298
0,458684925
-0,203014465
0,095298472
5,009234183
2,366709
-0,19652
2,1983
-0,21397
0,581824
0,096407
-0,23764
0,195276
-0,20323
-0,00073
4,586427
2,55421
0,134275
2,171963
0,341251
-0,00505
0,051575
-0,3569
0,430448
-0,24789
0,317355
5,391248
2,661852
0,043509
2,205791
0,517987
0,580722
0,010176
-0,25303
-0,27385
-0,19643
0,44694
5,74366
2,678864
-0,05058
2,213686
-0,04231
0,137771
0,467435
-0,13563
0,495974
-0,2281
0,463045
6,00016
2,470224
0,043196
2,162636
-0,14177
-0,006
0,152071
-0,10117
0,200955
-0,36727
0,037132
4,450013
2,491387
-0,06006
2,290953
-0,04544
0,588883
0,038643
-0,1284
0,23486
-0,21423
0,057439
5,25403
2,28897
-0,1484
2,278119
0,147743
0,578753
0,668957
0,026556
0,714988
-0,13615
-0,00472
6,414816
1,142187
-0,02187
2,413596
0,019396
-0,01045
-0,00614
-0,05368
0,082127
2,807297
0,326662
2,247962
0,352994
0,601874
0,532062
-0,01599
0,506592
0,006358
0,470916
7,836726
2,764986
-0,10829
2,264954
0,270053
0,181544
0,045484
0,000763
0,36397
-0,14108
0,471556
6,113943
2,564556
-0,08234
2,34462
0,245351
-0,00608
0,664609
-0,07806
0,472733
-0,04074
0,092082
6,176719
2,712184
0,437617
2,206203
0,180995
0,663302
0,028431
-0,12125
0,480766
-0,13168
0,46132
6,917889
80
Somatrio +
TI
6,765545453
7,166518793
7,18858251
7,211703045
7,138796681
6,342738668
7,147559184
7,499971606
7,756471563
6,206323902
7,010341674
6,893386475
9,622583286
7,940478713
8,072750606
7,69558736
8,078051863
8,467816936
8,418469763
7,923881317
8,17112743
8,361578414
7,816739221
8,244042377
7,819283364
7,354249284
9,59303709
7,870254531
7,933030118
8,674200199
5,150797243
6,030835479
7,72984956
6,441013222
5,728186628
5,525098729
4,078185971
-0,56063
0,108597
3,971875
Ycal
6,765546
7,166519
7,188583
7,211704
7,138797
6,342739
7,14756
7,499972
7,756472
6,206324
7,010342
6,893387
9,622584
7,940479
8,07275
7,695589
8,078052
8,467817
8,41847
7,923881
8,171127
8,361579
7,816739
8,244042
7,819284
7,35425
9,593037
7,870255
7,93303
8,674201
5,150797
6,030836
7,72985
6,441013
5,728187
5,525099
4,078186
1 C VDP
HETATM
2 C ELP
81
HETATM
3 C VDP
HETATM
4 C ELN
HETATM
5 C VDP
HETATM
6 C VDP
HETATM
7 C ELP
HETATM
8 C VDN
HETATM
9 C ELP
HETATM 10 C VDP
C
C
C
C
END
Carolina Arruda Braz Edjan Carlos Dantas da Silva Euzbio Guimares Barbosa *
Department of Pharmacy. Natal, RN-Brazil.
Email:
Abstract
Malaria represents an actual scenario of importance for Drug Discovery. Plasmodium falciparum, the most virulent
form of the parasite, adds the increasing resistance of the current therapy to the issue. Plasmepsin II is part of a
group of aspartic protease responsible for Plasmodium's metabolism of hemoglobin, representing a potential
pharmaceutical target for malaria inhibition. The repertoire of inhibiting compounds from plasmepsin II were used
to build LQTA-QSAR models. Molecular Dynamics was performed previously, to understand better the potential
inhibitors binding mode and to compose the construction of the predicting model. The LQTA-QSAR model was
obtained by calculating Coulomb and Lennard-Jones descriptors from the aligned conformations with
Open3DQSAR. The model was built using 37 samples with 10 variables and 2 latent variables using PLS
regression. The model obtained a Q:0.92; R: 0.95 and a prediction Q: 0.78 from a test set of 12 molecules. With
a better understanding of the binding mode through molecular dynamics, it was possible to reach better results on
constructing a 4D predicting model instead of a 3D, mainly due to the difficulty presented by the protein flexibility.
The model has good statistics, robustness (by Leave-N-out-tests) and can be used to predict the activity of proposed
Plasmepsin II inhibitors.
82
Introduction
Malaria is caused by a parasite of the genre Plasmodium and constitutes a large threat to human health. Infections
with malaria parasite results in half a million cases of death, mostly among African children (ORGANIZATION,
2014). Of the four species of Plasmodium parasites, P. vivax, P. ovale, P. malariae and P. falciparum, the last is
the most lethal and responsible for the increase in malaria resistance for the current drug therapy (HYDE, 2009;
SNOW; KORENROMP; GOUWS, 2004; WHITE et al., 1999). Artemisin-based combination therapies have been
recommended by the World Health Organization as first line treatment for P.falciparum malaria in endemic
countries, but increasing resistance to these treatments has stimulated the search and development of new drugs
with novel modes of action(BIAMONTE; WANNER; LE ROCH, 2013; DONDORP et al., 2010; KLEIN, 2013;
WELLS; VAN HUIJSDUIJNEN; VAN VOORHIS, 2015; WHITE, 2004).
The life cycle of the parasite involves a sexual (invertebrate host) and asexual (hepatocyte and erythrocyte) stage
in which the latter takes place in the human body. Most symptoms manifest themselves when the developing
parasites are released from the red blood cells. The parasite degrades human hemoglobin in the food vacuole to
produce essential amino acid for development and maturation (KHAN; WATERS, 2004; TUTEJA, 2007). A class
of eukaryotic Aspartate proteases of P. falciparum, plasmepsins (Plms) are involved in this process and are
currently under investigation as drug targets (ERSMARK; SAMUELSSON; HALLBERG, 2006; HUIZING;
MONDAL; HIRSCH, 2015).
Ten different plasmepsins have been identified in the P. falciparum while four (I IV) are active in the food
vacuole during intra-erythrocytic (BHAUMIK; GUSTCHINA; WLODAWER, 2012). Plasmepsins are one of the
oldest targets that still hold interest among drug discovery to develop novel antimalarial drugs(DAN; BHAKAT,
2015). Structure-based drug design focus mainly in Plm II as a target since it is the first protein which a crystal
structure available, consisting of a starting point to develop novel protease inhibitors as promising antimalarial
agents (HUIZING; MONDAL; HIRSCH, 2015; SILVA et al., 1996). Plm II sequence also enabled the
development of in silico models for Plm I, IV and HAP as the binding site region of these proteins are similar
(ERSMARK; SAMUELSSON; HALLBERG, 2006).
The mature Plm II consists of 329 amino acid residues and a single peptide chain folded into two domains (C- and
N-terminal). It's connected to the bottom of the binding cleft via a catalytic dyad ( formed by Asp34 and Asp214
residues ), a -hairpin turn and the flap, which covers the active site and interacts with the substrate (ASOJO et
al., 2003). Crystal structure of Plm II ( PDB code 4CKU (JAUDZEMS et al., 2014)) and a list of important amino
acid residues in their respective pockets (BRINKWORTH et al., 2001) are displayed in Figure 1. The pockets are
represented according to Schechter (SCHECHTER; BERGER, 1967) adapted from Huizing et al. (HUIZING;
MONDAL; HIRSCH, 2015) ( Table 1). In blue the S2 pocket; in orange the S1 pocket; in yellow the flap; in red
the catalytic dyad; in green the S1/S3 pocket and the S2 pocket in purple.
Fig. 1 Plasmepsin II crystal structure.
83
Flap
S1/S3
S2
S1'
S2'
Asp34,
Asp214.
Pro43,
Met75,
Val105,
Phe111,
Ile123.
Tyr17,
Ile32,
Ser215, Ser218,
Phe120, Met15,
Ile123,Ile14,
Thr114.
Pro243, Leu271,
Phe244,
Met286, Ile290,
Tyr273.
Tyr192,
Trp193,
Ile300,
Ile212,
Phe294.
Asn39,
Leu131,
Ser132,
Ile133.
Peptidomimetic inhibitors are the first choice when developing compounds targeting proteases due to its isosteric
nature with non-cleavable transition-state complex(DAN; BHAKAT, 2015). Allophenylnorstatine based
compounds have been selected as they mimic the protein substrate transition state. Since plasmepsins are capable
of making an initial cleavage in the hemoglobin alpha chain between residues Phe33 and Leu34 (DE FARIAS
SILVA; LAMEIRA; ALVES, 2011; FRIEDMAN; CAFLISCH, 2007). These compounds are characterized as
containing a unique unnatural allophenylnorstatine [Apns; (2S, 3S) amino-2-hydroxy-4phenylbutyric acid]
(NEZAMI et al., 2002).
Quantitative structureactivity relationship (QSAR) methods have become an essential part of modern drug design,
allowing cost and time saving. There is no more a drug developed without previous QSAR analyzes (ANDRADE
et al., 2010). Since these studies are helpful in understanding and explaining the drug action mechanism at the
protein active site in addition to suggesting designs of more specifics drug candidates (VERMA; KHEDKAR;
COUTINHO, 2010). QSAR methodology is based on the concept of quantitative correlation between differences
presented in a biological activity and differences in their structural or physicochemical properties per statistical or
mathematical tools (PATEL et al., 2014).
3D QSAR analysis is a common method used in molecular design, but the 4D QSAR analysis introduced initially
by Hopfinger et al. (HOPFINGER et al., 1997), includes the conformational flexibility and freedom of alignment
to the development of the model. In this study, an alternative 4D QSAR approach LQTA-QSAR (MARTINS et
84
al., 2009) and such model can be used to screen libraries of compounds and design novel potentially active
molecules.
In this study, molecular dynamics simulations and docking were combined to simulate the interaction of
allophenylnorstatine-based compounds within the active site of the Plasmepsin II for a better understanding of the
binding mode of these inhibitors. A LQTA-QSAR 4D model based was built to predict inhibitory potency and
provide means to design new drug-like ligands.
Methods
Dataset of known Plasmepsin II inhibitors
A dataset of Allophenylnorstatine-based Plasmepsin II inhibitors with a common thiazolidinecarboxylic ring and
their biological activities (Ki values) was taken from the literature (HIDAKA et al., 2007; KISO et al., 2004;
MIURA et al., 2010; NEZAMI et al., 2002). Divided into a training set of 37 compounds and a test set of 12
compounds, chosen by means of picking samples evenly in a dendrogram derived from Hierarchical Cluster
Analysis. The Hierarchical Cluster Analyses was performed with a complete as linkage scheme, considering the
distribution of both biological data values and structure diversity. The K i values in units of molarity (M) were
transformed in pKi (-log Ki). Also, inaccurate values were not included in the study. Molecules in the test set were
used to evaluate the predictivity of the LQTA-QSAR model developed. Compound 33 (Table 2) was used as
reference molecule (template) due to its high potency.
The chemical structures and biological activity values of all compounds are present in Table 2.
Composto
R1
R2
R3
pKi
CH3
8.04
CH3
CH3
7.7
CH3
8.07
CH3
CH3
7.92
CH2 CH3
9.16
CH2 CH3
CH2 CH3
8.29
CH2 CH3
CH2 CH3
8.37
CH2 CH3
CH2 CH3
CH2 CH3
85
Composto
pki
Composto
pki
9.30
17
6.95
10
7.14
18
6.14
11
7.65
19
7.02
12
6.27
20
7.38
13
7.95
21
6.25
14
6.26
22
7.25
15
7.82
23
86
16
Composto
R1
24
6.56
R2
R3
R4
Pki
25
4.02
26
CH3
CH3
5.06
27
4.89
28
CH3
CH3
5.85
29
CH3
CH3
7.15
87
30
5.03
31
6.22
32
CH3
CH3
7.52
CH3
CH3
7.69
34
35
5.09
33
88
Composto4
pKi
36
8.52
37
7.82
8.05
38
39
9.7
40
8.3
89
Composto4
R1
R2
R3
pki
41
CH3
7.75
42
CH3
8.30
43
NH2
8.40
44
NH2
7.92
45
46
47
OH
OCH3
OH
-
7.95
8.40
8.05
48
OCH3
7.77
49
OCH3
8.40
All the 3D structures were constructed using MarvinSketch (MARVINSKETCH, [s.d.]). The molecular
protonation state was estimated at pH = 7.40 to simulate physiological conditions, and hydrogens were added
properly. All structures were curated using the Fourches and Tropsha protocol (FOURCHES; MURATOV;
TROPSHA, 2010) to guarantee proper final model quality. Furthermore, energy minimizations were performed
with the MMFF94 force field and semi-empirical PM6 level of energy with Gaussian (FRISCH, M. J.; TRUCKS,
G. W.; SCHLEGEL, H. B.; SCUSERIA, G. E.; ROBB, M. A.; CHEESEMAN, J. R.; SCALMANI, G.; BARONE,
V.; MENNUCCI, B.; PETERSSON, G. A.; NAKATSUJI, H.; CARICATO, M.; LI, X.; HRATCHIAN, H. P.;
IZMAYLOV, A. F.; BLOINO, J.; ZHENG, G.; SONNENB, 2009), AM1-BCC charges were added using UCSF
Chimera (PETTERSEN et al., 2004).
The Receptor
A Plasmepsin II 3D structure was retrieved from Protein Data Bank under the code 4CKU (JAUDZEMS et al.,
2014). The missing amino acid residues side chains were properly rebuilt using Swiss-Model webserver (GUEX;
PEITSCH, 1997), and their protonation states were set to pH=7.40 using PROPKA3 web server (BAS; ROGERS;
JENSEN, 2008). The final protein model was used to perform molecular docking with the compound 33 (most
active one).
Molecular docking
Automatic molecular docking was not able to give correct bioactive conformers. Every automatically docked
conformations diverge extensively from what was expected for Allophenylnorstatine-based Plasmepsin II
inhibitors binding poses. Based on the known ligand co-crystalized with the receptor (JAUDZEMS et al., 2014).
Even redocking experiments were utterly unsuccessful. It was decided to perform a manual docking procedure
with the atom-by-atom mouse movement and torsions adjustments tool implemented in UCSF Chimera. Geometry
90
optimizations were carried out to eliminate distortion on atoms positions. The manually docked ligands were
further optimized to simulate induced fit performing molecular dynamics simulations.
The Molecular Dynamics (MD) simulations for all ligand-receptor complexes were performed using the
GROMACS software. Every initial state was obtained from a previous docking procedure. The simulations of all
the ligands were executed considering implicit solvent using Generalized Born formalism (FAN et al., 2005).
Initially, all complexes had their geometry optimized utilizing the steepest descent algorithm and if the criteria for
convergence was not found the LBFGS algorithm was used to reach system convergence. Restraints were added
to the protein backbone and partially to the side chains. A leap-frog algorithm integrator was used for integrating
Newtons equations of motion, the center of mass translation and rotation was removed around the center of mass,
all bonds were converted to constraints. Lennard-Jones and Coulombic interactions were cut off at 2 nm. The
simulations procedures lasted about 2000 ps.
After Molecular dynamics simulation with the compound 33, the binding site was shaped to enable proper docking
of the remaining ligands of the data set. Every complex was yet submitted to further refinement employing
molecular dynamics simulations.
The methodology to create a 4D-QSAR model followed the principles of the LQTA-QSAR methodology
previously described (MARTINS et al., 2009). Initially, the energy-optimized structures were submitted to the
topobuild program for generating the GAFF (WANG et al., 2004) topology files for GROMACS (VAN DER
SPOEL et al., 2005). The conformations necessary to create the model were retrieved from MD simulations. Every
simulation started with the initial coordinate resulted from the docking of each ligand to the binding site. An
optimal conformational was selected from the result of the MD simulations of all 49 molecules from the data set.
The selection was based on a clustering analysis of all MD frames using USCF Chimera clustering tool. The
selected conformation of each molecule from the MD process was considered on how often each ligand presented
it.
The selected conformations for each molecule from the MD simulations were aligned and submitted for the
calculation of molecular interaction descriptors using open3Dgrid (TOSCO; BALLE, 2011) free package.
Lennard-Jones ( LJ) and Coulomb (QQ) descriptors were generated as described in Barbosa et al. 2012
(BARBOSA; FERREIRA, 2012). A grid box was created with a spacing of 1.0 and grid size 33 x 32 x 30 to
calculate 63360 descriptors.
Data preprocessing and descriptors filtering methodology was applied as described by Barbosa and Ferreira
(BARBOSA; FERREIRA, 2012). The generated matrix with a dimension of 49 x 63360 in the following steps.
An energy cut-off for LJ descriptors was performed to avoid positive values with a higher order of magnitude and
to keep information in the region close to the molecules atoms. As follows in Equation 1, if a LJ descriptor at a
, , position had value of energy equal or lower than 30 kcal/mol, no cut-off would be applied. If not, then the
logarithmic value of the residual would be added to 30 kcal/mol.(MARTINS et al., 2009)
91
Equation 1
,, 30 1 ,, = ,,
,,
,, > 30 1 ,, = 30 + (
29)
1
Variables with variance lower than 0.02 kcal/mol were excluded. Descriptors having the absolute values of
correlation between each descriptor and biological activity (independent and latent variable, respectively) lower
than 0.3 were eliminated. Descriptors showing poor distribution when compared to the biological activity were
eliminated using a digital filter described on reference (BARBOSA; FERREIRA, 2012) and finally, nearly
identical descriptors were discarded, only kept those with the best correlation with the latent variable.
After filtering, the data set was divided into a training set and a test set based on dendogram obtained by
Hierarchical Cluster Analysis (KIRALJ; FERREIRA, 2010). The lowest and highest value of Ki were never
selected for the external data set. The training set containing 37 compounds was used to develop the LQTA QSAR
model and the test set, with 12 compounds, to evaluate the predictive ability of the built model. The Ordered
Prediction Selection algorithm (OPS) was applied for the selection of the most informative variables on the
remaining descriptors (TEFILO; MARTINS; FERREIRA, 2009b) and then partial least square (PLS) regression
methods were used to obtain the QSAR model.
Model Validation
The robustness and presence of chance correlations of the best model were examined by Leave-N-out (LNO) crossvalidation and y-randomization, respectively. Leave-one-out (LOO) was applied to define the optimum number of
factors in PLS (KIRALJ; FERREIRA, 2009). In the LNO cross-validation, N compounds N (1, 2) were left out
from the training set, a new model is built without it, and the values of the dependent variable are predicted. A
correlation coefficient (Q) is calculated and the model is considerate robust if the deviation of Q2 LNO values
from the Q2 LOO value does not exceed 0.05 for at least 20-30% of data set samples (GRAMATICA, 2007;
KIRALJ; FERREIRA, 2009). In the y-randomization test, PLS models are built using the scrambled dependent
variable vector and the models obtained should be of a poor quality. It was created a plot with the correlation with
the scrambled y (|r|) with it in the correct order v.s. Q and R2. Simple linear curves fitting of |r| x Q2yrand and |r| x
R2yrand are made. Models obtained not by chance correlation model have intercepts aQ < 0.05 and aR < 0.3
(ERIKSSON et al., 2003).
Equation 2
92
i)
= + ||
ii)
= + ||
The external predictability was evaluated by external Q statistical parameters (KIRALJ; FERREIRA, 2009). The
descriptors of the best model were illustrated by UCSF Chimera that also contributed to the interpretation of the
relationships between their positions and the adjacent binding site.
It was observed that the thiazolidine portion of the ligand occupied the protein S1 pocket allowing the ligand
branch to reach the S2 pocket. Reaching a more stable conformation than presented on the crystallographic form.
In all simulations, the benzyl side chain interacted with the residues on the S1/S3 pocket showing -interactions
with Phe120, Tyr77, and Phe111 (flap). The catalytic dyad formed by Asp 34 and Asp214 residues interacts with
93
the hydroxyl and carbonyl on the Apns scaffold forming an H-bond with the carboxylate oxygens of the aspartates.
The thiazolidine ring interacted with Tyr192, Ile 300 and Ile 212 residues of the hydrophobic S1 pocket. The fact
that the ligands are designed to mimic a substrate transition state is clear from the binding mode, where the inhibitor
extend on both sides of the catalytic center and are intended to fill optimally the S1/S3 and S1 pockets (SHIM;
MACKERELL, JR., 2011).
The resulting treatment of the conformation provided seemingly suitable aligned allophenylnorstatine ready to
construct 4D LQTA-QSAR models. After the descriptors filtering and selection procured by the OPS (TEFILO;
MARTINS; FERREIRA, 2009b) a model having good statistics parameters (Q = 0.92, R = 0.95) and suitable
external prediction (Qext = 0.79). A PLS model was built containing 2 latent variables and 10 selected descriptors.
Figure 4 shows the scatterplot predicted biological activity versus experimental K i values. The recommended
validation procedures for QSAR models were performed, ensuring the robustness and avoiding chance correlation
(KIRALJ; FERREIRA, 2009). The best model maintained predictability up to 15 samples (40%) in the Leave-Nout validation. The QLOO/QLNO ratio presents values lower than 0.05. The y-randomization required linear
regression delivered intercepts and values for lower than 0.3 and 0.0 (ERIKSSON et al., 2003), respectively,
as shown in Figure 4.
Fig. 4 Validation performed on the final model. a. The plot of calculated pKi values versus experimental ones,
black dots are training set samples and in red the external set. b. The regression curve computed for the real model
Q2 values and the ones obtained with the randomized Q2 values. c. The regression curve computed for the real
model R2 values and the ones obtained with the randomized R2 values d. Leave-N-out validation.
94
Descriptors of the best model are graphically represented in Figure 5 (dark and light blue, orange and red spheres).
Lennard-Jones descriptors (LJ) are presented as orange and red (negative and positive regression sign,
respectively). Electrostatics descriptors (QQ) in light, and dark blue (negative and positive regression sign,
respectively). The biological conformation of the best active compound 33 (Figure 5a) and an inactive analog,
compound 25 (Figure 5b) is shown to facilitate the visualization of the descriptors positioning.
95
The only negative Lennard-Jones (LJ8) descriptor, in orange, is related to the adjacent residues within Plm II
binding pocket, Tyr192 and Leu131, representing unfavorable steric interaction correlated with the biological
activity. The three positive LJ descriptors (LJ10, LJ1, and LJ6) in red near the thiazolidine ring address to favorable
steric regions. The presence of hydrophobic substituents such as dimethyl substituents present in compound 33
(Figure 5a) and other compounds of the data set correlate to a further increase in the biological inhibitory potency.
Lack of these substituents (Figure 5b) reflects in poor biological activity. Even so, these descriptors correlate to
the orientation that the sulfur in the thiazolidine ring binds to the site. Futher from the optimum conformation
presented by compound 33 shows decrease the activity as shown more specifically in Figure 6. The positive
descriptors LJ3 and LJ5 relate to favoring bulky substituents in the S1 pocket region, as corroborated in other
studies (MUTHAS et al., 2005), that greater activity is correlated with more bulk in this region.
96
The positive QQ7 descriptor represents favorable electronegative region and can be correlated with the addition
of polar groups interacting especially with the S1 pocket increasing the inhibitory activity. As shown in Figure
5b inactive ligands do not interact fully with this pocket. The positive descriptor QQ2 can be described as
modifications increasing the electronegativity on the phenyl portion. Descriptor QQ9 also relate favorable to
electronegative modifications, related with interactions with Tyr77 residue. The only negative descriptor QQ4
relates to the presence of negative interactions electronically increasing the activity. Also, aliphatic substituents
have less inhibitor activity than cyclic in this position.
The binding pocket along with the descriptor positions and selected ligands are available to the reader to perform
their screening procedures. An Excel spreadsheet gives the PLS regression 4D-LQTA-QSAR equation for
immediate usage. What is required are (i) bound ligand proposals within the provided pocket and (ii) computation
of open3Dgrid energy for the model descriptors specific positions and nature. LJ descriptors must be modified
according to equation 1.
Conclusion
A 4D LQTA-QSAR model was constructed for an allophenylnorstatine-based data set of Plasmepsin II inhibitors.
The model presented excellent statistical parameters and is suitable prediction power, allowing the model to be
used as assistance on a drug development research. The results enable a better understanding of the binding mode
of the Plm II and explaining structural features that reflect in the determined biological response of inhibiting the
parasite. These insights can be helpful to screening and designing new active compounds.
AFONSO, A. et al. Malaria parasites can develop stable resistance to artemisinin but lack
mutations in candidate genes atp6 (encoding the sarcoplasmic and endoplasmic reticulum Ca2+
ATPase), tctp, mdr1, and cg10. Antimicrobial agents and chemotherapy, v. 50, n. 2, p. 480
9, 1 fev. 2006.
ANDRADE, C. H. et al. 4D-QSAR: perspectives in drug design. Molecules (Basel,
Switzerland), v. 15, n. 5, p. 328194, maio 2010.
ARAKAWA, M.; HASEGAWA, K.; FUNATSU, K. The recent trend in QSAR modeling Variable selection and 3D-QSAR methods. Current Computer-Aided Drug Design, v. 3, n.
4, p. 254262, 2007.
ARAV-BOGER, R.; SHAPIRO, T. A. Molecular mechanisms of resistance in antimalarial
chemotherapy: the unmet challenge. Annual review of pharmacology and toxicology, v. 45,
p. 56585, 7 jan. 2005.
ASOJO, O. A. et al. Novel uncomplexed and complexed structures of plasmepsin II, an aspartic
protease from Plasmodium falciparum. Journal of Molecular Biology, v. 327, n. 1, p. 173
181, 2003.
AZOUZI, S.; EL KIRAT, K.; MORANDAT, S. Hematin loses its membranotropic activity
upon oligomerization into malaria pigment. Biochimica et Biophysica Acta (BBA) Biomembranes, v. 1848, n. 11, p. 29522959, 2015.
BARBOSA, E. G.; FERREIRA, M. M. C. Digital Filters for Molecular Interaction Field
97
98
CRUCIANI, G. Molecular Interaction Fields. Perugia IT: Wiley - VCH Verlag GmbH & Co.
KGaA, 2006.
DALRYMPLE, U.; MAPPIN, B.; GETHING, P. W. Malaria mapping: understanding the
global endemicity of falciparum and vivax malaria. BMC Medicine, v. 13, n. 1, p. 140, 2015.
DAN, N.; BHAKAT, S. New paradigm of an old target: An update on structural biology and
current progress in drug design towards plasmepsin II. European journal of medicinal
chemistry, v. 95, p. 324348, 2015.
DE ARAJO SANTOS, R. A. et al. Mixed 2D3D-LQTA-QSAR study of a series of
Plasmodium falciparum dUTPase inhibitors. Medicinal Chemistry Research, v. 24, n. 3, p.
10981111, 6 ago. 2014.
DE FARIAS SILVA, N.; LAMEIRA, J.; ALVES, C. N. Computational analysis of aspartic
protease plasmepsin II complexed with EH58 inhibitor: A QM/MM MD study. Journal of
Molecular Modeling, v. 17, n. 10, p. 26312638, 2011.
DONDORP, A. M. et al. Artemisinin resistance: current status and scenarios for containment.
Nature reviews. Microbiology, v. 8, n. 4, p. 27280, abr. 2010.
DOUGLAS, R. G. et al. Active migration and passive transport of malaria parasites. Trends in
Parasitology, v. 31, n. 8, p. 357362, 2015.
DOWEYKO, A. M. 3D-QSAR illusions. Journal of Computer-Aided Molecular Design, v.
18, n. 7-9, p. 587596, 2004.
ELEBRING, T.; GILL, A.; PLOWRIGHT, A. T. What is the most important approach in
current drug discovery: doing the right things or doing things right? Drug Discovery Today,
v. 17, n. 21-22, p. 11661169, 2012.
ERIKSSON, L. et al. Methods for reliability and uncertainty assessment and for applicability
evaluations of classification- and regression-based QSARs. Environmental health
perspectives, v. 111, n. 10, p. 136175, ago. 2003.
ERSMARK, K.; SAMUELSSON, B.; HALLBERG, A. Plasmepsins as potential targets for
new antimalarial therapy. Medicinal Research Reviews, v. 26, n. 5, p. 626666, 2006.
FAN, H. et al. Comparative study of generalized Born models: protein dynamics. Proceedings
of the National Academy of Sciences of the United States of America, v. 102, n. 19, p. 6760
4, 10 maio 2005.
FOURCHES, D.; MURATOV, E.; TROPSHA, A. Trust, but verify: on the importance of
chemical structure curation in cheminformatics and QSAR modeling research. Journal of
chemical information and modeling, v. 50, n. 7, p. 1189204, 26 jul. 2010.
FRIEDMAN, R.; CAFLISCH, A. The protonation state of the catalytic aspartates in plasmepsin
II. FEBS Letters, v. 581, n. 21, p. 41204124, 2007.
FRISCH, M. J.; TRUCKS, G. W.; SCHLEGEL, H. B.; SCUSERIA, G. E.; ROBB, M. A.;
CHEESEMAN, J. R.; SCALMANI, G.; BARONE, V.; MENNUCCI, B.; PETERSSON, G. A.;
NAKATSUJI, H.; CARICATO, M.; LI, X.; HRATCHIAN, H. P.; IZMAYLOV, A. F.;
BLOINO, J.; ZHENG, G.; SONNENB, D. J. g09Wallingford CT, , 2009.
GHASEMI, J. B.; SAFAVI-SOHI, R.; BARBOSA, E. G. 4D-LQTA-QSAR and docking study
on potent gram-negative specific LpxC inhibitors: A comparison to CoMFA modeling.
Molecular Diversity, v. 16, n. 1, p. 203213, 2012.
GINER ALMARAZ, S. et al. Guidelines for Treatment of Malaria in the United States
Guidelines for Treatment of Malaria in the United States. Jama, v. 15, n. 4, p. 227238, 2010.
99
100
101
102
103