Drug Discovery: Exploring The Utility of Cluster

Drug Discovery: Exploring the Utility of Cluster Oriented Genetic
Algorithms in Virtual Library Design

B. Sharma & I. Parmee M. Whittaker & A. Sedwell
ACDDM Lab Evotec OAI
(www.ad-comtech.co.uk/ACDDM_Group.htm) 151 Milton Park
University of West of England, Abingdon
Bristol BS16 1QY Oxon, OX14 4SD
bhuvan.sharma, ian.parmee@uwe.ac.uk mark.whittaker, alistair.sedwell@evotecoai.com
Abstract- In silico combinatorial library design carboxylic acids, amines, aldehydes etc. All molecules
involves the identification of molecules that have a that contain the key reactive functional group for a
greater probability of exhibiting desired biological reagent would constitute possible reactants across that
activity when subjected to in vitro screening (assaying) reagent. In this sense, in a variable parameter space
against a particular biological target. The paper reagent refers to a particular variable parameter
introduces the integration of cluster-oriented genetic (dimension) whereas reactants are the available values /
algorithms (COGAs) with such machine-based library molecules across that dimension.
design environments.
COGAs have a proven capability to identify high-
performance regions of complex, continuous design R1
OH + H2N R2
R1 H
N
spaces relating to engineering design problems. R2
O O
Modifications to the basic COGA approach are
described that allow a transfer of this capability from Carboxylic Acid + Primary Amine ! Secondary Amide
continuous variable parameter space to the highly
discrete spaces described by reactants across reagent Figure 1. Sample reaction scheme
libraries. Results relating firstly to the identification of
optimal molecules and secondly to the focussing of The number of possible compounds in a virtual library
reagent libraries in terms of high-performance can be in the order of many millions and this number
reactants are presented. Single objective optimisation combinatorially increases with number of included
and focussing are initially considered before moving reagents.
on to multiple objective satisfaction. These libraries may then be subjected to in silico
screening to identify compounds (hits) that exhibit
activity against the biological target during the subsequent
1 Introduction assay. In silico models such as molecular docking,
Drug design and discovery is a systematic, serial process pharmacophore matching, QSAR (Quantitative Structure-
of identification and modification of chemical structure to Activity Relationship), and chemical similarity are
achieve desired results against biological targets utilised as objective functions to identify possible high-
associated with a particular disease. Tradionally, the performance compounds that have a greater probability of
process involves the development of a biochemical assay exhibiting desired biological activity during in vitro
for a biological target of interest and the subsequent assaying.
screening of large numbers of drug-like organic chemical The size of the virtual libraries coupled with
compounds in a high-throughput manner to identify hit computational expense relating to most of the objective
compounds. Such hit compounds usually possess weak functions rule out exhaustive search (i.e. complete
biological activity that requires improvement by a process enumeration of all library members from the
of medicinal chemistry optimisation first into a lead series corresponding chemical reagents). What is therefore
with robust properties and then into a drug candidate that required is a search process that rapidly samples and
is suitable for evaluation in human clinical trials. This identifies as many high performance molecules as
process requires the optimisation of multiple parameters possible within available time limitations. The
including biological activity, selectivity for the biological development and integration of appropriate search
target of interest over related proteins, pharmacokinetics, techniques during in-silico screening could significantly
pharmacodynamics and pharmaceutical properties. enhance the drug discovery process by both improving
In modern drug discovery extensive use is being made of the hit rates during assaying and reducing drug design
in silico techniques to find hit molecules through virtual cycle times.
screening and to then aid their subsequent optimisation. A In collaboration with Evotec OAI, Cluster-oriented
large compound collection in the form of virtual libraries Genetic algorithms (COGAs) have been integrated as a
is described by the reagents required for their synthesis as potential user-interactive search and exploratory tool with
shown in figure 1. Reagents (inputs to the reaction Evotec OAI’s existing drug design software, EVOSeek™.
equation in figure 1) are the chemical reactants required to COGA [1] has a proven capability within engineering
make a set of molecules. Examples of reagents could be design environments described by continous variables to
identify regions of high performance (HP) solutions. This other works [8,9,10] have attempted single objective
can be achieved with no apriori knowledge of the problem optimisation on significantly smaller libraries, This, plus
space in terms of possible number of local optima and the the absence of suitable benchmark problems, makes
settings of niche radii, sharing factors etc. comparison of our results with other works very
The successful transfer of COGA technology to difficult.
combinatorial chemical space offers significant potential
in terms of the ability to identify groupings of HP 3 Cluster Oriented Genetic Algorithms
reactants and hence support the focussing of reagent
libraries or the identification of individual HP molecules.
(COGA)
This library focussing and optimisation capability is Cluster Oriented Genetic Algorithms, initially developed
therefore the motivation for the research described in the by Parmee in the early 1990s [1], provide the means to
following sections. This work represents a proof-of- identify high-performance (HP) regions of complex
concept relating to the potential of COGA integration. conceptual engineering design spaces and enable the
Section 2 is a brief review of library design literature. extraction of information from such regions [16,17].
Section 3 introduces the COGA methodology. Section 4 COGAs identify HP solution regions through the on-line
describes initial experimentation to determine basic adaptive filtering of solutions generated by a genetic
COGA performance Section 5 concentrates upon algorithm. Further work resulted in several variations of
identufying optimal molecules from a test library whereas COGA and also identified and illustrated the manner in
Section 6 presents the introduction of a tabu element to which the COGA approach can be utilised to generate
the basic COGA to enable focussing of this library. highly relevant design information relating to single and
Scaleability issues are investigated in section 7 before multi-objective problem domains [17,18,19].
moving on to multi-objective satisfaction in section 8.
COGA comprises two primary components: the
diverse search engine which utilises a highly exploratory
2 Background genetic algorithm to search the design space and the
A review of drug design approaches and strategies can be adaptive filter (AF) which extracts solutions from each
found in Tsinopoulos and McCarthy [2]. Computer Aided COGA generation and stores them within a Final Cluster
Drug Design found application in the early 1990s in terms Set (FCS). The AF scales solution fitness in terms of
of the modelling of structure-activity relationships distance from the mean (figure 2) and only solutions that
(SARs). The introduction of evolutionary computing (EC) lie above a pre-defned threshold value, Rf, are copied to
techniques to modelling strategies began to emerge a the FCS. By reducing the severity of Rf, more HP
little later. Milne [3] reports of only five published papers solutions albeit with a lower average fitness can enter the
employing evolutionary algorithms between 1989 and FCS. The user can therefore vary the filter setting in
1992. However between 1993 and 1997 more than 210 order to identify regions ranging from succinct groupings
EC-based papers appeared as reviewed in [4,5,6,7]. The of very high performance solutions to larger regions of
first published application of EC to combinatorial library high and lower performance solutions. Design space
design was by Sheridan et al. [8] utilising a measure of exploration is enhanced in the underlying search engine
chemical similarity as an objective function. Papers by via variable mutation regimes [1] or Halton injection
Singh et al. [9] and Weber et al. [10] however utilised the sequences [20]. Sufficient HP regional set-cover (in terms
results from in vitro biological activity to provide a fitness of number of solutions) can be achieved to allow
measure for a GA-based search thus providing a proof-of- significant qualitative and quantitative design information
concept of the potential of a GA to direct reactant to be extracted.
selection for chemical synthesis. It is not possible within the space available to provide a
Gillet et. al. [11] developed a GA based technique more detailed description of COGA. However, anyone
(SELECT) to optimise virtual libraries against a single wishing to replicate the research in this paper can refer to
diversity objective using a distance based diversity matrix. well-documented COGA development in many papers
A weighted sum approach for multiple objectives was available at http://www.ad-comtech.co.uk/Parmee-
refined [12] via the introduction of a multi objective Publications.htm.
genetic algorithm (MoSELECT). Wright et al. [13] The COGA approach was primarily developed for
developed a different selection scheme in MoSELECT-II search and exploration across design spaces described by
for the optimization of library size and configuration.
continuous variable parameters which tend to
Brown et al. [14], using a GA based approach called
predominate in engineering design. Typical COGA output
GALOPED, also addressed size and configuration
from continuous design spaces comprises clusters of
whereas Pickett et al. [15] introduced Monte Carlo
approaches to achieve similar objectives. solutions describing high performance regions. The
There is little comparison between previous work and spread and distribution of these high quality solutions can
the aims and objectives of the research described in the offer a wealth of information relating to the characteristics
paper. Our objective, initially using chemical similarity, is of the search space and the complex relationships between
to optimise and focus a library as opposed to Gillet’s variable and objective space as demonstrated in Parmee
objective of generating a diverse library as described in et. al. [17,19, 21].
[11]. Whereas we introduce search, optimisation and
multi-objective satisfaction in large reagent libraries
3.1 Representation A test virtual library from the reaction scheme of Figure 1
One of the challenges in the research described in the comprising amines and acids was initially chosen to
following sections has been to modify the COGA assess the performance of COGA utilizing Tanimoto
approach in order to ensure a similar utility to that proven Similarity as a simple test objective function. The two
in engineering design when searching the discrete reagents each possess 400 reactants creating a search
combinatorial problem spaces that are an all pervading space of 160,000 possible solutions. To allow a true
aspect of drug design. evaluation of COGA performance and proof-of-concept
Given the very positive results from the engineering all the product molecules in this virtual library were
design domain the initial asssumption was that application enumerated and their chemical similarity to a specific
of COGAs would significantly support the identification drug molecule (methotrexate) calculated. This allows the
of ‘best’ reactants. The chemist could interact with the top 0.5% solutions to be plotted as shown in figure 3
evolutionary process by varying the adaptive filter to which illustrates a typical distribution of high-
identify either succinct groupings of high-performance performance solutions against which COGA output can be
molecules or larger collections of lesser performance compared.
molecules in terms of any chosen in silico objective The plot in Figure 3 indicates that:
function (e.g. QSAR, chemical similarity etc). • good solutions can be distributed across the entire
Binary representation has generally been utilised within range of a virtual library;
the original COGA algorithms. However binary • reactants producing a high percentage of good
representation for reactants of each reagent revealed a solutions can be easily identified.
number of potential problems. For instance, directly With these results in mind it was intended to determine
mapping binary strings onto the integer space comprising the potential of the COGA approach in terms of:
the numbered address of each reactant molecule of a
reagent grouping results in illegal solutions and / or a • optimisation i.e. the identification of a number of
degree of redundancy. Problems relating to the crossover very high performance molecules;
of binary strings, the generation of non-feasible solutions • focussed library i.e. the identification of high
and a subsequent requirement for chromosome repair performance reactants that provide a focussed
were also inherent. combinatorial compound library the members of
which include a significant number of high
performance molecules.
Figure 2: The adaptive filter (AF)
A straightforward integer encoding was therefore adopted Figure 3. Top 0.5% solutions in test library identified by
where integer values represent an index of a reactant’s exhaustive search and enumeration
location in the proprietary database. The chromosome in
our integer representation scheme has a length (number of It was initially assumed that appropriate settings of the
genes) equal to the number of dimensions (reagents) of adaptive filter threshold of COGA would result in the
the virtual library. A gene can then take an integer value achievement of each of these objectives. High filter
between one and the maximum number of reactant settings would provide smaller numbers of HP solutions
molecules across that dimension. This number can whereas low settings would identify much larger numbers
directly be transposed to the index of that reactant of high performance solutions with a lower average
molecule in the chemical database of molecules. The fitness which would also support the identification of high
phenotype then is the product molecule of the reaction performance reactants i.e. reactants generally exhibiting
involving the reactant molecules represented by each gene high-performance across all possible combinations.
in the chromosome. Before investigating these initial assumptions both
variable mutation COGA (vmCOGA) and Halton
injection COGA (hiCOGA) were investigated with the
test virtual library from Figure 1 to determine their
comparative performance. Performance criteria relates to
4 Initial Experimentation and Results
each COGA’s ability to identify, within their final
clustering sets (FCSs), the greatest number of the
enumerated top 0.5% solutions from the exhaustive
search. The degree of robustness of each approach was
also considered to be a significant criteria. The overall
objective of this comparative study was to identify which
approach to concentrate further effort on.
The top 0.5% of the possible 160,000 solutions of the
test virtual library comprised 800 molecules. An adaptive
filter setting of 0.9 was initially introduced and a typical
plot of the solutions from a hiCOGA’s FCS is shown in
figure 4. Similar typical plots from vmCOGA’s FCS have (a)
also been generated. The results indicated that both
vmCOGA and hiCOGA identify significant numbers of
high performance solutions across the library. Fifty runs
each of hiCOGA and vmCOGA were executed in each of
which a population size of 100 individuals running over
200 generations was utilised. For each run the number of
FCS solutions that match those in the top 0.5% of the test
library were extracted and are shown in figure 5a and 5b.
On average hiCOGA identified ~200 solutions out of the
best 800 (top 0.5%), whereas vmCOGA identifed ~150 (b)
solutions. It is also evident from the plots that hiCOGA
also proved to be more robust with the standard deviation Figure 5: Numbers of HP solutions matching the test set for 50
of matching solutions being significantly less than that of runs of (a) hiCOGA and (b) vmCOGA
the vmCOGA. Robustness is of particular importance in
4
terms of integrating COGA techniques with the in silico
3.5
drug design processes within Evotec OAI’s EVOseek™
count ( x10^ 3)
3
Software.
2.5
2
1.5
1
0.5
0
-1.5 -1 -0.5
0 0.5 1 1.5 2
Rf setting
Figure 6a: Variation of solution count in FCS at differing Rf
settings
0.56
0.55
0.54
avgfitness
0.53
0.52
Figure 4: Plot of solutions from the FCS of a typical hiCOGA 0.51
search of the test library
0.5
0.49
Further experimentation involving variation of the number
0.48
of Halton injections per generation and variation of the -1.5 -1 -0.5 0 0.5 1 1.5 2
Rf setting
vmCOGA’s mutation regimes further indicated that, in
terms of discrete space search, hiCOGA performs better
both in terms of the identification of high performance Figure 6b: Variation of average solution fitness in FCS at
differing Rf settings
solutions and robustness.
Figure 6a shows the variation of numbers of HP solutions
5 Optimisation in the FCS at different filter settings. Figure 6b shows the
the average solution fitness within the FCS for different
For the enumerated 400 x 400 amine-acid test library the filter settings. The results represent average hiCOGA FCS
fitness (chemical similarity against methotrexate) range is fitness of 50 runs of 200 generations each at each filter
between 0.0003 and 0.5812. The identification of small setting with a population size of 100. A typical result from
numbers of very high performance molecules can be a single 200 generation run of hiCOGA with a filter
achieved by appropriate setting of the AF filter. setting of 2.0 would be an FCS containing circa 45
solutions with an average fitness of 0.56. Dropping the
Figure 7: User interface from EVOseek™ showing top best performing molecules
filter to the previous 0.9 setting results in circa 400 will overcome the problem. An alternative search and
solutions in the FCS with an average fitness of 0.54. exploration strategy that rapidly identifies a HP axis but
This optimisation process has now been integrated with then moves on to discover other axes in unexplored
Evotec OAI’s EVOseek™ software with an appropriate regions of search space was required. One way to achieve
user-interface that allows the chemist to explore reagent this is to introduce some form of memory into the COGA
space via use of different objective functions and the process. In this respect elements of tabu search seemed to
setting of contstraints (e.g. molecular weight, calculated offer some utility in formulating a new approach called
lipophilicity, hydrogen bond donor and acceptor counts). tabu-COGA (taCOGA).
The chemist can request visualisation of a number of the COGA’s FCS is a repository of high fitness solutions an
top performing molecules from the optimisation process analysis of which during run time could give an indication
which are then presented in a tabular form as shown in of the presence of HP axes. After a specified initial
figure 7. These have been generated from the integrated number of generations solutions in the FCS can be
software on a live (i.e. non-enumerated) library analysed at each subsequent generation to assess the count
comprising two reagents viz. haloalkane and amine. of solutions involving specific reactants. A threshold
Fitness was calculated using tanimoto chemical similarity count could be used to identify when a reactant is eligible
against a molecule known to be present in the library. to be declared as a HP axis. Thereafter, this reactant
would become tabu and the search could be directed to
6 Generating Focussed Libraries other less-visited areas of library space. Some form of
replacement strategy is therefore required that, to some
Returning to the plots of the fully enumerated test set
extent, re-initialises tabu solutions. Two such strategies
(figure 3) of Section 4 and the results from the initial
have been considered:
hiCOGA implementation (figure 4); these indicate the
a) Reactant Replacement (RR): Here the tabu reactant is
existence of high-performance (HP) axes relating to
replaced by another reactant randomly selected from
individual reactants. These significantly differ from the
the entire search space.
typical HP regions identified in continuous design space
b) Fitness Reassignment (FR): In this approach any
[16,17,21]. The identification of such axes allows the
solution containing a tabu reactant is assigned a low
chemist to develop a focussed library of reactants that
fitness.
provide an enrichment of high performance molecules.
The objective of taCOGA is to identify maximum number
Experimentation showed that both hiCOGA and
of true HP axes within a minimum number of generations/
vmCOGA FCS’s contain some HP axes but it is apparent
fitness evaluations. With this in mind determining a
when comparing output to the enumerated test set that
robust and meaningful threshold count presented a
many others were not identified. It appears that the
problem. Too low a count and individual HP solutions
explorative nature of the underlying GAs was not
could falsely render a reactant ‘high-performance’
sufficient to avoid convergence upon a subset of available
whereas too high a count could result in very lengthy
HP axes.
COGA run times to identify a sufficient number of HP
Further experimentation showed that increasing
axes. A preliminary experiment utilising hiCOGA with
exploration via increased Halton injection or mutation
the same test library and filter and injection settings as
was not the answer as HP axis identification did not
utilised in the experiments of section 4 was carried out to
improve and it appeared likely that an appropriate balance
establish an initial threshold count for taCOGA.
between exploration and exploitation does not exist that
7 Increasing Dimension
Having established a basic working system via the two
dimensional enumerated test library, hiCOGA was then
integrated with a live library of three reagents (primary
amines, aromatic acids, aldehydes) with a total size of 400
x 400 x 400 (64 x 106) reactant combinations. In this case
a focussed library approach (i.e. with tabu) again using
chemical similarity against methotraxate as a criteria was
introduced with a hiCOGA filter setting of 0.9. Resulting
plots of FCS solutions projected onto two dimensional
hyperplanes are shown in figure 10. The plots show
identification of a significant number of high performing
Figure 8. Change in HP axes identified with increasing tabu reactant axes which can be further investigated by the
threshold count. chemist.
A four reagent virtual library comprising aliphatic
ketone, isocyanide, primary amine and unsaturated
carboxylic acid creating a library of 5297 x 32 x 17510 x
23060 (6.8 x 1013 ) molecules was then introduced.
Chemical similarity against a target molecule known to
have close analogs in the library, provided an objective
for an optimising hiCOGA (i.e. no tabu). Tentative fitness
range therefore was between 0.0 and 0.9. The best
solution fitness achieved is 0.761 in just 200 generations
with a population size of 100 which is considered a
promising result considering the size and nature of the
Figure 9a. Number of HP Axes for R1 and R2 identified via RR
search space
(Reactant Replacement) strategy. Results given for 50 runs.
8 Multi-objective satisfaction within
combinatorial libraries
Various in silico models can be used as objective
functions to ascertain molecule performance against
differing criteria. Molecules best satisfying a number of
differing criteria have a greter probability of passing the
subsequent in vitro tests. In addition to chemical
similarity other simple in silico objectives are cLog P,
Polar surface area (PSA) and Molar Refractivity. In
addition, more sophisticated Quantitative Structure
Figure 9b. Number of HP Axes for R1 and R2 identified via FR Activity Relationships (QSARs), which correlates the
(Fitness Reassignment) strategy. Results given for 50 runs.
activity of a compound against a particular property (e.g.
Average results from 50 runs utilising increasing counts aqueous solubility, Caco-2 permeability, blood-brain-
indicated that, for threshold counts below 10, the variation barrier permeability, hERG channel blockade), with its
in the number of identified high performance axes was sructural features can also be used as objective functions.
higher than the degree of variation between threshold To date chemical similarity, cLog P, and QSAR models
counts greater than 10 (see figure 8). For this reason a for aqueous solubility have been utilised to investigate the
threshold count of 10 was initially adopted for further utility of the developed taCOGA approach for multi-
investigation of the approach. However, further validation objective satisfaction.
in this area is required. The multi-objective COGA techniques (MOCOGA)
A series of experiments then assessed the exploratory developed for searching continuous design spaces [17]
potential of the two replacement strategies (RR and FR) should also offer utility in discrete chemical space. The
with a tabu thereshold count of 10. 50 runs for each continuous space approach involves running a COGA for
strategy were initiated to determine the number of HP each objective with a relatively relaxed filter setting of
axes along each reagent that they could identify. Results around 0.5. It has been shown that the resulting identified
are given in figure 9a and 9b.The reactant replacement regions give a very good indication of the degree of
strategy consistently identifies a greater number of HP conflict between the objectives under investigation
axes. It is likely that the immediate reassignment of tabu [17,19,21]. Mutually inclusive regions of high
reactants is more effective than the more gradual
performance relating to two or more objectives indicate a
replacement of tabu reactants that occurs with fitness
low degree of conflict whereas mutually exclusive regions
reassignment.
indicate high degree of conflict i.e. significantly lower
performance solutions will have to be considered to similarity resulted in the identification of an average of 30
satisfy all objectives. common HP R1 reactants and running taCOGA on all
three objectives identifies an average of 10 common HP
R1 reactants. In each case lesser numbers of common R2
reactants were identified.
Figure 11: Number of high performing R1 and R2 axes clusters

common to various objectives identified using tabu COGA. Results
given for 50 runs.
It is apparent from these preliminary results that taCOGA

offers a great deal of utility when integrated with multi-
Figure 10: taCOGA FCS solutions projected on 2 objective reagent library search. This relates directly to
dimensional hyperplanes for a 3 reagent library. the focussing of libraries in terms of solutions that have a
higher probability of high performance regarding several
Similarly, in discrete reagent space HP reactant axes that
objectives during subsequent in vitro assaying processes.
are common to a number of objectives can represent sets
Further analysis of the identified reagents by the chemist
of good compromise solutions. In order to identify such
and machine-based sampling of HP reactant axes can
common axes an optimisation approach is not appropriate.
result in further focussing of the taCOGA sets.
Initial experimentation utilising the 400x400 enumerated
virtual library of section 4 with high Rf filter settings
illustrates identification of significantly less number of 9 Further work
HP axes. Also, just one common HP solution existed in An extensive and successful study relating to the
the FCSs of two selected objectives (aqueous solubility satisfaction of molecular constraints to further promote
and chemical similarity) from 50 hiCOGA runs with an ‘drug likeness’ has been carried out. This has resulted in
Rf setting of 0.9. However introducing the taCOGA the introduction of appropriate penalties that further focus
focussing approach results in the identification of the reagent libraries. Unfortunately there is insufficient
significant numbers high performing reactant axes space to include results from the study in this paper.
common to each objective as shown in figure 11. An User-interfaces that present library search results in a
average of 25 common R1 reactants were identified. succinct manner to the chemist and support user-
Repeating the experiment using cLogP and chemical interaction with the evolutionary search and exploration
processes have been developed and are currently being 7. Lobanov V. S, Agrafiotis D K. Scalable Methods for
assessed within Evotec OAI. the Construction and Analysis of Virtual Combinatorial
Further research relating to user-preference and multi- Libraries, Combin. Chem. and High-Throughput
objective satisfaction is currently underway and, now that Screen., 2002, 5, pp 167-178
proof-of-concept has been achieved, further development 8. Sheridan R.P., Kearsley S.K. Using a genetic algorithm
of the basic techniques described will likely increase to suggest combinatorial libraries. J. Chem Informatics
performance both in terms of number and fitness of Comp. Sci., 1995, 35, pp 310-320
identified solutions and in reduced drug discovery cycle- 9. Singh J, Ator M.A., Jaeger E.P. et al. Application of
times. genetic algorithms to combinatorial synthesis: A
Information emerging from such optimisation and computational approach to lead identification, J. Am
focussing could be utilised to refine in-silico objective Chem. Soc., 1996, 118, pp 1669-1676
functions in the similar manner to the problem 10. Weber L, Wallabaum S, Broger C, Gubernator K:
redefinition aspects of previous interactive evolutionary Optimization of Biological Activity of combinatorial
design work [16,22]. Further information regarding Libraries by a genetic algorithm. Angew. Chem. Int. Ed.
degree of objective conflict may also be available as has Engl. 1995 34, pp 2280-2282
been the case with engineering design applications [17,21] 11. V. J. Gillet, P. Willett, J. Bradshaw, and D. V. S.
Green: Selecting combinatorial libraries to optimize
10 Conclusions diversity and physical properties. J. Chem. Inf. Comput.
A significant utility has been identified via the transfer of Sci. 1999, 39, pp 169–177
basic COGA techniques from the engineering design 12. Gillet V, Khatib, Willet. P. Combinatorial Library
domain into drug discovery processes. This proof-of- design using a multi-objective genetic algorithm. J.
concept has been achieved via experimental development Chem. Inf. Comp. Sci., 2002, 42, pp 375-385
of hiCOGA to enable successful integration with discrete 13. Wright T, Gillet V, Green D, et al. Optimising the size
reagent library focussing and optimisation. A tabu- and configuration of Combinatorial Libraries. J. Chem.
oriented COGA (taCOGA) approach has been Inf. Comp. Sci. 2003, 43, pp 381-390
successfully developed to support the identification of 14. Brown, R. D., Martin, Y. C. Designing combinatorial
larger numbers of high-performance reactants. Results are library mixtures using a genetic algorithm. Journal of
presented from both enumerated reagent libraries and Medicinal Chemistry, 1997, 40, pp 2304–2313.
15. Pickett S. D, McLay I. M, Clark D.E. Enhancing the
from the integration of COGA with Evotec OAI’s library
hit to lead properties of lead optimization libraries. J.
definition and evaluation sofware. Scaleability aspects
Chem Inf. Comput. Sci. 2000, 40, pp 263-272
have been successfully addressed and the manner in
16. Parmee I.C., Cvetkovic D, Watson A, Bonham C.
which multi-objective considerations can be taken into
Multi-objective satisfaction within an interactivee
account has been illustrated.
evolutionary design environment. Evolutionary
All results so far indicate a significant potential for the
Computation, 2000, 8, MIT Press, pp 197-222.
COGA concept providing a firm foundation for complex
17. Parmee. I.C., Bonham C. R. Towards the support of
reagent library search and exploration.
innovative conceptual design through interactive
designer/evolutionary computing strategies. Artificial
References Intelligence for Engineering Design Analysis and
1. Parmee I.C. The Maintenance of Search Diversity for Manufacturing, 2000, 14, pp 3-16.
Effective Design Space Decomposition using Cluster 18. Parmee I.C., Abraham J.A. User Centric Evolutionary
Oriented Genetic Algorithms (COGAs) and MultiAgent Design. Procs Design 2004, Dubrovnik, 2004 May..
Strategies (GAANT), in Proc. Adaptive Computing in 19. Abraham J.A., Parmee I.C. User Centric Evolutionary
Engineering Design and Control, 1996, University of Design Systems – the visualisation of emerging multi-
Plymouth, UK, pp 128-138. objective design information,. Xth International
2. Tsinopoulos C., McCarthy I.P. An evolutionary Conference on Computing in Civil and Building
classification of the strategies for drug discovery. Engineering, Weimar. 2004, June 02-04.
Manufacturing Complexity Network Conference, 20. Bonham C. R., Parmee I. C. Improving the
Cambridge, 2002, pp 373-385. peformance of cluster oriented genetic algorithms
3. Milne G. W. Mathematics as a basis for chemistry. J. (COGAs). IEEE Congress on Evolutionary
Chem. Inf Comput Sci., 1997, 37(4), pp 639-44. Computation, Washington D. C, 1996, pp 554-561
4. Parrill, A. Evolutionary and genetic methods in drug 21. Parmee I.C., Abraham J.A. Supporting Implicit
design, Drug Discovery Today 1996, 1, 514-521, Learning via the Visualisation of COGA Multiobjective
5. Brown, R.D. and Clark, D.E. Genetic Diversity: Data. IEEE Congress on Evolutionary Computation,
Application of evolutionary algorithms to combinatorial Portland, USA, 2004, pp 395-402.
library design. Expert Opinion on Therapeutic Patents 22. Parmee I. C. Improving problem definition through
1998, 8, pp 1447-1460. interactive evolutionary computation. Artificial
6. Agrafiotis D, Lobanov V.S, Salemme F; Combinatorial Intelligence in Engineering Design, Analysis and
Informatics in the post Genomic Era. Nat Rev Drug Manufacture, 2002, 16(3).
Discov. 2002 May, 1(5), pp 337-46.

Drug Discovery: Exploring The Utility of Cluster

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Drug Discovery: Exploring The Utility of Cluster

Diunggah oleh

Hak Cipta:

Format Tersedia

Drug Discovery: Exploring the Utility of Cluster Oriented Genetic

Algorithms in Virtual Library Design

Figure 2: The adaptive filter (AF)

Figure 11: Number of high performing R1 and R2 axes clusters

It is apparent from these preliminary results that taCOGA

Anda mungkin juga menyukai