Anda di halaman 1dari 60

Day 4: KNIME Practical

George Papadatos, ChEMBL group, EMBL-EBI


Francis Atkinson, ChEMBL group, EMBL-EBI

Outline
Introduction to KNIME
Basic components
Desktop, nodes, dialogs, workflows

Demo
Compound selection for focused screening

Read chemical data


Calculate properties
Apply drug- and lead- likeness filters
Remove nasty compounds
Pick diverse molecules
Visualize results and plot properties

Exercises 1 & 2 (hands-on)

12/12/2013

Resources for Computational Drug Discovery

Are there KNIME users among us?


3

12/12/2013

Resources for Computational Drug Discovery

What is KNIME?

KNIME = Konstanz Information Miner


Developed at University of Konstanz in Germany
Desktop version available free of charge (Open Source)
Modular platform for building and executing workflows using
predefined components, called nodes
Core functionality available for tasks such as standard data
mining, analysis and manipulation
Extra features and functionality available in KNIME through
extensions from various groups and vendors
Written in Java based on the Eclipse SDK platform

12/12/2013

Resources for Computational Drug Discovery

KNIME resources
Web pages (documentation)
www.knime.org | tech.knime.org | tech.knime.org/installation-0

Downloads
knime.org/download-desktop

Community forum
tech.knime.org/forum

Books and white papers


knime.org/node/33079

Myself
georgep@ebi.ac.uk

12/12/2013

Resources for Computational Drug Discovery

What can you do with KNIME?


Data manipulation and analysis
File & database I/O, sorting, filtering, grouping, joining, pivoting

Data mining / machine learning


R, WEKA, KNIME, interactive plotting

Chemoinformatics
Conversions, similarity, clustering, (Q)SAR analysis, MMPs, reaction
enumeration

Scripting integration
R, Perl, Python, Matlab, Octave, Groovy

Reporting
So much more
Bioinformatics, HTS & image analysis, network & text mining
Marketing, bid data and business analytics
6

12/12/2013

Resources for Computational Drug Discovery

Community contribution nodes


http://tech.knime.org/community
Chemoinformatics
ChEMBL and ChEBI (EBI) SureChEMBL nodes coming soon!
CDK (EBI), RDKit (Novartis), Indigo (GGA), ErlWood (Eli Lilly), Enalos
(NovaMechanics)

Bioinformatics
HCS (MPI), NGS (Konstanz), Image analysis

Text mining
Palladian

Integration
Python, Perl, R, Groovy, Matlab (MPI), PDB web services client (Vernalis)

12/12/2013

Resources for Computational Drug Discovery

Installation & updates


Download and unzip KNIME
No further setup required
Additional nodes after first launch
knime.ini contains arguments & parameters for launch

New software (nodes) from update sites


http://tech.knime.org/update/community-contributions/release

Workflows and data are stored in a workspace


/Users/georgep/knime/workspace_mac_new

C:\knime_2.8.2\workspace

Customization in: FilePreferencesKNIME

12/12/2013

Resources for Computational Drug Discovery

KNIME Workbench

Auto-layout Execute Execute all nodes

Node description
tabs

workflow projects

favorite nodes
public server

workflow editor

node repository

12/12/2013

outline

Resources for Computational Drug Discovery

console

KNIME nodes: Overview


Node = basic processing unit of KNIME workflow which performs a particular task
Title

Input port(s) on the left of icon


Output port(s) on the right of icon

Icon
Status display (traffic lights)
Sequence number
Red (not ready)
Amber (ready)
Green (executed)

10

Blue bar during execution


(with percentage or flashing)

12/12/2013

Resources for Computational Drug Discovery

Right-click menu
To configure and
execute the node,
display the output
views, edit the
node, and display
data for the ports

KNIME nodes: Dialogs


Double click to configure
Configuration menus for
selected nodes

Explicit column type

11

12/12/2013

Resources for Computational Drug Discovery

An example completed workflow


Workflows can be imported and exported as .zip files
With or without the underlying data
File Import KNIME workflow
File Export KNIME workflow

12

12/12/2013

Resources for Computational Drug Discovery

Any questions so far?


13

12/12/2013

Resources for Computational Drug Discovery

Compound selection for focused screening


1.
2.
3.
4.
5.
6.

14

Read chemical data


Calculate phys/chem properties
Apply drug- and lead-likeness filters
Apply more filters (e.g. remove solubility liabilities)
Apply substructural filters (PAINS subset)
Pick diverse molecules

12/12/2013

Resources for Computational Drug Discovery

The objective

15

12/12/2013

Resources for Computational Drug Discovery

First steps - I
Locate the directory with todays
material
1
2

Copy and paste it to your desktop


You can take it with you too

Open the presentation file


Import the
FocusedScreeningSelection.zip to
KNIME
Menu File Import workflow
to KNIME
3

16

12/12/2013

Resources for Computational Drug Discovery

First steps - II
Open a new workflow
Right click on the workflow projects area
1
2

17

12/12/2013

Resources for Computational Drug Discovery

Part 1: Reading chemical data

18

12/12/2013

Resources for Computational Drug Discovery

SDF Reader
.\data\SMDC_cleaned_nodups.sdf
1

19

12/12/2013

Resources for Computational Drug Discovery

Inspect the structures

Right click on the node

20

12/12/2013

Resources for Computational Drug Discovery

Molecule to RDKit

21

12/12/2013

Resources for Computational Drug Discovery

Any questions so far?


22

12/12/2013

Resources for Computational Drug Discovery

Part 2: Property-based filtering

23

12/12/2013

Resources for Computational Drug Discovery

Descriptor Calculation

1
2

24

12/12/2013

Resources for Computational Drug Discovery

Java Snippet

.\code\Lipinski.txt

25

12/12/2013

Resources for Computational Drug Discovery

Numeric Row Splitter

26

12/12/2013

Resources for Computational Drug Discovery

Inspect the Lipinski fails


Right click on the node

27

12/12/2013

Resources for Computational Drug Discovery

Java Snippet

.\code\Oprea.txt

28

12/12/2013

Resources for Computational Drug Discovery

Numeric Row Splitter

29

12/12/2013

Resources for Computational Drug Discovery

Inspect the Oprea fails


Right click on the node

30

12/12/2013

Resources for Computational Drug Discovery

Numeric Row Splitter

31

12/12/2013

Resources for Computational Drug Discovery

Inspect the Solubility fails


Right click on the node

32

12/12/2013

Resources for Computational Drug Discovery

Any questions so far?


33

12/12/2013

Resources for Computational Drug Discovery

Part 3: Substructure-based filtering

34

12/12/2013

Resources for Computational Drug Discovery

Molecule to Indigo

35

12/12/2013

Resources for Computational Drug Discovery

File reader

36

12/12/2013

.\data\PAINS_clean_half.sdf

Resources for Computational Drug Discovery

Query Molecule to Indigo

37

12/12/2013

Resources for Computational Drug Discovery

Inspect the SMARTS rules

38

12/12/2013

Resources for Computational Drug Discovery

Chunk Loop Start

39

12/12/2013

Resources for Computational Drug Discovery

Substructure Matcher

40

12/12/2013

Resources for Computational Drug Discovery

Loop End

41

12/12/2013

Resources for Computational Drug Discovery

Inspect matched structures


Right click on the node

42

12/12/2013

Resources for Computational Drug Discovery

Reference Row Filter

43

12/12/2013

Resources for Computational Drug Discovery

Any questions so far?


44

12/12/2013

Resources for Computational Drug Discovery

Part 4: Diversity picking and plotting

45

12/12/2013

Resources for Computational Drug Discovery

RDKit Fingerprint

46

12/12/2013

Resources for Computational Drug Discovery

Inspect the fingerprints


Right click on the node

47

12/12/2013

Resources for Computational Drug Discovery

RDKit Diversity Picker

48

12/12/2013

Resources for Computational Drug Discovery

2D/3D Scatterplot

49

12/12/2013

Resources for Computational Drug Discovery

Inspect the plot


Right click on the node

50

12/12/2013

Resources for Computational Drug Discovery

Any questions so far?


51

12/12/2013

Resources for Computational Drug Discovery

Exercise 1

Read an sd file with drug information from ChEMBL


Inspect the structures and their properties
Select only drugs that were released after 1990 (First Approval)
Select only drugs that target human (Homo sapiens)
How many drugs remain now?
Save the workflow
Tips

52

Open a new workflow


Use the SDF Reader node
Use the Numeric Row Splitter node to filter on First Approval >= 1990
Use the Nominal Value Row filter node to filter on Organism = Homo
sapiens
12/12/2013

Resources for Computational Drug Discovery

Exercise 2

Continue from your previous workflow


Calculate MW and logP of the drug compounds
Generate a scatter plot of MW and logP
Can you see any compounds with high MW and logP?

Tips
Use the Molecule to RDKit node
Use the RDKit Descriptor Calculator node
Include the SlogP and ExactMW descriptors

Use the 2D/3D Scatterplot node

53

12/12/2013

Resources for Computational Drug Discovery

Any questions? Last chance!


54

12/12/2013

Resources for Computational Drug Discovery

Conclusions
Compound selection for focused screening
Typical scenario

KNIME
Open and free
Data analysis
Chemoinformatics toolkits
Erl Wood, RDKit, Indigo, CDK, etc.

Lots of other functionality

More advanced KNIME on Friday around lunch time

55

12/12/2013

Resources for Computational Drug Discovery

Further reading
Open data and tools
1. Irwin, J. J.; Sterling, T.; Mysinger, M. M.; Bolstad, E. S.; Coleman, R. G., ZINC:
A free tool to discover chemistry for biology. Journal of Chemical Information
and Modeling 2012 ASAP.
2. Saubern, S.; Guha, R.; Baell, J. B., KNIME workflow to assess PAINS filters in
SMARTS format. Comparison of RDKit and Indigo cheminformatics libraries.
Molecular Informatics 2011, 30, (10), 847-850.
3. Barnes, M. R.; Harland, L.; Foord, S. M.; Hall, M. D.; Dix, I.; Thomas, S.;
Williams-Jones, B. I.; Brouwer, C. R., Lowering industry firewalls: precompetitive informatics initiatives in drug discovery. Nature Reviews Drug
Discovery 2009, 8, (9), 701-708.
4. Berthold, M. R.; Cebron, N.; Dill, F.; Gabriel, T. R.; Ktter, T.; Meinl, T.; Ohl, P.;
Sieb, C.; Thiel, K.; Wiswedel, B., KNIME: The Konstanz Information Miner. In
Data Analysis, Machine Learning and Applications, Preisach, C.; Burkhardt, H.;
Schmidt-Thieme, L.; Decker, R., Eds. Springer: Berlin, 2008; pp 319-326.
5. Tiwari, A.; Sekhar, A. K. T., Workflow based framework for life science
informatics. Computational Biology and Chemistry 2007, 31, (5-6), 305-319.

56

12/12/2013

Resources for Computational Drug Discovery

Further reading
High throughput screening
1. Bajorath, J., Integration of virtual and high-throughput screening. Nature
Reviews Drug Discovery 2002, 1, (11), 882-894.
2. Harper, G.; Pickett, S. D.; Green, D. V. S., Design of a compound
screening collection for use in High Throughput Screening. Combinatorial
Chemistry & High Throughput Screening 2004, 7, (1), 63-70.

Lead- and drug-likeness


1. Chuprina, A.; Lukin, O.; Demoiseaux, R.; Buzko, A.; Shivanyuk, A., Drug- and
lead-likeness, target class, and molecular diversity analysis of 7.9 million
commercially available organic compounds provided by 29 suppliers. Journal of
Chemical Information and Modeling 2010, 50, (4), 470-479.
2. Lipinski, C. A., Lead- and drug-like compounds: the rule-of-five revolution. Drug
Discovery Today: Technologies 2004, 1, (4), 337-341.
3. Oprea, T. I.; Davis, A. M.; Teague, S. J.; Leeson, P. D., Is there a difference
between leads and drugs? A historical perspective. Journal of Chemical
Information and Computer Sciences 2001, 41, (5), 1308-1315.

57

12/12/2013

Resources for Computational Drug Discovery

Further reading
Physicochemical properties and drug discovery
1. Brstle, M.; Beck, B.; Schindler, T.; King, W.; Mitchell, T.; Clark, T., Descriptors,
physical properties, and drug-likeness. Journal of Medicinal Chemistry 2002, 45,
(16), 3345-3355.
2. Hill, A. P.; Young, R. J., Getting physical in drug discovery: A contemporary
perspective on solubility and hydrophobicity. Drug Discovery Today 2010, 15,
(15/16), 648-655.
3. Leeson, P. D.; Springthorpe, B., The influence of drug-like concepts on decisionmaking in medicinal chemistry. Nature Reviews Drug Discovery 2007, 6, (11), 881890.

Structural alerts in HTS


1. Baell, J. B.; Holloway, G. A., New substructure filters for removal of Pan Assay
Interference Compounds (PAINS) from screening libraries and for their exclusion in
bioassays. Journal of Medicinal Chemistry 2010, 53, (7), 2719-2740.
2. Rishton, G. M., Reactive compounds and in vitro false positives in HTS. Drug
Discovery Today 1997, 2, (9), 382-384.

58

12/12/2013

Resources for Computational Drug Discovery

Further reading
Similarity and diversity
1. Ashton, M.; Barnard, J.; Casset, F.; Charlton, M.; Downs, G.; Gorse, D.; Holliday,
J.; Lahana, R.; Willett, P., Identification of diverse database subsets using
property-based and fragment-based molecular descriptions. Quantitative
Structure-Activity Relationships 2002, 21, (6), 598-604.
2. Bender, A.; Glen, R. C., Molecular similarity: a key technique in molecular
informatics. Organic and Biomolecular Chemistry 2004, 2, 3204-3218.
3. Gorse, A.-D., Diversity in medicinal chemistry space. Current Topics in Medicinal
Chemistry 2006, 6, (1), 3-18.
4. Maldonado, A.; Doucet, J.; Petitjean, M.; Fan, B.-T., Molecular similarity and
diversity in chemoinformatics: From theory to applications. Molecular Diversity
2006, 10, (1), 39-79.
5. Rogers, D.; Hahn, M., Extended-connectivity fingerprints. Journal of Chemical
Information and Modeling 2010, 50, (5), 742-754.
6. Schuffenhauer, A.; Brown, N., Chemical diversity and biological activity. Drug
Discovery Today: Technologies 2006, 3, (4), 387-395.
7. Willett, P.; Barnard, J. M.; Downs, G. M., Chemical similarity searching. Journal
of Chemical Information and Computer Sciences 1998, 38, (6), 983-996.
59

12/12/2013

Resources for Computational Drug Discovery

Day 4: KNIME Practical


George Papadatos, ChEMBL group, EMBL-EBI
Francis Atkinson, ChEMBL group, EMBL-EBI

Anda mungkin juga menyukai