Anda di halaman 1dari 24

Babel molecular structure le conversion

version 3.3

OpenEye Scientic Software, Inc.

July 11, 2007

9 Bisbee Court, Suite D Santa Fe, NM 87508 www.eyesopen.com support@eyesopen.com

c 1997-2007 OpenEye Scientic Software, Santa Fe, New Mexico. All rights reserved. Copyright

All rights reserved. This material contains proprietary information of OpenEye Scientic Software. Use of copyright notice is precautionary only and does not imply publication or disclosure. The information supplied in this document is believed to be true but no liability is assumed for its use or the infringement of the rights of others resulting from its use. Information in this document is subject to change without notice and does not represent a commitment on the part of OpenEye Scientic Software. This package is sold/licensed/distributed subject to the condition that it shall not, by way of trade or otherwise, be lent, re-sold, hired out or otherwise circulated without OpenEye Scientic Softwares prior consent, in any form of packaging or cover other than that in which it was produced. No part of this manual or accompanying documentation, may be reproduced, stored in a retrieval system on optical or magnetic disk, tape, CD, DVD or other medium, or transmitted in any form or by any means, electronic, mechanical, photocopying recording or otherwise for any purpose other than for the purchasers personal use without a legal agreement or other written permission granted by OpenEye. This product should not be used in the planning, construction, maintenance, operation or use of any nuclear facility nor the ight, navigation or communication of aircraft or ground support equipment. OpenEye Scientic software, shall not be liable, in whole or in part, for any claims arising from such use, including death, bankruptcy or outbreak of war. Windows is a registered trademark of Microsoft Corporation. Apple and Macintosh are registered trademarks of Apple Computer, Inc. AIX and IBM are registered trademarks of International Business Machines Corporation. UNIX is a registered trademark of the Open Group. RedHat is a registered trademark of RedHat, Inc. Linux is a registered trademark of Linus Torvalds. Alpha is a trademark of Digital Equipment Corporation. SPARC is a registered trademark of SPARC International Inc. SYBYL is a registered trademark of TRIPOS, Inc. MDL is a registered trademark and ISIS is a trademark of MDL Information Systems, Inc. SMILES, SMARTS, and SMIRKS may be trademarks of Daylight Chemical Information Systems. Macromodel is a trademark of Schr odinger, Inc. Schr odinger, Inc may be a wholly owned subsidiary of the Columbia University, New York. Python is a trademark of the Python Software Foundation. Java is a trademark or registered trademark of Sun Microsystems, Inc. in the U.S. or other countries. The forefront of chemoinformatics is a trademark of Daylight Chemical Information Systems, Inc. Other products and software packages referenced in this document are trademarks and registered trademarks of their respective vendors or manufacturers.

CONTENTS

1 2

Introduction Theory 2.1 Introduction . . . . . . . 2.2 Formats . . . . . . . . . . 2.3 Flavors . . . . . . . . . . 2.4 Multiconformer databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 2 2 2 3 7 8 8 8 9 9 9 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 18 18 19 19 20 21 21 21

Installation and Licensing 3.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Usage 4.1 Command line interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Command line options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Example executions

A Release Notes A.1 Babel 3.3 . . A.2 Babel3 2.2 . A.3 Babel3 2.1 . A.4 Babel3 2.0 . A.5 Babel2 2.0b1

B Known problems and caveats B.1 Reporting of non-compliant molecule records . . . . . . . . . . . . . . . . . . . . . . . B.2 Aromaticity perception of very large ring systems . . . . . . . . . . . . . . . . . . . .

ii

CHAPTER

ONE

Introduction
The OpenEye Babel program interconverts molecule les among several supported formats. Babel is built upon and in most ways a thin wrapper around OEChem, the OpenEye toolkit for chemistry, and the chemoinformatics foundation for most OpenEye applications. Babel is intended to provide convenient and exible access to the le interconversion capabilities integral to the OpenEye suite of software. The program name Babel has a rich and proud history involving Arizona (a U.S. state formerly in the New Mexico territories), the open source software movement, intrigue, daring, and several colorful characters. For now may it sufce to say that this incarnation, the OpenEye Babel application, should not be confused with OpenBabel, the original open source Babel, or OELibs Babel.

CHAPTER

TWO

Theory

2.1

Introduction

Babel should not really need a theory manual. The relevant theory is OEChem theory. However, a few points are worth noting. 1. OEChem adheres to format specications insofar as they are dened by authoritative documentation. 2. Format variants are generally handled by avors. 3. In some cases de facto format variants exist by virtue of their general usage, which OEChem may support when not in conict with the dened format (e.g. MOL2 les with absent hydrogens). 4. Different formats not only differ in their encoding, but also differ in the information represented, e.g., the molecular representation. Where source information is absent, use of the term conversion is a something of a stretch, as information must be inferred (e.g., bonds from PDB les). Thus, is not always possible to guarantee correct output for all conversions, when correctness is dened by information not strictly contained in the data source. 5. OEChem is specically designed to handle and interconvert the different chemical models intrinsic to certain formats, for example, the varying aromaticity models of Tripos, MDL, and Daylight.

2.2

Formats

The current list of supported formats can be displayed by running babel -helpformats, and are as follows:

2.3. Flavors

extension smi mdl,mol,rxn ent,pdb mol2,syb bin tdt ism,isosmi mol2h sd,sdf can mf xyz fasta,seq mopac,pac oeb dat,mmd,mmod sln rd,rdf cdx skc

OEChem code ( 1) OEFormat::SMI ( 2) OEFormat::MDL ( 3) OEFormat::PDB ( 4) OEFormat::MOL2 ( 5) OEFormat::BIN ( 6) OEFormat::TDT ( 7) OEFormat::ISM ( 8) OEFormat::MOL2H ( 9) OEFormat::SDF (10) OEFormat::CAN (11) OEFormat::MF (12) OEFormat::XYZ (13) OEFormat::FASTA (14) OEFormat::MOPAC (15) OEFormat::OEB (16) OEFormat::MMOD (17) OEFormat::SLN (18) OEFormat::RDF (19) OEFormat::CDX (20) OEFormat::SKC

format SMILES MDL Mol PDB Tripos MOL2 OEBinary v1 Daylight TDT Isomeric SMILES MOL2 with H MDL SDF Canonical SMILES Molecular Formula XYZ FASTA MOPAC OEBinary v2 Macromodel Tripos SLN MDL RDF ChemDraw CDX MDL ISIS Sketch

read yes yes yes yes yes no yes yes yes yes no yes yes no yes yes no yes yes yes

write yes yes yes yes no no yes yes yes yes yes yes yes yes yes yes yes no yes no

This list is completely determined by the OEChem library upon which Babel is built. Similarly, all applications with OEChem inside can normally handle these same formats.

2.3

Flavors

For each supported format there may be input and output avors. Some avors are generic and applicable to multiple formats. If none are specied the default avor will be used. Babel is intended to expose all the format avors dened in the ofcial OEChem API. To list the available avors, run babel --help all:
prompt> babel --help all :jGf: :jGDDDDf: ,fDDDGjLDDDf, ,fDDLt: :iLDDL; ;fDLt: :tfDG; ,jft: ,ijfffji, :iff .jGDDDDDDDDDGt. ;GDDGt::tDDDG, .DDDG: :GDDG. ;DDDj tDDDi

______ _ _ | ___ \ | | | | | |_/ / __ _| |__ ___| | | ___ \/ _ | _ \ / _ \ | | |_/ / (_| | |_) | __/ | \____/ \__,_|_.__/ \___|_| babel - molecular structure file conversion

Chapter 2. Theory

,DDDf fDDD, LDDDt. .fDDDj .tDDDDfjtjfDDDGt :ifGDDDDDGfi. .:::. ...................... DDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDD

version: 3.3 copyright (c) 2005,2006,2007 OpenEye Scientific Software, Inc. OEChem version: 1.4.3 platform: osx-10.4-g++3.3-G4 built: 20070401

licensee: OpenEye site: Albuquerque Complete parameter list basic : file i/o and other top level options -in : input file -out : output file -firstonly : convert first molecule -helpformats : list supported formats -n : convert only N molecules (0 means all) -skip : skip first N molecules -v : verbose -vv : very verbose advanced -ifmt : input format -ofmt : output format -add2d : add 2D coordinates -chunk : split the output into several files -chunk_filecount : number of chunk output files -chunk_molcount : molecules per chunk output file -chunk_prefix : file prefix for output chunk files (default is <inbase>_XX) -hydrogens : hydrogen handling -igz : ungzip input -input_params : read execution parameters from file -mc : treat the file as multi-conformer -mc2sc : one output scmol for each input mcmol -mc_isomer : treat the file as multi-conformer (isomeric) -mc_titles : mc perception title-sensitive -mdlstereocorrect : correct non-compliant MDL stereo if possible -molcount : write molcount to stdout -nowarn : supress warnings -ogz : gzip output -output_names : write molecule names to file -output_params : write execution parameters to file -parts2mols : split out connected components -perceive_residues : perceive macromolecular residues -perceive_residues_preserve_All -perceive_residues_preserve_AlternateLocation -perceive_residues_preserve_ChainID -perceive_residues_preserve_HetAtom -perceive_residues_preserve_InsertCode -perceive_residues_preserve_ResidueName -perceive_residues_preserve_ResidueNumber -perceive_residues_preserve_SerialNumber -quiet : minimal verbosity, no banner

2.3. Flavors

-sc : handle as single-conformer molecules -sd2title : copy specified SD data to title -stereofrom3d : perceive stereo from input 3D flavors, input generic : generic (format non-specific) input flavors -iAroMask -iFlavorNone : raw, no standardizations -iGenericMask -iOEAroModelDaylight -iOEAroModelMDL -iOEAroModelMMFF -iOEAroModelOpenEye -iOEAroModelTripos -iRings flavors, output generic : generic (format non-specific) output flavors -oAroMask -oFlavorNone : raw, no standardizations -oGenericMask -oOEAroModelDaylight -oOEAroModelMDL -oOEAroModelMMFF -oOEAroModelOpenEye -oOEAroModelTripos -oRings flavors, input format specific : format specific input flavors flavors, mmod specific -mmodiDefault : default flavors -mmodiFormalCrg flavors, mol2 specific -mol2iDefault : default flavors -mol2iM2H flavors, pdb specific -pdbiALL : read all atoms including alternate locations, dummy atoms, etc. -pdbiAllMask -pdbiBasicMask -pdbiBondOrder -pdbiCHARGE : read partial charges from b-factor field -pdbiConnect -pdbiDATA : preserve header data as generic data -pdbiDELPHI : combines -pdbiCHARGE and -pdbiRADIUS -pdbiDefault : default flavors -pdbiEND : read END as separator -pdbiENDM : read ENDM as separator -pdbiExtraMask -pdbiFormalCrg -pdbiImplicitH -pdbiRADIUS : read atomic radius from occupancy field -pdbiRings -pdbiTER : read TER as separator -pdbiTerMask

Chapter 2. Theory

flavors, smiles specific -smiiCanon -smiiDefault : default flavors -smiiStrict flavors, xyz specific -xyziBondOrder -xyziConnect -xyziDefault : default flavors -xyziExtraMask -xyziFormalCrg -xyziImplicitH -xyziRings

flavors, output format specific : format specific output flavors flavors, mdl specific -mdloCurrentParity : write internal parity -mdloDefault : default flavors -mdloMCHG : write MCHG and MRAD fields for charged/radical atoms -mdloMDLParity : write MDL parity -mdloMISO : write ISO field for isotopes -mdloMMask -mdloMRGP : write RGP field for each R-group atom -mdloMV30 : MDL V3000 format -mdloNoParity : write no parity -mdloPMask flavors, mf specific -mfoDefault : default flavors -mfoTitle flavors, mmod specific -mmodoAtomTypes -mmodoDefault : default flavors flavors, mol2 specific -mol2oAtomNames -mol2oAtomTypeNames -mol2oBondTypeNames -mol2oDefault : default flavors -mol2oHydrogens -mol2oNameMask -mol2oOrderAtoms -mol2oSubstructure flavors, mopac specific -mopacoCHARGES : write charges -mopacoDefault : default flavors -mopacoXYZ : cartesian coords (default is internal coords/z-matrix) flavors, pdb specific -pdboBONDS : write CONECT records (all single without -pdboORDERS) -pdboBOTH : bi-directional CONECT records -pdboCHARGE : write partial charges to b-factor field -pdboCurrentResidues

2.4. Multiconformer databases

-pdboDELPHI : combines -pdboCHARGE and -pdboRADIUS -pdboDefault : default flavors -pdboELEMENT -pdboFormalCrg -pdboHETBONDS -pdboNoResidues -pdboOEResidues -pdboORDERS : include bond orders in CONECT records -pdboOrderAtoms -pdboRADIUS : write atomic radii to occupancy field -pdboTER : terminate with TER rather than END flavors, smiles specific -smioAtomMaps -smioAtomStereo -smioBondStereo -smioCanonical -smioDefault : default flavors -smioExtBonds -smioHydrogens -smioImpHCount -smioIsotopes -smioKekule -smioRGroups -smioSmiMask -smioSuperAtoms

2.4

Multiconformer databases

OEChem and Babel are designed to handle multiconformer les in a consistent way across formats. OEBinary is an explicitly multiconformer format. Others are not, so consistent rules must exist to determine whether subsequent molecules are in fact conformers of the same molecule. Controls exist to specify whether stereoisomers are considered the same or different molecules, and whether titles should distinguish molecules. Within one multiconformer molecule, OEChem requires that atoms and bonds must be identically ordered.

CHAPTER

THREE

Installation and Licensing

3.1

Installation

As with other OpenEye packages, Babel is normally shipped as a gzipped tarball, to be installed into a subdirectory openeye (normally /usr/local/openeye). For example:
prompt> cd /usr/local prompt> tar xzvf $HOME/babel-3.3-centos-3.6-i586.tar.gz

3.2

Licensing

Babel requires a valid OpenEye license for the product OEChem and only OEChem licensees are entitled to use Babel. (One feature, generating 2D coordinates, invoked by the option -add2d, requires an Ogham (oedepict) license, but this is not required for all other operations.) The license le should be dened by environment variable OE_LICENSE. To request an evaluation license use OpenEyes online request form. If already a licensee, contact support@eyesopen.com or your local system administrator for a license le. To purchase OEChem and Babel, or other OpenEye software, contact business@eyesopen.com.

CHAPTER

FOUR

Usage

4.1

Command line interface

The command line interface is similar to that of other OpenEye applications. Normally, input and output le formats are implied by lename extensions. Also, gzipped input and output are allowed and implied by the .gz extension. Standard input and output can be used by specifying only the le extension. Extensive online help is available. The simplest way to get started is just to type babel and follow the directions.

4.2
4.2.1

Command line options


Basic

-in le containing input molecules -out le containing output molecules -firstonly Convert rst molecule only and exit. -helpformats List supported formats. -n Convert rst n-molecules (0, the default, means all). -skip Skip rst n-molecules. -v verbose -vv very verbose

10

Chapter 4. Usage

4.2.2

Advanced

-ifmt input format specication. Not normally needed, since formats are implied by lename extensions. Specify by extension (e.g., mol2, sdf, smi, etc.). -ofmt output format specication. Not normally needed, since formats are implied by lename extensions. Specify by extension (e.g., mol2, sdf, smi, etc.). Useful with -chunk. -add2d Generate 2D coordinates and include in the output. This functionality is disabled in the absence of a valid Ogham (oedepict) license. -add2d is incompatible with output formats which cannot represent 2D. -chunk Split, a.k.a. chunk, the output into several les. See also -chunk prex, -chunk molcount, and -chunk lecount. The default output le prex is the input le directory and basename, and the specied output format. Output format should be specied by -ofmt and -ogz. -chunk filecount number of chunk output les -chunk molcount molecules per chunk output le -chunk prefix le prex for output chunk les (default is INPATH/INBASE XX) -hydrogens Hydrogen handling: allowed values are add, delete and same (meaning same as input). -igz Ungzip input. Not normally needed, since le extension can imply gzipped. -input params read execution parameters from le. -mc Treat the le as multi-conformer. Not needed for OEB which is multi-conformer by default. -mc2sc One output scmol for each input mcmol. -mc isomer Treat the le as multi-conformer (isomeric). -mc titles Multi-conformer perception title-sensitive. -mdlstereocorrect Correct non-compliant MDL stereo if possible. -molcount Write molecule count to stdout. -nowarn Supress warnings. -ogz Gzip output. Not normally needed, since le extension can imply gzipped. Useful with -chunk. -output names Write molecule names (titles) to le. -output params write execution parameters to le. -parts2mols Split out connected components. -perceive residues perceive macromolecular residues

4.2. Command line options

11

-perceive residues preserve All -perceive residues preserve AlternateLocation -perceive residues preserve ChainID -perceive residues preserve HetAtom -perceive residues preserve InsertCode -perceive residues preserve ResidueName -perceive residues preserve ResidueNumber -perceive residues preserve SerialNumber -quiet minimal verbosity, no banner -sc Handle input and output as single-conformer molecules. This is the default for all input formats except OEBinary v1 and v2 (.bin and .oeb). -sd2title Copy specied SD data to title. -stereofrom3d perceive stereo from input 3D

4.2.3

Format avors

If no format avor ags are invoked, Babel will use the default avors for the specied input and output formats. These defaults are also available by using a avor which combines the individual avors; for example, -mdloDefault. To see what these defaults are, view the detailed help for the specic default avor (e.g., babel --help -mdloDefault). If any input avors are specied, the user must take full control over all input avors. Likewise for output avors. So, to add one avor to the defaults, that avor should be used in combination with the default avor (e.g., babel -mdloDefault -mdloMV30). Combining avors involves a bitwise OR-ing of an integer datatype which represents a binary array for these purposes. Babel reports avors used as hex integers. Generic input avorings (format non-specic) -iAroMask -iFlavorNone Raw, no standardizations. -iGenericMask -iOEAroModelDaylight Daylight aromaticity model. -iOEAroModelMDL MDL aromaticity model. -iOEAroModelMMFF MMFF aromaticity model.

12

Chapter 4. Usage

-iOEAroModelOpenEye OpenEye aromaticity model. -iOEAroModelTripos Tripos aromaticity model. -iRings Perceive rings. Generic output avorings (format non-specic) -oAroMask -oFlavorNone Raw, no standardizations. -oGenericMask -oOEAroModelDaylight Daylight aromaticity model. -oOEAroModelMDL MDL aromaticity model. -oOEAroModelMMFF MMFF aromaticity model. -oOEAroModelOpenEye OpenEye aromaticity model. -oOEAroModelTripos Tripos aromaticity model. -oRings Input format specic avorings: mmod -mmodiDefault default avors -mmodiFormalCrg Input format specic avorings: mol2 -mol2iDefault default avors -mol2iM2H Input format specic avorings: pdb -pdbiALL read all atoms including alternate locations, dummy atoms, etc. -pdbiAllMask -pdbiBasicMask -pdbiBondOrder -pdbiCHARGE read partial charges from b-factor eld

4.2. Command line options

13

-pdbiConnect -pdbiDATA preserve header data as generic data -pdbiDELPHI combines -pdbiCHARGE and -pdbiRADIUS -pdbiDefault default avors -pdbiEND read END as separator -pdbiENDM read ENDM as separator -pdbiExtraMask -pdbiFormalCrg -pdbiImplicitH -pdbiRADIUS read atomic radius from occupancy eld -pdbiRings -pdbiTER read TER as separator -pdbiTerMask Input format specic avorings: smiles -smiiCanon skips Kekulization test -smiiDefault default avors -smiiStrict disallow format extensions Input format specic avorings: xyz -xyziBondOrder -xyziConnect -xyziDefault default avors -xyziExtraMask -xyziFormalCrg -xyziImplicitH -xyziRings

14

Chapter 4. Usage

Output format specic avorings: mdl -mdloCurrentParity write internal parity -mdloDefault default avors -mdloMCHG write MCHG and MRAD elds for charged/radical atoms -mdloMDLParity write MDL parity -mdloMISO write ISO eld for isotopes -mdloMMask -mdloMRGP write RGP eld for each R-group atom -mdloMV30 MDL V3000 format -mdloNoParity write no parity -mdloPMask Output format specic avorings: mf -mfoDefault default avors -mfoTitle include title Output format specic avorings: mmod -mmodoAtomTypes -mmodoDefault default avors Output format specic avorings: mol2 -mol2oAtomNames -mol2oAtomTypeNames -mol2oBondTypeNames -mol2oDefault default avors -mol2oHydrogens -mol2oNameMask -mol2oOrderAtoms -mol2oSubstructure

4.2. Command line options

15

Output format specic avorings: mopac -mopacoCHARGES write charges -mopacoDefault default avors -mopacoXYZ cartesian coords (default is internal coords/z-matrix) Output format specic avorings: pdb -pdboBONDS write CONECT records (all single without -pdboORDERS) -pdboBOTH write bi-directional CONECT records -pdboCHARGE write partial charges to b-factor eld -pdboCurrentResidues -pdboDELPHI combines -pdboCHARGE and -pdboRADIUS -pdboDefault default avors -pdboELEMENT writes the chemical symbol in columns 77-78 of the output -pdboFormalCrg writes non-zero formal charges in columns 79-80 (and implies -pdboELEMENT) -pdboHETBONDS all bonds between (and to/from) hetero atoms are written to the output PDB le -pdboNoResidues -pdboOEResidues -pdboORDERS include bond orders in CONECT records -pdboOrderAtoms -pdboRADIUS write atomic radii to occupancy eld -pdboTER terminate with TER rather than END Output format specic avorings: smiles -smioAtomMaps -smioAtomStereo -smioBondStereo -smioCanonical -smioDefault default avors

16

Chapter 4. Usage

-smioExtBonds -smioHydrogens -smioImpHCount -smioIsotopes -smioKekule -smioRGroups -smioSmiMask -smioSuperAtoms

CHAPTER

FIVE

Example executions
1. prompt > babel in foo . sdf out bar . mol2 Convert SDF le to MOL2 le. 2. prompt > babel in foo . sdf . gz out bar . smi Convert gzipped SDF le to SMILES. 3. prompt > babel foo . sdf . gz bar . smi Convert SDF le to MOL2 le using shortcut keyless syntax. 4. prompt > cat foo . sdf . gz | babel in . sdf . gz out bar . smi Convert gzipped SDF stream from stdin to SMILES. 5. prompt > babel in mongodb . sdf . gz out bar . oeb . gz mc Convert gzipped SDF le to OEBinary multiconformer le, where consecutive molecules may be interpreted as conformers of the same molecule. 6. prompt > babel in mongodb . sdf . gz out bar . oeb . gz mc_isomer Convert gzipped SDF le to OEBinary multiconformer le, where consecutive molecules may be interpreted as conformers of the same molecule, but different stereoisomers are considered different molecules. 7. prompt > babel in mongodb . sdf . gz mc quiet Create no output but report counts for input SDF le handled as multiconformer, without fanfare. 8. prompt > babel \
in mongodb . sdf . gz \ ofmt sdf \ ogz \ chunk \ chunk_prefix datadir / mongo_part \ chunk_molcount 1 0 0 0 0 0

Split the input le into several les containing 100000 molecules each, in .sdf.gz format. 17

APPENDIX

Release Notes

A.1

Babel 3.3

July 2007 v3.3 1. Babel 3.3 is a minor update from Babel3 v2.2, largely to provide full compatibility with OEChem 1.5.0. 2. The name of the program has been changed from babel3 to babel, for simplicity. Incrementing the major version number only reects this name change. 3. Option -output names is added to extract a list of names. 4. Option -mdlstereocorrect is added to correct non-compliant MDL stereo if possible. 5. Option -molcount is added for use in automation. 6. Bug xed: Fixed -chunk so -n 0 is allowed. 7. Options added to -perceive residues -perceive residues -perceive residues -perceive residues -perceive residues

allow residue perception with ne control: -perceive residues preserve AlternateLocation preserve ChainID -perceive residues preserve HetAtom preserve InsertCode -perceive residues preserve ResidueName preserve ResidueNumber -perceive residues preserve SerialNum preserve All

A.2

Babel3 2.2

October 2006 v2.2 1. Babel3 2.2 is a minor update largely to provide full compatibility with OEChem 1.4.2. Thereby, writing of the MDL V3000 le format is added (use -mdloDefault -mdloMV30). (Note that 18

A.3. Babel3 2.1

19

a spurious warning is generated by OEChem 1.4.2 when the atom count exceeds 999 with MDL V3000 output.) (Note that V3000 reading is not yet available.) Also included are improvements to PDB handling. 2. Option -quiet added, to avoid banner and verbose messages. 3. Options -chunk, -chunk prefix, -chunk molcount and -chunk filecount added, to facilitate the task of splitting an input le into several output les. This can be useful for simple divide and conquer approaches to parallel processing. This replicates the functionality of the Rocs auxiliary program chunker but with format control. 4. Option -add2d added, to generate and add 2D coordinates. This feature is disabled in the absence of a valid Ogham (oedepict) license. Very useful for generating 2D SD les from 3D or 0D input (e.g. PDB, SMILES). 2D coordinates are required for preserving cis/trans stereochemistry in SD les. This task can also be done with the Ogham program depict. 5. Option -stereofrom3d added, to perceive stereochemistry from 3D coordinates, for writing to a non-3D format.

A.3

Babel3 2.1

April 2006 v2.1 1. Babel3 2.1 is a minor update largely to provide full compatibility with OEChem 1.4.0. Thereby, reading of the MDL ISIS Sketch File format is added.

A.4

Babel3 2.0

August 2005 v2.0 1. Babel3 2.0 is the rst ofcially supported version of this program. Prior to this, the program Babel2 (babel2.cpp) was an unsupported OEChem example, despite its critical functionality. Hence the promotion. Babel2 v2.0b1 was this programs beta, prior to renaming. 2. Built with OEChem 1.3.4. 3. Performance has been improved signicantly (x3 in typical use). 4. Fixed one-off bug in mol count with -n option. 5. Keyless syntax enabled; e.g., babel foo.sdf bar.mol2. 6. Multiconformer perception for XYZ and PDB format disallowed, due to fundamental format limitations (ref: OEChem 1.3.4 manual).

20

Appendix A. Release Notes

A.5

Babel2 2.0b1

May 2005 v2.0b1 1. This beta release (called Babel2) reects the form and features of the planned Babel 2.0, to be the rst ofcially supported version. Prior to this, the program (babel2.cpp) has been an unsupported OEChem example, despite its critical functionality. Hence the promotion.

APPENDIX

Known problems and caveats

B.1

Reporting of non-compliant molecule records

The Babel program makes use of the OEChem function OEReadMolecule() and input molecule streams. This is a highly robust method for processing input data including recovering from input errors and broken, non-compliant les. However, error recovery sometimes entails not reporting input errors, or not correlating them precisely with input records.

B.2

Aromaticity perception of very large ring systems

The OpenEye aromaticity model considers all rings including non-SSSR, and with unlimited size. Thus with some large ringsystems (esp. larger buckminsterfullerenes) the algorithm can be impractically slow (days). This problem does not affect Babels normal task of processing of small organic molecules or oligomeric macromolecules.

21