Chem-X at NCI
Daniel W. Zaharevitz, Frederick Biomedical Supercomputing Center,
Developmental Therapeutics Program, PRI/DynCorp
Introduction
Chem-X is a molecular modeling and database program. It is a product
of Chemical Design, Ltd. Chemical Design can be contacted at:
Roundway House, Cromwell Park,
Chipping Norton, Oxfordshire OX7 5SR
UK
Tel: (0608)644000
FAX: (0608)642244
200 Route 17S, Suite 120
Mahwah, NJ 07430
USA
Tel: (201)529-3323
FAX: (201)529-2443
This document attempts to explain a few Chem-X features that are
modified at the NCI. Parts of these explanations follow closely
the cited parts of the Chem-X Reference Manual, which should be
consulted for a much more through and detailed explanation. Of course,
if Chemical Design made their manual Web accessible then I could
just put hot links to their pages ( HINT, HINT).
Parameterisation in Chem-X is fully discussed in Chapter 15 of the Reference
Manual. The default parameters are found in $CDL_FILES/extended.mmp. The
associated parameterisation database is $CDL_FILES/extended.dbs. At the NCI
we have added two platinum atom types to the parameterisation by running
the addpt.log script. Note that we don't add all the
bond length and bond angle parameters, because without the ChemInorganic module
we really can't do much modeling of Pt compounds. There are no changes or
additions to the parameterisation database, so we just create an empty database
with the new parametisation and copy the old segments to the new database.
The parameter file and the associated database is kept in the $CDL_USERFRAG
directory and is automatically loaded on startup by calls in the
$CDL_CHEMX_MANAGER/chemx.ini file.
Generating Conformations
Generating conformations in Chem-X is documented in Chapter 21 of the
Reference Manual. The first step is to determine the bonds to be rotated.
Chem-X will determine this automatically. Terminal bonds ( for example, C-H )
are excluded as are bonds to terminal groups ending in identical atoms (
for example, -CH3, -OH, -CF3 ). Bonds that are in rings are excluded by default,
but they can be included. We have found the Chem-X ring conformation search to
be unacceptably slow for large database purposes and we routinely exclude
ring bonds from consideration. The number of points to be considered around
each rotatable bond is determined by the bond type. This parameter can be
set independently for the four different bond types:
- single bonds
- alpha ( sp2- sp3 ) bonds
- conjugated single bonds
- double bonds
The conformations are usually generated by systematically rotating the bonds,
but there is a switch to pick the conformations randomly out of the search
space ( see Reference Manual, Section 21.4.5).
Each conformation that is generated is evalauted and a decision made as to whether
to keep or discard the conformation. Obviously it would be ideal to make this decision
based a a detailed energy calculation, but a rule-based approach ( Reference Manual,
Section 21.4.3 )is the only practical
possibility for large databases. The rule based approach is based on the work of
Dolata ( Dolata, D.P., et. al. J. Comput.-Aided Mol. Des., 1987
, 1,73-85.) A rule is defined in terms of three central torsion angles
occuring in a six atom chain. The possible torsion angles are divided into a set of six
ranges and each range is represented by a letter. For single, conjugated single, or
double bonds the ranges are labeled
single, conjugated single, double bonds
and for alpha ( sp2 - sp3 ) bonds
alpha bonds
For example, n-hexane with the interior C-C bonds fully eclipsed would be described as aaa
while the fully extended, all trans geometry would be described by
ddd. A rule is a description of which atom type are in the six atom chain and a list
of torsion angle combinations that are considered high energy. There are four levels
of rules: soft, medium, hard and very hard. A conformation is rejected if it is listed
in:
- one or more hard or very hard rules
- two or more medium rules
- four or more soft rules
- one medium rule and two soft rules
The level of rule is represented by the case of the letters in the rule:
- all upper case indicate a very hard rule
- two upper case and one lowercase indicate a hard rule
- one upper case and two lower case indicate a medium rule
- all lower case indicate a soft rule
If no conformations about a bond will be accepted then the rules applied to that bond are
softened by one level. An example of a rule is the one for hydrogen, sp3 carbon, sp2 carbon,
sp3 carbon, sp3 carbon, any atom.
H/C+4/CSP2/C+4/C+4/*/
Khb/jhb/Ihb/hhb/Ghb/lhb/kgb/Jgb/igb/Hgb/ggb/Lgb/kla/jla/ila/hla/gla/lla/KKa/ -
JKa/IKa/HKa/GKa/LKa/KJa/JJa/IJa/HJa/GJa/LJa/kia/jia/iia/hia/gia/lia/KHA/JHa/ -
IHA/HHa/GHA/LHa/KGa/JGA/IGa/HGA/GGa/LGA/Khf/jhf/Ihf/hhf/Ghf/lhf/kgf/Jgf/igf/ -
Hgf/ggf/Lgf/kle/jle/ile/hle/gle/lle/KKe/JKe/IKe/HKe/GKe/LKe/KJe/JJe/IJe/HJe/ -
GJe/LJe/kie/jie/iie/hie/gie/lie/KHE/JHe/IHE/HHe/GHE/LHe/KGe/JGE/IGe/HGE/GGe/ -
LGE/Khd/jhd/Ihd/hhd/Ghd/lhd/kgd/Jgd/igd/Hgd/ggd/Lgd/klc/jlc/ilc/hlc/glc/llc/ -
KKc/JKc/IKc/HKc/GKc/LKc/KJc/JJc/IJc/HJc/GJc/LJc/kic/jic/iic/hic/gic/lic/KHC/ -
JHc/IHC/HHc/GHC/LHc/KGc/JGC/IGc/HGC/GGc/LGC/
Chem-X uses keys as a way to rapidly screen out compounds that could not possibly satisfy
a query. Keys are discussed in Chapter 29 of the Reference Manual.
There are three types of keys in Chem-X:
- formula key - number of various classes in the molecule, such as
hydrogen bond acceptor, halogens, oxygens, etc. see Appendix L.2 in the Reference Manual.
- bond key - number of various bond patterns in the molecule. see Appendix L.3 in the Reference
Manual
- 3D distance keys - bin coded distance between centers in the molecule.
The 3D distance keys require a bit more explanation. In order to save time and
space, Chem-X only calculates distances between important centers for its 3D
distance keys. There are four type of centers:
- Hydrogen bond donor
- Hydrogen bond acceptor
- Postitive charge center
- Ring center
In newer version of Chem-X ( Jan '94 ) there is the possibility of using
a hydrophobic center, but the use of this center type is still under
investigation.
The specific definitions of these centers is part of the parameterization
( see Appendix L.1 ) and the parameterization fragment database. There is a
31 bit distance key for each center type - center type distance for a total of
10 keys in all ( see reference manual page 29-38 ).
Each bit in the key represents
a specific distance range.
For each conformation that is accepted, all center to center distances are calculated
and the appropriate bits to set are determined. The each key stored for a given molecule
is the logical OR of the keys for all accepted conformations.
ChemLib is a set of library routines that allow the user access to the Chem-X
graphics routines and the Chem-X data structures. It allows the user to write
custom routines to manipulate Chem-X data. Full details of ChemLib can be found
in the ChemLib Programming Guide. At the NCI we have written ChemLib routines
to handle:
- Counting chiral centers in a database.
- Writing the 3D Keys to an ASCII file.