Docstoc

SMILES

Document Sample
SMILES Powered By Docstoc
					               SMILES
• Simplified Molecular Input Line Entry
  System (SMILES)
• Widely used AND computationally
  efficient
• Uses atomic symbols and a set of
  intuitive rules
• Uses hydrogen-suppressed molecular
  graphs (HSMG)
          SMILES Bonds
      SINGLE*            -

      DOUBLE         =

      TRIPLE         #

     AROMATIC*           :
* can be omitted
              Butanols
                             O
 2-Butanol

iso-Butanol              O




tert-Butanol             O
       SMILES Branches
• Represented by enclosure in
  parentheses
• Can be nested or stacked
• Examples:
             CC(O)CC is 2-Butanol
             OCC(C)C is iso-Butanol
            OC(C)(C)C is tert-Butanol
           SMILES Bonds
        Ethene                C=C
     Chloroethene            ClC=C
  1,1-Dichloroethene       ClC(Cl)=C
cis-1,2-Dichloroethene      ClC=CCl
    Trichloroethene       ClC(Cl)=CCl
   Perchloroethene       ClC(Cl)=C(Cl)Cl
          SMILES Atoms
• Use normal chemical symbols
• Add punctuation symbols if necessary
• No super- or subscripts
        SMILES Symbols
• String of alphanumeric characters and
  certain punctuation symbols
• Terminates at the first space
  encountered when read left to right
• The ORGANIC SUBSET:
        B, C, N, O, P, S, F, Cl, Br, I
      Other SMILES Atoms
• Aliphatic or nonaromatic carbon: C
• Atom in aromatic ring: lowercase letter
• Designate ring closure with pairs of
  matching digits, e.g.
     c1ccccc1 (or C1=CC=CC=C1) is Benzene,
      whereas
     C1CCCCC1 is Cyclohexane
        SMILES Charges
• Specify attached hydrogens and
  charges in square brackets
• Number of attached hydrogens is the
  symbol H followed by optional digit
  SMILES Charges
   [H+]         proton
  [OH-]    hydroxyl anion
[OH3+]    hydronium cation
 [Fe++]     iron(II) cation
[NH4+]    ammonium cation
   SMILES Cyclic Structures
• Break one single or one aromatic bond
  in each ring
• Number in any order
  – Designate ring-breaking atoms by the
    same digit following the atomic symbol
          Cyclic Structures
• Numbers indicate start and stop of ring
• Same number indicates start and end of the
  ring, entered immediately following the
  start/end atoms
• Only numbers 1 – 9 are used
• A number should appear only twice
• Atom can be associated w. 2 consecutive
  numbers, e.g., Napthalene: c12ccccc1cccc2
         Naphthalene



c12ccccc1cccc2
      SMILES Conventions
• Avoid two consecutive left parentheses
  if possible
• Strive for the fewest number of possible
  branches
• Tautomeric bonds are not designated;
  enter the appropriate form
      Further Restrictions
• A branch cannot begin a SMILES
  notation
• A branch cannot immediately follow a
  double- or triple-bond symbol
• Example: C=(CC)C is invalid, but
• C(=CC)C or C(CC)=C are valid SMILES
          SMILES Fragments
•   Nitro             •   N(=O)(=O)
•   Nitrate           •   ON(=O)(=O)
•   Nitrite           •   ON(=O)
•   Sulfonic acid     •   S(=O)(=O)O
•   Cyanide/Nitrile   •   C#N
•   Azide             •   N=N#N
•   Azido             •   N+=N-
SMILES Metals
[Al] [As] [Au] [Be]
[Bi] [Cd] [Ca] [Fe]
[Hg] [K] [Li] [Mg]
[Na] [Ni] [Pt] [Sb]
    [Sn] [Zn] [Zr]
   Disconnected Structures
• Indicated by a dot
• Tetramethyl ammonium bromide

          C[N+]C(C)C.[Br-]
 Isomeric and Chiral SMILES
• Isomeric configuration indicated by
  forward and backward slashes: / \
• Examples:
  – trans-1,2-dibromoethene: Br/C=C/Br
     • Direction of the slash continues
  – cis-1,2-dibromoethene: Br/C=C\Br
     • Direction of the slash reverses
• Chirality indicated by the “@” symbol
         Some Applications
• JMDraw/SMILESViewer (Christoph
  Steinbeck)
• JME Molecular Editor (Peter Ertl)
• STN Express (SMILES as output)
• Tripos (dbtranslate: SMILES to MOL)
• Marvin (Ferenc Csizmadia)
  http://chemaxon.com/marvin/
• CACTVS http://www2.ccc.uni-erlangen.de/cactvs/
       Another Application
• SMILESCAS Database
  http://www.syrres.com/esc/smilecas.htm
  Over 103,000 SMILES notations
• Input CAS Registry Number
• Leads to SMILES and thence to a
  structure search

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:1/22/2012
language:English
pages:21