SMARTS Examples

Document Sample
SMARTS Examples Powered By Docstoc
					734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                             Page 1
of 25

                                 SMARTS Examples
Table of Contents
1. Introduction
2. Functional Groups by Element
2. Gross Structual Features
4. Meta-SMARTS
5. Electron & Proton Features
6. Breakdown of Complex SMARTS
7. Interesting Example SMARTS

1. Introduction
When using SMARTS to do searches, it is often helpful to have example queries from which to start. This
document contains many potentially useful example SMARTS which may be used to perform searches.
templates, examples and ideas.

These SMARTS have been tested, but they may still contain errors. Please send corrections, improvements,
additions, and questions to support@daylight.com.

2. Functional Groups by Element
    C                   C&O                    H           N            O           P         S            X



C
alkane

Alkyl Carbon
       [CX4]


alkene (-ene)

Allenic Carbon
       [$([CX2](=C)=C)]
Vinylic Carbon
       [$([CX3]=[CX3])]
       Ethenyl carbon


alkyne (-yne)
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                 Page 2
of 25
Acetylenic Carbon
       [$([CX2]#C)]


arene (Ar , aryl-, aromatic hydrocarbons)

Arene
        c


C&O
carbonyl

Carbonyl group. Low specificity
      [CX3]=[OX1]
      Hits carboxylic acid, ester, ketone, aldehyde, carbonic acid/ester,anhydride, carbamic acid/ester, acyl
      halide, amide.
Carbonyl group
      [$([CX3]=[OX1]),$([CX3+]-[OX1-])]
      Hits either resonance structure
Carbonyl with Carbon
      [CX3](=[OX1])C
      Hits aldehyde, ketone, carboxylic acid (except formic), anhydride (except formic), acyl halides (acid
      halides). Won't hit carbamic acid/ester, carbonic acid/ester.
Carbonyl with Nitrogen.
      [OX1]=CN
      Hits amide, carbamic acid/ester, poly peptide
Carbonyl with Oxygen.
      [CX3](=[OX1])O
      Hits ester, carboxylic acid, carbonic acid or ester, carbamic acid or ester, anhydride Won't hit aldehyde
      or ketone.
Acyl Halide
      [CX3](=[OX1])[F,Cl,Br,I]
      acid halide, -oyl halide
Aldehyde
      [CX3H1](=O)[#6]
      -al
Anhydride
      [CX3](=[OX1])[OX2][CX3](=[OX1])
Amide
      [NX3][CX3](=[OX1])[#6]
      -amide
Amidinium
      [NX3][CX3]=[NX3+]
Carbamate.
      [NX3,NX4+][CX3](=[OX1])[OX2,OX1-]
      Hits carbamic esters, acids, and zwitterions
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                   Page 3
of 25
Carbamic ester
       [NX3][CX3](=[OX1])[OX2H0]
Carbamic acid.
       [NX3,NX4+][CX3](=[OX1])[OX2H,OX1-]
       Hits carbamic acids and zwitterions.
Carboxylate Ion.
       [CX3](=O)[O-]
       Hits conjugate bases of carboxylic, carbamic, and carbonic acids.
Carbonic Acid or Carbonic Ester
       [CX3](=[OX1])(O)O
       Carbonic Acid, Carbonic Ester, or combination
Carbonic Acid or Carbonic Acid-Ester
       [CX3](=[OX1])([OX2])[OX2H,OX1H0-1]
       Hits acid and conjugate base. Won't hit carbonic acid diester
Carbonic Ester (carbonic acid diester)
       C[OX2][CX3](=[OX1])[OX2]C
       Won't hit carbonic acid or combination carbonic acid/ester
Carboxylic acid
       [CX3](=O)[OX2H1]
       -oic acid, COOH
Carboxylic acid or conjugate base.
       [CX3](=O)[OX1H0-,OX2H1]
Cyanamide
       [NX3][CX2]#[NX1]
Ester Also hits anhydrides
       [#6][CX3](=O)[OX2H0][#6]
       won't hit formic anhydride.
Ketone
       [#6][CX3](=O)[#6]
       -one


ether

Ether
        [OD2]([#6])[#6]


H
hydrogen atoms

Hydrogen Atom
       [H]
       Hits SMILES that are hydrogen atoms: [H+] [2H] [H][H]
Not a Hydrogen Atom
       [!#1]
       Hits SMILES that are not hydrogen atoms.
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                   Page 4
of 25
Proton
         [H+]
         Hits positively charged hydrogen atoms: [H+]


hydrogen count

Mono-Hydrogenated Cation
      [+H]
      Hits atoms that have a positive charge and exactly one attached hydrogen: F[C+](F)[H]
Not Mono-Hydrogenated
      [!H] or [!H1]
      Hits atoms that don't have exactly one attached hydrogen.


N
amide see carbonyl
mine (-amino)

Primary or secondary amine, not amide.
       [NX3;H2,H1;!$(NC=O)]
       Not ammonium ion (N must be 3-connected), not ammonia (H count can't be 3). Primary or secondary is
       specified by N's H-count (H2 &am p; H1 respectively). Also note that "&" (and) is the dafault opperator
       and is higher precedence that "," (or), which is higher preceden ce than ";" (and). Will hit cyanamides
       and thioamides
Enamine
       [NX3][CX3]=[CX3]
Primary amine, not amide.
       [NX3;H2;!$(NC=[!#6]);!$(NC#[!#6])][#6] Not amide (C not double bonded to a hetero-atom), not
       ammonium ion (N must be 3-connected), n ot ammonia (N's H-count can't be 3), not cyanamide (C not
       triple bonded to a hetero-atom)
Two primary or secondary amines
       [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
       Here we use the disconnection symbol (".") to match two separate unbonded identical patterns.
Enamine or Aniline Nitrogen
       [NX3][$(C=C),$(cc)]


amino acids

Generic amino acid: low specificity.
       [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
       For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s
       and specific residues w/in polypeptides (internal, or terminal).
Dipeptide group. generic amino acid: low specificity.
       [NX3H2,NH3X4+][CX4H]([*])[CX3](=[OX1])[NX3,NX4+][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-
       ]
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                 Page 5
of 25
      Won't hit pro or gly. Hits acids and conjugate bases.
Amino Acid
      [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
      Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard
      a.a. Won't work with Prolin e or Glycine, they have their own SMARTS (see side chain list). Hits acids
      and conjugate bases. Hits single a.a.s and specific residues w/i n polypeptides (internal, or terminal).
      {e.g. usage: Alanine side chain is [CH3X4] . Search is
      [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([ CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]}


amino acid side chains

Alanine side chain
        [CH3X4]
Arginine side chain.
        [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
        Hits acid and conjugate base.
Aspargine side chain.
        [CH2X4][CX3](=[OX1])[NX3H2]
        Also hits Gln side chain when used alone.
Aspartate (or Aspartic acid) side chain.
        [CH2X4][CX3](=[OX1])[OH0-,OH]
        Hits acid and conjugate base. Also hits Glu side chain when used alone.
Cysteine side chain.
        [CH2X4][SX2H,SX1H0-]
        Hits acid and conjugate base
Glutamate (or Glutamic acid) side chain.
        [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
        Hits acid and conjugate base.
Glycine
        [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
Histidine side chain.
        [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:
        [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
        Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected
        with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-
        connected with one H]) or (3-connected with one H).
Isoleucine side chain
        [CHX4]([CH3X4])[CH2X4][CH3X4]
Leucine side chain
        [CH2X4][CHX4]([CH3X4])[CH3X4]
Lysine side chain.
        [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
        Acid and conjugate base
Methionine side chain
        [CH2X4][CH2X4][SX2][CH3X4]
Phenylalanine side chain
        [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
Proline
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                               Page 6
of 25
        [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-
        ,N]
Serine side chain
        [CH2X4][OX2H]
Thioamide
        [NX3][CX3]=[SX1]
Threonine side chain
        [CHX4]([CH3X4])[OX2H]
Tryptophan side chain
        [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
Tyrosine side chain.
        [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
        Acid and conjugate base
Valine side chain
        [CHX4]([CH3X4])[CH3X4]
Alanine side chain
        [CH3X4]
Arginine side chain.
        [CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]
        Hits acid and conjugate base.
Aspargine side chain.
        [CH2X4][CX3](=[OX1])[NX3H2]
        Also hits Gln side chain when used alone.
Aspartate (or Aspartic acid) side chain.
        [CH2X4][CX3](=[OX1])[OH0-,OH]
        Hits acid and conjugate base. Also hits Glu side chain when used alone.
Cysteine side chain.
        [CH2X4][SX2H,SX1H0-]
        Hits acid and conjugate base
Glutamate (or Glutamic acid) side chain.
        [CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]
        Hits acid and conjugate base.
Glycine
        N[CX4H2][CX3](=[OX1])[O,N]
Histidine side chain.
        [CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]:
        [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1
        Hits acid & conjugate base for either Nitrogen. Note that the Ns can be either ([(Cationic 3-connected
        with one H) or (Neutral 2-connected without any Hs)] where there is a second-neighbor who is [3-
        connected
Isoleucine side chain
        [CHX4]([CH3X4])[CH2X4][CH3X4]
Leucine side chain
        [CH2X4][CHX4]([CH3X4])[CH3X4]
Lysine side chain.
        [CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]
        Acid and conjugate base
Methionine side chain
        [CH2X4][CH2X4][SX2][CH3X4]
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                           Page 7
of 25
Phenylalanine side chain
        [CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1
Proline
        N1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[O,N]
Serine side chain
        [CH2X4][OX2H]
Threonine side chain
        [CHX4]([CH3X4])[OX2H]
Tryptophan side chain
        [CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12
Tyrosine side chain.
        [CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1
        Acid and conjugate base
Valine side chain
        [CHX4]([CH3X4])[CH3X4]


azide (-azido)

Azide group.
       [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
       Hits any atom with an attached azide.
Azide ion.
       [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
       Hits N in azide ion


azo

Nitrogen.
       [#7]
       Nitrogen in N-containing compound. aromatic or aliphatic. Most general interpretation of "azo"
Azo Nitrogen. Low specificity.
       [NX2]=N
       Hits diazene, azoxy and some diazo structures
Azo Nitrogen.diazene
       [NX2]=[NX2]
       (diaza alkene)
Azoxy Nitrogen.
       [$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]
Diazo Nitrogen
       [$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]
Azole.
       [$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]
       5 member aromatic heterocycle w/ 2double bonds. contains N & another non C (N,O,S) subclasses are
       furo-, thio-, pyrro- (replac e CH o' furfuran, thiophene, pyrrol w/ N)


hydrazine
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                        Page 8
of 25
Hydrazine H2NNH2
      [NX3][NX3]


hydrazone

Hydrazone C=NNH2
      [NX3][NX2]=[*]


imine

Substituted imine
       [CX3;$([C]([#6])[#6]),$([CH][#6])]=[NX2][#6]
       Schiff base
Substituted or un-substituted imine
       [$([CX3]([#6])[#6]),$([CX3H][#6])]=[$([NX2][#6]),$([NX2H])]
Iminium
       [NX3+]=[CX3]


imide

Unsubstituted dicarboximide
       [CX3](=[OX1])[NX3H][CX3](=[OX1])
Substituted dicarboximide
       [CX3](=[OX1])[NX3H0]([#6])[CX3](=[OX1])
Dicarboxdiimide
       [CX3](=[OX1])[NX3H0]([NX3H0]([CX3](=[OX1]))[CX3](=[OX1]))[CX3](=[OX1])


nitrate

Nitrate group
        [$([NX3](=[OX1])(=[OX1])O),$([NX3+]([OX1-])(=[OX1])O)]
        Also hits nitrate anion
Nitrate Anion
        [$([OX1]=[NX3](=[OX1])[OX1-]),$([OX1]=[NX3+]([OX1-])[OX1-])]


nitrile

Nitrile
        [NX1]#[CX2]
Isonitrile
        [CX1-]#[NX2+]
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                              Page 9
of 25
nitro

Nitro group.
       [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8] Hits both forms.
Two Nitro groups
       [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8].[$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]


nitroso

Nitroso-group
       [NX2]=[OX1]


n-oxide

N-Oxide
      [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]
      Hits both forms. Won't hit azoxy, nitro, nitroso,or nitrate.


O
hydroxyl (includes alcohol, phenol)

Hydroxyl
       [OX2H]
Hydroxyl in Alcohol
       [#6][OX2H]
Hydroxyl in Carboxylic Acid
       [OX2H][CX3]=[OX1]
Hydroxyl in H-O-P-
       [OX2H]P
Enol
       [OX2H][#6X3]=[#6]
Phenol
       [OX2H][cX3]:[c]
Enol or Phenol
       [OX2H][$(C=C),$(cc)]
Hydroxyl_acidic
       [$([OH]-*=[!#6])]
       An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this
       includes carboxylic, sulphur, pho sphorous, halogen and nitrogen oxyacids.


peroxide

Peroxide groups.
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                             Page
10 of 25
          [OX2,OX1-][OX2,OX1-]
          Also hits anions.


P
phosphoric compounds

Phosphoric_acid groups.
      [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-
      ]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX 2H]),$([OX1-
      ]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
      Hits both depiction forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit
      monophosphoric acid anhydride esters (including acidic mono- & di- esters) but will hit some
      polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longer, di- esters on
      linear triphosphoric acid and longer).
Phosphoric_ester groups.
      [$(P(=[OX1])([OX2][#6])([$([OX2H]),$([OX1-]),$([OX2][#6])])[$([OX2H]),$([OX1-
      ]),$([OX2][#6]),$([OX2]P)]),$([P+]([OX1-])([OX2][#6])([ $([OX2H]),$([OX1-
      ]),$([OX2][#6])])[$([OX2H]),$([OX1-]),$([OX2][#6]),$([OX2]P)])]
      Hits both depiction forms. Doesn't hit non-ester phosphoric_acid groups.


S
thio groups ( thio-, thi-, sulpho-, mercapto- )

Carbo-Thiocarboxylate
        [S-][CX3](=S)[#6]
Carbo-Thioester
        S([#6])[CX3](=O)[#6]
Thio analog of carbonyl
        [#6X3](=[SX1])([!N])[!N]
        Where S replaces O. Not a thioamide.
Thiol, Sulfide or Disulfide Sulfur
        [SX2]
Thiol
        [#16X2H]
Sulfur with at-least one hydrogen.
        [#16!H0]
Thioamide
        [NX3][CX3]=[SX1]


sulfide

Sulfide
          [#16X2H0]
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                  Page
11 of 25
        -alkylthio Won't hit thiols. Hits disulfides.
Mono-sulfide
        [#16X2H0][!#16]
        alkylthio- or alkoxy- Won't hit thiols. Won't hit disulfides.
Di-sulfide
        [#16X2H0][#16X2H0]
        Won't hit thiols. Won't hit mono-sulfides.
Two Sulfides
        [#16X2H0][!#16].[#16X2H0][!#16]
        Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.


sulfinate

Sulfinate
        [$([#16X3](=[OX1])[OX2H0]),$([#16X3+]([OX1-])[OX2H0])]
        Won't hit Sulfinic Acid. Hits Both Depiction Forms.
Sulfinic Acid
        [$([#16X3](=[OX1])[OX2H,OX1H0-]),$([#16X3+]([OX1-])[OX2H,OX1H0-])]
        Won't hit substituted Sulfinates. Hits Both Depiction Forms. Hits acid and conjugate base (sulfinate).


sulfone

Sulfone. Low specificity.
       [$([#16X4](=[OX1])=[OX1]),$([#16X4+2]([OX1-])[OX1-])]
       Hits all sulfones, including heteroatom-substituted sulfones: sulfonic acid, sulfonate, sulfuric acid mono-
       & di- esters, sulfam ic acid, sulfamate, sulfonamide... Hits Both Depiction Forms.
Sulfone. High specificity.
       [$([#16X4](=[OX1])(=[OX1])([#6])[#6]),$([#16X4+2]([OX1-])([OX1-])([#6])[#6])]
       Only hits carbo- sulfones (Won't hit herteroatom-substituted molecules). Hits Both Depiction Forms.
Sulfonic acid. High specificity.
       [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-
       ])([#6])[OX2H,OX1H0-])]
       Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate
       base. Hits Both Depiction Forms. Hits Arene sulfonic acids.
Sulfonate
       [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H0]),$([#16X4+2]([OX1-])([OX1-])([#6])[OX2H0])]
       (sulfonic ester) Only hits carbon-substituted sulfur (Oxygen may be herteroatom-substituted). Hits Both
       Depiction Forms.
Sulfonamide.
       [$([#16X4]([NX3])(=[OX1])(=[OX1])[#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[#6])]
       Only hits carbo- sulfonamide. Hits Both Depiction Forms.
Carbo-azosulfone
       [SX4](C)(C)(=O)=N
       Partial N-Analog of Sulfone
Sulfonamide
       [$([SX4](=[OX1])(=[OX1])([!O])[NX3]),$([SX4+2]([OX1-])([OX1-])([!O])[NX3])]
       (sulf drugs) Won't hit sulfamic acid or sulfamate. Hits Both Depiction Forms.
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                   Page
12 of 25



sulfoxide

Sulfoxide Low specificity.
       [$([#16X3]=[OX1]),$([#16X3+][OX1-])]
       ( sulfinyl, thionyl ) Analog of carbonyl where S replaces C. Hits all sulfoxides, including heteroatom-
       substituted sulfoxides, dialkylsulfoxides carbo-sulfoxides, sulfinate, sulfinic acids... Hits Both Depiction
       Forms. Won't hit sulfones.
Sulfoxide High specificity
       [$([#16X3](=[OX1])([#6])[#6]),$([#16X3+]([OX1-])([#6])[#6])]
       (sulfinyl , thionyl) Analog of carbonyl where S replaces C. Only hits carbo-sulfoxides (Won't hit
       herteroatom-substituted molecules). Hits Both Depiction Forms. Won't hit sulfones.


sulfate

Sulfate
        [$([#16X4](=[OX1])(=[OX1])([OX2H,OX1H0-])[OX2][#6]),$([#16X4+2]([OX1-])([OX1-
        ])([OX2H,OX1H0-])[OX2][#6])]
        (sulfuric acid monoester) Only hits when oxygen is carbon-substituted. Hits acid and conjugate base.
        Hits Both Depiction Forms.
Sulfuric acid ester (sulfate ester) Low specificity.
        [$([SX4](=O)(=O)(O)O),$([SX4+2]([O-])([O-])(O)O)]
        Hits sulfuric acid, sulfuric acid monoesters (sulfuric acids) and diesters (sulfates). Hits acid and
        conjugate base. Hits Both Depiction Forms. i
Sulfuric Acid Diester.
        [$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][#6]),$([#16X4](=[OX1])(=[OX1])([OX2][#6])[OX2][
        #6])]
        Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.


sulfamate

Sulfamate.
      [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2][#6]),$([#16X4+2]([NX3])([OX1-])([OX1-])[OX2][#6])]
      Only hits when oxygen is carbon-substituted. Hits Both Depiction Forms.
Sulfamic Acid.
      [$([#16X4]([NX3])(=[OX1])(=[OX1])[OX2H,OX1H0-]),$([#16X4+2]([NX3])([OX1-])([OX1-
      ])[OX2H,OX1H0-])]
      Hits acid and conjugate base. Hits Both Depiction Forms.


sulfene

Sulfenic acid.
       [#16X2][OX2H,OX1H0-]
       Hits acid and conjugate base.
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                 Page
13 of 25
Sulfenate.
       [#16X2][OX2H0]


X
halide (-halo -fluoro -chloro -bromo -iodo)

Any carbon attached to any halogen
      [#6][F,Cl,Br,I]
Halogen
      [F,Cl,Br,I]
Three_halides groups
      [F,Cl,Br,I].[F,Cl,Br,I].[F,Cl,Br,I]
      Hits SMILES that have three halides.


acyl halide

Acyl Halide
      [CX3](=[OX1])[F,Cl,Br,I]
      (acid halide, -oyl halide)


3. Gross Structual Features


 Chirality     Orbital Configuration      Connectivity      Chains & Branching       Rotation     Cyclic Features



Chirality
Specified chiral carbon.
       [$([#6X4@](*)(*)(*)*),$([#6X4@H](*)(*)*)]
       Matches carbons whose chirality is specified (clockwise or anticlockwise) Will not match molecules
       whose chirality is unspecified b ut that could otherwise be considered chiral. Also,therefore won't match
       molecules that would be chiral due to an implicit connection (i.e.i mplicit H).
"No-conflict" chiral match
       C[C@?](F)(Cl)Br
       Will match molecules with chiralities as specified or unspecified.
"No-conflict" chiral match where an H is present
       C[C@?H](Cl)Br
       Will match molecules with chiralities as specified or unspecified.
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                   Page
14 of 25

Orbital Configuration
sp2 cationic carbon
        [$([cX2+](:*):*)]
        Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
Aromatic sp2 carbon.
        [$([cX3](:*):*),$([cX2+](:*):*)]
        The first recursive SMARTS matches carbons that are three-connected, the second case matches two-
        connected carbons (i.e cations with a free electron in a non-bonding sp2 hybrid orbital)
Any sp2 carbon.
        [$([cX3](:*):*),$([cX2+](:*):*),$([CX3]=*),$([CX2+]=*)]
        The first recursive SMARTS matches carbons that are three-connected and aromatic. The second case
        matches two-connected aromatic ca rbons (i.e cations with a free electron in a non-bonding sp2 hybrid
        orbital). The third case matches three-connected non-aromatic carbons ( alkenes). The fourth case
        matches non-aromatic cationic alkene carbons.
Any sp2 nitrogen.
        [$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-*)=*),$([#7X3+H]=*)]
        Can be aromatic 3-connected with 2 aromatic bonds (eg pyrrole,Pyridine-N-oxide), aromatic 2-
        connected with 2 aromatic bonds (and a f ree pair of electrons in a nonbonding orbital, e.g.Pyridine),
        either aromatic or non-aromatic 2-connected with a double bond (and a free pai r of electrons in a
        nonbonding orbital, e.g. C=N ), non aromatic 3-connected with 2 double bonds (e.g. a nitro group; this
        form does not exi st in reality, SMILES can represent the charge-separated resonance structures as a
        single uncharged structure), either aromatic or non-aroma tic 3-connected cation w/ 1 single bond and 1
        double bond (e.g. a nitro group, here the individual charge-separated resonance structures are specified),
        either aromatic or non-aromatic 3-connected hydrogenated cation with a double bond (as the previous
        case but R is hydrogen), r espectively.
Explicit Hydrogen on sp2-Nitrogen
        [$([#1X1][$([nX3](:*):*),$([nX2](:*):*),$([#7X2]=*),$([NX3](=*)=*),$([#7X3+](-
        *)=*),$([#7X3+H]=*)])]
        (H must be an isotope or ion)
sp3 nitrogen
        [$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)]
        One atom that is (a 4-connected N cation or a 3-connected N) and is not double bonded and is not
        aromatically bonded.
Explicit Hydrogen on an sp3 N.
        [$([#1X1][$([NX4+]),$([NX3]);!$(*=*)&!$(*:*)])]
        One atom that is a 1-connected H that is bonded to an sp3 N. (H must be an isotope or ion)
sp2 N in N-Oxide
        [$([$([NX3]=O),$([NX3+][O-])])]
sp3 N in N-Oxide Exclusive:
        [$([$([NX4]=O),$([NX4+][O-])])]
        Only hits if O is explicitly present. Won't hit if * is in SMILES in place of O.
sp3 N in N-Oxide Inclusive:
        [$([$([NX4]=O),$([NX4+][O-,#0])])]
        Hits if O could be present. Hits if * if used in place of O in smiles.


Connectivity
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                 Page
15 of 25
Quaternary Nitrogen
        [$([NX4+]),$([NX4]=*)]
        Hits non-aromatic Ns.
Tricoordinate S double bonded to N.
        [$([SX3]=N)]
S double-bonded to Carbon
        [$([SX1]=[#6])]
        Hits terminal (1-connected S)
Triply bonded N
        [$([NX1]#*)]
Divalent Oxygen
        [$([OX2])]


Chains & Branching
Unbranched_alkane groups.
       [R0;D2][R0;D2][R0;D2][R0;D2]
       Only hits alkanes (single-bond chains). Only hits chains of at-least 4 members. All non-(implicit-
       hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
Unbranched_chain groups.
       [R0;D2]~[R0;D2]~[R0;D2]~[R0;D2]
       Hits any bond (single, double, triple). Only hits chains of at-least 4 members. All non-(implicit-
       hydrogen) atoms count as branches (e.g. halide substituted chains count as branched).
Long_chain groups.
       [AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]~[AR0]
       Aliphatic chains at-least 8 members long.
Atom_fragment
       [!$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
       (CLOGP definition) A fragment atom is a not an isolating carbon
Carbon_isolating
       [$([#6+0]);!$(C(F)(F)F);!$(c(:[!c]):[!c])!$([#6]=,#[!#6])]
       This definition is based on that in CLOGP, so it is a charge-neutral carbon, which is not a CF3 or an
       aromatic C between two aromati c hetero atoms eg in tetrazole, it is not multiply bonded to a hetero
       atom.
Terminal S bonded to P
       [$([SX1]~P)]
Nitrogen on -N-C=N-
       [$([NX3]C=N)]
Nitrogen on -N-N=C-
       [$([NX3]N=C)]
Nitrogen on -N-N=N-
       [$([NX3]N=N)]
Oxygen in -O-C=N-
       [$([OX2]C=N)]


Rotation
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                               Page
16 of 25
Rotatable bond
       [!$(*#*)&!D1]-!@[!$(*#*)&!D1]
       An atom which is not triply bonded and not one-connected i.e.terminal connected by a single non-ring
       bond to and equivalent atom. No te that logical operators can be applied to bonds ("-&!@"). Here, the
       overall SMARTS consists of two atoms and one bond. The bond is "si ngle and not ring". *#* any atom
       triple bonded to any atom. By enclosing this SMARTS in parentheses and preceding with $, this enables
       us t o use $(*#*) to write a recursive SMARTS using that string as an atom primitive. The purpose is to
       avoid bonds such as c1ccccc1-C#C which wo uld be considered rotatable without this specification.


Cyclic Features
Bicyclic
        [$([*R2]([*R])([*R])([*R]))].[$([*R2]([*R])([*R])([*R]))]
        Bicyclic compounds have 2 bridgehead atoms with 3 arms connecting the bridgehead atoms.
Ortho
        *-!:aa-!:*
        Ortho-substituted ring
Meta
        *-!:aaa-!:*
        Meta-substituted ring
Para
        *-!:aaaa-!:*
        Para-substituted ring
Acylic-bonds
        *!@*
Single bond and not in a ring
        *-!@*
Non-ring atom
        [R0] or [!R]
Macrocycle groups.
        [r;!r3;!r4;!r5;!r6;!r7]
S in aromatic 5-ring with lone pair
        [sX2r5]
Aromatic 5-Ring O with Lone Pair
        [oX2r5]
N in 5-sided aromatic ring
        [nX2r5]
Spiro-ring center
        [X4;R2;r4,r5,r6](@[r4,r5,r6])(@[r4,r5,r6])(@[r4,r5,r6])@[r4,r5,r6]rings size 4-6
N in 5-ring arom
        [$([nX2r5]:[a-]),$([nX2r5]:[a]:[a-])] anion
CIS or TRANS double bond in a ring
        */,\[R]=;@[R]/,\*
        An isomeric SMARTS consisting of four atoms and three bonds.
CIS or TRANS double or aromatic bond in a ring
        */,\[R]=,:;@[R]/,\*
Unfused benzene ring
        [cR1]1[cR1][cR1][cR1][cR1][cR1]1
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                    Page
17 of 25
       To find a benzene ring which is not fused, we write a SMARTS of 6 aromatic carbons in a ring where
       each atom is only in one ring:
Multiple non-fused benzene rings
       [cR1]1[cR1][cR1][cR1][cR1][cR1]1.[cR1]1[cR1][cR1][cR1][cR1][cR1]1
Fused benzene rings
       c12ccccc1cccc2


4. Meta-SMARTS


        Amino Acids                            Recursive or Multiple                           Tools &Tricks



Amino Acids
Generic amino acid: low specificity.
        [NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]
        For use w/ non-standard a.a. search. hits pro but not gly. Hits acids and conjugate bases. Hits single a.a.s
        and specific residues w/in polypeptides (internal, or terminal).
A.A. Template for 20 standard a.a.s
        [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,O
        X1-,N]),
        $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX 4H2][CX3](=[OX1])[OX2H,OX1-
        ,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])]
        Pro, Gly, Other. Replace * w/ the entire 18_standard_side_chains list to get "any standard a.a." Hits
        acids and conjugate bases. Hits single a.a.s and specific residues w/in polypeptides (internal, or
        terminal).
Proline
        [$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-
        ,N]
Glycine
        [$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N])]
Other a.a.
        [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]
        Replace * w/ a specific a.a. side chain from the 18_standard_side_chains list to hit a specific standard
        a.a. Won't work with Prolin e or Glycine, they have their own SMARTS (see side chain list). Hits acids
        and conjugate bases. Hits single a.a.s and specific residues w/i n polypeptides (internal, or terminal).
           Example usage:
           Alanine side chain is [CH3X4]
           Alanine Search is
        [$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([CH3X4])[CX3](=[OX1])[OX2H,OX1-,N]
18_standard_aa_side_chains.
        ([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),
        $([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),
        $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),
        $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                Page
18 of 25
       [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),
       $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),
       $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
       $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),
       $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][c
       X3]12),
       $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-
       ])[cX3H][cX3H]1),$([CHX4]([CH3X4])[CH3X4])])
       Can be any of the standard 18 (Pro & Gly are treated separately) Hits acids and conjugate bases.
N in Any_standard_amino_acid.
       [$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3]
       (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3]
       (=[OX1])[OX2H,OX1-,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),
       $([CH2X4][CH2X4][CH2X4][NHX3][CH0X3](=[NH2X3+,NHX2+0])[NH2X3]),$
       ([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[OH0-,OH]),
       $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),
       $([CH2X4][#6X3]1:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
       [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:[#6X3H]1),
       $([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3X4]),
       $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
       $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),
       $([CHX4]([CH3X4])[OX2H]),$([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][c
       X3]12),
       $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),
       $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])]
       Format is A.A.Template for 20 standard a.a.s. where * is replaced by the entire 18_standard_side_chains
       list (or'd together). A gen eric amino acid with any of the 18 side chains or, proline or glycine. Hits
       "standard" amino acids that have terminally appended groups (i.e . "standard" refers to the side chains).
       (Pro, Gly, or 18 normal a.a.s.) Hits single a.a.s and specific residues w/in polypeptides (intern al, or
       terminal).
Non-standard amino acid.
       [$([NX3,NX4+][CX4H]([*])[CX3](=[OX1])[O,N]);!$([$([$([NX3H,NX4H2+]),
       $([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-,N]),
       $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][CX3](=[OX1])[OX2H,OX1-,N]),
       $([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([$([CH3X4]),$([CH2X4][CH2X4][CH2X4][NHX
       3][CH0X3]
       (=[NH2X3+,NHX2+0])[NH2X3]),$([CH2X4][CX3](=[OX1])[NX3H2]),$([CH2X4][CX3](=[OX1])[O
       H0-,OH]),
       $([CH2X4][SX2H,SX1H0-]),$([CH2X4][CH2X4][CX3](=[OX1])[OH0-,OH]),$([CH2X4][#6X3]1:
       [$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),$([#7X3H])]:
       [#6X3H]:[$([#7X3H+,#7X2H0+0]:[#6X3H]:[#7X3H]),
       $([#7X3H])]:[#6X3H]1),$([CHX4]([CH3X4])[CH2X4][CH3X4]),$([CH2X4][CHX4]([CH3X4])[CH3
       X4]),
       $([CH2X4][CH2X4][CH2X4][CH2X4][NX4+,NX3+0]),$([CH2X4][CH2X4][SX2][CH3X4]),
       $([CH2X4][cX3]1[cX3H][cX3H][cX3H][cX3H][cX3H]1),$([CH2X4][OX2H]),$([CHX4]([CH3X4])[O
       X2H]),
       $([CH2X4][cX3]1[cX3H][nX3H][cX3]2[cX3H][cX3H][cX3H][cX3H][cX3]12),
       $([CH2X4][cX3]1[cX3H][cX3H][cX3]([OHX2,OH0X1-])[cX3H][cX3H]1),
       $([CHX4]([CH3X4])[CH3X4])])[CX3](=[OX1])[OX2H,OX1-,N])])]
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                    Page
19 of 25
          Generic amino acid but not a "standard" amino acid ("standard" refers to the 20 normal side chains).
          Won't hit amino acids that are non-standard due solely to the fact that groups are terminally-appended to
          the polypeptide chain (N or C term). format is [$(generic a.a.); !$(not a standard one)] Hits single a.a.s
          and specific residues w/in polypeptides (internal, or terminal).


Recursive or Multiple
Recursive SMARTS: Atoms connected to particular SMARTS

Ortho
          [SMARTS_expression]-!:aa-!:[SMARTS_expression]
Meta
          [SMARTS_expression]-!:aaa-!:[SMARTS_expression]
Para
       [SMARTS_expression]-!:aaaa-!:[SMARTS_expression]
Hydrogen
       [$([#1][SMARTS_expression])]
       Hydrogen must be explicit i.e. an isotope or charged
Nitrogen
       [$([#7][SMARTS_expression])]
Oxygen
       [$([#8][SMARTS_expression])]
Fluorine
       [$([#9][SMARTS_expression])]


Recursive SMARTS: Multiple groups

Two possible groups
       [$(SMARTS_expression_A),$(SMARTS_expression_B)]
       Hits atoms in either environment or group of interest, A or B.
         Example usages:
         Azide group is : [$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]
         Azide ion is: [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
         Azide or azide ion is: [$([$(*-[NX2-]-[NX2+]#[NX1]),$(*-[NX2]=[NX2+]=[NX1-])]),$([$([NX1-
       ]=[NX2+]=[NX1-]),$( [NX1]#[NX2+]-[NX1-2])])]
Recursive SMARTS
       [$([atom_that_gets_hit][other_atom][other_atom])]
       Hits first atom within parenthesis Example usages:
         [$([CX3]=[OX1])] hits Carbonyl Carbon [$([OX1]=[CX3])] hits Carbonyl Oxygen


Single only, Double only, Single or Double

Sulfide
          [#16X2H0]
          (-alkylthio) Won't hit thiols. Hits disulfides too.
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                              Page
20 of 25
Mono-sulfide
        [#16X2H0][!#16]
        (alkylthio- or alkoxy-) R-S-R Won't hit thiols. Won't hit disulfides.
Di-sulfide
        [#16X2H0][#16X2H0]
        Won't hit thiols. Won't hit mono-sulfides.
Two sulfides
        [#16X2H0][!#16].[#16X2H0][!#16]
        Won't hit thiols. Won't hit mono-sulfides. Won't hit disulfides.
Acid/conj-base
        [OX2H,OX1H0-]
        Hits acid and conjugate base. acid/base
Non-acid Oxygen
        [OX2H0]
Acid/base
        [H1,H0-]
        Works for any atom if base form has no Hs & acid has only one.


Muntiple Disconnected Groups

Two disconnected SMARTS fragments
      ([Cl!$(Cl~c)].[c!$(c~Cl)])
      A molecule that contains a chlorine and an aromatic carbon but which are not connected to each other.
      Uses component-level SMARTS. B oth SMARTS fragments must be in the same SMILES target
      fragment.
Two disconnected SMARTS fragments
      ([Cl]).([c])
      Hits SMILES that contain a chlorine and an aromatic carbon but which are in different SMILES
      fragments.
Two not-necessarily connected SMARTS fragments
      ([Cl].[c])
      Uses component-level SMARTS. Both SMARTS fragments must be in the same SMILES target
      fragment.
Two not-necessarily connected fragments
      ([SMARTS_expression]).([SMARTS_expression])
      Uses component-level SMARTS. SMARTS fragments are each in different SMILES target fragments.
Two primary or secondary amines
      [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)]
      Here we use the "disconnection" symbol (".") to match two separate not-necessarily bonded identical
      patterns.


Tools &Tricks
Alternative/Equivalent Representations

Any carbon aromatic or non-aromatic
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                Page
21 of 25
       [#6] or [c,C]
SMILES wildcard
       [#0]
       This SMARTS hits the SMILES *
Factoring
       [OX2,OX1-][OX2,OX1-] or [O;X2,X1-][O;X2,X1-]
       Factor out common atomic expressions in the recursive SMARTS. May improve human readability.
High-precidence "and"
       [N&X4&+,N&X3&+0] or [NX4+,NX3+0]
       High-precidence "and" (&) is the default logical operator. "Or" (,) is higher precidence than & and low-
       precidence "and" (;) is lower precidence than &.


Hydrogens

Any atom w/ at-least 1 H
       [*!H0,#1]
       In SMILES and SMARTS, Hydrogen is not considered an atom (unless it is specified as an isotope).
       The hydrogen count is instead consi dered a property of an atom. This SMARTS provides a way to
       effectively hit Hs themselves.
Hs on Carbons
       [#6!H0,#1]
Atoms w/ 1 H
       [H,#1]


5. Electron & Proton Features


       Acids & Bases            Charge                  H-bond Donors & Acceptors                      Radicals



Acids & Bases
Acid
      [!H0;F,Cl,Br,I,N+,$([OH]-*=[!#6]),+]
      Proton donor
Carboxylic acid
      [CX3](=O)[OX2H1]
      (-oic acid, COOH)
Carboxylic acid or conjugate base.
      [CX3](=O)[OX1H0-,OX2H1]
Hydroxyl_acidic
      [$([OH]-*=[!#6])]
      An acidic hydroxyl is a hydroxyl bonded to an atom which is multiply bonded to a hetero atom, this
      includes carboxylic, sulphur, pho sphorous, halogen and nitrogen oxyacids
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                Page
22 of 25
Phosphoric_Acid
       [$(P(=[OX1])([$([OX2H]),$([OX1-]),$([OX2]P)])([$([OX2H]),$([OX1-
       ]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)]),$([P+]([OX1-])([$([OX 2H]),$([OX1-
       ]),$([OX2]P)])([$([OX2H]),$([OX1-]),$([OX2]P)])[$([OX2H]),$([OX1-]),$([OX2]P)])]
       Hits both forms. Hits orthophosphoric acid and polyphosphoric acid anhydrides. Doesn't hit
       monophosphoric acid anhydride esters (in cluding acidic mono- & di- esters) but will hit some
       polyphosphoric acid anhydride esters (mono- esters on pyrophosphoric acid and longe r, di- esters on
       linear triphosphoric acid and longer). Hits acid and conjugate base.
Sulfonic Acid. High specificity.
       [$([#16X4](=[OX1])(=[OX1])([#6])[OX2H,OX1H0-]),$([#16X4+2]([OX1-])([OX1-
       ])([#6])[OX2H,OX1H0-])]
       Only hits carbo- sulfonic acids (Won't hit herteroatom-substituted molecules). Hits acid and conjugate
       base. Hits Both Depiction Fo rms. Hits Arene sulfonic acids.
Acyl Halide
       [CX3](=[OX1])[F,Cl,Br,I]
       (acid halide, -oyl halide)


Charge
Anionic divalent Nitrogen
        [NX2-]
Oxenium Oxygen
        [OX2H+]=*
Oxonium Oxygen
        [OX3H2+]
Carbocation
        [#6+]
sp2 cationic carbon.
        [$([cX2+](:*):*)]
        Aromatic cationic sp2 carbon with a free electron in a non-bonding sp2 hybrid orbital
Azide ion.
        [$([NX1-]=[NX2+]=[NX1-]),$([NX1]#[NX2+]-[NX1-2])]
        Hits N in azide ion
Zwitterion High Specificity
        [+1]~*~*~[-1]
        +1 charged atom separated by any 3 bonds from a -1 charged atom.
Zwitterion Low Specificity, Crude
        [$([!-0!-1!-2!-3!-4]~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-
        3!-4]~*~*~*~[!+0!+1!+2!+3!+4]),$([ !-0!-1!-2!-3!-4]~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-
        4]~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~*~*~[!+0!+1!+2!+3!+ 4]),$([!-0!-1!-
        2!-3!-4]~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-
        4]~*~*~*~*~*~*~*~*~[!+0!+1!+2!+3!+4]),$([!-0!-1!-2!-3!-4]~*~*~*~*~
        *~*~*~*~*~[!+0!+1!+2!+3!+4])]
        Variously charged moieties separated by up to ten bonds.
Zwitterion Low Specificity
        ([!-0!-1!-2!-3!-4].[!+0!+1!+2!+3!+4])
        Variously charged moieties that are within the same molecule but not-necessarily connected. Uses
        component-level grouping.
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                                 Page
23 of 25



H-bond Donors & Acceptors
Hydrogen-bond acceptor
       [#6,#7;R0]=[#8]
       Only hits carbonyl and nitroso. Matches a 2-atom pattern consisting of a carbon or nitrogen not in a ring,
       double bonded to an oxyge n.
Hydrogen-bond acceptor
       [!$([#6,F,Cl,Br,I,o,s,nX3,#7v5,#15v5,#16v4,#16v6,*+1,*+2,*+3,])]
       A H-bond acceptor is a heteroatom with no positive charge, note that negatively charged oxygen or
       sulphur are included. Excluded are halogens, including F, heteroaromatic oxygen, sulphur and pyrrole
       N. Higher oxidation levels of N,P,S are excluded. Note P(III) is currentl y included. Zeneca's work
       would imply that (O=S=O) shoud also be excluded.
Hydrogen-bond donor.
       [!$([#6,H0,-,-2,-3])]
       A H-bond donor is a non-negatively charged heteroatom with at least one H
Hydrogen-bond donor.
       [!H0;#7,#8,#9]
       Must have an N-H bond, an O-H bond, or a F-H bond
Possible intramolecular H-bond
       [O,N;!H0]-*~*-*=[$([C,N;R0]=O)]
       Note that the overall SMARTS consists of five atoms. The fifth atom is defined by a "recursive
       SMARTS", where "$()" encloses a valid nested SMARTS and acts syntactically like an atom-primitive
       in the overall SMARTS. Multiple nesting is allowed.


Radicals
Carbon Free-Radical
       [#6;X3v3+0]
       Hits a neutral carbon with three single bonds.
Nitrogen Free-Radical
       [#7;X2v4+0]
       Hits a neutral nitrogen with two single bonds or with a single and a triple bond.


6. Breakdown of Complex SMARTS


<!--/a>
                   Amino Acid                                                Ester or Amide



Amino Acid
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                              Page
24 of 25
[$([$([NX3H,NX4H2+]),$([NX3](C)(C)(C))]1[CX4H]([CH2][CH2][CH2]1)[CX3](=[OX1])[OX2H,OX1-
,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H2][ CX3](=[OX1])[OX2H,OX1-
,N]),$([$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N])] i
[$(                            Proline
[                                  N:
$([                                   terminal
NX3H                                      neutral
,                                         or
NX4H2+])                                  + charged
,                                     or
$([NX3](C)(C)(C))]1                   internal
[CX4H]                             C: alpha
([CH2][CH2][CH2]1)                     pro side chain
[CX3]                              C: of COOH
(=[OX1])                           O: =O of COOH
[OX2H,OX1-,N]                      O: term COOH (neutral or -) or intern
),                             OR
$(                             Glycine
[                                  N:
$([                                   terminal
NX3H2                                    neutral
,                                        or
NX4H3+])                                 + charged
,                                     or
$([NX3H](C)(C))                       internal
[CX4H2]                            C: alpha (w/ H side chain)
[CX3]                              C: of COOH
(=[OX1])                           O: =O of COOH
[OX2H,OX1-,N]                      O: term COOH (neutral or -) or intern
),                             OR
$(                             Other amino acid
[                                  N:
$([                                   terminal
NX3H2                                    neutral
,                                        or
NX4H3+])                                 + charged
,                                     or
$([NX3H](C)(C))]                      internal
[CX4H]                             C: alpha
([*])                                  any side chain
[CX3]                              C: of COOH
(=[OX1])                           O: =O of COOH
[OX2H,OX1-,N]                      O: term COOH (neutral or -) or intern
)]



Ester or Amide
[#6][CX3](=O)[$([OX2H0]([#6])[#6]),$([#7])]
[#6]                      An atom that is a carbon
[CX3]                     Connected to an atom that is a three-connected carbon
(=O)                           Which is double bonded to an oxygen
[                         Connected to an atom
$(                             That is in an environment where
[OX2H0]                           An atom that is a two-connected oxygen, without hydrogens
([#6])[#6])                         Is connected to two carbons, one of them being the
carbonyl C
734b3cc9-5cd4-4f0b-bbc3-ae252acb1894.doc                                                              Page
25 of 25
,                                  Or
$(                                 That is in an environment where
[#7]                                  An atom is a nitrogen.
)]



7. Interesting Example SMARTS
Oxygen double bonded to aliphatic carbon or nitrogen, single bonded to an aromatic ring, with a halogen in
meta position
       [#8]=[C,N]-aaa[F,Cl,Br,I]
Aliphatic carbon attached to oxygen with any bond
       C~O
Oxygen or nitrogen, with at least one hydrogen attached and not in a ring
       [O,N;!H0;R0]
Oxygen double bonded to aliphatic carbon or nitrogen
       [#8]=[C,N] or O=[C,N]
Aliphatic atom single-bonded to any carbon which isn't a trifluromethyl carbon
       A[#6;!$(C(F)(F)F)]
PCB
       [$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]-[$(c:cCl),$(c:c:cCl),$(c:c:c:cCl)]
       Polychlorinated Biphenyls. Overall SMARTS is atom-bond-atom. Note that ":" is explicit aromatic
       bond, and "-" is explicit single bo nd. On each side of the single bond, we use three nested SMARTS to
       represent the ortho, meta, and para position.
Imidazolium Nitrogen
       [nX3r5+]:c:n
1-methyl-2-hydroxy benzene with either a Cl or H at the 5 position.
       [c;$([*Cl]),$([*H1])]1ccc(O)c(C)c1 or Cc1:c(O):c:c:[$(cCl),$([cH])]:c1
       The "H" primitive in SMARTS means "total number of attached hydrogens", i.e., [C] will match C in
       [CH4] methane, [CH3] methyl, [CH2] methylene, etc., [CH3] will only match methyl. This is similar to
       the use of "H" in SMILES to specify hydrogen count. The default value for the SMARTS "H" primitive
       is 1 (same as SMILES, e.g., [CH2]=[CH]-[OH] same as CC=O). This H-specification value includes all
       attached hydrogens: implicit and explicit (e.g., isotopic [2H]).
Nonstandard atom groups.
       [!#1;!#2;!#3;!#5;!#6;!#7;!#8;!#9;!#11;!#12;!#15;!#16;!#17;!#19;!#20;!#35;!#53]

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:4/5/2013
language:Unknown
pages:25