Document Sample
Variations Powered By Docstoc
					          Sequence Variation in Ensembl

1 of 25

          •   SNPs
          •   SNPs in Ensembl
          •   Linkage disequilibrium
          •   SNPs in BioMart
          •   DAS sources

2 of 25
          Single nucleotide polymorphisms
          • Two human genomes differ by

          • Polymorphism: a DNA variation in
            which each possible sequence is
            present in at least 1% of people

          • Most polymorphisms (~90%) take
            the forms of SNPs: variations that
            involve just one nucleotide
             • ~1 out of every 300 bases in the human
             • ~10 million in the human genome
3 of 25
                Functional Consequences

          • SNPs in coding area that    Cause of most monogenic
            alter aa sequence           disorders, e.g:
                                          Hemochromatosis (HFE)
                                          Cystic fibrosis (CFTR)
                                          Hemophilia (F8)

          • SNPs in coding areas that   May affect splicing
            don’t alter aa sequence

          • SNPs in promoter or         May affect the level, location or
            regulatory regions          timing of gene expression

          • SNPs in other regions       No direct known impact on
                                        phenotype, useful as markers

4 of 25
              Practical Applications

          • Disease diagnosis
          • Association studies
          • Pharmacogenomics
          • Forensic testing
          • Population genetics and
            evolutionary studies
          • Marker-assisted selection

5 of 25
          Practical Applications

6 of 25
                      SNPs in Ensembl
          • Most SNPs imported from dbSNP (rs……):
             • Imported data: alleles, flanking sequences, frequencies,
             • Calculated data: position, synonymous status, peptide
               shift, ….

          • For human also:
             •   HGVbase
             •   TSC
             •   Affy GeneChip 100K and 500K Mapping Array
             •   Affy Genome-Wide SNP array 6.0
             •   Ensembl-called SNPs (from Celera reads and Jim
                 Watson’s and Craig Venter’s genomes)

          • For mouse, rat, dog and chicken also:
             • Sanger- and Ensembl-called SNPs (other strains / breeds)
7 of 25
          • Central repository for simple genetic
             • single-base nucleotide substitutions
             • small-scale multi-base deletions or insertions
             • retroposable element insertions and microsatellite
               repeat variations


          • For human (dbSNP build 128):
             • 34,434,159 submissions (ss#’s)
             • 11,883,685 RefSNP clusters (rs#’s)
             • 6,262,709 validated
             •    737,679 with frequency
8 of 25
              SNPs in Ensembl - Types
          Non-synonymous          In coding sequence, resulting in an aa change
          Synonymous              In coding sequence, not resulting in an aa change
          Frameshift              In coding sequence, resulting in a frameshift
          Stop lost               In coding sequence, resulting in the loss of a stop codon
          Stop gained             In coding sequence, resulting in the gain of a stop codon

          Essential splice site   In the first 2 or the last 2 basepairs of an intron
          Splice site             1-3 bps into an exon or 3-8 bps into an intron

          Upstream                Within 5 kb upstream of the 5'-end of a transcript
          Regulatory region       In regulatory region annotated by Ensembl
          5' UTR                  In 5' UTR
          Intronic                In intron
          3' UTR                  In 3' UTR
          Downstream              Within 5 kb downstream of the 3'-end of a transcript
          Intergenic              More than 5 kb away from a transcript

9 of 25
           SNPs in Ensembl - Species

           •   Human    •   Platypus
           •   Chimp    •   Chicken
           •   Mouse    •   Zebrafish
           •   Rat      •   Tetraodon
           •   Dog      •   Mosquito
           •   Cow

10 of 25

           For human, mouse and rat Ensembl defines all
           SNP alleles respective to the + strand of the
           genome assembly! (to be able to merge dbSNP
           data with Sanger resequencing data)

           Those cases where SNPs are shown as part of a

11 of 25
                      5 MINUTE EXERCISE
           A missense SNP, C1858T, in PTPN22 (Tyrosine-protein
           phosphatase non-receptor type 22) has been identified as a
           genetic risk factor for rheumatoid arthritis.
           This SNP is also referred to as R620W.

           1.   Find the SNPView page for this SNP.

           2.   Why are the alleles on this page given as A/G?

           3.   What is the minor allele of this SNP in Caucasians?

12 of 25
           SNPs in Ensembl
             GeneSNPView (1)


                               InterPro domains

                                        SNP alleles

13 of 25
           SNPs in Ensembl
             GeneSNPView (2)

14 of 25
                    SNPs in Ensembl
                      TranscriptSNPView (1)

           Shows SNP alleles in different:

           • Individuals (human):
             Celera HuAA, HuCC, HuDD and HuFF,
             Craig Venter, Jim Watson

           • Strains (mouse, rat)

           • Breeds (chicken, dog)

15 of 25
                         SNPs in Ensembl
                          TranscriptSNPView (2)

           individuals                            Resequencing

                                                    SNP alleles

                                                          Alleles in
                                                    different individuals

16 of 25
           SNPs in Ensembl
            TranscriptSNPView (3)

17 of 25
                    5 MINUTE EXERCISE
           1.   Find the TranscriptSNPView page for human PTPN22.

           2.   Do all individuals (HuAA, HuCC, HuDD, HuFF, Venter
                and Watson) have resequence coverage at the
                position of the C1858T (R620W) SNP?

           3.   Has any of the individuals a higher risk to get
                rheumatoid arthritis based on its genotype at this

           4.   Is there an individual that is heterozygote at this

18 of 25
                Haplotypes and Linkage

           A haplotype is a set of SNPs on a single
           chromatid that are statistically associated

           Linkage disequilibrium describes a
           situation in which some combinations of
           SNP alleles occur more or less frequently
           in a population than would be expected
           from a random formation of haplotypes
           from alleles based on their frequencies

19 of 25
                         Measures of LD
           • D = P(AB) – P(A)P(B)
              • D ranges from – 0.25 to + 0.25
              • D = 0 indicates linkage equilibrium
              • dependent on allele frequencies, therefore of little use

           • D’ = D / maximum possible value
              • D’ = 1 indicates perfect LD
              • estimates of D’ strongly inflated in small samples

           • r2 = D2 / P(A)P(B)P(a)P(b)
              • r2 = 1 indicates perfect LD
              • measure of choice

20 of 25
           Linkage Disequilibrium


                              It is also possible
                              to export SNP
                              information for
                              upload into the
                              software tool

21 of 25
           Linkage Disequilibrium

22 of 25
                    5 MINUTE EXERCISE

           Retrieve all non-synonymous SNPs for the human
           CFTR gene using BioMart and export their id,
           genomic position, alleles and peptide shift
           (hint: which dataset should you start with?).

23 of 25
                           DAS Sources
           For human, data from the following DAS Sources can be
           visualised on ContigView:

           •   DGV and DGV loci:
               Structural variations from the Database of Genomic
               Variations (CNVs, InDels, inversions etc.)

           •   RedonCNV regions and RedonCNV loci:
               Copy number variations from Redon et al. paper

           •   SegDup Washu:
               Segmental Duplications, University of Washington

24 of 25
           Q U E S T I O N S

            A N S W E R S
25 of 25

Shared By: