Variations

Document Sample
Variations Powered By Docstoc
					          Sequence Variation in Ensembl




1 of 25
                          Outline

          •   SNPs
          •   SNPs in Ensembl
          •   Linkage disequilibrium
          •   SNPs in BioMart
          •   DAS sources




2 of 25
          Single nucleotide polymorphisms
                       (SNPs)
          • Two human genomes differ by
            ~0.1%

          • Polymorphism: a DNA variation in
            which each possible sequence is
            present in at least 1% of people

          • Most polymorphisms (~90%) take
            the forms of SNPs: variations that
            involve just one nucleotide
             • ~1 out of every 300 bases in the human
               genome
             • ~10 million in the human genome
3 of 25
                Functional Consequences

          • SNPs in coding area that    Cause of most monogenic
            alter aa sequence           disorders, e.g:
                                          Hemochromatosis (HFE)
                                          Cystic fibrosis (CFTR)
                                          Hemophilia (F8)

          • SNPs in coding areas that   May affect splicing
            don’t alter aa sequence

          • SNPs in promoter or         May affect the level, location or
            regulatory regions          timing of gene expression

          • SNPs in other regions       No direct known impact on
                                        phenotype, useful as markers

4 of 25
              Practical Applications

          • Disease diagnosis
          • Association studies
          • Pharmacogenomics
          • Forensic testing
          • Population genetics and
            evolutionary studies
          • Marker-assisted selection


5 of 25
          Practical Applications




6 of 25
                      SNPs in Ensembl
          • Most SNPs imported from dbSNP (rs……):
             • Imported data: alleles, flanking sequences, frequencies,
               ….
             • Calculated data: position, synonymous status, peptide
               shift, ….

          • For human also:
             •   HGVbase
             •   TSC
             •   Affy GeneChip 100K and 500K Mapping Array
             •   Affy Genome-Wide SNP array 6.0
             •   Ensembl-called SNPs (from Celera reads and Jim
                 Watson’s and Craig Venter’s genomes)

          • For mouse, rat, dog and chicken also:
             • Sanger- and Ensembl-called SNPs (other strains / breeds)
7 of 25
                                dbSNP
          • Central repository for simple genetic
            polymorphisms:
             • single-base nucleotide substitutions
             • small-scale multi-base deletions or insertions
             • retroposable element insertions and microsatellite
               repeat variations


          • http://www.ncbi.nlm.nih.gov/SNP/index.html

          • For human (dbSNP build 128):
             • 34,434,159 submissions (ss#’s)
             • 11,883,685 RefSNP clusters (rs#’s)
             • 6,262,709 validated
             •    737,679 with frequency
8 of 25
              SNPs in Ensembl - Types
          Non-synonymous          In coding sequence, resulting in an aa change
          Synonymous              In coding sequence, not resulting in an aa change
          Frameshift              In coding sequence, resulting in a frameshift
          Stop lost               In coding sequence, resulting in the loss of a stop codon
          Stop gained             In coding sequence, resulting in the gain of a stop codon

          Essential splice site   In the first 2 or the last 2 basepairs of an intron
          Splice site             1-3 bps into an exon or 3-8 bps into an intron

          Upstream                Within 5 kb upstream of the 5'-end of a transcript
          Regulatory region       In regulatory region annotated by Ensembl
          5' UTR                  In 5' UTR
          Intronic                In intron
          3' UTR                  In 3' UTR
          Downstream              Within 5 kb downstream of the 3'-end of a transcript
          Intergenic              More than 5 kb away from a transcript




9 of 25
           SNPs in Ensembl - Species

           •   Human    •   Platypus
           •   Chimp    •   Chicken
           •   Mouse    •   Zebrafish
           •   Rat      •   Tetraodon
           •   Dog      •   Mosquito
           •   Cow


10 of 25
                           Caveat

           For human, mouse and rat Ensembl defines all
           SNP alleles respective to the + strand of the
           genome assembly! (to be able to merge dbSNP
           data with Sanger resequencing data)

           Exceptions:
           Those cases where SNPs are shown as part of a
           sequence




11 of 25
                      5 MINUTE EXERCISE
           A missense SNP, C1858T, in PTPN22 (Tyrosine-protein
           phosphatase non-receptor type 22) has been identified as a
           genetic risk factor for rheumatoid arthritis.
           This SNP is also referred to as R620W.

           1.   Find the SNPView page for this SNP.

           2.   Why are the alleles on this page given as A/G?

           3.   What is the minor allele of this SNP in Caucasians?



12 of 25
           SNPs in Ensembl
             GeneSNPView (1)




                                  Transcript




                               InterPro domains




                                        SNP alleles



13 of 25
           SNPs in Ensembl
             GeneSNPView (2)




14 of 25
                    SNPs in Ensembl
                      TranscriptSNPView (1)


           Shows SNP alleles in different:

           • Individuals (human):
             Celera HuAA, HuCC, HuDD and HuFF,
             Craig Venter, Jim Watson

           • Strains (mouse, rat)

           • Breeds (chicken, dog)


15 of 25
                         SNPs in Ensembl
                          TranscriptSNPView (2)




            Different
           individuals                            Resequencing
                                                    coverage



                                                    SNP alleles


                                                          Alleles in
                                                    different individuals



16 of 25
           SNPs in Ensembl
            TranscriptSNPView (3)




17 of 25
                    5 MINUTE EXERCISE
           1.   Find the TranscriptSNPView page for human PTPN22.

           2.   Do all individuals (HuAA, HuCC, HuDD, HuFF, Venter
                and Watson) have resequence coverage at the
                position of the C1858T (R620W) SNP?

           3.   Has any of the individuals a higher risk to get
                rheumatoid arthritis based on its genotype at this
                position?

           4.   Is there an individual that is heterozygote at this
                position?


18 of 25
                Haplotypes and Linkage
                    Disequilibrium

           A haplotype is a set of SNPs on a single
           chromatid that are statistically associated

           Linkage disequilibrium describes a
           situation in which some combinations of
           SNP alleles occur more or less frequently
           in a population than would be expected
           from a random formation of haplotypes
           from alleles based on their frequencies


19 of 25
                         Measures of LD
           • D = P(AB) – P(A)P(B)
              • D ranges from – 0.25 to + 0.25
              • D = 0 indicates linkage equilibrium
              • dependent on allele frequencies, therefore of little use


           • D’ = D / maximum possible value
              • D’ = 1 indicates perfect LD
              • estimates of D’ strongly inflated in small samples


           • r2 = D2 / P(A)P(B)P(a)P(b)
              • r2 = 1 indicates perfect LD
              • measure of choice



20 of 25
           Linkage Disequilibrium

                                  LDView




                              It is also possible
                              to export SNP
                              information for
                              upload into the
                              HaploView
                              software tool



21 of 25
           Linkage Disequilibrium
                  LDTableView




22 of 25
                    5 MINUTE EXERCISE

           Retrieve all non-synonymous SNPs for the human
           CFTR gene using BioMart and export their id,
           genomic position, alleles and peptide shift
           (hint: which dataset should you start with?).




23 of 25
                           DAS Sources
           For human, data from the following DAS Sources can be
           visualised on ContigView:

           •   DGV and DGV loci:
               Structural variations from the Database of Genomic
               Variations (CNVs, InDels, inversions etc.)

           •   RedonCNV regions and RedonCNV loci:
               Copy number variations from Redon et al. paper

           •   SegDup Washu:
               Segmental Duplications, University of Washington



24 of 25
           Q U E S T I O N S

            A N S W E R S
25 of 25

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:7/25/2012
language:
pages:25