Online resources for genetic variation study-Part One by JKVN5u16

VIEWS: 5 PAGES: 101

									             Online Resources for
      Genetic Variation Study – Part One
              Workshop Attendees:
    Please complete the workshop sign-in form.
To help us develop bioinformatics workshops that
are more relevant to your research, please take our
        online User Needs Survey, thanks!
 NML Bioinformatics Service User Needs Survey:

 From the NML-Bioinformatics Web Page  Click
 the NML Support Requests under the “Support
 Request Section” Click the “this online user
 needs survey” under the “Tell us how to serve
 your information needs better! Section”.
       Online Resources for
Genetic Variation Study – Part One

             Yi-Bu Chen, Ph.D.
          Bioinformatics Specialist
           Norris Medical Library
      University of Southern California


              323-442-3309
       yibuchen@belen.hsc.usc.edu

                                      Dec. 6, 2007
                   Workshop Outline
 Overview of Bioinformatics Support Program at NML
 Human Genetic Variation Overview
    Main types of genetic variations
    Basics of the single nucleotide polymorphisms (SNPs)
 NCBI Genetic Variation Resources: dbSNP and OMIM
    dbSNP overview
    dbSNP search examples
    OMIM overview
 International HapMap Project
    The HapMap project: overview and major findings
    HapMap search examples
   The Perlegen Genetic Variation Database
   Genome Variation Server (SeattleSNPs)
   Ensembl SNPs
   Hands-on Search Question
Polymorphisms: How different are we?




  Human vs. Chimp                                           Human vs. Human
~96% overall (~99% similar in
                                                ~99.9% similar with around 3.2 million
      terms of SNPs)
                                                single nucleotide differences (account for
                                                up to 90% of all genomic variations, total
                                                     possible SNPs near 12 millions)
       Adapted from a lecture slide by Jonathan Wren, NYU
Why do we care about genetic variations?
 1. Genetic variations underlie
    phenotypic differences among
    different individuals


               2. Genetic variations determine our
               predisposition to complex diseases and responses
               to drugs and environmental factors



  3. Genetic variations reveal
     clues of ancestral human
     migration history
           Main Types of Genetic Variations
A. Single nucleotide mutation
    Resulting in single nucleotide polymorphisms (SNPs)
    Accounts for up to 90% of human genetic variations
    Majority of SNPs do NOT directly or significantly contribute to any phenotypes

B. Insertion or deletion of one or more nucleotide(s)
   1. Tandem repeat polymorphisms
    Tandem repeats are genomic regions consisting of variable length of sequence
     motifs repeating in tandem with variable copy number.
    Used as genetic markers for DNA finger printing (forensic, parentage testing)
    Many cause genetic diseases
       Microsatelites (Short Tandem Repeats): repeat unit 1-6 bases long
       Minisatelites: repeat unit 11-100 bases long

   2. Insertion/Deletion (INDEL or DIPS) polymorphisms
      Often resulted from localized rearrangements between homologous tandem
      repeats.

C. Gross chromosomal aberration
    Deletions, inversions, or translocation of large DNA fragments
    Rare but often causing serious genetic diseases
  How many variations are present
       in human genome?
 SNPs appear once per 0.1-1 kb interval or on average 1
  per 300 bp. Considering the size of entire human
  genome (3.2 x109 bp), the total number of SNPs is well
  above 11 million. The high density and relatively easier
  assay make SNPs the ideal genomic markers.

 In sillico estimation of potentially polymorphic variable
  number tandem repeats (VNTR) are over 100,000
  across the human genome

 The short insertion/deletions are very difficult to
  quantify and the number is likely to fall in between
  SNPs and VNTR.
 Types of Single Base Substitutions

 Transitions
  Change of one purine (A,G) for another purine, or a
  pyrimidine (C,T) for another pyrimidine

 Transversions
  Change of a purine (A,G) for a pyrimidine (C,T), or
  vice versa.

 The cytosine to thymine (C>T) transition accounts
  for approximately 2 out of every 3 SNPs in human
  genome.
           SNP or Mutation?
 Call it a SNP IF
  the single base change occurs in a population at a
  frequency of 1% or higher.

 Call it a mutation IF
  the single base change occurs in less than 1% of a
  population.

 A SNP is a polymorphic position where the point
  mutation has been fixed in the population.
From a Mutation to a SNP
                     SNPs Classification
SNPs can occur anywhere on a genome, they are classified based on their locations.

 Intergenic region
 Gene region
    can be further classified as promoter region, and coding region
    (intronic, exonic, promoter region, UTR, etc.)
            Coding Region SNPs
 Synonymous
 Non-Synonymous
   Missense – amino acid change
   Nonsense – changes amino acid to stop codon.




                                                   Geospiza Green Arrow™ tutorial by Sandra Porter, Ph.D.
       The Consequences of SNPs
  The phenotypic consequence of a SNP is
  significantly affected by the location where it
  occurs, as well as the nature of the mutation.
 No consequence
 Affect gene transcription quantitatively or
  qualitatively.
 Affect gene translation quantitatively or
  qualitatively.
 Change protein structure and functions.
 Change gene regulation at different steps.
 Simple/Complex Genetic Diseases and SNPs
 Simple genetic diseases (Mendelian diseases) are
  often caused by mutations in a single gene.
  -- e.g. Huntington’s, Cystic fibrosis, PKU, etc.
 Many complex diseases are the result of mutations
  in multiple genes, the interactions among them as
  well as between the environmental factors.
  -- e.g. cancers, heart diseases, Alzheimer's, diabetes,
  asthmas, etc.
 Majority of SNPS may not directly cause any
  diseases.
 SNPs are ideal genomic markers (dense and easy to
  assay) for locating disease loci in association studies.
  Main Genetic Variation Resources
 NCBI dbSNP
    http://www.ncbi.nlm.nih.gov/SNP/index.html

 NCBI Online Mendelian Inheritance in Man
  (OMIM)
    http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM
 International HapMap Project
    http://www.hapmap.org/
 Perlegen
    http://genome.perlegen.com
 Genome Variation Server (Seattle SNPs)
    http://gvs.gs.washington.edu/GVS/
  Where to Find Bioinformatics Resources for
          Genetic Variation Studies?

 OBRC: Online Bioinformatics
  Resources Collection (Univ. of Pittsburgh)
   http://www.hsls.pitt.edu/guides/genetics/obrc
   The most comprehensive annotated bioinformatics
   databases and software tools collection on the Web, with
   over 200 resources relevant to genetic variation studies.


 HUGO Mutation Database Initiative
   http://www.hgvs.org/dblist/dblist.html
     NCBI dbSNP Database: Overview
 URL: http://www.ncbi.nlm.nih.gov/SNP/index.html

 The NCBI’s Single Nucleotide Polymorphism
  database (dbSNP) is the largest and primary
  public-domain archive for simple genetic variation
  data.

 The polymorphisms data in dbSNP includes:
     Single-base nucleotide substitutions (SNPs)
     Small-scale multi-base deletions or insertions variations
      (also called deletion insertion polymorphisms or DIPs or
      INDELs)
     Microsatellite tandem repeat variations (also called short
      tandem repeats or STRs).
dbSNP Data Stats (build 128, Oct, 2007)
http://www.ncbi.nlm.nih.gov/SNP/snp_summary.cgi
            dbSNP Data Types
The dbSNP contains two classes of records:
   Submitted record
     The original observations of sequence
     variation; submitted SNPs (SS) records
     started with ss (ss5586300)
   Computationally annotated record
    Generated during the dbSNP "build" cycle by
    computation based the original submitted
    data, Reference SNP Clusters (ref SNP) start
    with rs (rs4986582)
            dbSNP Submitted Record
  Provides information on the SNP and conditions under
   which it was collected.
  Provides links to collection methods (assay technique),
   submitter information (contact data, individual submitter),
   and variation data (frequencies, genotypes).




ss5586300
From Submitted Record to Reference SNP Cluster

                                             SNP position mapped
SNPs records submitted
                                             to the reference genomic
   by researchers
                                             contigs


If the SNP position not unique, it will be
assigned to the existing RefSNP cluster



                                                If the SNP position is
                                                unique, a new RS# is
                                                assigned
   Different Ways to Search SNPs in dbSNP
 dbSNP Web site
   http://www.ncbi.nlm.nih.gov/SNP/index.html
   Direct search of SS record; batch search; allow SNP record
   submission; NO search limits
 Entrez SNP
   http://www.ncbi.nlm.nih.gov/sites/entrez?db=Snp
   Search limits options allows precise retrieval
 Entrez Gene Record’s SNP Links Out Feature
   Direct links to corresponding SNP records; access to genotype
   and linkage disequilibrium data
 NCBI’s MapViewer
   Visualize SNPs in the genomic context along with other types
   of genetic data.
      Search SNPs from dbSNP Web Page
 dbSNP Web site
   http://www.ncbi.nlm.nih.gov/SNP/index.html
   Search SNPs from Entrez SNP Web Page
 Entrez SNP
  http://www.ncbi.nlm.nih.gov/sites/entrez?db=Snp
  The dbSNP is a part of the Entrez integrated information
  retrieval system and may be searched using either qualifiers
  (aliases) or a combination search limits from 14 different
  categories.
                      Entrez SNP Search Limits
   Organisms
   Chromosome (including W and Z for non-mammals)
   Chromosome Ranges
   Map Weight (how many times in genome)
   Function Class (coding non-synonymous; intron; etc.)
   SNP Class (types of variations)
   Method Class (methods for determining the variations)
   Validation Status (if and how the data is validated)
   Variation Alleles (using IUPAC- codes)
   Annotation (Records with links to other NCBI database)
   Heterozygosity (% of heterozygous genotype)
   Success Rate (likelihood that the SNP is real)
   Created Build ID
   Updated Build ID

http://www.ncbi.nlm.nih.gov/portal/query.fcgi?db=Snp   http://www.ensembl.org/common/helpview?kw=snpview;ref=
              Search dbSNP: Example 1
Some mutations on human BRCA1 gene
have been reported to be involved in the
early onset of breast cancer.
Retrieve all validated non-synonymous
coding reference SNPs for BRCA1 from
dbSNP.

Hint: starting from the Entrez SNP: http://www.ncbi.nlm.nih.gov/sites/entrez?db=Snp
Entrez SNP Search Results Example 1
dbSNP Ref SNP Record Example 1: Summery
       http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=4986852




  This Ref SNP cluster
   contains multiple
submitted SNP records
 from different groups
dbSNP Ref SNP Record Example 1:
 SNP position and the flank region
               dbSNP Ref SNP Record Example 1:
                 GeneView of an individual SNP




Because of alternative splicing, the very same SNP
can locate in different region of the transcripts.
dbSNP Ref SNP Record Example 1:
  TableView of an individual SNP


                          Notice that the individual
                          SNP is mapped to the
                          same position on the
                          reference genomic contig,
                          but different positions on
                          mRNAs and proteins due
                          to alternative splicing.
    dbSNP Ref SNP Record Example 1:
Links to Various Annotated NCBI Databases




                                                         Link to the
                                                         OMIM record
                                                         where
                                                         documented
                                                         clinical and
                                                         genetic data of
                                                         this SNP can
                                                         be found.

                Warning: the lack of OMIM link does not necessary mean
                that this SNP is unrelated to any OMIM record.
              dbSNP Ref SNP Record Example 1:
Population Allele Frequency, Genotype and Heterozygosity Data

                                               Link to the detailed
                                               population genotype
                                               data.

                                               Data from National
                                               Cancer Institute.

                                               Data from The NIH
                                               Polymorphism
                                               Discovery Resource

                                               Data from Centre
                                               d'Etude du
                                               Polymorphisme
                                               Human (CEPH).

                                               Data from the
                                               International
                                               HapMap Project.
dbSNP Ref SNP Record Example 1: GeneVeiw and SequenceView of ALL SNPs
           dbSNP Ref SNP Record Example 1:
Links to View SNPs on 3D Structure, Conserved Domains,
            and Multiple Sequence Alignment
              Search dbSNP: Example 2

Mutations in Dopamine Receptor 5 (DRD5)
gene have been observed in patients with
various neurological disorders.
Find how many refSNP records have been
reported for DRD5. Show all refSNPs in
the context of a chromosome.

Hint: starting from the Entrez Gene: http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
Search dbSNP: SNP Links from Entrez Gene Record
Search dbSNP: SNP Display Using NCBI Map Viewer
Search dbSNP: Configure Map Viewer
   to Display other Relevant Data
               SNPs Display in Map Viewer: Legend
 Click on any
 column headings
 to see the refSNPs
 legend.




http://www.ncbi.nlm.nih.gov/SNP/get_html.cgi?whichHtml=verbose
SNPs Display in Map Viewer: Legend
     Online Mendelian Inheritance in Man (OMIM):
                   A Brief Overview
 URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

 OMIM is a human genetic disorders database built and curated using
  results from published studies.

 Each OMIM record provides a summary of the current state of knowledge
  of the genetic basis of a disorder, which contains the following information:
     description and clinical features of a disorder or a gene involved in
      genetic disorders;
     biochemical and other features;
     cytogenetics and mapping;
     molecular and population genetics;
     diagnosis and clinical management;
     animal models for the disorder;
     allelic variants.

 OMIM is searchable via NCBI Entrez, and its records are cross-linked to
  other NCBI resources.
Online Mendelian Inheritance in Man Stats




               •http://www.ncbi.nlm.nih.gov/Omim/mimstats.html
              OMIM: Allelic Variants
 The OMIM database includes genetic disorders caused by
  various mutation/variation, from SNPs to large-scale
  chromosomal abnormalities.
 The listed allelic variants are searchable through the "Allelic
  Variants" field.
    Single nucleotide substitutions (SNPs);
    small insertions and deletions (INDEL/DIPS);
    frame shifts caused by these INDELs.

 Allelic variants are represented by a 10-digit OMIM number,
  and can be searched in two ways:
    Search for a gene or a disease, when retrieved, view its allelic
     variants.
    Use the Limits to narrow your search to:
     -- retrieve only records that contain allelic variant information;
     -- search for particular terms within the allelic variants field.
       Notes on OMIM Allelic Variants
For most genes, only selected mutations are included
  Criteria for inclusion include: the first mutation to be
  discovered, high population frequency, distinctive phenotype,
  historic significance, unusual mechanism of mutation, unusual
  pathogenetic mechanism, and distinctive inheritance.
Most of the allelic variants represent disease-
 producing mutations, NOT polymorphisms.
A few polymorphisms are included, many of which
 show a positive statistical correlation with particular
 common disorders.
Few neutral polymorphisms are included in OMIM.
Some SNPs in the dbSNP records are not linked to the
 corresponding OMIM records.
       http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=113705
Sequence variations view in UniProt Beta




                  http://beta.uniprot.org/uniprot/P38398
Assessing Polymorphisms: Genotypes and Genotyping




Genotype: Each person has two copies of all
 chromosomes except the sex chromosomes. The set of
 alleles at a given locus forms the genotype.
Genotyping: the process of identifying what genotype a
 person has for any given locus (loci).
Whole-genome genotyping of all SNPs in a human
 genome? (11.8 million and counting)
  Technologically daunting
  Prohibitively expensive and time consuming
     Assessing Polymorphisms: the Origin of Haplotype
                                              Two ancestral chromosomes scrambled
                                               through recombination over many
                                               generations to yield different descendant
                                               chromosomes.

                                              If a genetic variant marked by the X on
                                               the ancestral chromosome increases the
                                               risk of a particular disease, the two
                                               descendants who inherit that part of the
                                               ancestral chromosome will be at
                                               increased risk.

                                              Adjacent to the variant marked by the X
                                               are many SNPs that can be used to
                                               identify the location of the variant.

                                              Haplotype: A particular combination of
                                               alleles along a chromosome that tends to
                                               be inherited as a unit.

http://www.hapmap.org/originhaplotype.html
              Assessing Polymorphisms:
Linkage Disequilibrium, Haplotype Block, and Tag SNPs




                                                                                        Adapted from Nature 426, 6968: 789-796 (2003)
 Linkage Disequilibrium (LD): If two alleles tend to be inherited together more often
  than would be predicted, then the alleles are in linkage disequilibrium.
 If most SNPs have highly significant correlation to one or more of neighbors, these
  correlations can be used to generate haplotypes, which represent excellent proxies
  for individual SNP.
 Because haplotypes may be identified by a much small number of SNPs (tag SNPs),
  assessing polymorphisms via haplotypes dramatically reduces genotyping work.
        Assessing Polymorphisms: Tag SNPs




 Tag SNP: a representative SNP enabling to infer (or predict)
  other SNPs of its ―neighborhood‖ (both distance and
  genealogically wise).
 An r2 of 0.8 or greater is sufficient for tag SNP mapping to
  obtain a good coverage of untyped SNPs.
 Tag SNPs allow genotyping of a lower number of marker
  SNPs with very small losses in power.
 If LD between SNPs is low, almost every SNP might have to be
  genotyped to get all variation information.              51
                                       Goals
                                        Create a public genome-wide
                                         database of common human genetic
                                         variation in the context of geographic
                                         distribution
                                        Provide such information to guide
                                         genetic studies of clinical phenotypes

 Phase I (Oct. 2002)
    One million common SNPs (every 5 kb across the genome) were
     genotyped in 269 DNA samples from four populations.
    Common SNPs : Minor Allele Frequency ≥ 0.05
    YRI : Yoruba in Nigeria (30 trios), CEU : Utah with European
     ancestry (30 trios), CHB : 45 Han Chinese, JPT: 44 Japanese
 Phase II
    An additional 4.6 million SNPs are genotyped.

 ENCODE (Encyclopedia of DNA Elements)
    Collection of ten regions, each 500kb in length.
    Each 500 kb region was re-sequenced and all SNPs were genotyped.
              HapMap Progress
PHASE I – completed
  1,000,000 SNPs successfully typed in all 269 HapMap samples
  At least one common SNP every 5 kb across the genome
  ENCODE variation reference resource available

PHASE II – data generation complete, about 4.6 million
SNPs typed in total.

ENCODE-HAPMAP – A much more detailed variation resource
  48 samples sequenced
  All discovered SNPs (and any others in dbSNP) typed in all
   270 HapMap samples
  Current data set – average 1 SNP every 279 bp
             HapMap Data Overview
Basic Data: genotypes of the 270 individual samples (frequencies of SNP alleles and
genotypes in each population)
Recent data release (Full Data Set): January 11, 2007, NCBI B35 (includes both Phase
I&II data, genotypes from Illumina 100k and 300k genotyping arrays and the Affymetrix
nsSNPs)
Phase I: 600,000 common SNPs in 270 individuals
Phase II: 4-5 million SNPs in the same individuals




Available for bulk download:
 All genotype data, haplotype phasing data (from PHASE)
 Pedigree trio files
 Raw LD data (D’, R2), recombination rates and hotspots
 Allele and genotype frequencies
 SNP assays and protocols
 Allocated SNPs (dbSNP reference clusters chosen for genotyping)
                                        Adapted from Alanna Morrison, Human Genetics Center, Feb. 2007 lecture
   Major Findings of the HapMap Project
 Extensive Redundancy of SNP: over 90% of all SNPs on the
  map have highly statistically significant correlation to one or
  more neighbors.
 Confirmed the generality of recombination hotspots and long
  segments of strong LD (Haplotype blocks), with the average
  length ranging from 7.3 (YRI) to 16.3 kb (CEU), and between
  65-85% of human genome presented in such blocks.
 Revealed limited haplotype diversity: while each haplotype
  block contains 30-70 SNPs, on average only 4-5.6 common
  haplotype blocks exist, which can be further identified by a
  smaller number of SNPs (tag SNPs).
 The density of common SNPs can be reduced by 75–90% with
  essentially no loss of information. That is, the genotyping
  burden can be reduced from one common SNP every 500 bp to
  one SNP every 2 kb (YRI) to 5 kb (CEU and CHB/JPT).
What can you do from the HapMap Web Site?
 Search for SNPs in a gene or any region of
  interest (ROI).
   View patterns of LD in the ROI.
   Select tagSNPs in the ROI.
 Download information on the SNPs in ROI
  for genotype/haplotype data analysis and
  visualization in Haploview or other software.
 Generate and retrieve customized subset data.
 Download the entire data set in bulk.
       Search HapMap: Example 1

SNPs in human BRCA1 gene have been
reported to be involved in the early onset
of breast cancer.
Find all available genotype and LD data
for SNPs documented for BRCA1 in
HapMap database.

http://www.hapmap.org/
               HapMap Search Example 1
Step 1: Open the Genome Browser with the Latest Full Data Set




                                         Click ―HapMap Genome
                                        Browser (B35 full data set)‖
          HapMap Search Example 1
Step 2: Specify the landmark/region of interests
            Enter gene name ―brca1‖ to specify
                the region of your interest




                                        When there are multiple transcripts,
                                             click one of your choice
              HapMap Search Example 1
Step 3: Examine and determine the desired region for display




 The mRNA
                                           Examine the region for display
                                               using different scales



Genotype
frequency
                                     Genotyped SNPs in the region, pie
                                     chart shows allelic frequencies (ref
                                                 vs other)
      HapMap Search Example 1
Step 4: display genotype data for each refSNP
     HapMap Search Example 1
Step 5: Select the desired tracks for display

                               Select the desired analysis results
                                           for display




                                    Click ―Update Image‖ once
                                     the configuration is done
  HapMap Search Example 1
Step 6: Configure the tag SNP Picker




                                      Select the desired population



                                     Select the desired tagging methods



                                   Select r2 value to set desired stringency




                                 Set MAF for the lowest threshold of alleles to
                                      be captured by the tagged SNPs




        Specify SNPs to be included/excluded as
                     tagged SNPs
                  HapMap Search Example 1
                    Step 7: Configure the LD Plot




                                                      Configure LD plot display




                                               Select LD measurement and range




Select desired populations             Customize the color display for LD value
HapMap Search Example 1
Step 8: Tag SNPS and LD Plot


                         Genotyped SNPs in the region




                          LD plot shows LD between
                           different pairs of SNPs




                          Tagged SNPs based on your
                                   criteria
                  HapMap Search Example 1
            Step 9: Download various data and files




                                                                Click ―Go‖




The genotype data can be used for in depth LD and     Select desired data or file
Haplotype analysis with the free Haploview program.         for download
              Haploview--
http://www.broad.mit.edu/mpg/haploview/
Haploview
Screenshots
     HapMap Data Extraction using HapMart




                             Select desired population
www.hapmap.org
HapMap Data Extraction using HapMart:
       Data filter and export
             Perlegen Sciences
 Found in 2000 with the mission of identifying clinically
  relevant patterns of genetic variation.
 Over 1.6 millions common SNPs genotyped from 71
  individuals from 3 American populations of European,
  African and Asian ancestry (about 1 SNP/1871 bp)
 GWA studies on over 100,000 different human
  individual.
 Re-sequenced the nuclear DNA genomes of 15 inbred
  laboratory mouse strains and generated genotype data.
 Specialized Mouse Genome Brower allows users
  visualize the SNPs and LR-PCR primer pairs and
  access the SNP genotypes for the 15 strains
  http://mouse.perlegen.com/mouse/browser.html
Perlegen Human Genotype Brower

           http://genome.perlegen.com/cgi-bin/gbrowse/
Perlegen Human Genotype Brower
 Hosting raw genotyping data for 4.5 million human
  SNPs from HapMap, Perlegen, and other projects.
 Generated SNPs data on candidate genes involved in
  cardiovascular diseases and inflammatory process.
 Tools for searching, visualization and analysis of
  genotype data for association studies.
 Merging SNP data sets from different populations.
 Using Genome Variation Server
       http://gvs.gs.washington.edu/GVS/index.jsp




       Select the search type to
           start the search




             upload your genotype
               data for analysis



Detailed online tutorial
GVS Search Example: rs9939609 (FTO gene)
     Step 1: select query type




                    1




                            2
GVS Search Example: rs9939609 (FTO gene)
     Step 2: Select population(s)
GVS Search Example: rs9939609 (FTO gene)
     Step 3: Configure parameters
GVS Search Example: rs9939609 (FTO gene)
  Step 4: Display Results—Genotype data
GVS Search Example: rs9939609 (FTO gene)
  Step 4: Display Results—Genotype data

                                      rs9939609

                  SNP ID




        Sample
GVS Search Example: rs9939609 (FTO gene)
     Step 5: Display results—TagSNPs



                     TagSNPs Table Display
GVS Search Example: rs9939609 (FTO gene)
  Step 5: Display results—TagSNPs
                    Bin   TagSNPs Graphic Display
GVS Search Example: rs9939609 (FTO gene)
                   Step 6: Display results—LD
GVS Search Example: rs9939609 (FTO gene)
     Step 7: Display results—Summary
                 SNPs in Ensembl
                http://www.ensembl.org/index.html
• Most SNPs imported from dbSNP (rs……):
   • Imported data: alleles, flanking sequences, frequencies, ….
   • Calculated data: position, synonymous status, peptide shift,
     ….
• For human also:
   •   HGVbase
   •   TSC
   •   Affy GeneChip 100K and 500K Mapping Array
   •   Ensembl-called SNPs (from Celera reads)

• For mouse and rat also:
   • Sanger- and Ensembl-called SNPs
    SNPs in Ensembl
MapView: SNP density on chromosome
     SNPs in Ensembl
ContigView: SNPs in genomic context
       SNPs in Ensembl
GeneSeqView: SNPs in genomic sequence
            SNPs in Ensembl
TransView & ProtView: SNPs in transcript/ protein
              SNPs in Ensembl
What SNPs does my gene contain? > GeneSNPView
SNPs in Ensembl

       Info about one specific
         SNP?
       > SNPView:

       • SNP Report
       • Genotype and allele
         frequencies per population
       • Located in transcripts
       • SNP Context
       • Individual genotypes
https://www.pharmgkb.org/index.jsp
                User Question
A recent report (Frayling et al. Science 2007) found a
common variant (rs9939609, A>T) in the FTO
gene (fat mass and obesity associated) is associated
with body mass index and predisposes to obesity
and diabetes.
The adults (16%) carrying homozygous risk allele
A weighed 3 kg more and had 1.67 fold increased
odds of obesity compared to those without the risk
allele.
Use the HapMap and dbSNP to find the genotype
data of this SNP in different populations.
         Answer 1: Searching HapMap
                                      Use the refSNP# (must starts with rs)
                                         as the landmark for the search




Click on the pie chart for detailed
    population genotype data
        Answer 1: Searching HapMap
                                    Population genotype data of the
                                       homozygous risk allele A




Retrieve detailed genotyping data
Answer 2: Searching NCBI’s dbSNP
  http://www.ncbi.nlm.nih.gov/sites/entrez?db=Snp




                        Click on the rs record for detailed
                                 SNP data report
Answer 2: Searching NCBI’s dbSNP




               Genotype data from Perlegen’s project
                 with different population samples
               Acknowledgement
   In addition to those already stated, some slides of this
   workshop were adapted from the sources below:
1. Chattopadhyay A. and M.R. Tennant. ―Genetic
   Variation Resources‖. Lecture slides for 2007 NCBI
   Advanced Workshop for Bioinformatics Information
   Specialists.
2. Stein L. ―Using HapMap.org: A tutorial‖.
   Presentation slides as part of the Official HapMap
   Tutorial.
3. Overduin B. ―Sequence Variation in Ensembl‖.
   Lecture slides for ―Ensembl Courses and Workshops‖
  Recommend Topics for the Second Part of
―Online Resources for Genetic Variation Study‖

   Functional analysis of SNPs
   Tools for SNP discovery and genotyping
   Tools for TagSNPs selection
   Tools for genome wide association study
   Genetic association databases
   Others??

Please evaluate this workshop to help me improving
future presentations:
http://www.zoomerang.com/survey.zgi?p=WEB226GJV4RJWR

Have questions or comments about this workshop?
Please contact:

Yi-Bu Chen, Ph.D.
Bioinformatics Specialist
Norris Medical Library
University of Southern California

323-442-3309
yibuchen@belen.hsc.usc.edu

								
To top