Linkage analysis

Document Sample
Linkage analysis Powered By Docstoc
					    6

Linkage analysis
         Jan Hellemans
Finding causal mutations


 2 opposing strategies
     sequence then select
     select then sequence
 Sequencing
     traditional Sanger sequencing only possible after selection
     Massively parallel sequencing possible prior to or after selection
         RNA sequencing
         exome sequencing
         genome sequencing
Finding causal mutations


 Selection
     positional (prior to sequencing)
        linkage analysis
        GWAS
        structural variations (e.g. microdeletions)
     functional (prior to & after sequencing)
        candidate genes selected based on known function or involvement
          in related disorders
        filtering of variants based on functional predictions
     overlap (after sequencing)
        looking for genes / variants that occur in multiple independent
          patients

     mostly a combination is used
exome sequencing
Aims



          Interprete microsatellite results
          Add genotypes to pedigrees
          Create pedigree and genotype files
          Calculate and interprete LOD-scores
          Delineate linkage intervals

          Basic principles of linkage analysis
          Analyze other types of markers
          Association studies
          Learn how to work with specific pedigree programs
Starting linkage analysis
Preparations


 Clearly define the phenotype
     If not specific enough than you may analyze different disorders that can
      map to different genomic loci
     LOD scores are additive
 Find suitable families
     larger is better
     more patients is better
 Collect genomic DNA from as much family members as
  possible
 Determine the type of inheritance
 Calculate the power to prove linkage with the available
  material (SLink – not part of this course)
Linkage analysis types


 Directed linkage analysis
     Evaluate linkage at a specific locus such as a candidate gene
     Common approach: evaluate an intragenic, 5’ and 3’ marker
      often microsattelites
 Genome wide linkage analysis
     Screen for linkage for markers spread across the entire genome
     Microsatellites: ~400 markers spaced at about 10cM
     SNP’s: 500k SNP array
 Homozygosity mapping
     Screen only affected individuals in inbred families
     Select homozygous markers (typically SNP markers)
     Very efficient technology
 Fine mapping
     Some linked markers are known, but the borders of the linkage interval
      still need to be defined
Exercise – Part 1


 2 inbred families with a recessive disorder
 With a homozygosity mapping based on 500k SNP
  arrays 2 candidate regions could be identified
    40,000               Chromosome 4
    35,000               Patient 1 homozygous for
    30,000
                            6.052Mb - 14.488Mb
    25,000
                            21.008Mb – 37.477Mb
                         Patient 2 homozygous for
    20,000
                            11.186Mb – 37.219Mb
    15,000

    10,000
                      Task: find microsatellite markers to
     5,000             confirm linkage
     -
             1   2
Find additional flanking markers


   Find physical position of marker in NCBI > UniSTS
   NCBI map viewer: http://www.ncbi.nlm.nih.gov/mapview/
   Go to Homo sapiens and to the wright chromosome
   Maps & options: show
     DeCode, Généthon & Marshfield (genetic maps)
     Genes
   Set region: e.g. 2Mb up- and downstream of your marker
   Click ‘Data as table view’
   Click on STS behind a marker to see its details
   Select markers that
     locate to only 1 genomic location
     have a PCR product with an extended size range
      one size  not polymorphic
http://www.ncbi.nlm.nih.gov/projects/mapview
http://www.ncbi.nlm.nih.gov/projects/mapview
http://www.ncbi.nlm.nih.gov/projects/mapview
Exercise – Part 1 > possible solution


 Markers in 1st candidate region
       D4S3017 (21.078Mb)
       D4S3044 (25.189Mb)
       D4S1618 (33.857Mb)
       D4S3350 (33.857Mb)
       D4S2988 (36.889Mb)
 Markers in 2nd candidate region
       D4S1582 (10.311Mb)
       D4S2906 (12.321Mb)
       D4S2944 (13.141Mb)
       D4S1602 (14.059Mb)
       D4S2960 (15.437Mb)
  Order primers & analyze them on all family members
Analyzing microsatellite data
Microsatellites > basics


 Repeats of short sequences (e.g. 2bp)
  NNNNAC(AC)nACNNNN
 Number of repeats is variable (instable sequence)
 Number of repeats determines the allele
 Number of repeats corresponds to specific length of
  PCR product:
        allel 1: NNNNACACACACACNNNN       (5*AC  18bp)
        allel 2: NNNNACACACACACACNNNN     (6*AC  20bp)
        allel 3: NNNNACACACACACACACNNNN   (7*AC  22bp)
        ...
 Determine length to know the allele (sequencer)
Microsatellites > basics
Microsatellites > determine size


 Use internal size standard (other color)




       220bp                                 230bp

                                   225bp
Microsatellites > heterozygotes




       220bp                               230bp

                           223bp   225bp
Microsatellites > stutter peaks


 Repeats are difficult to copy  polymerase slips
 Some amplicons have 1 repeat less
  a few even loose multiple repeats
 Small repeats are more prone to slippage and show
  more pronounced stutter peaks
 Largest product is the correct one
 Distance between peaks = length of a repeat
Microsatellites > stutter peaks




                                                allelic peak




                                  1st stutter peak


                      2nd stutter peak
Microsatellites > stutter peaks


 Allelic peaks are the heighest
 Stutter peaks are lower




                                   A1   A2
Microsatellites > stutter peaks




                                  A1   A2
Microsatellites > +A peaks


 Taq polymerase tends to add an extra A at the 3’ end
 Variable degree of products with or without this extra A
 Do not confuse with stutter peaks (only 1bp difference)
                                               allelic peak


                                                         allelic peak + A

                                 1st stutter peak
                                            1st stutter peak + A
                     2nd stutter peak
                                 2nd stutter peak + A
Microsatellites > complex plots (stutter & +A)




                                                 A1   A2
Microsatellites > mutliplex


 Combine multiple markers in a single analysis ($$$)
      Different size range
      Multicolor
      Commercial kits: e.g. 16 markers / lane
Microsatellite plots examples
Genotyping pedigrees
Genotyping pedigrees


 Screen one or multiple markers for some or all family
  members
 For every marker:
     Make a list of all occuring allele sizes
     Due to technical variation on sizing the same allele can have a slightly
      different size in different measurements (-0.4bp _ +0.4bp). Give all
      alleles within this range the same allele number
     Add the allele numbers to the pedigree at the corresponding
      individual/marker combination
     Find the wright phase
 Advanced software like GeneMapper can generate
  tables with allele numbers for every sample / marker
 Advanced pedigree programs like Progeny can store
  genotype information for family members
 Verify inheritance
Exercise – Part 2


 Genotype 3 markers in all available individuals of 2
  families
 Pedigrees & microsatellite plots in
  ExercisePart2-GenotypingData.pdf
 Add allele numbers for the 3 markers to the pedigree
 Interprete the genotyped pedigrees: linked?
Family 1
Family 2
Exercise – Part 2 > Conclusions


 D4S1582
     Mendelian error  can not be interpreted
 D4S2944
     Linked
 D4S3017
     Not-linked: unaffected individuals with the same genotype as a patient
Calculate LOD scores
EasyLinkage


 EasyLinkage = UI for linkage analysis
     http://genetik.charite.de/hoffmann/easyLINKAGE/index.html#start
     Bioinformatics. 2005 Feb 1;21(3):405-7       PMID: 15347576
     Bioinformatics. 2005 Sep 1;21(17):3565-7     PMID: 16014370
 Interface for many linkage analysis programs
 Input
       Pedigree file (linkage format)
       Genotype file(s)
       Marker information (already provided for popular markers)
       Settings
Pedigree file


 Naming requirements for EasyLinkage:
  p_xxx.pro  e.g. p_SMMD.pro
 Format:
      Tab delimited text file
      1 individual per row
 Columns:
        1  family ID
        2  person ID
        3  father ID
        4  mother ID
        5  sex (1=male, 2=female, 0=unknown)
        6  affection status (1=unaffected, 2=affected, 0=unknown)
        7  DNA availability (optional, relevant for power calculations)
        8  liability class (to be provided if multiple liability classes are used)
Genotype files


 Person ID’s have to match exactly with those provided in
  the pedigree file
 Naming requirements for EasyLinkage:
  MarkerName_xxx.abi  e.g. D1S1609_SMMD.abi
 Format:
     Tab delimited text file
     1 individual per row
 Columns (for microsatellite based analysis):
     1  marker (same as in file name and matching a marker in an
      available marker set)
     2  custom information (content doesn’t matter, but column must be
      present)
     3  individual ID (match person ID in pedigree file)
     4 & 5  genotypes for 2 alleles (unknown=0)
Marker information


 Contains information on the chromosome and position of
  every marker
 Already available for a number of commercial SNP-
  arrays and for the microsatellite markers from
     Genethon
     Marshfield
     DeCode
 Custom marker sets can be created (see manual)
EasyLinkage settings


 Choose a program:
     FastLink  Parametric, single-point
     SuperLink  Parametric, single-/multipoint
     SPLink  Nonparametric, single-point
     Genehunter  Nonpara-/parametric, single-/multipoint
     Genehunter Plus  Nonpara-/parametric, single-/multipoint
     Genehunter MOD  Nonpara-/parametric, single-/multipoint
     Genehunter Imprinting  Nonpara-/parametric, single-/multipoint
     GeneHunter TwoLocus  Parametric, two-locus, single-/multipoint
     Merlin  Nonpara-/parametric, single-/multipoint
     SimWalk  Nonparametric, single-/multipoint
     Allegro  Nonpara-/parametric, single-/multipoint & simulation, single-
      /multi-point
     PedCheck  Mendelian error check
     FastSLink  Simulation, single-/multi-point
EasyLinkage settings


   Parametric <-> non-parametric
   Single point <-> multipoint
   Frequency of the disease allele
   Penetrance vectors (wt/wt, wt/mt, mt/mt)
       Standard dominant: 0 1 1
       Standard recessive: 0 0 1
       Reduced penetrance: replace 1 by penetrance (e.g. 0.9)
       Phenocopy: replace 0 by percentage of phenocopy (e.g. 0.1)
       Example: 0.01 0.9 0.99
        1% chance to show a similar phenotype despite a normal genotype
        90% chance to show the phenotype when 1 mutant allele (dominant
        with incomplete penetrance)
        99% likelihood to present with the phenotype if both alleles are mutant
Evaluate calculated LOD-scores


 Maximum LOD-scores can be seen in EasyLinkage
 Details about LOD-scores at different recombination
  fractions can be found in text files generated by
  EasyLinkage  process in Excel (generate graphs, ...)
 Standard rules for LOD-scores
       >3  significant linkage
       2<LOD<3  suggestive linkage
       -2<LOD<2  uninformative
       <-2  significant absence of linkage
Interpreting LOD plots
              5                                           5


              4                                           4


              3                                           3


              2                                           2


              1                                           1


              0                                           0
                   0       0,1    0,2   0,3   0,4   0,5        0   0,1   0,2   0,3   0,4    0,5

              -1                                          -1


              -2                                          -2


              -3                                          -3


              -4                                          -4


              -5                                          -5



                   5                                      5


                   4                                      4


                   3                                      3


                   2                                      2


                   1                                      1


                   0                                      0
                       0    0,1   0,2   0,3   0,4   0,5        0   0,1   0,2   0,3   0,4   0,5

               -1                                         -1


               -2                                         -2


               -3                                         -3


               -4                                         -4


               -5                                         -5
Exercise – Part 3


 Generate one pedigree file containing all family
  members of both families (use Global ID’s)
 Generate a genotype file for each of the tested markers
 Run SuperLink analysis with the right settings
 Evaluate results
Exercise – Part 3 > Results
Strengthen the evidence


 Analyze more family members
 Analyze more families
 Analyze flanking markers
       Look for more informative markers that result in higher LOD-scores
       A series of flanking markers allows for multipoint linkage analysis
       A series of linked markers gives more confidence (subjective)
       Flanking markers can also be used to fine-map the linkage interval
Determine the linkage interval




     NL                          NL
               ?                 NL
                                      ?
     L                           L

                                          ...   candidate
                                                region
     L                           L
               ?                 L
                                      ?
      NL                         NL
Exercise 2: find the linkage interval
Post linkage


 Create a list of all the genes within the linkage interval
     NCBI map viewer
     UCSC (also for non-coding RNA’s)
 Evaluate known gene functions for relevance to the
  investigated phenotype
 Sequence genes
     Start with those that seem the most relevant to the disorder
     Start with the coding regions
     Screen the entire region with capture sequencing
 Finding a mutation and proving its causality is the
  ultimate proof

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:59
posted:9/15/2012
language:Unknown
pages:56