Zebra Finch Seg Dup Analysis by ccf65261

VIEWS: 25 PAGES: 22

									Zebra Finch Seg Dup Analysis
 1. Genome
 2. Parameters for Pipeline
 3. Analysis
                   Zebra Finch Genome
•   The Genome (Jul. 2008 assembly of the zebra finch genome taeGut1,
    WUSTL v3.2.4) is downloaded from UCSU. This assembly was produced by
    the Genome Sequencing Center at the Washington University in St. Louis
    (WUSTL) School of Medicine.

•   The zebra finch DNA used for the shotgun sequencing and the BAC and
    cosmid libraries was derived from a single male domesticated zebra finch.
    The initial assembly was generated using PCAP with approximately 6X
    coverage. About 1.0 Gb of the 1.2-Gb genome has been ordered and
    oriented along 33 chromosomes and one linkage group. The chromosome
    names are based on their homologous chromosomes in the chicken (Gallus
    gallus).

•   Total genome size (gapped)      1,233,186,341 bp
          Seg Dup detection pipelines

• WGAC        to detect Seg Dup in genomic assemblies by
  looking for homologouse pairs ( >1 kb in length >90%
  identity).

• WSSD      to detect Seg Dup in given sequences based
  on depth coverage of WGS (whole-genome shotgun
  reads). Depth coverage > Average + 3SD. Done by
  Ginger Cheng.
     Parameters and notes for WGAC pipeline


• Repeats
   – The sequences download from UCSC has been soft masked.
       • UCSC rmsk options: RepeatMasker -align -s -species 'Taeniopygia guttata'
   – The repeat coordinates were reverse generated based on the soft-
     masked sequences.


• Blast parsing seeds in WGAC pipeline:
   – the seed size is 250 bp.
                    Result from WGAC Pipeline
• Total pairs of WGAC detected
        (>1 kb and >90% identity)                                      198180
• Inter chromosome pairs                                                81415
• Intra chromosome pairs                                               116742
• Chromosome inter and intra
     (excluding chr_random and chrUn)                                    26510
• ChrUn inter and intra                                              172670
• Total WGAC NR (bp)                                            384,501,909
• Total genome size (with gap)                                1,233,186,341

Notes:
•   The NR space of WGAC is about 31% zebra finch genome, which is too high. It is
    either due to the incomplete repeat masking or redundant sequences in chr_random
    and chrUn. 87% of the total WGAC pairs (inter and intra) have at least one sequence
    in each pair is on chrUn. The result indicates a big portal of false positive WGAC is
    from chrUn.
                     General analysis of WGAC length and identity
                                      distribution
1.        Length distribution peaked at 1-2 kb, intra > inter, with 87% of WGAC related to chrUn.
2.        Identity distribution peaked at 97-98%. Few are higher than 99%.




                                            WGAC length distribution                                                                                                       WGAC identity distribution

                  250000000                                                                                                                       180000000

                                                                                                                                                  160000000
                                                                                                     interlen                                                                                                                                                inter
                  200000000                                                                                                                       140000000                                                                                                  intra
                                                                                                     intralen
                                                                                                                                                  120000000




                                                                                                                                     Total (bp)
     Total (bp)




                  150000000                                                                                                                       100000000

                                                                                                                                                   80000000
                  100000000                                                                                                                        60000000

                                                                                                                                                   40000000
                   50000000                                                                                                                        20000000

                                                                                                                                                         0



                                                                                                                                                              90.00%

                                                                                                                                                                       91.00%

                                                                                                                                                                                92.00%

                                                                                                                                                                                         93.00%

                                                                                                                                                                                                  94.00%

                                                                                                                                                                                                           95.00%

                                                                                                                                                                                                                    96.00%

                                                                                                                                                                                                                             97.00%

                                                                                                                                                                                                                                      98.00%

                                                                                                                                                                                                                                               99.00%

                                                                                                                                                                                                                                                        99.50%

                                                                                                                                                                                                                                                                 100.00%
                         0
                              1.kb

                                     2.kb

                                            3.kb

                                                   4.kb

                                                          5.kb

                                                                 6.kb

                                                                        7.kb

                                                                               8.kb

                                                                                      9.kb

                                                                                             10.kb

                                                                                                     20.kb

                                                                                                             30.kb

                                                                                                                     40.kb

                                                                                                                             50.kb




                                                            WGAC Length (bp)                                                                                                                               Identity
                                                                                                                                                                                                                                 chrUn                                                                                                                                                                               chrUn
General analysis, NR distribution on chromosome high SD in chrUn




                                                                                                                                                                                                                                 chrZ_random                                                                                                                                                                         chrZ_random
                                                                                                                                                                                                                                 chrZ                                                                                                                                                                                chrZ
                                                                                                                                                                                                                                 chrLGE22_random
                                                                                                                                                                                                                                                                                                                                                                                                                     chrLGE22_random
                                                                                                                                                                                                                                 chrLGE22
                                                                                                                                                                                                                                                                                                                                                                                                                     chrLGE22
                                                                                                                                                                                                                                 chrLG5
                                                                                                                                                                                                                                                                                                                                                                                                                     chrLG5
                                                                                                                                                                                                                                 chrLG2
                                                                                                                                                                                                                                 chr28_random                                                                                                                                                                        chrLG2
                                                                                                                                                                                                                                                                                                                                                                                                                     chr28_random
                                                                                                                                                                                                                                 chr28
                                                                                                                                                                                                                                                                                                                                                                                                                     chr28
                                                                                                                                                                                                                                 chr27_random                                                                                                                                                                        chr27_random
                                                                                                                                                                                                                                 chr27
                                                                                                                                                                                                                                                                                                                                                                                                                     chr27
                                                                   None redundant WGAC length distribution on Chromosome




                                                                                                                                                                                                                                 chr26_random
                                                                                                                                                                                                                                 chr26                                                                                                                                                                               chr26_random
                                                                                                                                                                                                                                 chr25_random                                                                                                                                                                        chr26
                                                                                                                                                                                                                                 chr25                                                                                                                                                                               chr25_random
                                                                                                                                                                                                                                                                                                                                                                                                                     chr25
                                                                                                                                                                                                                                                                Percentage of none redundant WGAC on chromosome




                                                                                                                                                                                                                                 chr24_random
                                                                                                                                                                                                                                 chr24                                                                                                                                                                               chr24_random
                                                                                                                                                                                                                                 chr23_random                                                                                                                                                                        chr24
                                                                                                                                                                   both
                                                                                                                                       inter
                                                                                                                                                     intra




                                                                                                                                                                                                                                 chr23                                                                                                                                                                               chr23_random
                                                                                                                                                                                                                                                                                                                                                                                                                     chr23
                                                                                                                                                                                                                                                                                                                              both
                                                                                                                                                                                                                                                                                                                  inter
                                                                                                                                                                                                                                                                                                                              intra




                                                                                                                                                                                                                                 chr22_random
                                                                                                                                                                                                                                 chr22                                                                                                                                                                               chr22_random
                                                                                                                                                                                                                                 chr21_random                                                                                                                                                                        chr22
                                                                                                                                                                                                                                 chr21                                                                                                                                                                               chr21_random
                                                                                                                                                                                                                                 chr20_random                                                                                                                                                                        chr21
                                                                                                                                                                                                                                 chr20                                                                                                                                                                               chr20_random
                                                                                                                                                                                                                                 chr19_random                                                                                                                                                                        chr20
                                                                                                                                                                                                                                 chr19                                                                                                                                                                               chr19_random
                                                                                                                                                                                                                                 chr18_random                                                                                                                                                                        chr19
                                                                                                                                                                                                                                 chr18                                                                                                                                                                               chr18_random
                                                                                                                                                                                                                                 chr17_random                                                                                                                                                                        chr18
                                                                                                                                                                                                                                 chr17                                                                                                                                                                               chr17_random




                                                                                                                                                                                                                                                                                                                                                                                                                                       Chromosome
                                                                                                                                                                                                                                                   Total (bp)




                                                                                                                                                                                                                                 chr16_random                                                                                                                                                                        chr17
                                                                                                                                                                                                                                 chr15_random                                                                                                                                                                        chr16_random
                                                                                                                                                                                                                                 chr15                                                                                                                                                                               chr15_random
                                                                                                                                                                                                                                 chr14_random                                                                                                                                                                        chr15
                                                                                                                                                                                                                                 chr14                                                                                                                                                                               chr14_random
                                                                                                                                                                                                                                 chr13_random                                                                                                                                                                        chr14
                                                                                                                                                                                                                                 chr13                                                                                                                                                                               chr13_random
                                                                                                                                                                                                                                 chr12_random                                                                                                                                                                        chr13
                                                                                                                                                                                                                                 chr12                                                                                                                                                                               chr12_random
                                                                                                                                                                                                                                 chr11_random                                                                                                                                                                        chr12
                                                                                                                                                                                                                                 chr11                                                                                                                                                                               chr11_random
                                                                                                                                                                                                                                 chr10_random                                                                                                                                                                        chr11
                                                                                                                                                                                                                                 chr10                                                                                                                                                                               chr10_random
                                                                                                                                                                                                                                 chr9_random                                                                                                                                                                         chr10
                                                                                                                                                                                                                                 chr9                                                                                                                                                                                chr9_random
                                                                                                                                                                                                                                 chr8_random                                                                                                                                                                         chr9
                                                                                                                                                                                                                                 chr8                                                                                                                                                                                chr8_random
                                                                                                                                                                                                                                 chr7_random                                                                                                                                                                         chr8
                                                                                                                                                                                                                                 chr7                                                                                                                                                                                chr7_random
                                                                                                                                                                                                                                 chr6_random                                                                                                                                                                         chr7
                                                                                                                                                                                                                                 chr6                                                                                                                                                                                chr6_random
                                                                                                                                                                                                                                 chr5_random                                                                                                                                                                         chr6
                                                                                                                                                                                                                                 chr5                                                                                                                                                                                chr5_random
                                                                                                                                                                                                                                 chr4_random                                                                                                                                                                         chr5
                                                                                                                                                                                                                                 chr4A_random                                                                                                                                                                        chr4_random
                                                                                                                                                                                                                                 chr4A                                                                                                                                                                               chr4A_random
                                                                                                                                                                                                                                 chr4                                                                                                                                                                                chr4A
                                                                                                                                                                                                                                 chr3_random                                                                                                                                                                         chr4
                                                                                                                                                                                                                                 chr3                                                                                                                                                                                chr3_random
                                                                                                                                                                                                                                 chr2_random                                                                                                                                                                         chr3
                                                                                                                                                                                                                                 chr2                                                                                                                                                                                chr2_random
                                                                                                                                                                                                                                 chr1B_random                                                                                                                                                                        chr2
                                                                                                                                                                                                                                 chr1B                                                                                                                                                                               chr1B_random
                                                                                                                                                                                                                                 chr1A_random                                                                                                                                                                        chr1B
                                                                                                                                                                                                                                 chr1A                                                                                                                                                                               chr1A_random
                                                                                                                                                                                                                                 chr1_random                                                                                                                                                                         chr1A
                                                                                                                                                                                                                                 chr1
                                                                                                                                                                                                                                                                                                                                                                                                                     chr1_random




                                                                                                                                                                                                                             0
                                                                                                                                                                                 80000000

                                                                                                                                                                                            60000000

                                                                                                                                                                                                       40000000

                                                                                                                                                                                                                  20000000
                                                                                                                           160000000

                                                                                                                                         140000000

                                                                                                                                                       120000000

                                                                                                                                                                     100000000
                                                                                                                                                                                                                                                                                                                                                                                                                     chr1




                                                                                                                                                                                                                                                                                                                            90.00%
                                                                                                                                                                                                                                                                                                                                     80.00%
                                                                                                                                                                                                                                                                                                                                              70.00%
                                                                                                                                                                                                                                                                                                                                                       60.00%
                                                                                                                                                                                                                                                                                                                                                                50.00%
                                                                                                                                                                                                                                                                                                                                                                         40.00%
                                                                                                                                                                                                                                                                                                                                                                                  30.00%
                                                                                                                                                                                                                                                                                                                                                                                           20.00%
                                                                                                                                                                                                                                                                                                                                                                                                    10.00%
                                                                                                                                                                                                                                                                                                                                                                                                             0.00%
                                                                                                                                                                                                                                                                                                                  100.00%
                                                                                                                                                          chromosome                                                                                                                                                                           percent (%)
Global image shows the inter and intra pairs of 10 kb and above 90% in identity without or with chrUn.
     The red indicates the inter chromosomal pairs and blue indicates intra chromosomal pairs.




    Without chrUn
                                                                    With chrUn
              WGAC page
• http://eichlerlab.gs.washington.edu/help/lin
  chen/zfinch/zfinch_wgac.html
                WSSD analysis done by Ginger
            http://eichlerlab.gs.washington.edu/help/ginger/zebrafinch/

•   Downloaded the WGS reads; about 11,683,735 reads from trace archive at
    NCBI.

•   Downloaded zfinch-finished BACs. These BACs are used to determine the
    threshold for WGS depth coverage. For 5-kb window, the average number of
    reads is 59. The threshold for 5-kb window is 110, for 1-kb it’s 22.

•   Used UCSC taeGut1 database rmsk tables as input to mask the genome for
    repeats with divergence <=10%.

    (UCSC rmsk options: RepeatMasker -align -s -species 'Taeniopygia guttata')
                               WSSD results

• A total of 16,076 regions with 44,218,871 bp were found in
  wssdGE10K_nogap.tab (which has a 10-k cut-off). 13,782 of them
  are on chrUn.

• A summary table of WGAC intersect with WSSD is at
  http://eichlerlab.gs.washington.edu/help/linchen/zfinch/data/wgacCMPwssd.out.xls
General view showing WGAC (>5kb) and WSSD on all chromosomes




                                            Grey above lines are WSSD
                                            Brow below lines are WGAC
            Union of WSSD and WGAC
           gene intersect with Seg Dups

• A nonredundant union of WGAC and WSSD is generated with cut-
  off size at 10 kb (AllDup10kb.tab). There are 3,839 NR regions with
  50,902,487 bp, which is about 10 mb more than WSSD alone.

• However, be aware there may be false positive sites, especially on
  chrUn, since we know there are high false positive WGACs on
  chromosomes and chrUn.
                               Summary table 1


                                                              No. nr
                     total           chrN         chrUn     interval   file


wssd (bp)       44,218,871      11,237,985     35,080,886       729    wssdGE10K_nogap.tab


wgac (bp)      384,501,909     232,493,308    152,008,601      7387    oo.weild10kb.join.all.cull


AllDup (bp)    394,988,746     235,022,961    159,965,785      5934    allDUP

Wssd and
Wgac shared      8,195,577       3,182,128      5,013,449


Genome (bp)   1,233,186,341   1,057,961,026   175,225,315
            Large SDs >=10 kb


• SD >=10 kb in size were pulled out. There are a total of
  3,839 intervals with length 50,902,487 bp in the allDup.tab.
The study of the chromosome only
              WGAC
• The Segment duplications on sequences assigned to
  chromosome should be more reliable sequences with
  less artifact.

• It should contains sequences reflecting best of the
  assembly.
• Total Dup length 105,145,288 bp
• Intra Dup length 100,234,309 bp
• Inter Dup length   8,499,428 bp

• More Dup is intra chromosome dup >90%
• These intra chromosome dup are predominantly short
  range intra dup, see the global view on next slide
 Global view of 90%-5k and 94%-5k respectively, showing significant
amount of WGAC pairs are intra chromosome short range duplications.
The blowup view showing WGAC on chromosome 1 at 5k
and 94%. This is WGAC detected on sequences assigned
                  to chromosome only
Intra chromosome   Detail of a sample region on chr1
Homology pairs




                                                         Grey
                                                         Depth of coverage
                                                         by reads




WSSD                                                    Assembly Gaps




                                                       The average
                                                       identity for the for
                                                       the reads mapped
                                                       to the region.
                                                       Red >99%
                                                       Orange >98%
                                                       Yellow > 97%
                                                       Green > 96%
                     Text description for slide 20

•   Each black line represent the chromosome regions as indicated by ticks.
•   Blue bars and pairs are the intra chromosome homologous pairs (segment
    duplications) found.
•   Red bar and pair on chromosome line represent the inter chromosome
    homologous pairs (inter chromosome Segment Duplications).
•   The grey bars under the chromosome line represent the depth of coverage
    at the regions by WGS reads in 1kb window. The longer the bar is , the
    higher the depth of coverage by sequence reads.
•   The color bar under the chromosome line represent the average identity for
    all the reads mapped to the region. Red(>99%), Orange(>98%),
    yellow(>97%), green (>96%).
•   The black bar above the chromosome line represent WSSD detected.
•   The purple vertical line on chromosome line represent the assembly gaps.
•   Each tick represent the 10000bp; each line is 100kb.
                        result
• Most of the intra chromosomal pairs are very close to
  each other. In most cases, one sequence within the pair
  has gaps on both ends, which suggest the contig is not
  physically connected to its adjacent sequences. It was
  placed at current position by the mate pairs.
• Some of them are also next to each other, separated by
  a gap.
• We have not see in sampled region that a single contig
  contains both sequences within the pairs of intra
  chromosome segment duplications.
• Consider observation mentioned above, we think there is
  a high possibility that they could be assembly artifacts
  introduced by assembler.

								
To top