Docstoc

Divergence between closely related EST Sequences in Lettuce and

Document Sample
Divergence between closely related EST Sequences in Lettuce and Powered By Docstoc
					Thesis capstone report

DNA sequence evolution in Sunflower and Lettuce
Yi Zou
Advisor: Dr. Loren Rieseberg Dr. Sum Kim Major: Bioinformatics 07/16/2004

Background
• Sunflower and lettuce represent two major subfamilies of the Compositae Family, which is one of the largest and most diverse families of flowering plants • Sunflower is an important oil seed crop and domestication and breeding have focused on seed traits. • Lettuce is an important leaf crop and domestication and breeding have focused on leaf traits. • Extensive lettuce and sunflower EST database available (CGPDB)

Background
• Examination of DNA differences between closely related species of Compositae will provide insight into the nature of mutational rates and processes in this family • Hypothesis: genes associated with primary domestication traits (seeds in sunflower and leaves in lettuce) will evolve faster than genes expressed in other tissues. • Hypothesis: upstream enzymes in metabolic pathways will evolve less rapidly than downstream enzymes.

Goals
• Compare distribution of indels and base substitutions among closely related lettuce and sunflower EST sequences Compare rates of EST sequence divergence for genes from different tissue types Compare rates of EST sequence divergence from different pathway, and protein evolution among specific genes along major metabolic pathways

•

•

Data

http://cgpdb.ucdavis.edu

CGPDB
• contains about 112,000 individual ESTs sequenced from both sunflower and lettuce • Sunflower: about 44,000 individual ESTs, previously assembled into 4430 unique contigs, were sequenced from two Helianthus annuus cultivars: RHA801(exotic) and RHA280(oil). • Lettuce: around 68,000 ESTs, previously assembled into 8179 unique contigs(genes), sequenced from two species: Lactuca serriola (wild) and L. sativa (cultivated)

Goals
• Compare distribution of indels and base substitutions among closely related lettuce and sunflower EST sequences Compare rates of EST sequence divergence for genes from different tissue types and metabolic pathways Compare rates of EST sequence divergence from different pathway, and protein evolution among specific genes along major metabolic pathways

•

•

Data Analysis – Example from sunflower

Genotype 1 Genotype 2

Data Analysis
–comparison of complete EST sequence

Data Analysis
– comparison of coding region only

sun_vs_ath_TIGR_unique lettuce_vs_ath_TIGR_unique

Result - Sequences information for assembling Contigs and conseneus
Sunflower EST comparison information
5000 4500
Contig numbers
4430 4430 4430 4430

Total contig

Lettuce EST comparison information
9000 8000 7000 6000 5000 4000 3000 2000 1000 0
8179 8179 8179 8179

Total contigs

Contig numbers

4000 3500 3000 2500 2000 1500 1000 500 0
2698

contigs with EST from two genotypes contigs with coding region contigs without indels

contigs w ith EST from tw o genotypes contigs w ith coding region contigs w ithout indels

5296 4193

1838 1227 826 583 740 503 528 79 40 186 357 44 24

1517
479

2290 1737 1342 1041 478
157

896

213

663 121
57

77

S1

S2

S3

S4

contigs without substitutions

235 90

L1

L2

L3

L4

contigs w ithout substitutions

EST No. from both genotypes

EST No. from both geotypes

Result

– Comparison of complete EST sequences between two sunflower and two lettuce genotypes

Species
Comparison Name Substitutions / Total indel base length Substitutions / Total indels S1

Sunflower
S2 S3 S4 L1

Lettuce
L2 L3 L4

2.97
6.91 48.56 % 51.44 % 0.94

2.79
5.88 47.42 % 52.58 % 0.9

2.64
5.35 47.21 % 52.79 % 0.89

2.58
5.21 47.07 % 52.93 % 0.89

2.91
7.15 48.50 % 51.50 % 0.94

2.53
5.93 47.98 % 52.02 % 0.92

2.37
5.44 47.52 % 52.48 % 0.91

2.25
5.05 47.14 % 52.86 % 0.89

transitions/substitutions transversions/substituti ons Transitions / Transversions

Result
Species
S1 IndelNoRate SubRate SubRate/IndelNoRate p-Value(Subrate vs IndelNoRate) SubRate/IndelLengthRate p-Value(Subrate vs IndelLengthRate) TrsRate 1.66% 9.97% 6.01 < 2.2e16 2.83 < 2.2e16 4.84%

– Comparison of coding region only
Sunflower
S2 1.68% 7.69% 4.59 < 2.2e16 2.34 <0.0001 3.62% S3 1.86% 7.12% 3.82 < 2.2e16 2.02 <0.0001 3.32% S4 1.92% 7.41% 3.85 < 2.2e16 2.09 <0.0001 3.39% L1 1.11% 8.87% 8.00 < 2.2e16 3.74 < 2.2e16 4.32%

Lettuce
L2 0.88% 5.75% 6.52 < 2.2e16 3.49 < 2.2e16 2.77% L3 0.88% 5.05% 5.73 < 2.2e16 3.26 < 2.2e16 2.40% L4 0.91% 4.48% 4.92 < 2.2e16 3.01 < 2.2e16 2.09%

TrvRate
Trs vs Trv
p-Value(TrsRate vs TrvRate)

5.13%
0.94
0.2543

4.07%
0.89
0.1156

3.80%
0.87
0.1391

4.02%
0.84
0.09167

4.55%
0.95
0.1843

2.98%
0.93
0.2948

2.65%
0.91
0.263

2.39%
0.87
0.1784

Conclusion1
– Substitutions are 3-6 times more frequent than indels in both sunflower and lettuce, regardless of whether coding regions or complete EST sequences are analyzed.

Goals
• Compare distribution of indels and base substitutions among closely related lettuce and sunflower EST sequences Compare rates of EST sequence divergence for genes from different tissue types and metabolic pathways Compare rates of EST sequence divergence from different pathway, and protein evolution among specific genes along major metabolic pathways

•

•

Data Analysis – EST divergence for different tissue types
Lettuce:
• • • • • • • • • • • • • • • TAG0 - callus - "cls" TAG1 - roots - "rot" TAG2 - none (leaf) - "not" TAG3 - flowers pre-fert - "flr" TAG4 - flowers post-fert - "flo" TAG5 - chemical induction - "chi" TAG6 - none - "nos" TAG7 - roots env stress - "rts" TAG8 - shoots env stress - "shs" TAG9 - germinating seeds - "gsd" TAG10 - flowers env stress - "fls" TAG11 - leaves dark grow - "lvd Tag_1_7: all root related contigs Tag_3_4_10: All flower related contigs Tag_7_8_10: All contigs related to environment stress

Sunflower:
• • • • • • • • • • • • • • • TAG0 - callus - "cls" TAG1 - roots - "rot" TAG2 - disk & ray flowers - "drf" TAG3 - flowers pre-fert - "flr" TAG4 - developing kernel - "dkn" TAG5 - chemical induction - "chi" TAG6 - none - "nos" TAG7 - roots env stress - "rts" TAG8 - shoots env stress - "shs" TAG9 - germinating seeds - "gsd" TAG10 - flowers env stress - "fls" TAG11 - hulls - "hls Tag_1_7: all root related contigs Tag_3_10: All contigs related to flower Tag_7_8_10: All contigs related to environment stress

Result

– number of tissue-specific contigs in sunflower

Result
contigs with coding region found in both genotypes

– number of tissue-specific contigs in lettuce

Lettuce TAG-specific contig information
800 700 600 500 738

400
300 200 100 0 69

336

278
140

14

18

0

0

11

30

78

85

60

29

Tissue and Treatment

Result

– Rates of sequence divergence among tissue-specific
contigs in sunflower and lettuce
indel rate Substitution rate

Sunflower tissue specific EST divergence
18.00% 16.00% 14.00%
Content

Lettuce tissue specific EST divergence
12.00%
indel rate Substitution rate
Content

10.00% 8.00% 6.00% 4.00% 2.00%

12.00% 10.00% 8.00% 6.00% 4.00% 2.00% 0.00%
TAG4(kernel) TAG9(gsd) TAG11(hls) TAG8(shs) TAG0(cls) TAG_1_7(root)

0.00%
TAG9(gsd) TAG8(shs) TAG0(cls) TAG11(lvd)

TAG_3_10(flower)

TAG_1_7(root)

Tissue (Tag)

Tissue(TAG)

TAG_3_4_10(floor)

Result

– Comparison of rates of sequence divergence for
genes expressed in seeds versus other

16.00% 14.00% 12.00%
T-test:
SubRateKH vs SubRateNonKH

P-value = 0.0009414

indel rate

Content

10.00% 8.00%

6.00%
4.00% 2.00% 0.00%

Substitution rate

DknHls Seeds

Non-DknHls Other

Result

– Rates of sequence divergence among treatment-specific
contigs in sunflower and lettuce
indel rate

Sunflower treatment specific EST divergence
16.00% 14.00%
Content

Lettuce treatment specific EST divergence
16.00% 14.00%
Content

indel rate Substitution rate

Substitution rate

12.00% 10.00% 8.00% 6.00% 4.00% 2.00% 0.00%
TAG1(rot) TAG7(rts) TAG_1_7(root) TAG5(chi) TAG_7_8_10(stress) TAG3(flr) TAG10(fls) TAG_3_10(flower)

12.00% 10.00% 8.00% 6.00% 4.00% 2.00% 0.00%
TAG1(rot) TAG_1_7(root) TAG5(chi) TAG7(rts) TAG_3_4_10(flower) TAG_7_8_10(stress) TAG4(flo) TAG3(flr) TAG10(fls)

Treatment(TAG)

Treatment(TAG)

Conclusion2
– As predicted, sunflower genes expressed in seeds evolve significantly faster than genes expressed in other tissues. Artificial selection for large seeds and high seed oil content may contribute to these higher rates.

– For lettuce, there are no significant differences in rates of sequence evolution among different tissues
– No differences were found in sunflower or lettuce among biotic and abiotic stress treatments

Goals
• Compare distribution of indels and base substitutions among closely related lettuce and sunflower EST sequences Compare rates of EST sequence divergence for genes from different tissue types and metabolic pathways Compare rates of EST sequence divergence from different pathway, and protein evolution among specific genes along major metabolic pathways

•

•

Data Analysis
– EST divergence among metabolic pathways

• To identify contigs for specific pathways, the metabolic pathway information from TAIR (The Arabidopsis Information Resource: http://www.arabidopsis.org/) database was utilized. • Each contig in the CGPDB was assigned to an Arabidopsis gene locus (or remained unassigned) based on the BLAST results. • Genes (contigs) for different metabolic pathways were clustered and protein divergence was estimated.

Data Analysis
– protein evolution along major metabolic pathways

• Metabolic pathways:
– Lipid metabolic pathways – Phenylpropanoid biosynthetic pathways – Cellulose, lignin, sucrose …etc. metabolic pathways

• The nonsynonymous substitution rate (Ka) was calculated for enzymes in different positions along pathways
• Software DnaSP 4.0 was utilized for this calculation

Result – comparison of metabolic pathway genes between
CGPDB and TAIR

• Based on the blast results, the contigs in CGPDB were compared with genes in TAIR and assigned to appropriate pathways • Currently there are 186 pathways with more than 800 unique reactions in the TAIR database. For these reactions, 1144 unique locus_iDs were assigned to the enzymes involved. • Among the TAIR loci, 72.1% match Contigs in the sunflower database and 83.15% match Contigs in the lettuce database.

Result
Pathway pathway related contigs contigs with coding region found Indel No. rate Indel length rate lipid

– Rates of sequence evolution for sunflower
metabolic pathway-specific contigs
lipidDeg 16 6 2.15% 2.68% Phenylpropanoid 24 10 1.76% 4.34%

lipid-Bio 59 28 1.46% 3.38%

lipid-Ls 30 8 3.15% 6.70%

Cellu-lose 7 4 1.27% 1.55%

lignin 12 6 1.32% 3.34%

starch 9 2 1.70% 2.47% 4.96 % 3.12% 0.63 1.84% 0.37 2.92 2.01 1.70

Suc-rose 11 4 0.93% 1.32% 3.92 % 2.10% 0.54 1.82% 0.46 4.20 2.97 1.15

134 51 1.68% 3.40%

Substitution rate
Transition rate transitions/substitutions transversion rate transversions/substitutions Substitutions rate / Indel rate Substitutions rate/ Indel Length rate Transitions / Transversions

9.81%
4.98% 0.51 4.83% 0.49 5.85 2.89 1.03

10.69 %
5.49% 0.51 5.19% 0.49 7.31 3.16 1.06

8.44 %
3.72% 0.44 4.71% 0.56 3.92 3.15 0.79

15.71 %
8.13% 0.52 7.59% 0.48 4.99 2.35 1.07

15.80 %
7.64% 0.48 8.16% 0.52 8.97 3.64 0.94

6.03%
3.22% 0.53 2.80% 0.47 4.74 3.89 1.15

11.64%
4.95% 0.43 6.69% 0.57 8.82 3.49 0.74

Result
Pathway of pathway related Contigs contigs with coding region found indel No. rate indel length rate lipid

– Rates of sequence evolution for lettuce
metabolic pathway-specific contigs
lipidBio 73 50 0.77% 1.47% lipidDeg 18 14 1.43% 2.96% lipidLs 39 24 1.27% 3.08% Phenylpropanoid 29 14 1.33% 2.59% cellu=lose 5 1 5.33% 14.05% 42.93% ? 21.71% 0.51 21.22% 0.49 8.05 3.06 1.02

lignin 14 10 1.31% 2.68%

starch 14 8 1.32% 1.89%

sucrose 15 11 1.11% 2.41% 7.63 % 3.29% 0.43 4.34% 0.57 6.87 3.17 0.76

177 123 1.14% 2.33%

Substitution rate
Transition rate transitions/substitutions transversion rate transversions/substitutions Substitutions / Total indel No. Substitutions / Total indel Length Transitions / Transversions

7.46%
3.26% 0.44 4.20% 0.56 6.57 3.20 0.78

5.38 %
2.57% 0.48 2.82% 0.52 6.96 3.65 0.91

11.18 %
5.56% 0.50 5.62% 0.50 7.81 3.78 0.99

7.11 %
3.61% 0.51 3.51% 0.49 5.61 2.31 1.03

11.64 %
5.32% 0.46 6.32% 0.54 8.75 4.50 0.84

11.88%
5.92% 0.50 5.96% 0.50 9.06 4.44 0.99

11.11%
5.29% 0.48 5.82% 0.52 8.39 5.86 0.91

Result
1 0.8

– Nonsynonymous substitution rate (Ka) for genes
along four metabolic pathways in sunflower and lettuce

Lignin Pathway Ka
1 0.8
Ka value
LetKa SunKa

Phenylpropanoid pathway Ka

Ka Value

0.6 0.4 0.2 0 0 2 4 6 8 Relative enzyme position

0.6 0.4 0.2 0 0 2 4 6 8 Relative enzyme position

LetKa SunKa

Sucrose Bio Pathway Ka
0.9 0.8 0.7

Glycerolipid pathway Ka 1 0.8
Ka value

Ka value

0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 6 Relative enzyme position

0.6 0.4 0.2 0 0 5 10 Relative enzyme position 15

LetKa

LetKa SunKa

Conclusion3
• Rates of sequence divergence did not differ among metabolic pathways
• Rates of protein evolution (Ka) did not vary along metabolic pathways (i.e., upstream genes evolved at the same rate as downstream genes)

Summary
– Substitutions are much more frequent than indels in both sunflower and lettuce – Sunflower genes expressed in seeds evolve significantly faster than genes expressed in other tissues – There are no significant differences in rates of sequence evolution among different tissues in lettuce – Rates of sequence divergence did not vary significantly either among or along metabolic pathways in either sunflower or lettuce

Acknowledge
• Thanks
– Dr. Loren Rieseberg – Dr. Sun Kim – Dr. Sheri Church – Dr. Zhao Lai


				
DOCUMENT INFO