Comparative genomics analysis of NtcA regulons in cyanobacteria: Regulation of nitrogen assimilation and its coupling to photosynthesis
Zhengchang Su, Vitctor Olman, Fenglou Mao and Ying Xu
Wen-Ting Huang Jau-Chi Huang
Outline
• • • • Introduction Method Result Conclusion
Introduction
• DNA transcription
3’ 5’ 5’ 3’
coding region
intron
coding region
• mRNA
3’ 5’ 5’
5’ 3’
Regulation elements
RNA polymerase
3’
Regulatory region RNA polymerase binding site
upstream
downstream
transcription direction
• cis-regulation
– Regulatory regions of genes and the regulated genes are on the same chromosome
• Phylogenetic footprinting
– Identifies regulatory elements by finding regions in a set of orthologous non-coding DNA sequences from multiple species.
• Cyanobacteria
– bacteria – live in the water – Gram-Negative, oxygenic phototrophs – Nitrogen control in cyanobacteria is mediated by NtcA
http://www.ucmp.berkeley.edu/bacteria/cyanointro.html
• NtcA
– A protein which regulates the assimilation of nitrogen.
• NtcA binding site
– Base Motif “GTAN8TAC” – ~14 bps – Intron – Nitrogen fixation related genes – -31 downstream has -10 σ70like box “TAN3T”
High false positive rate
• Too short to identify • 3 methods:
– Coding region – -10 like box – Othologous genes
Materials
• Nine sequenced cyanobacteria genomes were downloaded from the GenBank. • ftp.ncbi.nih.gov/genomes/Bacter ia/
Method
• Step 1:
– Prepare training sets – Get the profiles(GTAN8TAC, TAN3T)
• Step 2:
– Scan genomic sequences and score each motif.
• Step 3:
– Decide the cutoff.
Known
• Possible NtcA binding sites
(GTAN8TAC)
– Appear in the upstream intergenic regions – In many cases, there is a –10 like box (TAN3T) in the 31bp downstream regions of the NtcA binding site
upstream transcription unit 31bp transcription unit
Prepare training sets
• They chose 11 genes which are known to be regulated by NtcA from the nine cyanobacterial genomes. • They used phylogenetic footprinting and identified 51 putative NtcA binding sites. • These 51 sites constitute the training set A1 for the NtcA binding site. • The –31 bp downstream regions are further searched for a –10 like box and form the training set B1
A1 & B1
A2&B2
• They collected 12 experimentally verified NtcA binding sites and their downstream from seven other cyanobacteria. • They also included the sites that failed to find by phylogenetic footprinting.
Profiles
• They combined A1 and A2 to construct the profile of NctA binding sites.
Profiles
• They combined B1 and B2 to construct the profile of –10 like boxes.
Scan genomic sequences
upstream transcription unit transcription unit
GTAAAGTTAAGTTCCTTCAAAGCATTCGTGG TTAAAGTTAAGTTCTTTTAAAGCTTTCGTGG
p[i, h(i )] S M 1 (t ) max I i ln ht q[h(i )] i 1
l
Scan genomic sequences
upstream transcription unit transcription unit
GTAAAGTTAAGTTCCTTCAAAGCATTCGTGG TTAAAGTTAAGTTCTTTTAAAGCTTTCGTGG
p[i, h(i)] S M 2 (t ) max I i ln ht q[h(i)] i 1
l
The scoring functions
S M 1....Mz (t ) S Mj
j 1
z
Orthologous genes
• The presence of similar motifs in the regulatory regions of the orthologous genes can increase the prediction accuracy. • They predicted two genes in two genomes to be orthologous to each other if they are a pair of reciprocal best hit in BLASTP searches.
Orthologous genes
upstream transcription unit
Cutoff
• The largest score for the genome to include all the binding sites from that genome in the training sets. • P-value
– p[S(CU)>sc]<0.01 or 0.05
Analysis
Analysis
“GTA________TAC”
“TA___T”
Niche of NtcA in cyanobacteria … ?
• Some genes bear NtcA promoters might coordinate photosynthesis and nitrogen fixation. • RNA polymerase σ-factor in cyanobacteria might bear an NtcA promoter and regulated by NtcA.
Conclusion
• The false positive rate is reduced from 8.2 to 90.9 fold. • Some binding sites might be missed due to the lack of orthologues in the other genomes. • NtcA promoters are found for many genes involved in the various stages of photosynthesis process.
Thank You