Mycobacterium tuberculosis, the causative agent of tuberculosis

Document Sample
Mycobacterium tuberculosis, the causative agent of tuberculosis Powered By Docstoc
					Mycobacterium tuberculosis, the causative agent of tuberculosis, continuously exists as
the leading infectious disease agent, causing millions of deaths each year. In addition, the
emergence of extremely drug resistant tuberculosis strains (XDR TB) indicates the
rebellious survival strategies adopted by the organism to continue its unfettered
pathogenic lead by mutating the drug targets. The ability of the organism to evolve
resistance to drugs with enhanced pathogenecity appears, at least in part, to be provided
by the mechanism of gene duplication. This evolutionary mechanism generates additional
DNA copies to add to the already existing genetic material, thereby providing the
organism with an extra copy of the gene and creating an opportunity to exploit one of the
copies for neofunctionalization. This project aims to identify the expanded gene families
in Mycobacterium tuberculosis and investigate the potential contribution of gene
duplication events to pathogenicity.

The availability of the complete genome sequence of Mycobacterium tuberculosis, strain
H37Rv, along with other microbial genomes provided us with an opportunity to compare
and find major differences in the expansion of gene families across different organisms.
For identification of gene duplicates in tuberculosis complex organisms, protein signature
and sequence data from 77 selected organisms were retrieved from the InterPro and
UniProtKB databases. Perl scripts were written for clustering the proteins with the same
protein signature data (common InterPro matches and thus common protein functions or
domains) into duplicate gene sets. The duplicate gene clusters were those that precisely
exhibit complete domain identity over their entire length. The proteins lacking identity
even in a single domain were excluded from duplicate gene clusters.

In addition to the InterPro data, complete protein sequences from each individual
organism were clustered into related sets by running BlastClust at different percentage
identities over varying lengths of the sequences. The proteins common to both InterPro
duplicate gene clusters and sequence based duplicate gene clusters were treated as
potential duplicate genes for each organism and considered for generation of a final gene
duplication matrix. The generated matrix clearly displays the degree of protein family
expansions across the different pathogenic and non pathogenic groups.

A preliminary analysis of the results brought into light many important duplicate gene
sets which have expanded only in the tuberculosis complex group and may be important
candidates involved in pathogenicity. Further, whole genome sequence clustering of
protein sequences across all the selected organisms is being performed to cross verify the
results and include the missing duplicate genes from the signature clustering methods
(not all proteins in a genome have InterPro matches and thus can be clustered using
signatures). The selected pathogenic duplicate gene sets which have expanded in
tuberculosis complex organisms and not in any other organism potentially contain
functions which could have expanded to impart a subfunctionalisation or
neofunctionalization benefit to the organism. Therefore, such functions may be playing
an important role in the organism’s pathogenicity and will be considered for further
evolutionary studies.