The Molecular Integration Database MID for democratizing HIV data

Reviews
The Molecular Integration Database (MID) for democratizing HIV data Heikki Lehväslaiho 29 May 2007, Nairobi Bioinformatics for Africa Classical pathogen biology  clinical aspects of the host physiology biochemistry al bi m ed ic m at oi nf or   immunology genetics   ic s Pathogen biology today  genomics − especially when applied to small genomes!  biochemistry immunology clinical aspects of the host physiology    Genomics paradigm  genome variation is central − both host and pathogen  time series of samples tools are freely available and timely distributed data are freely available and timely distributed    database engine query interface visualisation fr e  el y  av ai la b le What is needed? to ol s Prerequisites for tools in MID    Open source Under active development Web interface − build if needed  Derived work has to be distributable  database engine query interface visualisation fr e  el y  av ai la b ca p ac it y b u ild in g  (fast) internet connection knowledge on bioinformatics translational skills   le What is needed? to ol s Multidimensional data  researches not in the same location − communication is difficult  researches are specialist in different fields − "translation of knowledge" is needed How does this work?  Consult researchers Create a central service with tools Consult researchers Create better tools Consult researchers Distribute new tools Consult researches       Purpose of the MID  Integrate HIV biomedical information created by several laboratories into one query enabled interface − − A big step up from spreadsheets Central location for all exchange information backups  versioning  audit trail  Not a LIMS!  Elucidate HIV pathogenesis and immune escape that influence the set point in heterosexually acquired HIV subtype C infection Cohort of female sex workers in KwaZulu-Natal, participants in a Phase II/IIb microbicide trial in Durban, cohorts in Vulindlela, KZN  CAPRISA data collection Clinical Report Forms Clincal Review Patients DataFax Samples National Institute for Communicable Diseases (NICD) Immunological Assay Results MID SANBI CAPRISA Lab University of Cape Town HIV Seq Pipeline The databases  production database − − − MySQL, PostgressQL modified by the maintainer only (security) generic tables to minimize changes derived from the production db read-only by the authorised users BioMart: www.biomart.org  query database − − − BioMart architecture MartExplorer MartShell MartView JAVA Perl Retrieval BioMart API Databases Public data (local or remote) MartBuilder MartEditor Vega SNP myDatabase Schema transformation myMart Configuration XML MSD UniProt Ensembl Choose the data set Choose the output Select the filters View the output Unpublished data Other parts of MID  Summary − patient data progress monitoring  glossary data entry programs −  sequence pipeline  for sequence assembly and quality control  visualisation bug reporting  CAPRISA data summary unpublished data Sequence pipeline  One program is now split into three: Quality and Analysis Pipeline ppp: contamination & quality analysis REGA subtype tool Annotated sequence Bushman − Bushman: assembly manager general purpose tool  installable locally to save bandwidth  − − ppp: HIV quality control pipeline REGA: HIV subtyping tool Bushman, beta version BushMan report Public Perl Pipeline for HIV     Reads in the consensus sequence Outputs annotation in XML fast and easily configurable Check for 1.Contaminants 2.Location and orientation in the reference HXBX sequence 3.Protein sequence changes Visualisation  Gbrowse genome viewer next: seamless linking between BioMart and GBrowse General purpose plotting tool −  Low resolution of view of NEF peptide mismatches over time in two patients Challenges  low bandwidth Complexity of data and user interface − − −  GUI design sparsely populated huge data matrix subtle errors -> data cleaning Contributors SANBI • • • • • • • • • CAPRISA Allan Kamau (database programming) • Salim Abdool Karim Ruby van Rooyen (interface programming) (CAPRISA) Adam Dawe (content testing, user) • Koleka Mlisana (CAPRISA) Kavisha Ramdayal (testing, glossary) • Lynn Morris (NICD) Alan Powell (interface programming) • Clive Gray (NICD) Anelda Boardman (laboratory coordination) • Carolyn Williamson (UCT) Tulio de Oliviera (HIV evolution, testing) Heikki Lehväslaiho (project manager, programming) Win Hide (project co-ordinator)

Related docs
Other docs by Wu tang clan
Transcript of Marbury v Madison 1803
Views: 160  |  Downloads: 2
A Beginner's Guide to BitTorrent- James Ritchie
Views: 175  |  Downloads: 0
Voting Rights Act 1965 info
Views: 265  |  Downloads: 1
Business
Views: 636  |  Downloads: 14
Transcript of National Industrial Recovery Act
Views: 186  |  Downloads: 1
against_liberalism4
Views: 100  |  Downloads: 1
WAIVER OF NOTICE OF MEETING
Views: 189  |  Downloads: 5
Transcript of Truman Doctrine
Views: 204  |  Downloads: 1
Security_Deposit_Refund
Views: 497  |  Downloads: 18
Notice of Directors Meeting
Views: 145  |  Downloads: 3