Principal Investigator/Program Director: Abdool Karim, Salim S. Core C: Viral diversity and bioinformatics Name of Core Leader: Carolyn Williamson Human Subjects: Yes IRB Approval Date: Pending Assurance compliance number: Pending Vertebrate Animals: No Proposed Period of Support From: 1 January 2002 To: 31 December 2006 Costs Requested for Initial Budget Period Direct Costs: $ Total Costs: $ Costs Requested for the Entire Budget Period Direct Costs: $ Total Costs: $ Applicant Organization University of Natal King George V Avenue Durban 4041 South Africa PHS 398 (Rev. 4/98) Page 1 DESCRIPTION. State the application's broad, long-term objectives and specific aims, making reference to the health relatedness of the project. Describe concisely the research design and methods for achieving these goals. Avoid summaries of past accomplishments and the use of the first person. This description is meant to serve as a succinct and accurate description of the proposed work when separated from the application. If the application is funded, this description, as is, will become public information. Therefore, do not include proprietary confidential information. DO NOT EXCEED THE SPACE PROVIDED. The Viral Diversity and Bioinformatics Core will support the objectives of all four CAPRISA projects by undertaking PCR analysis on partial and full-length genomes, the heteroduplex mobility and tracking assays for defining genetic variability, sequence analysis, drug resistance genotyping/phenotyping and establishing integrated databases with sequence, clinical and immunological information. The functions of the core have been distributed between three centers to avoid duplication and optimally utilise exisiting strengths and expertise. The three centers involved complement each other by providing distinct services and unique expertise: The sequencing, quality control of sequences, heteroduplex mobility assays and heteroduplex tracking assays will be conducted at the University of Cape Town by the core leader, Carolyn Williamson. Resistance genotyping, OLA and SNP will be performed at the University of Natal by core co-leader, Sharon Cassol. The phylogenetic analysis, computational training and support, as well as the bioinformatics database will be located at the South African National Bioinformatics Institute, University of Western Cape by core co-leader, Winston Hide. The bioinformatics component of this core will have a computational biology section which will be responsible for the development of new tools needed for the analysis of evolution of viral populations. The core will be linked via a password protected website that provides information from a back-end database on project progress at each site, such as sample tracking and analytical tools and the progress of pipelined analyses. The web interface will serve as a medium for communication between sites and also as a directed information portal to support the activities at each site. The CAPRISA investigators involved in this core will also participate in the training program as outlined in the administration core. In this regard, the core will work in collaboration with three US laboratories based at the University of Washington (James Mullins) on advanced computational analysis, the Henry Jackson Foundation (Francine McCutchan) on whole genome sequencing and the University of North Carolina (Ronald Swanstrom) on development of methodologies for tracking sequence changes over time. This core will contribute to all four CAPRISA projects. It will be involved in sequencing and tracking genetic changes in HIV over time in the acute seroconversion project, genetic characterisation of breakthrough infections in the study of highly exposed persistently seronegative individuals, subtyping of circulating HIV-1 strains in the evolving HIV epidemic project, and screening HIV reverse transcriptase for drug resistance mutations in the antiretroviral therapy project. In addtion, the database will integrate sequence and immunological data with clinical data for analysis of the relationship between viral isolates, selective pressures and set-point in the acute seroconversion project. PERFORMANCE SITE(S) (organization, city, state) Faculty of Health Sciences Faculty, University of Cape Town, Cape Town, South Africa Faculty of Science, University of Western Cape, Cape Town, South Africa Nelson R. Mandela School of Medicine, University of Natal, Durban, South Africa __________________________________________________________________________________________________________________________ KEY PERSONNEL. See instructions on Page 11. Use continuation pages as needed to provide the required information in the format shown below. Name Organization Role on Project Carolyn Williamson, PhD University of Cape Town, SA Core leader Winston Hide, PhD University of the Western Cape, SA Core co-leader Sharon Cassol, PhD University of Natal, SA Core co-leader Joanne van Harmelen, PhD University of Cape Town, SA Co-investigator Cathal Seoighe, PhD University of Western Cape, SA Co-investigator James Mullins, PhD Washington University, Seattle, USA Co-investigator Francine McCutchan, PhD Henry Jackson Foundation, USA Co-investigator Ronald Swanstrom, PhD University of North Carolina, USA Co-investigator Type the name of the principal investigator/program director at the top of each printed page and each continuation page. (For type specifications, see instructions on page 6) VIRAL DIVERSITY AND BIOINFORMATICS CORE TABLE OF CONTENTS Cover Page ........................................................................................................................................................................................ Description, Performance Sites, and Key Personnel......................................................................................................................... Table of Contents .............................................................................................................................................................................. Detailed budget for initial budget period .......................................................................................................................................... Budget for entire proposed period of support .................................................................................................................................. Research Plan A. Specific aims B. Background and significance C. Core activities D. Core leader and personnel E. Conclusion F. References RESEARCH PLAN A. SPECIFIC AIMS The overall goal of the Viral Diversity and Bioinformatics Core is to perform sequencing, HIV genetic diversity assays and drug resistance assays and provide bioinformatics, computational biology and integrated databases. The specific aims are as follows: Specific aim 1: To perform PCR on sub-genomic fragments Sub-genomic regions will be amplified, cloned and sequenced in order to define epitopes corresponding to those targeted by CTL responses. Specific aim 2: To perform PCR on near full-length genomes For amplification of near full-length genomes, proviral DNA primers will be used to amplify the whole genome except for the 5’ LTR region (9Kb fragment). A subset of well-characterised samples will sequenced in their full length in order to assess viral evolution across the genome. Specific aim 3: To provide perform heteroduplex mobility assays Heteroduplex mobility assays are commonly used for subtyping and can also be used to identify intra-person diversity as complex mixtures of quasispecies are visualized as a defuse smear on a gel. HMA will be used to screen for dual infections, and to define gag and env subtypes. Specific aim 4: To perform heteroduplex tracking assays The heteroduplex tracking assay allows the detection of minor species that represent as little as 3% of the total population, it is accurate in its sampling of the population of mixed viral genotypes, and it is labour-efficient allowing many samples to be analysed. The heteroduplex tracking assay will be used to track HIV genetic diversity over time. Specific aim 5: To provide sequence analysis A production module will include development of a semi-automated pipeline for sequence assembly and annotation in order to provide standardised conditions for sequence generation and to provide a basis for quality control that includes a simple phylogenetic screen. A dedicated phylogenetics module will perform alignments construct phylogenetic trees and detection of recombination. A selection module will provide reconstruction of ancestral nodes, analysis of selection pressures along internal branches, detection of clustering of sites with similar rates of evolution and mapping of mutations to tertiary protein structure. Specific aim 6: To perform resistance genotyping/phenotyping Resistance genotyping of the RT and protease genes will be performed on an ABI 3100 capillary sequencer using the ViroSeq HIV-1 Genotyping v.2 Kit and Software System. When applied to recent seroconvertors and drug naïve patients, sequencing will provide important baseline information on the frequency of naturally-occurring polymorphisms and resistance mutations in South Africa. When applied to patients on therapy, sequencing will provide information on the frequency of development of drug resistance. Sequence information will also facilitate the design of subtype C-specific primers for use in OLA and SNP assays. Specific aim 7: To develop and provide databases for the integration of sequence, clinical and immunological information generated by all the projects and cores The Bioinformatics and Computational Biology component of this core will develop and implement an integrated schema for sharing of information generated by all projects and cores. Information on sequence alignment, evolution, cohort statistics, host genetic background, epitope mapping, clinical data and epidemiological data will be cross-referenced to viral sequences via filesharing with the Epidemiology and Biostatistics Core’s separate demographic, laboratory, epidemiological and clinical databases. B. BACKGROUND AND SIGNIFICANCE The overall goals of this core are to undertake the following for the CAPRISA projects: (i) HIV diversity assays and sequencing, (ii) Monitoring for drug resistance, and (iii) Bioinformatic, computational biology support and integration of data. HIV-1 variability has been widely appreciated since it became possible to compare two sequenced viral genomes. Because HIV-1 replicates constantly in its host, it is constantly subjected to immune selection that drives continuous evolution. In addition, progressive immunodeficiency and the intermittent application of drug selective pressure represent other forces that change the environment in which the virus replicates. The presence of all of these selective pressures is captured in the sequence of the viral genome, and changes in these pressures select for changes in the genomic text. For this reason, understanding the patterns and dynamic nature of HIV-1 sequence diversity represents a critical window into the biology and pathogenesis of HIV-1. Intra-subject variability reveals the outcomes of virus – host interactions. Inter-subject variability contains the history of the entire epidemic and describes the challenges of vaccine development. This core aims to provide assays that will define genetic diversity within individuals over time, and collate this with immunological responses and clinical data through an integrated database. This core will contribute to all four CAPRISA projects. It will be involved in sequencing and tracking genetic changes in HIV over time in the acute seroconversion project, genetic characterisation of breakthrough infections in the study of highly exposed persistently seronegative individuals, subtyping of circulating HIV-1 strains in the evolving HIV epidemic project, and screening HIV reverse transcriptase for drug resistance mutations in the antiretroviral therapy project. In addition, the database will integrate sequence and immunological data with clinical data for analysis of the relationship between viral isolates, selective pressures and set-point in the acute seroconversion project. The University of Cape Town laboratories are equipped to perform basic molecular techniques including heating blocks, microfuges, incubators, shaking incubators, hybridisation ovens, centrifuges (Beckman J2-21, Beckman L7-55), Speedvac concentrator, pH meters, fridges, freezers, and other small equipment. The laboratories have vertical and horizontal gel running equipment and power packs. There is a centralized facility for gel documentation that includes a Kodak UV transilluminator, camera, PC recording software and printer. In addition there are 3 Perkin Elmer GeneAmp PCR machines, one Roche light cycler, one ABI 310 Genetic Analyzer, four –800C freezers. Pentium II or III computers with internet access are readily available in the laboratory. The South African National Bioinformatics Institute, under the directorship of Winston Hide has full infrastructure to support and develop in silico studies of sequence information. It is the national node for the European Molecular Biology Network, a system of networked bioinformatics training and service provision centres. The computational and networking infrastructure includes broadband networking facilities, a terabyte level data warehousing facility and three multi-processor servers comprising 2 (SUN) , 4 (Compaq) and 16 (SGI) processors respectively. Dedicated direct internet access direct to Vienna, Virgina from Cape Town ensures a high bandwidth access environment to the USA for researchers using the facilities. The national facility has an established record of international and local collaboration on large scale projects in bioinformatics, and supports pathogen bioinformatics at both the US National Institutes of Health, National Centre for Biotechnology of Information, Bethesda, and more recently, the Institute for Genome Research, Gaithersburg, Maryland. The South African National Bioinformatics Institute provides genetic information to Africa through web based systems, as internet links to other African countries from South Africa comprise better bandwidth than connections from these to countries overseas. In addition, it has recently conducted a pilot HIV-Africa sequence data project with linked information to the US National Centre for Biotechnology Information. The Institute has an established core of programming expertise in open source genome annotation systems, and contributes actively to the development of the Wellcome Trust Genome annotation project ENSEMBL (www.sanger.ac.uk). The Africa Centre/University of Natal HIV-1 Molecular Laboratories are located in the Nelson Mandela School of Medicine in Durban, South Africa. The molecular diagnostics unit has recently been established and is now operational and, when completed in September 2001, the entire facility will consist of approximately 3,250 sq ft of space, including a Level 3 Biocontainment Unit, as well as separate PCR and sequencing laboratories. The primary activities of the laboratory are: 1) to provide a specially-designed, centrally- located Level 3 laboratory space to investigators at the Africa Centre and the University of Natal - so that infectious agents can be handled in a specified, physically-restricted area; 2) to facilitate sharing of high-cost equipment needed for basic and clinical research in order to broaden the research capabilities of all investigators and avoid redundancy, 3) to provide ongoing training, supervision and monitoring of Level 3 biosafety practices; 4) to foster the development of new African and international investigators who are dedicated to finding a cure for AIDS; and 5) to encourage an African-based multi-disciplinary approach to HIV/AIDS research. The PCR and Sequencing laboratories have been established in a separate, dedicated space to minimize potential carry-over contamination problems. The PCR unit contains all of the equipment needed to support qualitative and quantitative assessment of HIV-1 DNA and RNA (Roche DNA PCR; Roche Monitor RT-PCR, Nuclisens HIV-1 RNA) including the Nuclisens Automated Extractor. Other equipment includes a 9600 thermocycler, a class IIA/3B biosafety cabinet, freezers (-20oC, –70oC), refrigerators, microfuges and low- speed table-top centrifuges with covered containers, ELISA reader and plate washer, hot plates, water baths, computers and printers. The cloning/sequencing unit contains an ABI 3100 capillary sequencer, electrophoretic equipment, a transilluminator, a chemical hood and a range of microfuge and low-speed centrifuges and small equipment. C. CORE ACTIVITIES Organisation and structure of the core The core comprises three components: The viral diversity and sequencing component will be located at the University of Cape Town. Besides undertaking sequencing, it will also monitor viral diversity using the heteroduplex mobility assay and heteroduplex tracking assay. In addition it will track sequence data, ensure quality control of sequences and phylogenetic analysis in conjunction with the bioinformatics component of the core. The drug resistance component, located at the Africa Centre laboratory based Nelson R Mandela School of Medicine, University of Natal in Durban, will undertake genotypic and phenotypic screening for drug resistance. The bioinformatics and computational biology component will be located in Cape Town at the South African National Bioinformatics Institute at the University of the Western Cape. This component will provide computational support and training for both the other components, including quality control, assembly and phylogenetics, as well as developing and providing new computational tools for the analysis of viral populations. This section of the core will maintain the sequence database and interlink information from each of the research projects to enable the association of molecular events and diversity in HIV isolates with host immune responses and clinical data. Information will be made available to the group through a restricted access centralised database via a website. The core organisational chart is provided below. HIV Diversity and Bioinformatics Core Core Committee C. Williamson (Director), W. Hide (Co-Director), S. Cassol (Co-Director), C. Seioghe, J. Van Harmelen DNA Sequencing And Diversity sub-core HIV Resistance sub-core Bioinformatics and Computational Biology Sub-core (C. Williamson and JH van Harmelen) Sample Repository (W. Hide) (coreA) (S. Cassol) Training and DNA Sequencing Research Associate Analytical, Systems and Resistance Applications Student supervision (P.Owira) -Receive samples Molecular testing Development (Scientist, PhD, TBN) Perform Sequencing -Store samples ViroSeq Evolutionary Reactions -Retrieve samples (W.Hide) J. Grobler Maintain schedule -Process Material Genotyping Analysis L. Malaza Maintain machines -Set-up PCR OLA Systems and C. Rademeyer SNP Applications B. Londt Send data output to -Store samples (W.Hide, data manager -Retrieve samples C. Seioghe) developer Prepare printouts -Process material -Set-up PCR . Data Database PCR/HMA/HTA Integration Entry and (H. Bredell) (W Hide) Sequence Supervise PCR Training and Systems Engineer Management Process amplicons Data Base Screen sequences Perform HMA and HTA Development Developer against database Load gels C. Seioghe Multiple sequence Cloning alignments Web site Preliminary sequence development analysis (consultant) Entry of sequences Submission of to SANBI Phylogentic analysis Prioritisation and resource allocation within the core will be undertaken by the Viral Diversity and Bioinformatics Core Committee comprising the core leader (Carolyn Williamson), the core co-leaders (Win Hide and Sharon Cassol) and two local co- investigators (Cathal Seioghe and Joanne van Harmelen). In addition, this committee will monitor progress of the core, and support the coordination of the three components of the core. The Core Committee will also coordinate core activities with other CAPRISA activities and encourage collaboration with other research efforts. Specimen handling and shipment The central specimen repository for CAPRISA (see core B for a detailed description of the CAPRISA repository) is based in Durban at the Nelson R Mandela School of Medicine. All specimens collected during fieldwork will be labelled, transported, processed and stored by the central CAPRISA specimen repository according to standard operating procedures. Specimens will be shipped from the central repository to the Cape Town laboratory in compliance with IATA dangerous goods packing instructions. Fresh blood specimens will be processed in the central repository and shipped on dry ice, as frozen plasma, sera, cells, DNA or culture supernatants for shipment to Cape Town by experienced courier services such as World Couriers, ho specialise in shipping hazardous materials. On receipt in Cape Town, specimens will be further processed, catalogued and stored. DNA or RNA will be isolated using a Nucleisense automated extractor. The Laboratory Data Management System (LDMS) utilised by the central CAPRISA specimen repository will be installed to track the type, volume and precise location of each aliquot (freezer, rack, box, slot). Specific aim 1: To perform PCR on sub-genomic fragments PCR of HIV subtype C gene fragments is well established at the University of Cape Town laboratory, where several genetic diversity studies have been undertaken over a number of years (Williamson et al., 1995; van Harmelen et al., 1997; van Harmelen et al., 1999a; van Harmelen et al., 1999b; Bredell et al., 1998; Mashishi et al., submitted; Williamson et al., submitted). The PCR primers that are currently in use in the laboratory, amplify South African subtype C sequences including primers spanning gag, gp120, tat, nef, rev and a portion of pol (Williamson et al., submitted). The PCR facilities are established in a separate, dedicated space to minimize potential carry-over contamination problems. In the acute seroconvertor project, sub-genomic regions will be amplified, cloned and sequenced. The regions for sequencing will be selected on the basis of correspondence to those targeted by CTL responses. Changes in these regions will be monitored over time to determine the frequency and relevance of CTL escape. Viral populations in the seroconvertors will be amplified in order to follow the evolution of viral populations from HIV acquisition to set-point. In addition to tracking diversity in CTL epitopes, the V3 and V1/2 regions will be amplified for the heteroduplex tracking assay and heteroduplex mobility assay, the V2-V5 and gag for subtyping by heteroduplex mobility assay (described in greater detail in specific aim 3). In the highly exposed persistently seronegative individuals project, regions being targeted by CTL in the study participants will be compared and contrasted with those targeted in HIV+ individuals. The viruses from highly exposed persistently seronegative individuals who become infected after a long period of apparent resistance, so-called "breakthrough" infections, will be sequenced. Sequences will be annotated and attempts will be made to correlate CTL responses prior to infection with viral sequence to determine if there is evidence CTL escape. Specific aim 2: To PCR near full-length genomes For amplification of near full-length genomes, proviral DNA primers will be used to amplify the whole genome except for the 5’ LTR region (9Kb fragment). Proviral DNA from the outer reaction will be diluted 10 -1 to 10-5 in order to amplify a single provirus copy for sequence analysis. One l of each outer reaction will be transferred to the inner reaction and the highest dilution of positive PCR product will be utilised for direct PCR sequencing. Nested PCR product pooled from approximately eight reactions will be purified and approximately 80 sequencing reactions with primers spanning the HIV-1 genome will be used to completely sequence overlapping fragments in both directions. Sequences will be joined together using PHRED/PHRAP assembler. Where necessary, subtype C specific primers will be designed based on the approximately 65 full-length subtype C sequences available (Los Alamos database and 48 from Botswana, Novitsky, pers. comm.). This technique of amplifying the whole genome has been optimised for sequencing both cultured isolates and DNA extracted from PBMC, but has not been applied to plasma because of the limitations of the initial RT step. For problematic DNA samples, it may be possible to amplify the genome in three or more large, overlapping fragments. This will expand both the coverage of the DNA sequence analysis, and hence variation within potential CTL epitopes, as well as enhance project feasibility. A subset of well-characterised samples from the acute seroconvertor project will be sequenced in their full length in order to assess viral evolution across the genome from HIV acquisition to set-point. Ten viruses will be monitored at three time points. Specific aim 3: To provide perform heteroduplex mobility assays HMA is based on the principal that if two related but distinct DNA fragments are mixed, you will get a shift in mobility on polyacrylamide gels. DNA heteroduplexes that contain insertions, deletions, or clustered mutations (one strand relative to the other) form a bend and migrate more slowly in a native polyacrylamide gel. Thus it is possible to display a complex mixture of genotypes as discrete species in a gel. Heteroduplex mobility assays are commonly used for subtyping and can also be used to identify intraperson diversity as complex mixtures of quasispecies are visualized as a defuse smear on a gel. The method has been routinely used in the University of Cape Town laboratory since it was published in 1993 (Delwart et al., 1993). The laboratory has participated in a number of international programmes utilizing HMA including subtyping isolates for the UNAIDS collaborating centre for the Global Network for Isolation and Characterization of HIV; it was a testing site for the gag HMA (L. Heyndrickx, ITM, Antwerp, Belgium), and the gag HMA kit for the NIH AIDS Reagent programme (USA); and this laboratory has been selected by the WHO/African AIDS Vaccine Programme (AAVP) to run a workshop to train African scientists in this technique. In CAPRISA, HMA will be used in the acute seroconvertor project to screen for dual infections, and in the evolving epidemiology project to define gag and env subtypes present. Specific aim 4: To perform heteroduplex tracking assays Direct sequencing of PCR products gives the predominant sequence in the population but reveals little information concerning the number of sequence variants. A second approach is to clone the PCR products and sequence them individually. While this gives information about the precise sequence of different members of the population, it is a very time-consuming and limited approach to understanding the population structure, i.e. the number and relative proportion of genotypic species. An important experimental advance in the study of viral diversity has been the development of a more powerful genotypic sampling strategy - the heteroduplex tracking assay. Delwart, Mullins and colleagues introduced the approach of heteroduplex tracking assays (Delwart et al., 1993, 1994, 1995, 1997), using it both to describe env gene changes over time within a patient and as a tool for a modified form of subtyping. The heteroduplex tracking assay differs from heteroduplex mobility assay in that one strand of the heteroduplex is a labelled DNA probe, and if only a single strand of the probe is labelled then each band in the gel represents a different genotype. Strengths of the heteroduplex tracking assay are that it allows the detection of minor species that represent as little as 3% of the total population, it is very accurate in its sampling of the population of mixed viral genotypes, and it is labour-efficient allowing many samples to be analysed. In CAPRISA, studies involving the heteroduplex tracking assay will be undertaken in close collaboration with Ronald Swanstrom who has extensive experience with this assay (Nelson et al., 1997; Ping et al., 1999, 2000; Nelson et al., 1998, 2000; Resch et al., 2001, Freel et al, 2001). To implement this technique in the South African laboratories, a PhD student, Jandre Grobler, will visit Ronald Swanstrom’s laboratory in the latter half of 2001 to learn this technique. V3 heteroduplex tracking assay: The V3 heteroduplex tracking assay, as performed in Ronald Swanstom’s laboratory, will be established to examine evolution in V3 as a marker for co-receptor switch (Nelson et al., 1997; Ping et al., 1999, 2000; Nelson et al., 2000). For this analysis, the V3 will be amplified in a non-nested RT-PCR strategy to reduce the risk of contamination. The analysis is aided by the fact that each lane displays the internal markers of single-stranded probe and probe self-annealed which aids in lane-to- lane comparison at high resolution in the evaluation of the annealed heteroduplexes. The method is rapid, which enables the analysis of large numbers of samples to develop population data and to reveal those patients whose samples warrant more detailed analysis. Although it is acknowledged that switch to X4 phenotype is rare in subtype C infected individuals, this method is still valuable in identifying evolutionary variants in V3 among viruses. If multiple bands are present and the sample of sufficient interest we will do blunt-end cloning of the same PCR product used in the V3 heteroduplex tracking assay (only a small portion of the PCR product is consumed in the assay). Individual colonies will be screened directly by colony PCR followed by V3 heteroduplex tracking assay to identify clones with sequences that correspond to the bands of interest. Our experience is that V3 sequence variability in subtype C viruses is not related to the evolution of X4 variants. This may be a distinct feature among HIV-1 subtypes (Ping et al., 1999). V1/V2 heteroduplex tracking assay: The V1/V2 variable regions of env probably derive their sequence variability as the result of antibody selection. Differences in sequences between infected subjects and within a subject over time include insertions, deletions, and point mutations. There can be as few as one predominant species present at a given time, or, more typically, five to ten. The V1/V2 heteoduplex tracking assay is a very useful approach for displaying the presence of multiple viral population within a subject and also for monitoring changing selective pressures on the V1/V2 region of the Env protein. This method for tracking variation over time to investigate the relationship between sequence variation and neutralization response will be applied in the acute seroconvertor project. This will contribute to addressing the hypothesis that the presence of viral escape variants that are resistant to antibody neutralization are correlated with set-point. This method has been developed and applied to study subtype B HIV-1 and Swanstrom et al., are now developing an equivalent probe for subtype C viruses. Specific aim 5: To provide sequencing and sequence analysis Sequencing will be performed on PCR or plasmid DNA templates using the ABI Big Dye terminator kits (ABI Biosystems, Foster City, CA) according to manufacturers instructions and analysed on the ABI automatic gel sequencer. Processes for sequence generation and analysis will be implemented in a modular manner. The experience gained by James Mullins at the University of Washington Molecular Diversity CFAR core has been shared with the South African team and this will continue on an ongoing basis. A production module will include development of a semi-automated pipeline for sequence assembly and annotation in order to provide standardised conditions for sequence generation and to provide a basis for quality control that includes a simple phylogenetic screen. A dedicated phylogenetics module will perform alignments construct phylogenetic trees and detection of recombination. A selection module will provide reconstruction of ancestral nodes, analysis of selection pressures along internal branches, detection of clustering of sites with similar rates of evolution and mapping of mutations to tertiary protein structure where this is known. Production module: This will include sequence production, tracking of project progress, quality control, phylogenetic screening, management of sequences and submission to databases. Sequence chromatograms will be checked for errors and compared to local and GenBank sequence databases for potential sample mix-ups or contamination (Korber et al., 1995; Learn et al., 1996). Sequences will be submitted to AUTOFINISH (http://bozeman.mbt.washington.edu/), assembled using PHRED/PHRAP sequence assembly software and others. Finished sequences will be annotated with features such as open reading frames and protein products using a commonly used small genome annotation tool ARTEMIS (www.sanger.ac.uk/software). Sequences will be formatted into GenBank entries and stored in both flatfile and relational formats. Assembled and annotated sequences will be distributed to all four CAPRISA projects via the restricted access web interface. Quality control, tracking of sequences, databasing, and project management tools are currently available from the South African National Bioinformatics Institute and are also used by large scale sequencing projects at University of Washington and the Sanger Centre (http://www.sanger.ac.uk/Software/). Submission of sequences to the international databases and communication with the Los Alamos database team will be managed through the Bioinformatics Core. Submission of isolate sequences will be databased with unique accessions in a relational MySql database under a schema that will be developed to support the activities of the CAPRISA projects. Submission to the international databases will be performed using standard available tools such as NCBI’s SeqIn (www.ncbi.nlm.nih.gov). Samples will be tracked using a Laboratory Data Management System as this is the system that will be used by the CAPRISA central repository for all specimens from all CAPRISA projects. Sequence and sample quality will be tracked and values will be made available to the database via a quality table for each sample underlying each isolate sequence. Outputs from these efforts will be merged with the Epidemiology and Biostatistics core’s databases on demographic, laboratory, epidemiological and clinical data to enable subsequent integration with sequence data. The Epidemiology and Biostatistics core will develop and maintain databases with epidemiological, clinical, laboratory and behavioural data, where available, on each subject in each CAPRISA study, and these will be shared with the Viral Diversity and Bioinformatics core so that sequence information can be integrated with other data. This integration process will be undertaken by the Viral Diversity and Bioinformatics core and the integrated databases will then be available to CAPRISA investigators through restricted web access. Phylogenetics module: Assembled sequences will be aligned with CLUSTALW (Thompson et al., 1994) or other alignment programmes, as necessary, followed by manual adjustment using SeqLab (Wisconsin Genetics Package) and BioEdit (Hall, T.A. 1999. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids. Symp. Ser. 41:95-98). Pairwise evolutionary nucleotide distances (excluding gaps in the pairwise alignment) will be estimated using a general- time-reversible model with site-to-site variation in substitution rates (discrete approximation of a gamma-distribution with a shape parameter, alpha, to be determined from the available data (Leitner et al., 1997). Neighbour-joining evolutionary trees (Saitou and Nei, 1987) will be constructed using full data sets for each gene region. Bootstrap analysis will be performed (Felsenstein, 1988) to assess the support at internal nodes of the trees. Selection Analysis Module: Overriding selection pressures on viral gene sequences will be assessed through analysis of the site- adjusted frequency of synonymous and non-synonymous site changes within individual codons (Kumar et al., 1994). (Yang et al., 1997). The codemlsites programme from the PAML package will be used to implement a maximum likelihood estimate of the rate of evolution at each sequence position of each sequence alignment. A variable length sliding window will be used to assess clustering of regions with high and low rates of evolution. Regions showing rates higher or lower than cut-off values will be highlighted. In addition, reconstruction of ancestral states will be performed. CAPRISA investigators from all projects will be able to browse evolutionary rates on the website and to set the size of a sliding window to look for regions of high and low diversity. Where the 3-D structure is available, sequence sites of particular interest will be highlighted through modification of data files of the Protein Explorer programme (http://molvis.sdsc.edu/protexpl/). Intrapatient sequence diversity and evolutionary trends, and assessment of overriding evolutionary pressures will be performed in the selection analysis module. A series of modular analyses will be performed to populate tables in a back end MySQL database according to the schema developed in the integrated database. Specific aim 6: To perform resistance genotyping/phenotyping Resistance genotyping of the RT and protease genes will be performed on an ABI 3100 capillary sequencer using the ViroSeq HIV-1 Genotyping v.2 Kit and Software System. This method is well established in the laboratory of Dr. Cassol at the University of Natal. When applied to acute seroconvertors and drug naïve patients, sequencing will provide important baseline information on the frequency of naturally-occurring polymorphisms and resistance mutations in South Africa. Sequence information will also facilitate the design of subtype C-specific primers for use in OLA and SNP assays. The main advantages of OLA and SNP are that they are rapid, can be applied to large number of samples and, in the case of SNP, can be used to quantitate the amount of mutant versus wild-type virus. Both assays are more sensitive than conventional sequencing for the detection of low copy number mutations (ie. when the mutation constitutes less than 10% of the viral population). OLA and SNP are most useful when testing for a limited number of known mutations and when screening for the earliest emergence of resistance. Since separate reactions are required for the detection of each individual mutation, OLA and SNP are not particularly useful (or practical) when the treatment is unknown, or when multiple drugs have been used and many potential mutations are possible. Both are based on covalent interactions occurring between two adjacent oligonucleotide primers hybridised to an RT (or protease) PCR amplicon. In the OLA assay, fluoresein-labelled mutant and digoxigenin-labelled wild-type primers are designed to span the mutation site. The 3’-end of the primer forms a perfect match with either the mutant or the wild-type target. The second primer carries biotin at its 5’end and acts a ―common‖ anchor primer. Following hybridisation and ligation, the biotinylated product(s) are captured on a streptavidin-coated plate and detected by conventional ELISA methods. In the SNP assay (performed on a Light Cyler; Roche, Mannheim Germany), the first primer (wild-type or mutant) is labelled with fluorescein, and the anchor primer is labelled with an LC Red fluorophore. Following hybridisation, the close proximity of these primers leads to a fluorescence resonance energy transfer, which is detected and quantified during a melting curve analysis (http://www.biochem. roche.com). The same oligonucleotides primers (Lisa Frenkel, personal communication; Edelstein et al., 1998), modified for use with subtype C viruses, will be used for both OLA and SNP-based assays. For purposes of this proposal, sequencing will be considered to be the gold standard against which newly developed OLA and SNP assays will be evaluated. To obtain a better understanding of the biological significance of resistance-associated mutations and polymorphisms in the South African context, a subset of genotypically-resistant specimens will be subjected to resistance phenotyping using the consensus protocol developed by the Virology Committee of the AIDS Clinical Trials Group. In addition, a representative subset of drug resistant viruses will be shipped to either Virco or Photosense for an analysis of viral ―fitness‖ using a commercially available Recombinant Phenotypic Assay. Resistance sequences will be analysed and interpreted utilizing ABI software, in combination with ADRA (Antiviral Drug Resistance Analysis) from the Los Alamos HIV Sequence Database (http://www.hiv.lanl.gov) and the Stanford HIV RT and Protease Sequence Database (http://hivdb.stanford.edu). These methods will be used to screen for resistance in individuals treated with the antiretroviral combination DDI, 3TC and nevirapine in the CAPRISA antiretroviral therapy project. In this study, treatment will be initiated during tuberculosis therapy using a directly observed strategy or self-administered on completion of tuberculosis therapy. In addition, a baseline survey of resistance mutations will be done on the samples from acute seroconvertors project. Specific aim 7: To develop and provide databases for the integration of sequence, clinical and immunological information generated by all the projects and cores The bioinformatics and computational biology component of this core will develop and implement an integrated schema for sharing of information generated by all CAPRISA projects and cores. Information on sequence alignment, evolution, cohort statistics, host genetic background, epitope mapping, clinical data and epidemiological data will be cross-referenced to viral sequences via filesharing with the Epidemiology and Biostatistics core’s demographic, laboratory, behavioural, epidemiological and clinical database. An integrated query system will be implemented to support hypothesis testing. Sequences will be databased and annotated with relevant information from each of the research projects by means of automated and semi-automated annotation. The routine study databases for each of the projects will be established and maintained by the Epidemiology and Biostatistics core which will supply information relevant to each project in computer readable formats, that will subsequently be linked with underlying isolate sequences to allow for subsequent association studies and delivery to a web interface that presents the integrated data. The datasystem will constitute a dynamically generated web interface to the users developed in ZOPE with a python middleware layer associated with necessary parsers, file upload and download. The back end database will be MySql. Filesharing will be integrated with the Laboratory Data Management System. Relational tables will be populated as described below, with the ability to update entries relevant to sequence annotation by hand. Sequences will be curated and processed using the Viral Diversity and Bioinformatics core facility tools and MySql database. The structure of the query system and the underlying database have already been developed under a separate proposal for genome annotation and databasing funded by a three year project supported by the South African National Innovation fund (Human Disease — A Genomic Perspective 1998-2001). Additional hand annotation of isolate sequences will be possible at anytime through generation of a GenBank flatfile entry from the database through a python parser. The GenBank entry will be read into ARTEMIS and entries can then be submitted back into the system through updated generation of a GenBank entry for parsing through python into the MySql database. Integration module: Each sequence will be processed through the production sequence analysis module, a phylogenetic analysis module, a selection analysis module (the preceding three modules have been described in detail under specific aim 5) an immunology module (this aim) and a clinical information module (this aim). Other modules may be added as the study matures and needs are identified. The integration module (this aim) will integrate sequence information with immunogical information such a eptiope maps and HLA type coupled with epidemiology and cohort information (from the Epidemiology and Biostatistics core) and will be performed for each isolate via sharing of data tables and/or structured file sharing. Information gained from these cores will be extracted into tables for the relational database, and will be linked to the appropriate isolate sequences at a residue position level, together with appropriate mapping to available three dimensional structures for viewing using CN3D or Protein Explorer as appropriate. Immunological data module: Sequence data for epitope mapping from the acute seroconversion and highly exposed persistently seronegative individuals projects will be hand mapped onto isolate sequence entries using ARTEMIS. Entries will be further annotated according to HLA restriction such that viral sequence and the autologous HLA genotype of the host are linked via unique accessions. Known epitope changes restricted by a particular HLA allele(s) will be accessible via searches through a web interface. The database and search facility will include quantitative information on the number of CTL and Th degenerate and non- degenerate epitopes recognized in any one individual at any one time, providing a qualitative approach for equating viral control with the number of regions targeted by host CTL. Examples of immunological data will include: correlation with end-point data such as viral load; CD4 count and the clinical status of the individual; frequency of HLA types and haplotypes in South Africa. The highly exposed persistently seronegative cohort information will be further annotated with any available sequence data from breakthrough infections. Clinical data module: Host clinical status data for each viral isolate will be captured from the acute seroconversion, highly exposed persistently seronegative individuals and evolving epidemiology projects by the Epidemiology and Biostatistics core and then shared electronically with the South African National Bioinformatics Institute to create integrated databases. Standardised clinical descriptions will be used so that subsequent hypothesis testing can be performed across all available datasets. Descriptions will be linked directly to each isolate sequence using the database and will be queryable via a restricted access web interface. Links with other research and training activities The Immunology core will create research and training infrastructure that could be used for other research activities as well. Sharing of CAPRISA core capabilities will be encouraged, while ensuring that CAPRISA needs have highest priority. Such sharing will benefit other projects and researchers; in return CAPRISA will benefit from collaborating with other projects. South African National Bioinformatics Institute training program: The existing training infrastructure at the South African National Bioinformatics Institute provides regular training courses. These will be extended to include CAPRISA investigators so that they become familiar and competent in techniques for bioinformatics analyses, and also in use of the facilities provided by Viral Diversity and Bioinformatics core. Viral molecular evolution courses will be taught by leading experts in the field who have already taught such courses previously either in South Africa or the USA eg. James Mullins (University of Washington). Development of HIV-1 subtype C candidate vaccines: This project is funded by the South African AIDS Vaccine Initiative and is co-ordinated by Professor Anna-Lise Williamson, with Carolyn Williamson as a co-investigator on the project. Multiple vaccine candidates expressing several HIV-1 genes (env, nef, tat and gag-pol) from a local HIV-1 subtype C isolate will be compared. The vaccine approaches being developed are based on DNA, MVA, BCG and virus like particles (VLPs) produced in plants. CAPRISA infrastructure will be made available to this project when required eg. when each new construct needs to be sequenced to ensure integrity of the gene during vaccine manufacture. Does infection with one subtype protect from infection with a second subtype? This project is funded by the European Union under the leadership of M. Hoelscher, with Carolyn Williamson as a co-investigator on the project. A large cohort has been set-up in the Mbeya region of Tanzania. In this region there are multiple subtypes circulating and over 50% of the women are infected with unique recombinant viruses. Carolyn Williamson is the principal collaborator from South Africa on this project, and Lucky Malaza from the University of Cape Town laboratory, is currently being trained in Francine McCutchan’s laboratory to screen for dual infections. The South African role in this project is to monitor viral dynamics over time i.e. to monitor time from infection to superinfection and determine if the infection with a second subtype affects viral set-point; to determine the contribution that each subtype makes to viral load and lastly, monitor for the possible emergence of recombinants. This aims to shed light of the pathogenicity of recombinant viruses, as well as correlates of protection. Full-length sequencing has been established as part of this project. D. CORE LEADER AND PERSONNEL Carolyn Williamson, PhD, is the Core Leader and will be responsible for the overall conduct of the core. She is the lead investigator on the viral diversity and sequencing component of the core and will be responsible for co-ordinating interaction with the principal investigators of all the CAPRISA projects and the other core leaders, through her participation and member of the CAPRISA Steering Committee and her direct interaction with the investigators and project officers on all the CAPRISA projects. Dr Williamson has extensive experience in studies involved in characterisation of HIV subtypes and has been instrumental in detailing the diversity of HIV-1 in South Africa. Her laboratory was responsible for subtyping isolates by heteroduplex mobility assay for the UNAIDS collaborating centre for the Global Network for Isolation and Characterization of HIV and was used as a testing site for the gag heteroduplex mobility assay. Her laboratory is currently responsible for sequencing gag and env for the HIVNET 028 study on recent seroconvertors in Southern Africa; is the South African partner in the European Union funded project on HIV superinfection in Tanzania and; was responsible, along with Ronald Swanstrom, in selecting South African isolates for inclusion into a number of leading candidate vaccines including Venzuelan Equine Encephalitis replicon vaccine. Dr Williamson is currently Associate Professor, Department of Virology, University of Cape Town and WHO consultant on HIV- 1 genetic diversity. She directs the molecular HIV research program at the University of Cape Town. She is internationally recognized as an expert in her field: she was invited to represent South African scientists in the WHO meeting on African strategy for an AIDS vaccine and continues to participate in the Biomedical working group as part of the African AIDS Vaccine Program. In addition, she was invited to act as an advisor on the UNAIDS Network for HIV Isolation and Characterization (1999) and acted as the Southern African representative at the UNAIDS/European Commission sponsored workshop on HIV-1 subtypes and their implications for epidemiology, pathogenicity, vaccines and diagnosis (Tanzania, 1997). She has reviewed articles for the journals AIDS and Archives of Virology; was a member of the International Review Committee, 12th and 13 th World AIDS Conferences in Geneva and South Africa respectively (1998 and 2000). She was a scientific evaluator for European Commission research proposals as part of a program for scientific and technological cooperation with the developing world (INCO-DC) (1997). Her national and international recognition in HIV/AIDS is illustrated by numerous invitations to address workshops and congresses on the relevance of HIV-1 genetic diversity. Winston Hide, PhD, is the Core co-Leader and will be responsible for the Bioinformatics and Computational Biology component. He has broad experience in complex biological analyses and has published in areas of molecular evolution, algorithm design, gene expression, databasing and disease gene discovery and characterisation and is the Director of the South African National Bioinformatics Institute. Winston Hide was a consultant bioinformatics designer of the drug discovery system at RW Johnson PRI, has 14 years of experience in the analysis of sequence diversity and in large and small scale sequence analysis with an established publication record in molecular evolution and genome analysis systems development. The Institute is responsible for development of bioinformatics within South Africa and its training globally through the International Star Alliance global bioinformatics training programme. His laboratory has supported the development of Bioinformatics in Africa, and is responsible in part for the World Health Organisation Genomics and Bioinformatics capacity development initiative in Africa and is a grant holder from the World Health Organisation for the development of a network of sites worldwide for the analysis of trypanosome diseases. Winston Hide, also a regular trainer at World Health Organisation bioinformatics workshops, will co-ordinate interaction on the bioinformatics aspect of the core for all the CAPRISA projects, he will be responsible for setting up a centralised facility for integration of data as well as contributing to advanced computer and analytical expertise needed for sequence analysis. He will also be responsible for communication and sharing of biological data between laboratories within the core. Winston Hide will be responsible for interpretation of data and performing analyses in collaboration with Dr Seioghe and Dr Williamson. Sharon Cassol, PhD, is the Core co-Leader and will be responsible for resistance genotyping and phenotyping. Before moving to South Africa, Dr Cassol was the Principal Investigator of a comparative study to assess the performance of several different genotyping methods, including the ABI, Visible Genetics and LiPA systems. The primary goal of this work (conducted on behalf of the Ontario Ministry of Health) was to determine the feasibility and clinical utility of large-scale resistance genotyping at the provincial level. During the same time period, Dr. Cassol also served as the lead virologist of POLARIS, a prospective longitudinal study of HIV-1 seroconversion in Province of Ontario. In addition to screening for drug resistance in drug-naïve seroconvertors, this multi-disciplinary cohort study was designed to investigate the clinical impact and public health implications of primary resistance. Joanne van Harmelen, PhD, is a member of the Core Committee and will be responsible for overseeing the heteroduplex mobility assays and the heteroduplex tracking assays. She has been working in the laboratory of Carolyn Williamson for over 6 years and has been the key person involved in the initial characterisation of HIV-1 subtypes in South Africa. She has set up the heteroduplex mobility assay in the University of Cape Town laboratory as a rapid screening technique for both the gag and env regions. In addition, Dr van Harmelen was involved in the selection of genes to be included in possible candidate vaccines and was responsible for generating the full-length sequences of the three isolates chosen for vaccine development (Du151, Du422 and Du179) as well as an additional isolate from Cape Town (CTSc2). Cathal Seoighe, PhD, is a member of the Core Committee and will be the lead computational biologist and be responsible for training and development of advanced computational techniques for evolutionary analysis in this core. Trained initially in Theoretical Physics in Trinity College Dublin he has gained useful experience from computational modelling of physical systems. His mathematical skills and experience of probabilistic modelling will form an important addition to the skills pool of the core. He has a PhD in molecular evolution and comparative genomics and subsequent post-doctoral work has focussed on developing algorithms to compare rates of sequence evolution in families of related proteins. He has organised and taught in bioinformatics training courses for South African biologists, including a training course in phylogenetic methods and molecular evolution of HIV. He will maintain and provide support for standard bioinformatics tools and develop new software applications to perform non-standard analyses, as required. James Mullins, PhD, is an internationally recognised leader in defining the HIV-1 sequence diversity and analysis and has indicated a commitment to contribute to the advanced computer analysis of HIV sequences and in training. He is currently the Professor of the Departments of Microbiology, Medicine and Laboratory Medicine as well as the Chairman of the Department of Microbiology at the University of Washington. Francine McCutchan, PhD is a long-standing leader in HIV diversity with a particular interest in recombination. She has well established methodologies in full-length genome sequencing. She has already trained three South African scientists in her laboratory, two in full-length genome sequencing and one in methodologies for screening for recombinant viruses. She has indicated a continued commitment to assist in future full-genome sequencing of South African isolates as well as training of South African students and staff. Dr McCutchan is Chief of the Global Molecular Epidemiology Program at the Henry M. Jackson Foundation, a component of the US Military HIV Research Program. Over the last 10 years, her program helped to identify many of the major HIV-1 subtypes. In addition, she has contributed substantially to the database of HIV sequences, and has gathered data on the global distribution of HIV subtypes. Inter-subtype recombinant HIV, now known to represent some of the most important strains in the pandemic, were initially identified, and continue to be studied intensively, by her program. Ronald Swanstrom, PhD is an expert in tracking viral dynamics. He has been involved in the selection of genes for inclusion into subtype C based vaccines. Two students will go to his laboratory for training in heteroduplex tracking assay and Dr Swanstrom has indicated his commitment to building South African capacity in AIDS research. Dr Swanstrom received his postdoctoral training in retroviruses at the University of California at San Francisco before moving to the University of North Carolina at Chapel Hill. In the mid 1980s he started working on the molecular biology of HIV. Over the last 15 years he has worked in a number of areas of AIDS research. The initial work was with the HIV-1 protease, and this has led to an ongoing interest in the nature of resistance to protease inhibitors. He has also explored the nature of X4 variants which are able to enter cells using a different coreceptor. He is part of a collaborative team trying to exploit the Venezuelan Equine Encephalitis vaccine vector for HIV-1 vaccine development. More recently, he has been exploring patterns of viral sequence variability as a tool for revealing the nature of virus-host interactions. There are ongoing, active collaborations with scientists in South Africa, Malawi, China, Puerto Rico, and Cuba. In 1998 he became Director of the newly formed UNC Center For AIDS Research. The Center is a collaborative effort between HIV researchers at UNC Chapel Hill, Family Health International, and Research Triangle Institute. E. CONCLUSION The Viral Diversity and Bioinformatics Core has well established capabilities both in terms of personnel and resources to perform viral sequencing, heteroduplex assays, computational analysis on sequences and drug resistance assays. In addition, integration of sequence data with clinical, laboratory, epidemiological and behavioural data will be undertaken by this core. A Viral Diversity and Bioinformatics Core Committee will oversee this core under the leadership of Dr Carolyn Williamson; this committee will monitor progress of the core, prioritise the use of core resources and support the coordination of the three components of the core. The Core Committee will also coordinate core activities with other CAPRISA activities and encourage collaboration with other research efforts. While geographically split between Cape Town and Durban, this core ensures that existing strengths are optimised rather than rebuilding all capabilities in one center. Thus the inclusion of laboratories in Cape Town and Durban together with the South African National Bioinformatics Institute in Cape Town is a strength of this core. The Core is further strengthened by support and training from leading US investigators. A web interface will serve as a medium for communication between sites within this core and with the other CAPRISA investigators. F. REFERENCES Bredell H, Williamson C, Sonnenberg P, Martin DJ, Morris L. Genetic characterization of HIV type 1 from migrant workers in three South African gold mines. AIDS Res.Hum.Retroviruses 1998; 14(8): 677-84. Delwart EL, Busch MP, Kalish ML, Mosley JW, Mullins, JI. Rapid molecular epidemiology of human immunodeficiency virus transmission. AIDS Res Hum Retroviruses 1995; 11: 1081-1093. Delwart EL, Pan H, Sheppard HW, Wolpert D, Neumann AU, Korber B, Mullins JI. Slower evolution of human immunodeficiency virus type 1 quasispecies during progression to AIDS. J Virol 1997; 71: 7498-7508. Delwart EL, Sheppard HW, Walker BD, Goudsmit J, Mullins JI. Human immunodeficiency virus type 1 evolution in vivo tracked by DNA heteroduplex mobility assays. J Virol 1994; 68: 6672-6683. Delwart EL, Shpaer EG, Louwagie J, McCutchan FE, Grez M, Rubsamen-Waigmann H, Mullins JI. Genetic relationships determined by a DNA heteroduplex mobility assay: analysis of HIV-1 env genes. Science 1993; 262: 1257-1261. Edelstein RE, Nicerkerson DA, Tobe VO, Manns-Arcuino LA, Frenkel LM. Oligonucleotide ligation assay for detecting mutations in the human immunodeficiency virus type 1 pol gene that are associated with resistance to zidovudine, didanosie and lamivudine. J Clin Micro 1998; 36: 569-72. Felsenstein, J. Phylogenies form molecular sequences: Inference and reliability. Ann Rev Genet 1988; 22: 521-65. Freel SA, Williams JM, Nelson JA, Patton LL, Fiscus SA, Swanstrom R, Shugars DC. Characterization of human immunodeficiency virus type 1 in saliva and blood plasma by V3-specific heteroduplex tracking assay and genotype analyses. J Virol. 2001; 75(10): 4936-40. Heyndrickx L, Janssens W, Zekeng L, Musonda R, Anagonou S, Van der Auwera G, Coppens S, Vereecken K, De Witte K, Van Rampelbergh R, Kahindo M, Morison L, McCutchan FE, Carr JK, Albert J, Essex M, Goudsmit J, Asjo B, Salminen M, Buve A, van Der Groen G. Simplified strategy for detection of recombinant human immunodeficiency virus type 1 group M isolates by gag/env heteroduplex mobility assay. J Virol 2000; 74(1): 363-70. Korber BTM, Learn G, Mullins JI, Hahn BH, Wolinsky, S. Protecting HIV sequence databases, Nature 1995; 378: 242-243. Learn GH, Korber BT, Foley B, Hahn BH, Wolinsky SM, Mullins JI. Maintaining the integrity of human immunodeficiency virus sequence databases. J Virol 1996; 70(8): 5720-30. Leitner T, Kumar S, Albert J. Tempo and mode of nucleotide substitutions in gag and env gene fragments in human immunodeficiency virus type 1 populations with a known transmission history. J Virol 1997; 71(6): 4761-70. Mashishi T, Loubser S, Hide W, Hunt G, Morris L, Ramjee G, Abdool-Karim S, Williamson C, Gray CM. Conserved domains of subtype C nef from South African HIV-1 infected individuals include cytotoxic T lymphocyte epitope-rich regions. Submitted. Nelson JAE, Fiscus SA, Swanstrom R. Evolutionary variants of the human immunodeficiency virus type 1 V3 region characterized by using a heteroduplex tracking assay. Journal of Virology 1997; 71: 8750-8. Ping LH, Nelson JAE., Hoffman IF, Schock J, Lamers SL, Goodman M, Vernazza P, Kazembe P, Maida M, Zimba D, Goodenow MM, Eron JJ Jr, Fiscus SA, Cohen MS, Swanstrom R. Characterization of V3 sequence heterogeneity in subtype C human immunodeficiency virus type 1 isolates from Malawi: underrepresentation of X4 variants. J Virol 1999: 73: 6271-6281. Ping LH, Cohen MS, Hoffman I, Vernazza P, Seillier-Moiseiwitsch F, Chakraborty H, Kazembe P, Zimba D, Maida M, Fiscus SA, Eron JJ, Swanstrom R, Nelson JA. Effects of genital tract inflammation on human immunodeficiency virus type 1 V3 populations in blood and semen. J Virol. 2000; 74(19): 8946-52 Resch W, Parkin N, Stuelke EL, Watkins T, Swanstrom R. A multiple-site-specific heteroduplex tracking assay as a tool for the study of viral population dynamics. Proc Natl Acad Sci U S A, 2001: 98: 176-181. Saitou N, Nei M. The neighbour-joining method: a new method for reconstructing phylogeneitc trees, Molb Biol Evol 1987; 4: 406- 425. Tanaka T, Nei M. Positive darwinian selection observed at the variable-region genes of immunoglobulins. Mol Biol Evol 1989; 6(5): 447-59. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994; 22(22): 4673-80. van Harmelen J, van der Ryst E, Loubser AS, York D , Madurai S, Lyons S, Wood R, Williamson C. A predominantly HIV type 1 subtype C-restricted epidemic in South African urban populations. AIDS Res Hum Retroviruses 1999; 15(4): 395-8. van Harmelen J, van der Ryst E, Wood R, Lyons SF, Williamson C. Restriction fragment length polymorphism analysis for rapid gag subtype determination of human immunodeficiency virus type 1 in South Africa [In Process Citation]. J.Virol.Methods 1999; 78(1-2): 51-9. van Harmelen J, Wood R, Lambrick M, Rybicki EP, Williamson AL, Williamson C. An association between HIV-1 subtypes and mode of transmission in Cape Town, South Africa [see comments]. AIDS 1997; 11(1): 81-7. Van Harmelen J, Williamson C, Kim B, Morris L, Carr, J, Maartens G, Abdool Karim SS, McCutchan F. Characterisation of full- length HIV-1 sequences from South Africa. Submitted. Williamson C, Engelbrecht S, Lambrick M, van Rensburg EJ, Wood R, Bredell W, Williamson AL. HIV-1 subtypes in different risk groups in South Africa. Lancet 1995; 346(8977): 782 Williamson C, Morris L, Maughan M, Ping LH, Dryga S. Characterization and Selection of HIV-1 Subtype C Isolates For Use in Vaccine Development. Submitted. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 1997; 13: 555-6.