Bovine Genome Annotation Workshop - Bovine Genome Annotation

Document Sample
Bovine Genome Annotation Workshop - Bovine Genome Annotation Powered By Docstoc
					Bovine Genome Annotation
      January 15, 2007
         San Diego
 Rough Agenda for Workshop
• Melissa Landrum - Resources at NCBI
• Chris Childers, TAMU - Manual annotation
  tutorial using tools familiar to biologists
• Chris Elsik, TAMU - Organization of
  community annotation, submission website,
  tools and data at (to be
  continued after the Apollo talk)
• Coffee Break 3:00-3:30
• Lynn Crosby, Harvard - The Apollo
  Annotation Editor
• Chris Elsik - finish up the talk that was started
  before coffee break
•   Organization of community annotation
•   Rules
•   Signing Up
•   Objectives
•   The Annotation Submission Site
•   Chromosome Coordinate Systems
•   Resources at BovineGenome.Org
•   Bovine Data for Apollo
Bovine Annotation Themes and Leaders
•   Muscle - Jim Reecy
•   Immune function- Ross Tellam, Loren Skow
•   Lactation - Tad Sonstegard, Monique Rijnkels
•   Energy partitioning, metabolism, rumen function - Steve
•   Reproduction, endocrinology, sex determination,
    development - Mike Roberts
•   Imprinted genes, HDACs, methyl transferases -Randy Jirtle
•   Bovine models of human diseases - Frank Nicholas
•   Non-coding RNA - Brian Dalrymple
•   Behavior, maternal naturing - Clare Gill
•   Prion protein - John Williams
•   Adipose - Diane Spurlock-Moody
•   Blood/Skin - Fernando Garcia
•   All Other - Chris Elsik
   Rules related to Cooperation
• When different participants wish to work on the same
  area, theme leaders will coordinate, and participants will
  cooperate for joint publication.
• Theme leaders will make sure all contributers are listed
  as authors.
• When two people submit conflicting gene models, the annotator will resolve the conflict.
• Submitted gene models may be modified by the
  annotator at Those that are
  modified will be made available on the website, so that
  submitters may politely dispute the modification via
  email if they disagree.
• Submitters will follow gene model submission and
  publication deadlines.
        Rules Related to Data
• All annotations will be based on the current
  assembly (3.1).
• The ONLY exceptions will be genes that cross
  scaffolds, which may be affected by upcoming
  assembly changes, or genes that are not found in
  the assembly.
• Gene models that are annotated using additional
  sequence data, not found in the assembly, will not
  be incorporated into the official gene set until there
  is a future assembly that does contain the gene.
• However, these genes may be submitted as
  tentative official gene models.
• Gene models must be submitted to the website
  (not emailed).
           How To Sign Up (2 steps)

• 1. Sign up on Baylor’s Bovine Genome

• 2. Register at annotation submission site at
  BovineGenome.Org (not live yet)
     Manual Gene Annotation: Objectives
• Check consensus gene models for correct gene structure
• Identify consensus genes that should be split or merged
• Correct 5’ and 3’ ends (missing start and termination codons)
• Annotate splice variants
• Identify diverged gene family members - using special PSI-
  BLAST Database at which combines
  NCBI NR protein database with bovine predicted and ab initio
• Identify novel genes - not included in consensus gene set
• Annotate genes that cross scaffolds
• Functional annotation (homolog description and ids,
  phylogenetic analysis to identify orthologs)
  Objectives (more specifically)
• There will be one Bovine Official Gene Set (OGS)
  and 6 preliminary gene sets
  (NCBI, Ensembl, Fgenesh, Fgenesh++, Geneid, S
• The first step is to check if your gene is in the
  OGS set, then check if the OGS gene model is
• Gene models from the preliminary gene sets may
  be used to replace the OGS gene model.
• Or any of these gene models may be modified.
• Indicate in “Data Source” which dataset the gene
  model you started with came from.
• If your gene is not found in any gene set, but is
        The Annotation Submission
    A preliminary gene model id will automatically be assigned to
    your submission.
•   Clicking on a field name will allow you to see a description of
    that field.
•   There are a few required fields - data source, chromosome or
    ChrUn scaffold id, assemblyif not 3.1
•   The minimum sequence information required if you are
    changing the sequence of a gene model is the protein
•   Tabs will be stripped out of all fields, so don’t depend on them.
•   You may view and edit your submissions.
•   Other users cannot edit your submissions
•   User information is searchable so leaders may contact their
    group members.
Top half of submission page
Bottom half of submission page
Top half of edit page
          Coordinate Systems
• There are currently two coordinate systems.
• Terminology for these systems used by NCBI,
  Baylor, Ensembl, may be
• At
• Scaffolds = ChrUn (unassigned) and assigned
  sequence segments equivalent in the assembly
  process to the ChrUn sequences. These are the
  segments that may be ordered and assigned to
• Chromosomes = (Chr1-29 and ChrX). These are
  concatenated scaffolds that represent an entire
• ChrUn are always on the scaffold coordinate
          Coordinate Systems
• The BLAST site has
  databases for both coordinate systems.
• The Genome Browser is currently using the
  chromosome coordinate system for all
  chromosomes, except ChrUn (always the
  scaffold system)
• We will provide another browser with the
  scaffold coordinate system for all scaffolds (not
  just ChrUn)
• Data for Apollo will always be on the scaffold
  coordinate system.
       Resources that will be
              available at
         BovineGenome.Org and
    Two genome browsers - chromosome
    scaffold coordinate systems
•   BLAST site with assembly sequences
    (chromosome and scaffold sequences), all
    gene prediction sets, and special requests
•   Connection of BLAST results to genome
•   PSI-BLAST site to identify diverged family
•   Ftp site to download Apollo formatted data
•   The following slides have examples from
BeeBase - Link from Blast hit to Gbrowse
BeeBase - Gbrowse with predicted
    genes and BLAST hits
BeeBase - Gbrowse
tracks on assembly 2
            Gbrowse Things
• EST direction:
   – The track label will indicate whether arrows
     designate coding direction or strand of alignment
• Spliced Features - for spliced EST
  tracks, arrows always indicate coding
  direction. Splice sites are modeled.
• BLAST or BLAT alignment tracks - gaps
  between segments roughly indicate
  introns, but splices are not modeled (splice
  sites are likely incorrect). This also applies to
  your BLAST searches viewed in Gbrowse.
              Splice Sites
• Apollo helps you find splice sites and
  start/stop codons
• Gbrowse does not help you find splice sites
  unless the track specifically models splice
  sites (predicted CDS and transcript tracks;
  exonerate and splign alignments)
• BLAST hits do not always reveal correct
  splice sites
  – The ends of BLAST HSPs can be fuzzy - they may
    extend slightly past the real homology, or stop
     Bovine Data for Apollo
• Bovine data for Apollo will be provided
  on ftp site
• Configuration file
• GAME-XML with OGS and preliminary
  gene sets
• BLAST searches of NCBI NR protein
  database and ESTs
        A Strategy for Apollo
• 1. Search bovine scaffolds with a protein
  homolog of your gene of interest using
• 2. Search bovine predicted proteins with your
  protein homolog with BLASTP to identify the
  homologous bovine OGS gene model (or
  gene model from an input gene set).
• 3. Download the Apollo XML and BLAST
  output file for the correct scaffold from our ftp
• 4. Load the data into Apollo and locate the
  gene model from step 2.