Bovine Genome Annotation

Document Sample
Bovine Genome Annotation Powered By Docstoc
					Bovine Genome Annotation
      January 15, 2007
         San Diego
   Rough Agenda for Workshop
• Melissa Landrum - Resources at NCBI
• Chris Childers, TAMU - Manual annotation
  tutorial using tools familiar to biologists
• Chris Elsik, TAMU - Organization of community
  annotation, submission website, tools and data at (to be continued after the
  Apollo talk)
• Coffee Break 3:00-3:30
• Lynn Crosby, Harvard - The Apollo Annotation
• Chris Elsik - finish up the talk that was started
  before coffee break
•   Organization of community annotation
•   Rules
•   Signing Up
•   Objectives
•   The Annotation Submission Site
•   Chromosome Coordinate Systems
•   Resources at BovineGenome.Org
•   Bovine Data for Apollo
     Bovine Annotation Themes and Leaders
•   Muscle - Jim Reecy
•   Immune function- Ross Tellam, Loren Skow
•   Lactation - Tad Sonstegard, Monique Rijnkels
•   Energy partitioning, metabolism, rumen function - Steve Moore
•   Reproduction, endocrinology, sex determination, development -
    Mike Roberts
•   Imprinted genes, HDACs, methyl transferases -Randy Jirtle
•   Bovine models of human diseases - Frank Nicholas
•   Non-coding RNA - Brian Dalrymple
•   Behavior, maternal naturing - Clare Gill
•   Prion protein - John Williams
•   Adipose - Diane Spurlock-Moody
•   Blood/Skin - Fernando Garcia
•   All Other - Chris Elsik
      Rules related to Cooperation
• When different participants wish to work on the same area,
  theme leaders will coordinate, and participants will cooperate
  for joint publication.
• Theme leaders will make sure all contributers are listed as
• When two people submit conflicting gene models, the annotator will resolve the conflict.
• Submitted gene models may be modified by the annotator at Those that are modified will be made
  available on the website, so that submitters may politely
  dispute the modification via email if they disagree.
• Submitters will follow gene model submission and
  publication deadlines.
          Rules Related to Data
• All annotations will be based on the current assembly
• The ONLY exceptions will be genes that cross
  scaffolds, which may be affected by upcoming assembly
  changes, or genes that are not found in the assembly.
• Gene models that are annotated using additional
  sequence data, not found in the assembly, will not be
  incorporated into the official gene set until there is a
  future assembly that does contain the gene.
• However, these genes may be submitted as tentative
  official gene models.
• Gene models must be submitted to the website (not
             How To Sign Up (2 steps)

• 1. Sign up on Baylor’s Bovine Genome Listserv

• 2. Register at annotation submission site at
  BovineGenome.Org (not live yet)
        Manual Gene Annotation: Objectives
• Check consensus gene models for correct gene structure
• Identify consensus genes that should be split or merged
• Correct 5’ and 3’ ends (missing start and termination codons)
• Annotate splice variants
• Identify diverged gene family members - using special PSI-BLAST
  Database at which combines NCBI NR protein
  database with bovine predicted and ab initio proteins
• Identify novel genes - not included in consensus gene set
• Annotate genes that cross scaffolds
• Functional annotation (homolog description and ids, phylogenetic
  analysis to identify orthologs)
     Objectives (more specifically)
• There will be one Bovine Official Gene Set (OGS) and
  6 preliminary gene sets (NCBI, Ensembl, Fgenesh,
  Fgenesh++, Geneid, SGP)
• The first step is to check if your gene is in the OGS set,
  then check if the OGS gene model is correct.
• Gene models from the preliminary gene sets may be
  used to replace the OGS gene model.
• Or any of these gene models may be modified.
• Indicate in “Data Source” which dataset the gene
  model you started with came from.
• If your gene is not found in any gene set, but is found
  in the assembly using BLAST, you may create the gene
  model from scratch.
      The Annotation Submission Site
• A preliminary gene model id will automatically be assigned to your
• Clicking on a field name will allow you to see a description of that
• There are a few required fields - data source, chromosome or ChrUn
  scaffold id, assemblyif not 3.1
• The minimum sequence information required if you are changing the
  sequence of a gene model is the protein sequence.
• Tabs will be stripped out of all fields, so don’t depend on them.
• You may view and edit your submissions.
• Other users cannot edit your submissions
• User information is searchable so leaders may contact their group
Top half of submission page
Bottom half of submission page
Top half of edit page
             Coordinate Systems
• There are currently two coordinate systems.
• Terminology for these systems used by NCBI, Baylor,
  Ensembl, may be different.
• At
• Scaffolds = ChrUn (unassigned) and assigned sequence
  segments equivalent in the assembly process to the
  ChrUn sequences. These are the segments that may be
  ordered and assigned to chromosomes.
• Chromosomes = (Chr1-29 and ChrX). These are
  concatenated scaffolds that represent an entire
• ChrUn are always on the scaffold coordinate system
  (because they have not been assigned to chromosomes)
   Coordinate Systems (Continued)
• The BLAST site has databases
  for both coordinate systems.
• The Genome Browser is currently using the
  chromosome coordinate system for all chromosomes,
  except ChrUn (always the scaffold system)
• We will provide another browser with the scaffold
  coordinate system for all scaffolds (not just ChrUn)
• Data for Apollo will always be on the scaffold
  coordinate system.
 Resources that will be available
     at BovineGenome.Org
• Two genome browsers - chromosome and scaffold
  coordinate systems
• BLAST site with assembly sequences
  (chromosome and scaffold sequences), all gene
  prediction sets, and special requests
• Connection of BLAST results to genome browser
• PSI-BLAST site to identify diverged family
• Ftp site to download Apollo formatted data
• The following slides have examples from BeeBase
BeeBase - Link from Blast hit to Gbrowse
BeeBase - Gbrowse with predicted
    genes and BLAST hits
BeeBase - Gbrowse
tracks on assembly 2
               Gbrowse Things
• EST direction:
   – The track label will indicate whether arrows designate
     coding direction or strand of alignment
• Spliced Features - for spliced EST tracks, arrows
  always indicate coding direction. Splice sites are
• BLAST or BLAT alignment tracks - gaps between
  segments roughly indicate introns, but splices are
  not modeled (splice sites are likely incorrect). This
  also applies to your BLAST searches viewed in
                   Splice Sites
• Apollo helps you find splice sites and start/stop
• Gbrowse does not help you find splice sites unless
  the track specifically models splice sites (predicted
  CDS and transcript tracks; exonerate and splign
• BLAST hits do not always reveal correct splice
   – The ends of BLAST HSPs can be fuzzy - they may
     extend slightly past the real homology, or stop short
      Bovine Data for Apollo
• Bovine data for Apollo will be provided on ftp site
• Configuration file
• GAME-XML with OGS and preliminary
  gene sets
• BLAST searches of NCBI NR protein
  database and ESTs
         A Strategy for Apollo
• 1. Search bovine scaffolds with a protein homolog
  of your gene of interest using TBLASTN at the BLAST site.
• 2. Search bovine predicted proteins with your
  protein homolog with BLASTP to identify the
  homologous bovine OGS gene model (or gene
  model from an input gene set).
• 3. Download the Apollo XML and BLAST output
  file for the correct scaffold from our ftp site.
• 4. Load the data into Apollo and locate the gene
  model from step 2.
• View all sources of evidence to evaluate and
  modify gene model.