Figure 9 Overview of features of draft human genome.
Please note that this figure is too large to display in image form. Instead it has been split into four
PDFs. PDF 1 (3265K) shows chromosomes 1 - 3 and 20 - 22, PDF 2 (3049K) shows
chromosomes 4 - 6 and 17 - 19, PDF 3 (2287K) shows chromosomes 7 - 9 and 20 - 22 and PDF
4 (2737K) shows chromosomes 10 - 11, X, Y, and 12 - 13.
The Figure shows the occurrences of twelve important types of feature across the human
genome. Large grey blocks represent centromeres and centromeric heterochromatin (size not
precisely to scale). Each of the feature types is depicted in a track, from top to bottom as follows.
(1) Chromosome position in Mb.
(2) The approximate positions of Giemsa-stained chromosome bands at the 800 band resolution.
(3) Level of coverage in the draft genome sequence. Red, areas covered by finished clones;
yellow, areas covered by predraft sequence. Regions covered by draft sequenced clones are in
orange, with darker shades reflecting increasing shotgun sequence coverage.
(4) GC content. Percentage of bases in a 20,000 base window that are C or G.
(5) Repeat density. Red line, density of SINE class repeats in a 100,000-base window; blue line,
density of LINE class repeats in a 100,000-base window.
(6) Density of SNPs in a 50,000-base window. The SNPs were detected by sequencing and
alignments of random genomic reads. Some of the heterogeneity in SNP density reflects the
methods used for SNP discovery. Rigorous analysis of SNP density requires comparing the
number of SNPs identified to the precise number of bases surveyed.
(7) Non-coding RNA genes. Brown, functional RNA genes such as tRNAs, snoRNAs and
rRNAs; light orange, RNA pseudogenes.
(8) CpG islands. Green ticks represent regions of 200 bases with CpG levels significantly higher
than in the genome as a whole, and GC ratios of at least 50%.
(9) Exofish ecores. Regions of homology with the pufferfish T. nigroviridis 292 are blue.
(10) ESTs with at least one intron when aligned against genomic DNA are shown as black tick
(11) The starts of genes predicted by Genie or Ensembl are shown as red ticks. The starts of
known genes from the RefSeq database110 are shown in blue.
(12) The names of genes that have been uniquely located in the draft genome sequence,
characterized and named by the HGM Nomenclature Committee. Known disease genes from the
OMIM database are red, other genes blue. This Figure is based on an earlier version of the draft
genome sequence than analysed in the text, owing to production constraints. We are aware of
various errors in the Figure, including omissions of some known genes and misplacements of
others. Some genes are mapped to more than one location, owing to errors in assembly, close
paralogues or pseudogenes. Manual review was performed to select the most likely location in
these cases and to correct other regions. For updated information, see http://genome.ucsc.edu/