The Human Genome Project
at UC Santa Cruz
November 9, 2004
The Human Genome Project
Began in 1990
• The Mission of the HGP: The quest to
understand the human genome and the
role it plays in both health and disease.
“The true payoff from the
HGP will be the ability to
better diagnose, treat, and
--- Francis Collins, Director of the HGP
and the National Human Genome
Research Institute (NHGRI)
The genome is our Genetic Blueprint
• Nearly every human
cell contains 23 pairs
– 1 - 22 and XY or XX
• XY = Male
• XX = Female
• Length of chr 1-22, X,
Y together is ~3.2
billion bases (about 2
The Genome is
Who We Are on the inside!
• Chromosomes Information coded
consist of DNA
– molecular strings of A,
C, G, & T
– base pairs, A-T, C-G
– DNA sequences that
– less than 3% of
5000 bases per page
TATCCTCTCTGATTTCTTTGTGCAGTGTTTTGTAATTCTCAT TGTAGAGATTTTTCACCTCCCTGGTTAGTTGTATTTTACCCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT
TGCCTTCCTGATTTGACTGC CAGCTTGGTTACTGTTGGTTTATAGAAATGCTAGTGATTTTTGTACATTG ATTTTCTTTCTAAAACTTTGCTGAAGTTTTTTTTATTAGCAGAAGGAGCT
GCTGGGATTTCCTATGTTGAATAGGAGT CATGAGAGAGGGCATCAAATCTACACATATCAAATACTAACCTTGAATGTCTAGATATTT TATTCTTTTTGTGAAAATTGTGAATGGGAT
How much data make up the
• 3 pallets with 40 boxes per pallet x 5000
pages per box x 5000 bases per page =
• To get accurate
• Now: Shred 18 pallets
The Beginning of the Project
• Most the first 10 years of the project were
spent improving the technology to
sequence and analyze DNA.
• Scientists all around the world worked to
make detailed maps of our chromosomes
and sequence model organisms, like
worm, fruit fly, and mouse.
UC Santa Cruz gets Involved
Because of the work Professor David
Haussler was doing in the field of
computational biology, UC Santa Cruz
was invited to participate in the HGP in
late of 1999.
Computational biology (or
Bioinformatics) is a research
field that uses computers to
help solve biological problems
The Tech Awards honors the UCSC Genome Bioinformatics Group in 2003!
The Challenges were
• First there was the
The DNA sequence is so long that
no technology can read it all at
once, so it was broken into
There were millions of clones
(small sequence fragments).
The assembly process included
finding where the pieces
overlapped in order to put the
3,200,000 piece puzzle
The “Working Draft” of the
CTAGGCT Freeze of sequence data Clone layouts generated
TTGCATC generated by NCBI By Washington University
Assembly generated by maps
Working draft assembly
UCSC put the human
genome sequence on the
web July 7, 2000 Cyber geeks
UCSC put the
sequence on CD in
October 2000, with
The Completion of the Human
• June 2000 White House
announcement that the majority
of the human genome (80%)
had been sequenced (working
• Working draft made available
on the web July 2000 at
• Publication of 90 percent of the
sequence in the February 2001
issue of the journal Nature.
• Completion of 99.99% of the
genome as finished sequence on
The Project is not Done…
• Next there is the Annotation:
The sequence is like a topographical map,
the annotation would include cities, towns,
schools, libraries and coffee shops!
So, where are the genes?
How do genes work?
And, how do scientists use
this information for scientific
understanding and to
What do genes do anyway?
• We only have ~27,000 genes, so that means that
each gene has to do a lot.
• Genes make proteins that make up nearly all we
are (muscles, hair, eyes).
• Almost everything that happens in our bodies
happens because of proteins (walking, digestion,
Eye Color and Hair Color
are determined by genes
Of Mice and Men:
It’s all in the genes
Humans and Mice have about the same
number of genes. But we are so different
from each other, how is this possible?
Did you say
One human gene can make many different
proteins while a mouse gene can only
make a few!
Genes are important
• By selecting different pieces of a gene, your
body can make many kinds of proteins. (This
process is called alternative splicing.)
• If a gene is “expressed” that means it is turned
on and it will make proteins.
What we’ve learned from our
genome so far…
• There are a relatively small number of human
genes, less than 30,000, but they have a complex
architecture that we are only beginning to
understand and appreciate.
-We know where 85% of genes are in the
-We don’t know where the other 15% are
because we haven’t seen them “on” (they may only
be expressed during fetal development).
-We only know what about 20% of our genes
do so far.
• So it is relatively easy to locate genes in the
genome, but it is hard to figure out what they
How do scientists find genes?
• The genome is so large that useful
information is hard to find.
• Researchers at UCSC decided to make a
computational microscope to help
scientists search the genome.
• Just as you would use “google” to find
something on the internet, researchers
can use the “UCSC Genome Browser”
to find information in the human genome.
Explore it at http://genome.ucsc.edu
The UCSC Genome Browser
The browser takes you from
early maps of the genome . . .
. . . to a multi-resolution view . . .
. . . at the gene cluster level . . .
. . . the single gene level . . .
. . . the single exon level . . .
. . . and at the single base level
The Continuing Project
• Finding the complete set of genes and annotating
the entire sequence. Annotation is like detailing;
scientists annotate sequence by listing what has
been learn experimentally and computationally
about its function.
• Proteomics is studying the structure and function
of groups of proteins. Proteins are really important,
but we don’t really understand how they work.
• Comparative Genomics is the process of
comparing different genomes in order to better
understand what they do and how they work. Like
comparing humans, chimpanzees, and mice that
are all mammals but all very different.
Who works on this stuff anyway?
• Biologists and Chemists understand the
physical sciences-they take biology and
• Computer Scientists program the computers
(the same people who make video games!)-they
take math and computer classes.
• Computer Engineers try to build better, faster,
smarter computers-they take math, physics and
• Social Scientists try to understand how this new
information and technology will impact our lives-
they take sociology and philosophy classes.
UCSC Summer Workshop on
Human Genome Research
• Held annually in July
• It’s a free event for
students and teachers
• Workshops by faculty and
researchers on a wide
array of topics
• Tours of our laboratories
• Free breakfast and lunch
• Travel funds are available
• RSVP: 831-459-1702 or
How can I work on this project, or
something like it?
• Read about it, online at www.genome.gov,
or in Nature, Science, or other scientific
• Take classes in biology, chemistry, math,
physics and English classes at high school.
• OR take classes at your local community
college or University-Extension in biology,
bioinformatics, or genetics.
• Go to college and get a degree in science,
engineering, math, or social sciences.
Biology National Laboratory
Research Staff -
Mathematics MS (MA) Company/University
Ocean Sciences National Laboratory
Physics Research Foundation
(Education, Sociology, Community College
Philosophy, Public Schools
Community Studies) BS (BA) Entry-Level -
A research degree in National Laboratory
any of these majors Teaching –
will take you far! Private Schools
Thank you for letting us come
talk to you today and share
what we do!
Bye! UCSC, Slugs