									     Why Do Only Transcriptomes?
Sequencing transcripts (i.e. expressed genes) is inherently cheaper than
sequencing genomes, because it obviates the need to sequence the intronic
and intergenic regions, which can be orders of magnitude larger. Obviously
one can never get all the genes just by doing transcripts, and it is not our
intention to argue for one or the other, since (a) the pros and cons were laid
out two decades ago for the human genome project, and (b) BGI-Shenzhen
is also sequencingde novo genomes like the giant panda with the new
technology. The thing is once you get away from the few dozen obviously
important plant species, almost none of the roughly half million plant species
known to humanity has been touched by genomics at any level.

However, we are emphatically not generating ESTs (i.e.expressed sequence
tags), which commit only a single read to each transcript. We are making
shotgun libraries and trying to reconstruct full-length transcripts by
computationally assembling the fragments. Based on preliminary
experiments with 1 Gb of raw sequence per species we expect the first few
thousand of the most abundantly expressed genes to be full-length. For the
next ten thousand genes, we still recovered more of the coding region than
traditional ESTs that prime off the poly-A tail and capture mostly the 3-UTR.
That said, thanks to continuing technology improvements, we expect that
most of the project will generate 2 Gb of raw sequence per species.

