Transmeta Crusoe
Document Sample


Transmeta Crusoe
Microprocessor
Dr. Doug L. Hoffman
Computer Science 330
Spring 2002
Transmeta Crusoe
"Today [in RISC] we have large design teams and long design
cycles. The performance story is also much less clear now. The
die sizes are no longer small. It just doesn't seem to make as
much sense. Superscalar and out-of-order execution are the
biggest problem areas that have impeded performance [leaps].
The MIPS R10,000 and HP PA-8000 seem much more
complex to me than today's standard CISC architecture, which
is the Pentium II. So where is the advantage of RISC, if the
chips aren't as simple anymore?”
David Ditzel, Transmeta CEO
Transmeta’s 80x86
Architecture?
Crusoe microprocessors can run the same
software that runs on IBM PC-compatible
personal computers.
Smaller, simpler logic. Only about half the logic
transistors of an x86 processor.
Consumes between one-third and
one-30th the power.
Implements none of the x86 instructions in
hardware.
X86 vs. Crusoe
The blue stuff is silicon, and the yellow is software. Crusoe's blue part is smaller,
because branch prediction, and out-of-order execution (OOO) hardware has moved
off the die and into software. All of those functions are now done in real-time by a
special program as the application code is executing.
Transmeta’s Crusoe
The highest-performance Crusoe chip, the TM5400
Crusoe Features
Dynamic binary translation, gives programs the
impression that they are running on an x86
machine.
VLIW processor executes up to 4 instructions in
parallel.
LongRun power control adjust CPU power to the
tasks being performed.
Transmeta-ese
Individual instructions are called atoms.
VLIW instruction groups are called
molecules.
Commit and rollback allows instructions to
be “un-done”.
Code Morphing®
Transmeta-ese
VLIW vs. Superscaler
A "traditional" VLIW machine does reordering and parallelism hunting in software.
For a straight-ahead VLIW design like Intel's IA-64, the piece of software that does
all this is the compiler. The compiler extracts the parallelism from the code, looks
for dependencies, etc., and produces optimized code that the VLIW core can run as
fast as possible, in-order.
Code Morphing
The x86 architecture is an ill-defined amoeba containing such features as
segmentation, ASCII arithmetic, and variable-length instructions; the square
inside the blob is the VLIW processor and its functions.
Code Morphing
Since Crusoe is a VLIW machine that's made to run
code compiled for a superscalar machine, its
compilation and scheduling scheme is sort of a hybrid
of both approaches. Crusoe's Code Morphing software
actually takes a compiled x86 program and recompiles
it, on-the-fly, to Crusoe's native VLIW instruction
format. This recompilation uses sophisticated
compiler algorithms to extract parallelism from the
code, look for dependencies and do all those things
that a state-of-the-art VLIW compiler does.
Code Morphing Details
Takes x86 instructions and recompiles them on the fly
into VLIW instructions (atoms).
As it recompiles them, it optimizes them, making them
run, in many cases, more efficiently than the original
x86 code.
Finally, a scheduler reorders the atoms and groups them
into molecules.
Once translated, the VLIW code is stored in a special
part of memory, accessible only by the Code Morphing
software, so that particular program need not be
translated again.
But that’s not all...
Code Morphing Details
Software continues to monitor how an
application is being used.
If it finds that a process is spending a lot of time
in one part of the code, it turns on more levels
of optimization to make that part of the
program run faster.
It only optimizes the parts of the code being
used. Things that are executed infrequently are
not optimized.
Code Morphing
One of the challenges of
creating the Code Morphing
software was to make the
Crusoe processor, in many
cases, bug-compatible with
the x86 so that it would
generate the so-called Blue
Screen of Death at many of
the same times an x86
processor would.
Processor Features
Five execution units; two arithmetic-logic, a
load/store, a branch, and a floating-point.
Can execute four instructions in a cycle.
Sixty-four general-purpose and 32 floating-point
working registers shadowed by 48 general-
purpose and 16 floating-point registers.
64KB level one (L1) caches and a 256KB level
two (L2) cache.
Even more important
What it doesn’t have
no superscalar decode, grouping, or issue
logic.
no register renaming.
no segmentation hardware.
no floating-point stack hardware.
less interlock and bypassing logic than a
traditional central processing unit.
Low Power Features
If you have fewer transistors, you burn less
power.
Only those functional units that are absolutely
needed to execute an instruction are turned on.
LongRun hardware adjusts both
the supply voltage and the clock frequency so
that each application runs only as fast as it must
to get the job done.
Hardware and Software
Architecture
Processor upgrades are
simplified because the layer of
software between the applications
and the chip frees the designers to
change the chip architecture
without causing x86 software
developers to have to recompile
their code.
Code Morphing software can be
updated independently of
hardware by loading a software
upgrade into Flash memory.
The Last Word
"Considering the complexity of the project, it is amazing
how well it works, how fast it works, and how low-
power it is. For the end-user, this is just a normal PC,
but under the hood, it is a technological marvel."
-- Marc Fleischmann,Transmeta
"Revolutionary may be an overstatement, but they are
definitely different..."
-- Cahners Microprocessor Report
Get documents about "