Transmeta Crusoe by maclaren1


									Transmeta Crusoe

    Dr. Doug L. Hoffman
    Computer Science 330
    Spring 2002
Transmeta Crusoe
"Today [in RISC] we have large design teams and long design
cycles. The performance story is also much less clear now. The
die sizes are no longer small. It just doesn't seem to make as
much sense. Superscalar and out-of-order execution are the
biggest problem areas that have impeded performance [leaps].
The MIPS R10,000 and HP PA-8000 seem much more
complex to me than today's standard CISC architecture, which
is the Pentium II. So where is the advantage of RISC, if the
chips aren't as simple anymore?”

                                 David Ditzel, Transmeta CEO
  Transmeta’s 80x86
  Crusoe microprocessors can run the same
  software that runs on IBM PC-compatible
  personal computers.

Smaller, simpler logic. Only about half the logic
 transistors of an x86 processor.
Consumes between one-third and
 one-30th the power.
Implements none of the x86 instructions in
X86 vs. Crusoe

The blue stuff is silicon, and the yellow is software. Crusoe's blue part is smaller,
because branch prediction, and out-of-order execution (OOO) hardware has moved
off the die and into software. All of those functions are now done in real-time by a
special program as the application code is executing.
Transmeta’s Crusoe

  The highest-performance Crusoe chip, the TM5400
Crusoe Features

Dynamic binary translation, gives programs the
 impression that they are running on an x86
VLIW processor executes up to 4 instructions in
LongRun power control adjust CPU power to the
 tasks being performed.

Individual instructions are called atoms.
 VLIW instruction groups are called
Commit and rollback allows instructions to
 be “un-done”.
Code Morphing®
VLIW vs. Superscaler

A "traditional" VLIW machine does reordering and parallelism hunting in software.
For a straight-ahead VLIW design like Intel's IA-64, the piece of software that does
all this is the compiler. The compiler extracts the parallelism from the code, looks
for dependencies, etc., and produces optimized code that the VLIW core can run as
fast as possible, in-order.
 Code Morphing

The x86 architecture is an ill-defined amoeba containing such features as
segmentation, ASCII arithmetic, and variable-length instructions; the square
inside the blob is the VLIW processor and its functions.
Code Morphing

          Since Crusoe is a VLIW machine that's made to run
          code compiled for a superscalar machine, its
          compilation and scheduling scheme is sort of a hybrid
          of both approaches. Crusoe's Code Morphing software
          actually takes a compiled x86 program and recompiles
          it, on-the-fly, to Crusoe's native VLIW instruction
          format. This recompilation uses sophisticated
          compiler algorithms to extract parallelism from the
          code, look for dependencies and do all those things
          that a state-of-the-art VLIW compiler does.
Code Morphing Details

 Takes x86 instructions and recompiles them on the fly
  into VLIW instructions (atoms).
 As it recompiles them, it optimizes them, making them
  run, in many cases, more efficiently than the original
  x86 code.
 Finally, a scheduler reorders the atoms and groups them
  into molecules.
 Once translated, the VLIW code is stored in a special
  part of memory, accessible only by the Code Morphing
  software, so that particular program need not be
  translated again.
                But that’s not all...
Code Morphing Details

Software continues to monitor how an
 application is being used.
If it finds that a process is spending a lot of time
 in one part of the code, it turns on more levels
 of optimization to make that part of the
 program run faster.
It only optimizes the parts of the code being
 used. Things that are executed infrequently are
 not optimized.
Code Morphing

                One of the challenges of
                creating the Code Morphing
                software was to make the
                Crusoe processor, in many
                cases, bug-compatible with
                the x86 so that it would
                generate the so-called Blue
                Screen of Death at many of
                the same times an x86
                processor would.
Processor Features

Five execution units; two arithmetic-logic, a
 load/store, a branch, and a floating-point.
Can execute four instructions in a cycle.
Sixty-four general-purpose and 32 floating-point
 working registers shadowed by 48 general-
 purpose and 16 floating-point registers.
64KB level one (L1) caches and a 256KB level
 two (L2) cache.
             Even more important
What it doesn’t have

no superscalar decode, grouping, or issue
no register renaming.
no segmentation hardware.
no floating-point stack hardware.
less interlock and bypassing logic than a
 traditional central processing unit.
Low Power Features

If you have fewer transistors, you burn less
Only those functional units that are absolutely
 needed to execute an instruction are turned on.
LongRun hardware adjusts both
 the supply voltage and the clock frequency so
 that each application runs only as fast as it must
 to get the job done.
Hardware and Software
                Processor upgrades are
                simplified because the layer of
                software between the applications
                and the chip frees the designers to
                change the chip architecture
                without causing x86 software
                developers to have to recompile
                their code.

                Code Morphing software can be
                updated independently of
                hardware by loading a software
                upgrade into Flash memory.
The Last Word

"Considering the complexity of the project, it is amazing
  how well it works, how fast it works, and how low-
  power it is. For the end-user, this is just a normal PC,
  but under the hood, it is a technological marvel."
              -- Marc Fleischmann,Transmeta

"Revolutionary may be an overstatement, but they are
  definitely different..."
             -- Cahners Microprocessor Report

To top