64-Bit Micro-Architecture_ IA-64 or x86-64_

Document Sample
64-Bit Micro-Architecture_ IA-64 or x86-64_ Powered By Docstoc
					64-Bit Micro-Architecture: IA-64 or x86-64?
Paul Spiteri
BSc (Hons) Software Engineering B.Sc.
Supervisor: Antony Nicol
Second Reader: Rosemary Brown
December 2003



Abstract
More and more modern applications are suffering from the limitations of 32-bit microprocessor
architectures, as they begin to be reached.

This paper examines whether there is a need to move to a 64-bit platform, and a comparison of the two
architectural choices that are currently available to fulfil 64-bit needs, with the merits and weaknesses
of each exposed.

An explanation and comparison of RISC & CISC architecture designs examines the advantages of each
and concludes RISC is now the most effective philosophy when coupled with a modern efficient program
compiler that can harness the concurrency potential of super-scalar microprocessors.

The architectures examined and compared are the new Intel IA-64 RISC architecture and AMD’s x86-
64 extension to the existing x86 CISC architecture.

Potential problems with existing software backwards-compatibility and compiler dependency are
considered.

The paper concludes that IA-64 is a sensible design which overcomes most issues raised with x86 over
its lifespan and is a worthy successor. However, the x86-64 extension by AMD also addresses some of
the original x86 flaws and modernises the design.
This allows the x86-64 to be currently superior due to its full backwards-compatibility with existing
compiled software, and easy migration support.




                                                   1
1.     Introduction

The current 32-bit generation of microprocessors have been in production in the mainstream since the
launch of the Intel 80386 CPU in 1985. There is a strong argument that the next step up to 64-bit CPUs
is necessary to continue enhancing performance.
Considering 32-bit has been in use for such a long period, and today’s architectures have little
resemblance to those original 80386 designs, this sounds a sensible suggestion. Bear in mind the 80386
is already an extension to the original 8086 16-bit architecture.

There are many factors which require considering before the route to a 64-bit platform can be decided.
   • Is there an actual need for 64-bit registers?
   • Is the current 32-bit addressable memory limitation of 4 GB becoming problematic?
   • Should an all new architecture be designed or an existing architecture extended to 64-bit?
The disadvantages of moving to a 64-bit platform are equally important.

This paper will try to answer these questions as well as comparing the various approaches to 64-bit
computing from rival CPU manufacturers, specifically the Intel IA-64 architecture and the AMD x86-64
architecture.




                                                 The diagram to the left highlights a
                                                 variety of differences, including the
                                                 doubling of size of the internal data
                                                 paths.



Figure 1: Basic Comparison of 32bit & 64bit


The views on 64-bit computing from Intel and AMD differ largely. While both believe that there is a
purpose to extending CPU design to 64-bits, they consider the demand for such is in very different
places, and should be achieved in different ways.

The Intel approach is aimed solely towards high end enterprise servers, requiring fast processing power
and large quantities of memory for demanding applications such as large scale databases.
(Rattner) claims "It could be the end of the decade" before mainstream desktops need more than 4GB of
memory.

AMD however, believe there is a demand for the advantages 64-bit brings for workstation and even
desktop computers right now. Their approach is to design a CPU that runs existing software at high
speed, with expansion capabilities to 64-bits. This will allow the 4GB memory barrier to be broken as
well as potentially unlocking extra processing speed in certain circumstances, as described in (Dorian
2002).


According to Tim Sweeney (Lead Developer of Unreal 3-D Engine, Epic Games), there is a genuine use
for 64-bit workstation CPUs today.

                                                   2
                        re
“On a daily basis we' running into the Windows 2GB barrier with our next-generation content
development and pre-processing tools.
                                                                            d
If cost-effective, backwards-compatible 64-bit CPUs were available today, we' buy them today. We
need them today.
                                         t
And our next-generation 3d engine won' just support 64-bit, but will basically REQUIRE it on the
content-authoring side.”

Sweeney believes a 64-bit CPU should be brought to the home sooner rather than later, as his work
requires it. Due to Intel disagreeing, Sweeney is siding with AMD at this time.
We tell Intel this all the time, pleading for a cost-effective 64-bit desktop solution. Intel should be
listening to customers and taking the leadership role on the 64-bit desktop transition and not making
ridiculous "end of the decade" statements.


While 64-bit resolves memory constraint issues, the other advantage brought from 64-bit registers is the
ability to handle a larger dynamic range of numbers. Currently if a 64-bit value is required, the compiler
pairs two registers together to form a 64-bit value. This has obvious performance drawbacks.
However it should be clarified, most applications require only 32-bit registers to avoid overflow or
underflow.
(Stokes 2003) Mostly only the realm of scientific computing requires 64-bit for simulations etc. One
other use for larger registers is cryptography. As this often involves multiplication on huge integers, a
processor that can handle 64-bits at a time would offer large performance gains.


Intel’s reluctance to bring 64-bit CPU’s to the home user stems from their belief that x86 should not be
given another lease of life by further extension. Their 64-bit architecture is all new, which has positioned
it solely at high-end server use, where new software and operating systems will be less of an
inconvenience to deploy than to the desktop.


AMD on the other hand, have taken the x86 extension approach (Cleveland 2001). Their chip has been
designed to be fully compatible with existing 32-bit code to avoid the impact of having to immediately
migrate to a new operating system and software – instead it is as an optional extra.


Both of these design concepts will be further examined in detail, later in the paper.
To clarify, this 64-bit computer architecture paper is not focused on performance testing. Rather, it is
focused on architecture design, and discusses long-term potentials.

Section 2 identifies critical areas of architecture design that have influenced the two competing designs
examined in detail within section 3. Section 4 considers deployment issues and choices leading towards
conclusions drawn in Section 5.




                                                     3
2.     Essential Knowledge

There are two basic requirements for a CPU to be 64-bit. It must be able to address a memory capacity
significantly larger than 4GB (e.g. 1 TB) and have general purpose registers with a 64-bit dynamic
range.
The importance of RISC & CISC designs is critical to this paper as both rival architectures examined
use one of the two.


2.1:    RISC vs. CISC
       CISC (Complex Instruction Set Computers)
       RISC (Reduced Instruction Set Computers)

       The simplest way to compare the two architectures is examine how a relatively simple task is
       performed on each, and study the advantages and disadvantages of each.
       For this example, the simple problem set is to calculate the cubed value of a given number, 20.
       In a high level language, such as C++ the statements of code to cube the value of 20 stored in
       variable ‘A’ would be:

          1.    int A = 20;
          2.    A = Cube(A);

2.2:     The RISC Approach
       RISC processors use only simple instructions that generally can be executed within one clock
       cycle. Thus the cube operation would be performed by using the multiply operation twice,
       resulting in 20 * 20 * 20 being calculated. The ‘MOVE’ instruction is also used to move data into
       the registers.

       In order to perform the exact series of steps described in the C++ code, a compiler would need to
       code five lines of assembly:

          1.    MOV A, 20
          2.    MOV B, A
          3.    MUL B, A
          4.    MUL B, A
          5.    MOV A, B



2.3:     The CISC Approach
       The main goal of a CISC microprocessor is to complete a given task with the smallest number of
       actual assembly language instructions. This is achieved by building processor hardware that is
       capable of understanding and executing a large number of operations. For this particular example,
       the CISC processor comes with an instruction that can calculate the cube of a value.
       This allows the compiler to generate code that would in all probability, look like this:

          1. MOV       A, 20
          2. CUBE      A

       ‘A’ represents a main memory location, as CISC instructions can usually refer to memory and not
       only registers. As you can see, the assembly language of the CISC processor compares very
       closely to the original C++ code.

                                                    4
2.4:    RISC / CISC Comparison

       At first, the RISC method seems like a much less efficient way of completing the operation. As
       there are more instructions to execute, the program file size will be larger to store each opcode.
       Also, the compiler has a much more complex task to break the high-level language statement into
       the multiple basic operations, rather than simply converting straight to the CISC CUBE instruction.
       Debugging the larger RISC code may also be more difficult (Mann 1997) due to the instruction
       code being longer and less like the original code.

       As each RISC instruction is quick to execute, the total time is likely to be similar to the CISC time
       taken even though there are more instructions.

       (Tang 1996) points out that the RISC instructions also require less transistors of hardware space
       than the complex instructions, leaving more room for general purpose registers etc. As the chips
       can be made smaller, this can reduce the per-chip cost dramatically.

       However RISC has an advantage as it can pipeline the instructions to execute simultaneously,
       which can greatly improve performance but is heavily reliant on appropriate code. This greatly
       increases the requirement of an efficient compiler.

       The essence of RISC architecture is that it allows the execution of more operations in parallel and
       at a higher rate than possible with a CISC architecture employing similar implementation
       complexity. It can not only improve parallelism by pipelining, but also make superscalar and out-
       of-order execution. (Zhongli Ding)



3.      Architectural Comparison
3.1: The Competitors

       There are two 64-bit designs currently in contention, one from each major PC CPU manufacturer
       (INTEL & AMD).
       The Intel architecture named IA-64 is a clean slate design, conceptualised in the early 90’s.
       (Crawford 2000) describes the architecture as a RISC design that can execute multiple instructions
       simultaneously. It makes use of VLIW (very long instruction words) for added flexibility, to group
       instructions that can be executed in parallel (a characteristic referred to as superscalar).

       On the other hand, (Cleveland 2001) describes how AMD is taking a less disruptive approach to
       the challenge of 64-bit computing with x86-64. Their design is an extension on the existing x86-32
       CISC architecture to overcome maximum addressable memory restrictions, and allow 64-bit
       calculations to be performed natively.

       Whether one approach is better or not is difficult to judge, as both have obvious advantages and
       disadvantages.
       Each aspect of the designs will now be examined, to help determine which is superior in differing
       scenarios.




                                                     5
3.2:     GPR (General Purpose Register) Comparison
       As the IA-64 is RISC based architecture, the number of general purpose registers is large.
       Compared with traditional x86 designs, the 128 integer registers and 128 floating point are
       positively massive. Obviously, the general purpose registers are 64-bit wide, and the floating point
       are 82-bits wide. (Turley 2002) / (Huck, Morris 2000).


                                                                 The diagram to the left
                                                                 shows the general purpose
                                                                 integer and floating point
                                                                 registers, and their widths.
                                                                 Note the floating point is
                                                                 slightly wider (82-bit), and
                                                                 the Program Counter (PC) is
                                                                 64-bit, like the general
                                                                 integer registers.
       Figure 2: A simplified diagram          of IA-64
       internal registers

       This architecture gives the programmer or compiler an incredibly rich supply of registers, totalling
       328.


       AMD’s choice of register design was easily decided once they chose to extend the existing x86-32
       architecture.
       Comparable to when the original 8086 register set of 8 16-bit registers were extended to 32-bits for
       the launch of the 80386 CPU, AMD is implementing their extension to 64-bit in the same way.
       The 32-bit full registers were accessed by using different assembly language mnemonics to address
       each general purpose register, e.g. AX became EAX. For AMD’s 64-bit extension the new
       mnemonic prefix is R, e.g. AX register becomes RAX for the full 64-bits.

       (Leibson 2000) details a further augmentation that AMD has implemented to the existing x86
       design. While operating in 64-bit mode, the CPU can also make use of an extra eight general
       purpose registers, which should help overcome some of the limitations of this ageing architecture.
       The new registers double the total GPRs, as the diagram below shows.




                                                                     This diagram shows
                                                                     the general purpose
                                                                     registers have been
                                                                     doubled in width, and
                                                                     also     doubled    in
                                                                     quantity. The program
                                                                     counter also has been
                                                                     doubled in width.


       Figure 3: A diagram detailing the new/extended
       register set for x86-64

                                                     6
Overall the first impression when comparing the two register sets, is the IA-64 appears vastly
superior. While the x86-64 has reduced the restrictive quantity of registers over its predecessors,
the amount of IA-64 registers is still many times larger which appears very useful and should lead
to substantial performance benefits.

While this is extremely useful during functions execution, Gwennap (Oct 1999) points out:
With IA-64’s 128 integer registers, saving and restoring the entire register file takes more than
four times as long as on a standard RISC processor.

Traditional function calls would involve pushing the 128 general purpose registers on to the stack,
and popping on return. This is far from ideal as it is very time consuming.

(Turley 2002) describes how the IA-64 architecture brings with it a method to counter this
problem. The solution is called ‘register frames’.
The registers are split into groups which are individually visible to separate tasks. Each group, or
‘frame’ maps logical register numbers onto different physical registers. Each task’s frame logically
starts at GR32 although the actual register is unlikely to be.

Parameters can be efficiently passed between functions using this method, by allowing separate
frames to overlap which results in two register names pointing to the same physical register
location.




                                                        The diagram to the left
                                                        shows an example of register
                                                        framing, where two tasks
                                                        have allocated themselves 11
                                                        registers each, both starting
                                                        at different locations yet
                                                        named GR32 onwards. They
                                                        also overlap to allow data to
                                                        be shared.

Figure 4: An example of IA-64 Register
Framing


Regardless of the large number of registers, eventually the processor will run out. When this
occurs, the traditional pushing registers onto the stack is used. However, this differs from other
architectures as the procedure is fully automated by the processor and does not require handling by
the programmer or compiler while creating the instruction code.

Taken as a whole, one would have to say the IA-64 approach to general purpose registers is
superior. The vast quantity of available registers and elegant handling of them is simply superior to
x86-64’s implementation. However, there is no doubt x86-64 is still a marked improvement over
x86-32.




                                               7
3.3:     Floating Point Comparison
       The IA-64 floating point unit is of course a new design, and while the power is not in question
       there does appear to be some interesting quirks in the design.

       (Gwennap May 1999) The floating-point instruction set is built around a fused multiply-add
       (MAC) construct. Simple addition and multiplication are synthesized, using the constants +0.0 and
       +1.0 stored in FR0 and FR1, respectively.

       This means there is actually is no instruction to multiply or add two numbers together.
       This is due to the FPU being designed to perform MAC Ops (Multiply-Accumulate Operations)
       which essentially means two numbers are multiplied together, with the answer added to a third.
       The solution to allow ‘standard’ floating point multiplication to be performed is to reset the third
       ‘adder’ value to zero.
       (Sharangpani 2000) also explains how similarly, a floating point add can be performed by setting
       the multiplier value to 1, and the ‘adder’ to the 2nd floating-point value to combine.

       On the other hand, the AMD designers have chosen to modernise the x86 design by choosing to
       ignore the legacy x87 floating point unit while updating the architecture to 64-bits. Figure 3 shows
       it has not been extended, as programmers are encouraged to use the SSE/SSE2 unit for floating
       point operations. This allows much higher performance, and of higher precision (128-bits vs. 79-
       bits).

       Essentially, both methods employed here are quite similar. The typical disadvantage of the x86
       does not apply here as the floating point unit is roughly as modern as IA-64.
       AMD have made it quite clear the old method should be completely bypassed and the modern SSE
       unit should be used.
       I have come to the conclusion that neither of the architectures can claim their method is
       significantly superior.


3.4.1: x86 Compatibility – IA-64
      The ability to execute existing software written for the highly popular x86 platform is clearly
      important, when designing a new CPU.
      IA-64 does allow for x86 binaries to be executed in an x86 compatibility mode, which essentially
      ‘emulates’ an x86 CPU through hardware. All of the x86 instructions are supported, including
      additional instruction sets such as MMX or SSE, and can be used to run an entire x86-32 operating
      system, or just individual applications from within an IA-64 OS.
      This compatibility mode actually maps the x86 general purpose registers onto its own IA-64
      register set, as shown below.


                                                               This diagram shows the
                                                               standard general purpose
                                                               registers well known to x86-
                                                               32 programmers (EAX,
                                                               EBX, ECX etc.) map onto
                                                               the IA-64 general purpose
                                                               registers starting at GR8.
                                                               Only the lower 32-bits are
                                                               usable.

       Figure 5: Diagram showing x86 on IA-64

                                                     8
     However, preparing to execute x86 instructions is not a simple task. (Gwennap 99) warns of this:
     The switching overhead comes in preparing for the transition. Because of the register overlap, any
     shared registers with important data must be explicitly saved to memory before switching modes.
     Before calling an x86 routine, IA-64 code must properly set up the x86 segment descriptors, PSR,
     and EFLAG registers. This mode-switch overhead makes it impossible to mix x86 and IA-64 code
     at the subroutine level.


3.4.2: x86 Compatibility – x86-64
      (Zeichick 2003) points out that the full binary compatibility with existing x86 software that AMD
      x86-64 has, is probably its greatest asset.

     This is achieved by using various modes of operation.
     The first is ‘legacy mode’ which instructs the CPU to function exactly as a standard x86-32 CPU
     would, with full speed compatibility for 32-bit software, and the extra 64-bit registers disabled.

     The other mode is ’64-bit long mode’. This mode is set by the operating system during start-up,
     and thus requires a 64-bit operating system to utilise it.
     This ‘long mode’ is further split into two sub-modes – ’64-bit mode’ and ‘compatibility mode’.
     This allows individual processes running on the OS to be of 32-bit legacy nature or 64-bit code –
     with no performance penalty.




                                                                The diagram to the left
                                                                shows     how     a    64-bit
                                                                operating system can in fact
                                                                run legacy 32-bit software in
                                                                compatibility mode, while
                                                                also running 64-bit software
                                                                – when the CPU has ’64-bit
                                                                long mode’ enabled.


     Figure 6: Diagram showing how x86-64 allows
     legacy and 64-bit code to be run
     simultaneously.

     Of course, only 64-bit long mode code can take advantage of the extra registers and larger
     addressable memory capabilities.

     The default pointer size is 64-bit for long mode programs, to ensure they can point to data
     anywhere within the maximum addressable memory range. However, the default integer size
     remains at 32-bit as the majority of uses do not require such large values. While there should be no
     performance loss to use 64-bit integers by default, the waste of memory would be significant
     within the internal cache.


     All research for x86 code compatibility concludes that the advantage here clearly goes to AMD
     x86-64. Not only can it execute 32-bit code at a high level of performance, but can run it

                                                   9
     simultaneously alongside full 64-bit applications when under a 64-bit compatible operating
     system.
     The Intel IA-64 on the other hand struggles to execute x86 code with any serious level of
     performance, barely rivalling a Pentium 75 MHz. While this may improve in future revisions of
     the design, currently AMD have this area of 64-bit computing under their control.



4.    Architectural Summary
4.1 Deployment Issues

     The above details of each architecture shows how widely different both approaches are to 64-bit
     computing, with few similarities chosen.

     As (Hans de Vries 2003) details, the main weakness of the IA-64 architecture is the requirement
     for entirely new software to run on it, due to the completely new instruction set etc. The AMD also
     cannot take advantage of the extra features without recompilation of software – but is still arguably
     the fastest x86-32 microprocessor architecture created to date.
     However, recompiling existing x86 software is simpler thanks to California based PathScale
     recently (Q4’03) announcing a new version of their compiler suite with native code generation for
     x86-64, claiming the highest performance is achievable with this combination by some 40%.

     Research is showing companies are unwilling to go through this demanding task to move software
     over to IA-64. When Sun Microsystems CEO Scott McNealy was queried why they chose x86-64
     over IA-64 he responded:
     We maintained binary compatibility with the entire x86 software base with x86-64. We took the
     Xeon Solaris binary and it immediately ran on Opteron (x86-64 Architecture CPU).
     Itanium (IA-64 Architecture) would require an entire re-write and recompile and re-certification
     of our operating system and then of every application that ran on top of it.


     Another deployment issue is cost of production. Theories by (Hans de Vries 2003) & (Leibson
     2000) point towards the 64-bit extension element of x86-64 actually being relatively inexpensive
     in terms of CPU core size. This is backed up by Marty Seyer, vice president of AMD'                  s
     Microprocessor Business Unit who said:
                                05                  05,"
     "I think it will be in the ' timetable. Late ' when queried when AMD would cease production
     of 32-bit CPUs.
     This implies there is little reason to not include 32 & 64-bit support throughout their entire product
     line, which can only aid x86-64 take-up by the industry.


     However, Froutan (2003) has a differing view on this dual support.
     "By building 32-bit support into its 64-bit processors, AMD is actually giving developers less of a
     reason to port their applications to 64 bits and, therefore, slowing down the adoption of 64-bit
     systems."

     I disagree with this argument by Froutan. For AMD to create a 64-bit CPU their most logical path
     was to extend their existing 32-bit CPU, the Athlon which is an x86 chip. Judging from recent
     market share and financial reports, they do not have the resources to develop or persuade the
     industry to adopt an all new architecture platform. To remove the 32-bit compatibility mode would
     be suicidal considering the slow uptake of the backwards-incompatible IA-64 during its early
     years.
                                                   10
4.2 Compiler Optimisation Importance

     For IA-64 it is now the job of the compiler to locate parallelism and take advantage of the super-
     scalar architecture. The importance of compilers and parallelism for this architecture is highlighted
     by the opening lines of J Bharadwaj, J Pierce, 2000.
     In planning the new architecture, Intel designers wanted to exploit the high level of instruction-
     level parallelism (ILP) found in application code. To accomplish this goal, they incorporated a
     powerful set of features such as control and data speculation, register framing, and a large
     register file. By using these features, the compiler plays a crucial role in achieving the overall
     performance of an IA-64 platform.

     (Stokes 2003) and (Hans de Vries 2003) both explain: x86-64 does not rely on compiler
     optimisations as highly, mainly due to the inherent x86 handicap of being nonparallel in nature. As
     it is a super-scalar design, it does try to find instruction level parallelism in hardware, as well as
     reordering instructions. However it is limited by having no foreknowledge of the entire program,
     making the scope of the optimisations limited to the few instructions already in the pipeline.

     (Turley 2002) describes how essentially, the IA-64 and x86-64 are completely opposite in this
     area. The IA-64 does not re-order instructions at all – this is purely the job of the compiler where
     as the x86-64 has to constantly search the binary stream of Op-codes looking for possible ways of
     reorganising instructions in such a way to make the most efficient use of its hardware resources.
     All this must be achieved within nanoseconds without affecting the critical path or impacting the
     clock frequency.


     It is clear from the evidence presented the IA-64 approach to software optimisation is superior.
     Allowing the compiler to create optimisations is obviously more efficient than trying to locate
     shortcuts during run-time. This also allows the CPU core to be smaller as it has no re-ordering etc.
     hardware which could potentially reduce production costs.

     However, the notable disadvantage is that it simply shifts the complexity onto the compiler.



5.    Conclusion
     The original problems highlighted by this paper such as memory capacity restrictions, can easily
     be solved by migrating to a 64-bit platform. However, there is no single solution.

     I fully appreciate Intel’s approach to the 64-bit design task. If a 64-bit platform is to be created, it
     does seem logical to design an all new architecture using all the latest concepts and developments
     to create the fastest microprocessor.

     The compiler developments in recent years have helped shift the advantages of CISC architectures
     towards RISC, due to RISC being better able to take advantage of super-scalar architecture.
     Therefore Intel’s approach does appear ideal in terms of bringing the best theoretical architecture
     to the industry.

     However, it appears Intel is being too conservative with their forecast of bringing 64-bit CPUs to
     the desktop by the end of the decade. There are situations in some desktop applications and
     certainly in workstation scenarios where 64-bit registers and memory capacity greater than 4GB is
     advantageous or even required. This need can only increase within the near future meaning Intel

                                                    11
risks turning people to AMD’s 64-bit solution if they do not alter their public roadmap and bring
64-bit computing to the desktop ahead of schedule.


(Leibson 2001) describes AMD’s design decisions to extend the existing x86 platform to 64-bit’s
was a logical move for two reasons.
1.It has worked before. (16-bit 32-bit).
2.They do not have the market share to successfully pioneer an all-new architecture.

It also has pure advantages, as existing software that is currently hitting the 4 GB limit can be
recompiled and be immediately faster while not suffering this limitation. (Kaplowitz 2003) further
examines the ease with which software can be ported to x86-64.
Also, since it can run 32-bit and 64-bit code concurrently, the migration process is eased
dramatically.


However, the long term future should be considered. Turley believes:
It is inconceivable that Intel cannot extend and enhance IA-64 for another decade or more.
On the other hand, the x86-64 architecture is the result of more than a decade of stretching and
enhancing.

Nonetheless, (McGrath 2000) points out AMD have tried to modernise the x86 architecture in as
many ways as possible. For example, the floating point performance has been brought up to date
with the addition of the SSE/SSE2 FP unit, to replace the antiquated x87 unit. The other complaint
of shortage of registers in x86 has also been partially addressed by doubling the quantity.

Overall, at this time the x86-64 does appear to be an effective answer to the original problems
raised that demand 64-bit solutions. While the IA-64 architecture is fascinating and has clear
advantages over traditional x86, until higher take-up is achieved by lower prices and greater
availability of software, it is potentially destined for failure when compared to the inexpensive, fast
& compatible AMD x86-64.

Of course, if there is one manufacturer who could successfully roll-out a new architecture to the
industry, it is Intel. To discard them would be a considerable error of judgement.




                                               12
References
Sean Cleveland (2001)
x86-64 Technology White Paper
Advanced Micro Devices Inc.

Daniel Mann
‘Why the x86 CISC beat RISC’
http://www.amd-embedded.com/Benchmarks/whyx86.htm

Linley Gwennap (Oct 1999)
Merced (IA-64) Shows Innovative Design
Microprocessor Report, Volume 13, Number 13

Linley Gwennap (May 1999)
IA-64: A Parallel Instruction Set
Microprocessor Report, Volume 13, Number 7

Peter Song (Jan 1998)
Demystifying EPIC and IA-64
Microdesign Resources

Martin Hopkins (Feb 2000)
A Critical Look at IA-64
Microprocessor Report

Kevin McGrath (Sept 2000)
x86-64: Extending the x86 architecture to 64-bits
Stanford University

Jay Bharadwaj, William Y. Chen, Weihaw Chuang, Gerolf Hoflehner, Kalyan Muthukumar ,Jim Pierce
The Intel IA-64 Compiler Code Generator (Sept 2000)
Intel

Harsh Sharangpani, Ken Arora (Oct 2000)
Itanium Processor Micro-Architecture
Intel

John Crawford (Sept 2000)
Introducing the Itanium Processors
Intel

Tim Sweeney - Quote (Feb 2003)
Founder and President, Epic Games
http://www.amd.com/us-en/Weblets/0,,7832_8366_7823_8718%5E8320,00.html

Jerry Huck, Dale Morris, Jonathan Ross, Allan Knies, Hans Mulder, Rumi Zahir (Oct 2000)
Introducing the IA-64 Architecture
Hewlett Packard / Intel

Peter Dorian (Feb 2002)
Making a Sound Choice between 32 and 64 Bits
Meta Group White Paper
Steve Leibson (April 2000)
                                                    13
AMD Drops 64-Bit Hammer On X86
Microprocessor Report

Jon Stokes (Jan 1999)
A Preview of Intel’s IA-64
Arstechnica

Jon Stokes (March 2003)
An Introduction to 64-bit Computing and x86-64
Arstechnica

Yi Gao, Shilang Tang, Zhongli Ding (1996)
Comparison between CISC and RISC
http://doms.uwimona.edu.jm:1104/coursefiles/CS52R/cs52r_lectures/p_29.pdf

Michael Kanellos / Justin Rattner (Intel) (Feb 2003)
Intel takes slow road to 64-bit PC chips
http://zdnet.com.com/2100-1103-985432.html

Alan Zeichick (June 2003)
Extending the x86 Architecture to 64-bits
AMD64 DevSource

Hans de Vries (Sept 2003)
                                              s
Understanding the detailed Architecture of AMD' 64 bit Core
http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html

Jim Turley (Feb 2002)
64-Bit CPUs: What You Need to Know
http://www.extremetech.com/article2/0,3973,231,00.asp

Robert McMillan (Nov 2003)
Are the Days of 32-Bit Chips Numbered?
http://www.pcworld.com/news/article/0,aid,113516,00.asp

David Kaplowitz (June 2003)
IBM Universal Database Software on the AMD Opteron Processor
http://www.amd.com/us-en/assets/content_type/DownloadableAssets/
AMD_IBM_DB2_on_AMD_Opteron_2003-06-17.pdf

Nelson H. F. Beebe
A Selected Bibliography of Publications about Microprocessors
Centre for Scientific Computing, University of Utah

Paul Froutan (June 2003)
Understanding the AMD64 Solution
Microprocessor Report

Scott McNealy - Interview (2003)
Sun Microsystems
http://www.computing.co.uk/News/1151480



                                                   14

				
DOCUMENT INFO
Shared By:
Tags: IA-64, IA-32
Stats:
views:36
posted:8/13/2010
language:English
pages:14
Description: IA means "Intel Architecture." IA-64 is "64-bit Intel architecture." Both IA-32 or IA-64, is known as Intel's processor architecture, and 32,64 and other figures represent 32-bit and 64-bit processors.