Brief Architecture Description
AMD 64-bit Processor
Professor Vetón Kepuska
Digital Systems Design (ECE-5571)
Spring Term 2003
Florida Institute of Technology (FIT)
TABLE OF CONTENTS
Title Page 1
Table of Contents 2
Introduction and Scope 4
64-Bit Processor Mode Registers 5
Instruction Subsets 6
Memory Management 8
Status (Flags) Register 11
Operand and Data Types 13
The most important distinguishing feature of the new AMD 64-bit processor when
compared to others of its kind like the Intel Itanium II processor is that it is backward
compatible with the existing x86 architecture. This allows it to natively execute (without
re-compilation) legacy 16 or 32-bit x86-based applications on operating systems like
DOS, Windows, Linux, etc. At the same time, because of its increased 64-bit addressing
capabilities and its expanded set of register resources, new or re-compiled 64-bit
applications can execute natively.
The drawback to this approach is the inherent complexity resulting from a processor that
can now be categorized as a “jack of all trades.” One has no choice but to wonder how
many new features or capabilities did not make the design because the execution of
legacy applications would have been impaired. The AMD design team has, without
doubt, accomplished a very significant feat. But it accomplished this with a complicated
design that retains all the resources previously used by legacy applications and
incorporates new resources needed by today’s more demanding 64-bit applications.
INTRODUCTION AND SCOPE
The new AMD 64-bit processor incorporates two main operating modes: long and
legacy. The long mode incorporates two sub-modes: 64-bit mode and compatibility
mode. The legacy mode incorporates three sub-modes: protected mode, virtual-8086
mode, and real mode (see figure below).
In order to limit the amount of material covered in this report, I decided to focus on the
long 64-bit mode of operation of the processor that, as shown in the figure above, is
designed to support the current and future needs of 64-bit operating systems and
applications. More specifically, I will attempt to describe the architecture that supports
64-bit addresses with 32-bit and 64-bit operands along with the 64-bit wide general-
purpose registers (GPRs).
The report will briefly discuss, but will not present in detail the description of the more
advanced media or floating-point instructions possible with this processor. There are
entire volumes of documentation written at the AMD web site discussing these
instruction types and their capabilities. I encourage the reader to access and download
the set of manuals listed in the REFERENCES section (page 15) of this report for a
more in-depth coverage of these and a multitude of other related topics. The majority of
material presented here was abstracted from these manuals.
64-Bit PROCESSOR MODE REGISTERS
A system with the AMD 64-bit processor can support any of the modes of operations
described in the abstract section of this report at the same time. The operating system
and/or applications written to execute on such a system can query the processor using the
CPUID instruction to determine its capabilities before using any of its extended 64-bit
features. The CPUID instruction will return processor feature information in the EAX,
EBX, ECX, and EDX registers.
If after executing the CPUID instruction, your application determines that the processor
supports the 64-bit mode of operation, then your program can use the set of extended
registers and stack described in the table below. Note that all subsets of these registers
used for compatibility and legacy modes are also accessible from the 64-bit mode of
operation. Thus 64-bit mode has access to a much larger register set than is shown in
Register Set or Stack Name Number Size (bits)
General Purpose RAX, RBX, RCX, RDX, RBP, 16 64
Registers (GPRs) RSI, RDI, RSP, R8-R15
128-bit XMM Registers XMM0-XMM15 16 128
64-bit MMX Registers MMX0-MMX7 8 64
x87 Registers FPR0-FPR7 8 80
Instruction Pointer RIP 1 64
Flags RFLAGS 1 64
Stack -- 64
The instruction set is sub-divided into four subsets each utilizing its own set of registers:
1. General Purpose instructions include: Basic x86 integer instructions and
instructions to load, store, or operate on general-purpose registers or memory data.
Depending on the effective operand size, opcode, address size, or stack size used by
the instruction, the register size or subset of bytes that are referenced from these 64-
bit registers is determined. For example, register RAX in an instruction would
reference all 64-bits, EAX the low 32-bits, AX the low 16-bits, and AL the low 8-bits
stored in the register. This allows a single 64-bit register to be used in all modes of
operations. Access to the extended GPRs requires the use of a single REX prefix.
REX prefixes must immediately precede the first opcode byte in an instruction and
they must have a value in the range 40h to 4Fh (see MOV RAX example below).
2. 128-bit Media instructions include: Streaming Single-Instruction, Multiple-Data
(SIMD) extensions to load, store or operate on 128-bit XMM register data, and the
processing of vector (packed: defined as integer or floating-point values packed into
a single operand) and scalar data types. SIMD instructions can be performed
independently and simultaneously on multiple sets of vector (packed) data types.
These instructions are most useful when executing intensive multimedia and scientific
applications that process blocks of data at a time.
3. 64-bit Media instructions include: Multimedia extensions (MMX) and AMD
3Dnow! instructions to load, store or operate on 64-bit MMX register data. As in the
128-bit media instructions above, these instructions are performed on vector (packed)
and scalar data types and are also characterized as SIMD instructions.
4. x87 Floating-Point instructions include: three floating-point subsets that can use up
to three distinct register sets to process floating-point operands:
(1) 128-bit media instructions support 32-bit single-precision and 64-bit double-
precision floating-point operations, in addition to integer operations. These
operations include a dedicated floating-point exception-reporting mechanism.
(2) 64-bit media instructions support single-precision floating-point operations on
both vector and scalar data, but do not support the exception-reporting
(3) x87 floating-point instructions support single-precision, double-precision and 80-
bit extended-precision floating-point operations on scalar-only data types and a
dedicated exception-reporting mechanism.
The 128 and 64-bit media instructions are designed to accelerate image processing, music
synthesis, speech recognition, full-motion video, and 3D graphics rendering applications
by processing vector (packed) data simultaneously using SIMD technology. For
example, a single 128-bit XMM register can hold 16 byte integer data elements. The
128-bit vector instruction can operate on all 16 integers independently and at once. Thus
two such 128-bit registers can perform 16 integer additions and return the results back to
the first operand register in a single operation. In addition, AMD’s SIMD
implementation includes saturating-arithmetic instructions that can simplify the handling
of overflow or underflow conditions by saturating at the largest or smallest possible value
that can be represented by the destination register thus avoiding unexpected conditions.
Virtual memory consists of the entire (virtual/linear) address space available to programs
(264 = 18446744073709551616 memory addresses), parts of which are located in
memory with the rest located on magnetic disk or other storage media. The operating
system can use selectors for code, stack, and data segments to protect running processes
from each other, but the base address is always 0 when running in this mode.
Segmentation in 64-bit mode is disabled. The entire virtual address space is considered a
single, un-segmented unit starting at address 0. The AMD engineers leave the
implementation of any segmentation model when operating in this mode to the operating
system software developer and claim that this allows for “more efficient coding of new
64-bit multi-programming operating systems”.
Segment registers hold the selectors used to access memory segments. Since 64-bit mode
does away with segmentation, the CPU recognizes only three of the available six
segment-registers. These are CS (used to store attributes like the “default-size” (D) bit
and the “long-mode” (L) bit) and FS/GS (used by software to make address calculations).
But even when using those three registers, 64-bit mode utilizes a very small subset of
Physical Memory is defined as the installed memory in a system that can be accessed by
the processor through its memory bus interface. The number of address bits on the bus
(52 in this case) determines the maximum capacity of the physical memory space. As
shown in the figure below, paging is used in 64-bit mode to translate addresses between
the large virtual or linear address space and the physical address space.
In 64-bit mode, virtual addresses of 64-bits in size can be generated. These addresses are
then passed to the long-mode paging function that generates 52-bit long physical
addresses. Thus 12-bits of the virtual address are used to specify the memory page
number to use. The entire virtual address space can then be subdivided into a maximum
of 212 or 4096 pages containing 252 or 4503599627370496 physical addresses each.
The system uses little-endian (least-significant byte is stored at the lowest memory
address and the most-significant byte is stored at the highest memory address) byte
ordering. The image below shows how a quad-word (defined to be 8 bytes long) data
type is stored in memory or in a general-purpose register. Instructions are read from
memory one byte at a time starting at the least-significant (lowest address). For example,
the 64-bit MOV RAX instruction consisting of ten bytes (48 B8 8877665544332211) has
“48” as an instruction prefix specifying a 64-bit operand size, “B8” as the opcode that
along with the prefix identify the 64-bit RAX register as the destination, and
“8877665544332211” as the immediate value or operand to be moved to RAX. In
memory, “48” is stored at the lowest instruction address, and “11” is stored at the highest.
In 64-bit long mode, addresses specified with 16-bits or 32-bits are zero-extended to form
the full 64-bit address. The instruction-pointer register (also called RIP or Relative
Instruction Pointer) is used to locate the next instruction to execute in memory. In 64-bit
mode RIP contains the displacement (offset) from base address 0 of the one and only
segment. RIP is used to aid in the calculation of the effective address by adding to it a
displacement (near pointer). The latter is important because it allows for the addressing
of data in position-independent code. Far pointers containing addresses to data are not
used in 64-bit mode since the architecture supports only a single data segment (flat-
The stack is defined as a “stack frame” consisting of two 64-bit registers: a stack-frame
base pointer (rBP) and a stack pointer (rSP). The base address of the current stack
segment is stored in the “stack-segment” register (SS). Whenever rBP and rSP are used,
the processor must access the current stack segment. The figure below shows the state of
SS, rBP and rSP before and after data has been pushed onto the stack. After the push,
rSP becomes the new top of the stack.
STATUS (FLAGS) REGISTER
As outlined in the 64-Bit PROCESSOR MODE REGISTERS section, the RFLAGS
register is a 64-bit register. It is accessed when the processor is in the 64-bit mode of
operation. The figure below highlights the low 16-bits of the RFLAGS register. These
set of bits include the flags that are accessible to software applications.
These flags include one control flag (direction (DF)) and six status flags (carry (CF),
parity (PF), auxiliary carry (AF), zero (ZF), sign (SF) and overflow (OF)). The DF
controls the direction of string operations. The six status flags provide result information
from logical and arithmetic operations, and control information for conditional move and
jump instructions. Bits 31-16 are flags accessible to system software only, and bits 63-32
Direction Flag (DF): Determines the order in which strings are processed. Setting the
flag to “1” means that the data pointer for the next string instruction will be decremented.
Setting the flag to “0” means that the data pointer for the next string instruction will be
Carry Flag (CF): The carry flag gets set to “1” by the hardware when the last integer
addition results in a carry or the last integer subtraction results in a borrow out of the
most-significant bit position of the result. The flag gets cleared to zero otherwise. The
carry flag is also used in conjunction with shift and rotate instructions to shift bits of
operands. AND, OR and XOR logical instructions clear the carry flag. Finally, bit-test
instructions (BTx) set the value of the carry flag depending on the tested bit’s value.
Parity Flag (PF): The parity flag gets set to “1” by the hardware if the last result of
certain operations include an even number of “1” bits in the least-significant byte.
Auxiliary Carry Flag (AF): The auxiliary carry flag gets set to “1” by the hardware if
the last binary-coded decimal operation results in a carry (during addition) or a borrow
(during subtraction). The flag is cleared otherwise.
Zero Flag (ZF): The zero flag gets set to “1” by the hardware if the last arithmetic
operation results in a zero, otherwise the flag is cleared. Other instructions like compare
and test also affect this flag.
Sign Flag (SF): The sign flag gets set to “1” by the hardware if the last arithmetic
operation resulted in a negative value, otherwise the flag is cleared. The value of the sign
flag is equal to the value of the most significant bit of the result.
Overflow Flag (OF): The overflow flag gets set to “1” by hardware when the most-
significant (or sign) bit of the result of the last signed-integer operation differed from the
signs of the operands. The flag is otherwise cleared. This condition indicates that the
magnitude of the result of the operation is too big (overflow) or too small (underflow)
for its defined data type.
OPERAND DATA TYPES
Operands can be obtained from a variety of sources. These sources include: registers,
memory, I/O ports or immediate. Operands obtained from an immediate source are
included as part of the instruction and do not require a memory, register, or I/O port
Operands can have the following data types:
Double Quadword 16 bytes (64-bit mode only)
Quadword 8 bytes (64-bit mode only)
Doubleword 4 bytes
Word 2 bytes
Byte 1 byte
Double Quadword 16 bytes (64-bit mode only)
Quadword 8 bytes (64-bit mode only)
Doubleword 4 bytes
Word 2 bytes
Byte 1 byte
Packed BCD 1 byte divided into two 4-bit pieces
BCD Digit 1 byte divided into two 4-bit pieces
Bit 1 bit
And can be interpreted as the following types of numbers and strings:
Signed (two’s-complement) integers
Binary Coded Decimal (BCD) digits
Packed BCD digits
Strings, including bit strings
The new AMD 64-bit processor (also referred to as “Opteron”) is a very complex digital
system. It would take a lot more than two weeks and ten pages to thoroughly describe all
of its characteristics in detail. In this report, my intent was to provide enough new
information about its new 64-bit mode of operation to be instructive and to hopefully
encourage the reader to seek more in-depth, related materials from other sources.
The report lists the main modes of operation of the processor and places emphasis on the
new 64-bit mode. It lists the 64-bit mode registers and briefly discusses the available
instruction subsets (General Purpose, 128-bit Media, 64-bit Media, and x87 Floating-
Point) and capabilities. Thereafter, the report delves into a more detailed description of
the processor’s 64-bit mode memory management characteristics; stack and status
register descriptions, and available operand data types.
It is my personal opinion that in the interest of maintaining backward compatibility with
the legacy x86 architecture, AMD has limited the capabilities of the new processor. I
believe that AMD should have done like Intel and taken a dual track approach: (1) to
enhance the legacy 32-bit x86 architecture processors to continue to support their existing
customer base, and (2) to create a brand new 64-bit processor with innovative new
features, without the legacy support, but with support for the upcoming 64-bit operating
systems and applications. The latter could have resulted in a simpler design with more
silicon available for the addition of new features or capabilities and perhaps a smaller
footprint. I see the new AMD 64-bit processor as a transitional processor, one that will
have a temporary lifespan.
Nevertheless, once the choice to remain backward compatible had been made, the AMD
processor design team did an excellent job at incorporating 64-bit support to their
existing line of 32-bit processors.
1. “AMD x86-64 Architecture Programmer’s Manual Volume 1: Application
Programming”, Publication No. 24592, Revision 3.07, Advanced Micro Devices
(AMD), September 2002 (http://www.amd.com/us-
2. “AMD x86-64 Architecture Programmer’s Manual Volume 2: System
Programming”, Publication No. 24593, Revision 3.07, Advanced Micro Devices
(AMD), September 2002 (http://www.amd.com/us-
3. “AMD x86-64 Architecture Programmer’s Manual Volume 3: General-Purpose
and System Instructions”, Publication No. 24594, Revision 3.02, Advanced Micro
Devices (AMD), August 2002 (http://www.amd.com/us-
4. “AMD x86-64 Architecture Programmer’s Manual Volume 4: 128-Bit Media
Instructions”, Publication No. 26568, Revision 3.03, Advanced Micro Devices
(AMD), August 2002 (http://www.amd.com/us-
5. “AMD x86-64 Architecture Programmer’s Manual Volume 5: 64-Bit Media and
x87 Floating-Point Instructions”, Publication No. 26569, Revision 3.02,
Advanced Micro Devices (AMD), August 2002 (http://www.amd.com/us-
6. Internet Usenet Newsgroups: com.sys.intel and alt.comp.hardware.amd