Programming for the Cell Broadband
Engine Architecture on the PlayStation 3
By Faisal Rabbani
Table Of Contents
Table Of Contents ......................................................................................... 2
CBEA -Cell Broadband Engine Architecture ............................................ 3
The PPE -Power Processing Element ............................................................................. 3
The SPE -Synergistic Processing Elements .................................................................... 3
Language Extension Examples for CBEA.................................................. 4
Vector Data Types .......................................................................................................... 4
Signals and Mailboxes .................................................................................................... 5
MFC Direct Memory Access .......................................................................................... 6
SPU Intrinsic Calls .......................................................................................................... 6
Working with IBM Cell BE SDK 2.1 .......................................................... 7
Supported Platforms........................................................................................................ 8
SDK Components ........................................................................................................... 8
Installing the SDK........................................................................................................... 8
Additional Commands .................................................................................................... 9
The SDK and Yellow Dog Linux 5.02 (and PS3) ..................................... 10
Updating The GNU C Library ...................................................................................... 10
The “Hello World” App ............................................................................. 10
The Next Step ............................................................................................................... 14
CBEA -Cell Broadband Engine Architecture
The Cell Broadband Engine is a new family of microprocessors conforming to the Cell
Broadband Processor Architecture (CBEA). The CBEA is a new architecture that extends
the 64-bit PowerPC Architecture. The CBEA and the Cell Broadband Engine are the
result of collaboration between Sony, Toshiba, and IBM (known as STI, formally started
in early 2001).
The Cell Broadband Engine microprocessor is comprised of nine processors operating on
a shared, coherent memory. Their function is specialized into two types: the PowerPC
Processor Element (PPE), and the Synergistic Processor Element (SPE). The Cell
Broadband Engine has one PPE and eight SPEs.
It should be noted that on the PlayStation 3 one of the SPEs is disabled and another
one is locked, providing the programming access to 6 SPEs. The locked SPE is
dedicated to running the Game Operating System, and the disabled SPE is for
improved chip yields (i.e. Cell BE chips do not have to be discarded if one of the
SPE is/becomes defective).
The PPE -Power Processing Element
The PPE is a general-purpose, dual-threaded, 64-bit RISC processor that conforms to the
PowerPC Architecture (version 2.02) supporting the Vector/SIMD Multimedia
Extensions (VMX). (Programs written for the PowerPC 970 processor can run on the Cell
Broadband Engine without modification).
The PPE can run both 32-bit and 64-bit operating systems and applications. It has a 32KB
L1 instruction cache, a 32 KB L1 data cache, unified 512 KB L2 data and instruction
cache, and 32 128 bit vector (16 8-bit components, 8 16-bit components or 4 32-bit
components format) registers.
The SPE -Synergistic Processing Elements
Each of the 8 SPEs is a 128-bit RISC processor specialized for computationally intensive
and data rich SIMD applications. The SPE is comprised of the SPU (Synergistic
Processing Unit) and the MFC (Memory Flow Controller).
The SPU deals with instruction control and execution. It includes a single register file
with 128 registers (each one 128 bits wide), a unified (instructions and data) 256 KB
local store (LS) and a DMA (Direct Memory Access) interface.
The MFC contains a DMA controller that supports DMA transfers. Programs running on
the SPU, the PPE, or another SPU, use the MFC’s DMA transfers to move instructions
and data between the SPU’s LS and main storage. Each DMA transfer can be up to 16
KB in size, but the SPU can issue DMA-list commands that can represent up to 2048
DMA transfers (each one up to 16 KB in size).
This autonomous execution of MFC DMA commands and SPU instructions allows DMA
transfers to be conveniently scheduled to hide memory latency.
Note: storage of data and instructions in the Cell Broadband Engine is big-endian.
Language Extension Examples for CBEA
Vector Data Types
Though the PPU (the PowerPC Processing Unit of the PPE) has built in support for VMX
(Vector/SIMD Multimedia extension) instructions, certain types of vectors are only
available to the SPU. The CBEA supports following vector declarations for the SPU,
PPU or both:
vector unsigned char: 16 8-bit unsigned chars -both
vector signed char: 16 8-bit signed chars -both
vector unsigned short: 8 16-bit unsigned half-words -both
vector signed short: 8 16-bit signed half-words -both
vector unsigned int: 4 32-bit unsigned words -both
vector signed int: 4 32-bit signed words -both
vector unsigned long long: 2 64-bit unsigned double-words –SPU only
vector signed long long: 2 64-bit signed double-words –SPU only
vector float: 4 32-bit single-precision floats -both
vector double: 2 64-bit double-precision floats –SPU only
vector bool char: 16 8-bit booleans – 0 (false) 255 (true) -PPU only
vector bool short: 8 16-bit booleans – 0 (false) 65535 (true) -PPU only
vector bool int: 4 32-bit booleans – 0 (false) 232 – 1 (true) – PPU only
vector pixel: 8 16-bit unsigned half-word, 1/5/5/5 pixel – PPU only
Both PPU VMX instructions and SPU vector instructions are supported by C/C++
language extensions that define vector data types and vector intrinsics (intrinsics are
commands in the form of C-language function calls mapped to one or more inline-
assembly instructions).
However, these extensions are different for the PPU and SPU - for example, given
vectors v1, v2, and v3 of the same data type:
1) the vector addition intrinsic (which supports short, int and float) on the PPU will
look like:
v3 = vec_add( v1, v2 )
And the addition intrinsic on the SPU will look like:
v3 = spu_add( v1, v2 )
2) Where as the vector multiply intrinsic (supports all data-types) exists only for the
SPU in the form:
v3 = spu_mul(v1, v2)
Signals and Mailboxes
The PPE can use signals to send information to the SPE and mailboxes to send and
receive information from the SPEs along with DMA transfers.
Signals are information sent on the signal-notification channel. These channels are
inbound (to an SPE) registers only. Each SPE has two 32-bit signal-notification registers,
each of which has a corresponding memory-mapped I/O (MMIO) register into which the
signal-notification data is written by the sending processor.
Mailboxes are queued in a SPE’s MFC for exchanging 32-bit messages between the SPE
and the PPE or other devices. Two mailboxes (the SPU Write Outbound Mailbox and
SPU Write Outbound Interrupt Mailbox) are provided for sending messages from the
SPE. One mailbox (the SPU Read Inbound Mailbox) is provided for sending messages to
the SPE.
In the following example, the SPU reads the channel count register to check for inbound
mailbox messages before invoking a (blocking) read command to read the message from
the register. The SPU then writes to the outbound channel the same message incremented
by one:
“if(spu_readchcnt(SPU_RdInMbox) )
“{
“ unsigned int var = spu_readch(SPU_RdInMbox);
“ spu_writech(SPU_WrOutMbox, ++var );
“}
MFC Direct Memory Access
SPEs rely on asynchronous DMA transfers to hide memory latency and transfer overhead
by moving information in parallel with synergistic processor unit (SPU) computation. In
the following code, the SPE issues a DMA transfer GET command to receive 4KB (4096
bytes) of information, and perform computations while waiting on the DMA transfer to
complete. (The SPU receives the effective address via mailbox.)
“unsigned int tag_id = 31;
“unsigned int tag_mask = 1
int main( unsigned long long id,
unsigned long long argp,
unsigned long long envp )
{
/*
The first parameter of an spu program will always be
the spe id of the spe issued by the ppu upon loading.
The second and third parameters are optional, and are
passed by the ppu
*/
printf( "Hello Cell (0x%llx)!\n", id );
return 0;
}
4) Copy the following source to a file labeled “Makefile” and save it in the “spu”
directory as well:
PROGRAM_spu := spu_main
LIBRARY_embed := lib_spu_main.a
include /opt/ibm/cell-sdk/prototype/make.footer
5) Change directory back to the “hello” directory:
“cd ..
6) Copy the following source-code to a file labeled “main.c” and save it in this
directory:
#include
#include
#include
#include
#include // CBEA SPE programming
#include // POSIX threads programming
/* Handle to our SPU program */
extern spe_program_handle_t spu_main;
#define MAX_SPUS 16
/* Arguments passed to the thread function */
typedef struct
{
spe_context_ptr_t ctx;
unsigned int entry;
int runflags;
void* argp;
void* envp;
spe_stop_info_t stop_info;
}thread_arg_t;
pthread_t threads[ MAX_SPUS ]; //SPE thread
thread_arg_t thread_args[ MAX_SPUS ]; //SPE thread’s argument
/* spe thread function –will invoke the spu program */
void* thread_function( void *arg )
{
/*
spe_context_run() is a blocking call so we will
run each spe’s context in a separate thread
*/
thread_arg_t* p = (thread_arg_t*)arg;
if( spe_context_run( p->ctx, &p->entry, p->runflags,
p->argp, p->envp, &p->stop_info ) MAX_SPUS )
num_spus = MAX_SPUS;
for( i = 0; i //fo low level OS instructions
#include //for VMX instrinsics
And the SPE sample code can access SPU intrinsics by including multiple header files in
the SPE source file:
#include //for vector/channel manipulation
#include //for MFC composite intrinsic calls
The Next Step
The IBM’s Developer Works website contains a wealth of CBEA resources such as
tutorials, articles, technical documents, and samples code all available for free from at
website: http://www.ibm.com/developerworks/power/.