Embed
Email

ps3_rocks

Document Sample

Shared by: Kerala g
Categories
Tags
Stats
views:
0
posted:
12/7/2011
language:
pages:
14
Programming for the Cell Broadband

Engine Architecture on the PlayStation 3



By Faisal Rabbani

Table Of Contents





Table Of Contents ......................................................................................... 2



CBEA -Cell Broadband Engine Architecture ............................................ 3

The PPE -Power Processing Element ............................................................................. 3

The SPE -Synergistic Processing Elements .................................................................... 3



Language Extension Examples for CBEA.................................................. 4

Vector Data Types .......................................................................................................... 4

Signals and Mailboxes .................................................................................................... 5

MFC Direct Memory Access .......................................................................................... 6

SPU Intrinsic Calls .......................................................................................................... 6



Working with IBM Cell BE SDK 2.1 .......................................................... 7

Supported Platforms........................................................................................................ 8

SDK Components ........................................................................................................... 8

Installing the SDK........................................................................................................... 8

Additional Commands .................................................................................................... 9



The SDK and Yellow Dog Linux 5.02 (and PS3) ..................................... 10

Updating The GNU C Library ...................................................................................... 10



The “Hello World” App ............................................................................. 10

The Next Step ............................................................................................................... 14

CBEA -Cell Broadband Engine Architecture





The Cell Broadband Engine is a new family of microprocessors conforming to the Cell

Broadband Processor Architecture (CBEA). The CBEA is a new architecture that extends

the 64-bit PowerPC Architecture. The CBEA and the Cell Broadband Engine are the

result of collaboration between Sony, Toshiba, and IBM (known as STI, formally started

in early 2001).



The Cell Broadband Engine microprocessor is comprised of nine processors operating on

a shared, coherent memory. Their function is specialized into two types: the PowerPC

Processor Element (PPE), and the Synergistic Processor Element (SPE). The Cell

Broadband Engine has one PPE and eight SPEs.



It should be noted that on the PlayStation 3 one of the SPEs is disabled and another

one is locked, providing the programming access to 6 SPEs. The locked SPE is

dedicated to running the Game Operating System, and the disabled SPE is for

improved chip yields (i.e. Cell BE chips do not have to be discarded if one of the

SPE is/becomes defective).







The PPE -Power Processing Element



The PPE is a general-purpose, dual-threaded, 64-bit RISC processor that conforms to the

PowerPC Architecture (version 2.02) supporting the Vector/SIMD Multimedia

Extensions (VMX). (Programs written for the PowerPC 970 processor can run on the Cell

Broadband Engine without modification).



The PPE can run both 32-bit and 64-bit operating systems and applications. It has a 32KB

L1 instruction cache, a 32 KB L1 data cache, unified 512 KB L2 data and instruction

cache, and 32 128 bit vector (16 8-bit components, 8 16-bit components or 4 32-bit

components format) registers.







The SPE -Synergistic Processing Elements



Each of the 8 SPEs is a 128-bit RISC processor specialized for computationally intensive

and data rich SIMD applications. The SPE is comprised of the SPU (Synergistic

Processing Unit) and the MFC (Memory Flow Controller).

The SPU deals with instruction control and execution. It includes a single register file

with 128 registers (each one 128 bits wide), a unified (instructions and data) 256 KB

local store (LS) and a DMA (Direct Memory Access) interface.



The MFC contains a DMA controller that supports DMA transfers. Programs running on

the SPU, the PPE, or another SPU, use the MFC’s DMA transfers to move instructions

and data between the SPU’s LS and main storage. Each DMA transfer can be up to 16

KB in size, but the SPU can issue DMA-list commands that can represent up to 2048

DMA transfers (each one up to 16 KB in size).



This autonomous execution of MFC DMA commands and SPU instructions allows DMA

transfers to be conveniently scheduled to hide memory latency.



Note: storage of data and instructions in the Cell Broadband Engine is big-endian.









Language Extension Examples for CBEA







Vector Data Types



Though the PPU (the PowerPC Processing Unit of the PPE) has built in support for VMX

(Vector/SIMD Multimedia extension) instructions, certain types of vectors are only

available to the SPU. The CBEA supports following vector declarations for the SPU,

PPU or both:



 vector unsigned char: 16 8-bit unsigned chars -both

 vector signed char: 16 8-bit signed chars -both

 vector unsigned short: 8 16-bit unsigned half-words -both

 vector signed short: 8 16-bit signed half-words -both

 vector unsigned int: 4 32-bit unsigned words -both

 vector signed int: 4 32-bit signed words -both

 vector unsigned long long: 2 64-bit unsigned double-words –SPU only

 vector signed long long: 2 64-bit signed double-words –SPU only

 vector float: 4 32-bit single-precision floats -both

 vector double: 2 64-bit double-precision floats –SPU only

 vector bool char: 16 8-bit booleans – 0 (false) 255 (true) -PPU only

 vector bool short: 8 16-bit booleans – 0 (false) 65535 (true) -PPU only

 vector bool int: 4 32-bit booleans – 0 (false) 232 – 1 (true) – PPU only

 vector pixel: 8 16-bit unsigned half-word, 1/5/5/5 pixel – PPU only

Both PPU VMX instructions and SPU vector instructions are supported by C/C++

language extensions that define vector data types and vector intrinsics (intrinsics are

commands in the form of C-language function calls mapped to one or more inline-

assembly instructions).



However, these extensions are different for the PPU and SPU - for example, given

vectors v1, v2, and v3 of the same data type:



1) the vector addition intrinsic (which supports short, int and float) on the PPU will

look like:

 v3 = vec_add( v1, v2 )



And the addition intrinsic on the SPU will look like:

 v3 = spu_add( v1, v2 )





2) Where as the vector multiply intrinsic (supports all data-types) exists only for the

SPU in the form:

 v3 = spu_mul(v1, v2)





Signals and Mailboxes



The PPE can use signals to send information to the SPE and mailboxes to send and

receive information from the SPEs along with DMA transfers.



Signals are information sent on the signal-notification channel. These channels are

inbound (to an SPE) registers only. Each SPE has two 32-bit signal-notification registers,

each of which has a corresponding memory-mapped I/O (MMIO) register into which the

signal-notification data is written by the sending processor.



Mailboxes are queued in a SPE’s MFC for exchanging 32-bit messages between the SPE

and the PPE or other devices. Two mailboxes (the SPU Write Outbound Mailbox and

SPU Write Outbound Interrupt Mailbox) are provided for sending messages from the

SPE. One mailbox (the SPU Read Inbound Mailbox) is provided for sending messages to

the SPE.



In the following example, the SPU reads the channel count register to check for inbound

mailbox messages before invoking a (blocking) read command to read the message from

the register. The SPU then writes to the outbound channel the same message incremented

by one:



“if(spu_readchcnt(SPU_RdInMbox) )

“{

“ unsigned int var = spu_readch(SPU_RdInMbox);

“ spu_writech(SPU_WrOutMbox, ++var );

“}







MFC Direct Memory Access



SPEs rely on asynchronous DMA transfers to hide memory latency and transfer overhead

by moving information in parallel with synergistic processor unit (SPU) computation. In

the following code, the SPE issues a DMA transfer GET command to receive 4KB (4096

bytes) of information, and perform computations while waiting on the DMA transfer to

complete. (The SPU receives the effective address via mailbox.)



“unsigned int tag_id = 31;

“unsigned int tag_mask = 1



int main( unsigned long long id,

unsigned long long argp,

unsigned long long envp )

{

/*

The first parameter of an spu program will always be

the spe id of the spe issued by the ppu upon loading.



The second and third parameters are optional, and are

passed by the ppu

*/



printf( "Hello Cell (0x%llx)!\n", id );

return 0;

}



4) Copy the following source to a file labeled “Makefile” and save it in the “spu”

directory as well:



PROGRAM_spu := spu_main

LIBRARY_embed := lib_spu_main.a



include /opt/ibm/cell-sdk/prototype/make.footer



5) Change directory back to the “hello” directory:

 “cd ..



6) Copy the following source-code to a file labeled “main.c” and save it in this

directory:



#include

#include

#include

#include

#include // CBEA SPE programming

#include // POSIX threads programming



/* Handle to our SPU program */

extern spe_program_handle_t spu_main;

#define MAX_SPUS 16



/* Arguments passed to the thread function */

typedef struct

{

spe_context_ptr_t ctx;

unsigned int entry;

int runflags;

void* argp;

void* envp;

spe_stop_info_t stop_info;



}thread_arg_t;



pthread_t threads[ MAX_SPUS ]; //SPE thread

thread_arg_t thread_args[ MAX_SPUS ]; //SPE thread’s argument





/* spe thread function –will invoke the spu program */

void* thread_function( void *arg )

{

/*

spe_context_run() is a blocking call so we will

run each spe’s context in a separate thread

*/



thread_arg_t* p = (thread_arg_t*)arg;

if( spe_context_run( p->ctx, &p->entry, p->runflags,

p->argp, p->envp, &p->stop_info ) MAX_SPUS )

num_spus = MAX_SPUS;



for( i = 0; i //fo low level OS instructions

#include //for VMX instrinsics



And the SPE sample code can access SPU intrinsics by including multiple header files in

the SPE source file:

#include //for vector/channel manipulation

#include //for MFC composite intrinsic calls









The Next Step



The IBM’s Developer Works website contains a wealth of CBEA resources such as

tutorials, articles, technical documents, and samples code all available for free from at

website: http://www.ibm.com/developerworks/power/.



Other docs by Kerala g
union-budget-2012-13-highlights
Views: 81  |  Downloads: 0
notification M.Tech_05-03-09
Views: 56  |  Downloads: 0
India_Customs Regulation 1
Views: 52  |  Downloads: 0
CE Notification 39-2011-12.9.2011
Views: 50  |  Downloads: 0
STATISTICS
Views: 69  |  Downloads: 0
A Hero (R.K. Narayan)
Views: 87  |  Downloads: 6
RRBPatna-Info-HN
Views: 98  |  Downloads: 0
RRB-Notice-Para
Views: 100  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!