Document Sample

Bitwidth Analysis with Application to Silicon Compilation Amit Chaudhari Paper by Mark Stephenson*, Jonathan Babb+, Saman Amarasinghe* *MIT Laboratory for Computer Science +Princeton @ ACM SIGPLAN conference on Programming Language Design and Implementation, Vancouver, British Columbia, June 2000 Goal • For a program written in a high level language, automatically find the minimum number of bits needed to represent: – Each static variable in the program – Each operation in the program. Usefulness of Bitwidth Analysis • Higher Language Abstraction • Enables other compiler optimizations 1. Synthesizing application-specific processors 2. Optimizing for power-aware processors 3. Extracting more parallelism for SIMD processors Bitwidth Opportunities • Runtime profiling reveals plenty of bitwidth opportunities. • For the SPECint95 benchmark suite, – Over 50% of operands use less than half the number of bits specified by the programmer. Analysis Constraints • Bitwidth results must maintain program correctness for all input data sets – Results are not runtime/data dependent • A static analysis can do very well, even in light of this constraint Bitwidth Extraction • Use abundant hints in the source language to discover bitwidths with near optimal precision. • Caveats – Analysis limited to fixed-point variables. – The hints assume source program correctness. The Hints • Bitwidth refining constructs 1. Arithmetic operations 2. Boolean operations 3. Bitmask operations 4. Loop induction variable bounding 5. Clamping operations 6. Type castings 7. Static array index bounding 1. Arithmetic Operations • Example int a; unsigned b; a = random(); b = random(); a: 32 bits b: 32 bits a = a / 2; a: 31 bits b: 32 bits b = b >> 4; a: 31 bits b: 28 bits 2. Boolean Operations • Example int a; a: 32 bits a = (b != 15); a: 1 bit 3. Bitmask Operations • Example int a; a: 32 bits a = random() & 0xff; a: 8 bits 4. Loop Induction Variable Bounding • Applicable to for loop induction variables. • Example int i; i: 32 bits for (i = 0; i < 6; i++) { i: 3 bits … } i: 3 bits 5. Clamping Optimization • Multimedia codes often simulate saturating instructions. • Example int valpred valpred: 32 bits if (valpred > 32767) valpred = 32767 else if (valpred < -32768) valpred = -32768 valpred: 16 bits 6. Type Casting (Part I) • Example int a; char b; a: 32 bits b: 8 bits a = b; a: 8 bits b: 8 bits 6. Type Cast1ing (Part II) • Example int a; char b; a: 32 bits b: 8 bits a: 8 bits b: 8 bits b = a; a: 8 bits b: 8 bits 7. Array Index Optimization • An index into an array can be set based on the bounds of the array. • Example int a, b; int X[1024]; a: 32 bits b: 32 bits a: 10 bits b: 8 bits X[a] = X[4*b]; a: 10 bits b: 8 bits Propagating Data-Ranges • Data-flow analysis • Three candidate lattices – Bitwidth – Vector of bits – Data-ranges a: 4 bits a = a + 1 Propagating bitwidths a: 5 bits Propagating Data-Ranges • Data-flow analysis • Three candidate lattices – Bitwidth – Vector of bits – Data-ranges a: 1X a = a + 1 Propagating bit vectors a: XXX Propagating Data-Ranges • Data-flow analysis • Three candidate lattices – Bitwidth – Vector of bits Four bits are required – Data-ranges a: <0,13> a = a + 1 Propagating data-ranges a: <1,14> Propagating Data-Ranges • Propagate data-ranges forward and backward over the control-flow graph using transfer functions described in the paper • Use Static Single Assignment (SSA) form with extensions to: – Gracefully handle pointers and arrays. – Extract data-range information from conditional statements. Example of Data-Range Propagation a0 = input() a1 = a0 + 1 a1 < 0 Range-refinement functions true a2 = a1:(a10) a4 = a1:(a10) a3 = a2 + 1 c0 = a4 a5 = (a3,a4) b0 = array[a5] Example of Data-Range Propagation a0 = input() <-128, 127> <-2, 8> a1 = a0 + 1 <-127, 127> <-1, 9> <-1, -1> a1 < 0 <-127, -1> true <0, 9> <0, 127> a2 = a1:(a10) a4 = a1:(a10) a3 = a2 + 1 c0 = a4 <0, 9> <-126, 0> <0, 127> <0, 9> a5 = (a3,a4) b0 = array[a5] <-126, 127> array’s bounds are [0:9] <0, 9> What to do with Loops? • Finding the fixed-point around back edges will often saturate data-ranges. • Instruction in loops comprise the bulk of dynamically executed instruction! Their Loop Solution • Find the closed-form solutions to commonly occurring sequences. – A sequence is a mutually dependent group of instructions. • Use the closed-form solutions to determine final ranges. Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 Finding the Closed-Form Solution a = 0 for i = 1 to 10 a = a + 1 for j = 1 to 10 a = a + 2 for k = 1 to 10 a = a + 3 ...= a + 4 Finding the Closed-Form Solution a = 0 <0,0> for i = 1 to 10 a = a + 1 <1,460> for j = 1 to 10 a = a + 2 <3,480> for k = 1 to 10 a = a + 3 <24,510> ...= a + 4 <510,510> • Non-trivial to find the exact ranges Finding the Closed-Form Solution a = 0 <0,0> for i = 1 to 10 a = a + 1 <1,460> for j = 1 to 10 a = a + 2 <3,480> for k = 1 to 10 a = a + 3 <24,510> ...= a + 4 <510,510> • Non-trivial to find the exact ranges Finding the Closed-Form Solution a = 0 <0,0> for i = 1 to 10 a = a + 1 <1,460> for j = 1 to 10 a = a + 2 <3,480> for k = 1 to 10 a = a + 3 <24,510> ...= a + 4 <510,510> • Can easily find conservative range of <0,510> Solving the Linear Sequence a = 0 for i = 1 to 10 <1,10> a = a + 1 for j = 1 to 10 <1,100> a = a + 2 for k = 1 to 10 <1,100> a = a + 3 ...= a + 4 • Figure out the iteration count of each loop. Solving the Linear Sequence a = 0 for i = 1 to 10 <1,10> a = a + 1 <1,10>*<1,1>=<1,10> for j = 1 to 10 <1,100> a = a + 2 <1,100>*<2,2>=<2,200> for k = 1 to 10 <1,100> a = a + 3 <1,100>*<3,3>=<3,300> ...= a + 4 • Find out how much each instruction contributes to sequence using iteration count. Solving the Linear Sequence a = 0 for i = 1 to 10 <1,10> a = a + 1 <1,10>*<1,1>=<1,10> for j = 1 to 10 <1,100> a = a + 2 <1,100>*<2,2>=<2,200> for k = 1 to 10 <1,100> a = a + 3 <1,100>*<3,3>=<3,300> ...= a + 4 (<1,10>+<2,200>+<3,300>)<0,0>=<0,510> • Sum all the contributions together, and take the data- range union with the initial value. Results • Standalone Bitwise compiler. – Bits cut from scalar variables – Bits cut from array variables • With the DeepC silicon compiler. percentage of bits remaining 20 40 60 80 100 0 softfloat adpcm bubblesort life intmatmul jacobi with Bitwise median mpegcorr benchmark convolve histogram dynamic profile intfir parity pmatch Percentage of Original Scalar Bits sor percentage of bits remaining 0 10 20 30 40 50 60 70 80 90 100 softfloat adpcm bubblesort life intmatmul jacobi with Bitwise median mpegcorr benchmark convolve dynamic profile histogram intfir parity pmatch Percentage of Original Array Bits sor DeepC Compiler Targeted to FPGAs C/Fortran program Suif Frontend Pointer alias and other high-level analyses Bitwidth Analysis Raw parallelization MachSuif Codegen DeepC specialization Verilog Traditional CAD optimizations Physical Circuit Area (CLB count) 0 200 400 600 800 1000 1200 1400 1600 1800 2000 adpcm (8) bubblesort (32) convolve (16) histogram (16) intfir (32) Without bitwise intmatmul (16) jacobi (8) life (1) median (32) FPGA Area mpegcorr (16) newlife (1) With bitwise parity (32) •On average bitwidth optimized circuit used 57% less area pmatch (32) sor (32) Benchmark (main datapath width) XC4000-09 Clock Speed (MHZ) 0 100 125 150 25 50 75 adpcm bubblesort convolve histogram intfir Without bitwise intmatmul jacobi life median (50 MHz Target) mpegcorr FPGA Clock Speed newlife With bitwise parity pmatch sor Power Savings Without bitwidth analysis With bitwidth analysis Average Dynamic Power (mW) 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 bubblesort histogram jacobi pmatch •On average, analysis reduced power by 50%. Power Savings • C ASIC – IBM SA27E process • 0.15 micron drawn – 200 MHz • Methodology – C RTL – RTL simulation Register switching activity – Synthesis reports dynamic power Summary • Bitwise: a scalable bitwidth analyzer – Standard data-flow analysis – Loop analysis – Incorporate pointer analysis • Demonstrated savings when targeting silicon from high-level languages – 57% less area – up to 86% improvement in clock speed – less than 50% of the power Thank You

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 4 |

posted: | 10/26/2011 |

language: | English |

pages: | 41 |

OTHER DOCS BY panniuniu

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.