Docstoc

Shubu_Mukherjee

Document Sample
Shubu_Mukherjee Powered By Docstoc
					Pre-Silicon Simulation of
Multi-Core Benchmarks



Shubu Mukherjee
Principal Engineer
Director, SPEARS Group
Intel Corporation



Panel in Symposium on Workload Characterization, Sep
27, 2007
Detailed Model Good for Core Analysis
                                           Socket

                               Core



                              Uncore




Single core simulation model executes ~ 12 milliseconds
of a real machine’s execution
•    Assumes core speed = 1 KIPS (kilo simulated insts per second)
•    Assumes each simulation run is about 10 hours




 2
Four-Socket Platform Model Too Slow




 •   1-socket simulation model executes ~ 1-3 milliseconds of a real
     machine’s execution
 •   4-socket simulation model executes only 100s of microseconds of
     a real machine’s execution (recall disk latency is in milliseconds)




3
What 10x Speed Improvement Gives Us?

Improved Accuracy
•   Via greater coverage of benchmark slices
•   Better glassjaw analysis


Faster Turnaround
•   Improved Latency
•   Faster debugging


Improved Benchmarking
•   Greater coverage of benchmarks
•   Enables multithreaded (cooperative) benchmarks



4
Approaches to Boost Simulation Speed
(one key charter for SPEARS)




 Improve Basic Infrastructure


 Create Faster Core Models That are Less Accurate


 Go Parallel in a Modular Fashion


 Use Accelerators, such as FPGAs




5
    What’s Novel Here?
Parallel Simulation is an Old Technology
•   Distributed, discrete-event simulation, Fujimoto, 1990
•   Wisconsin Wind Tunnel I + II, Reinhardt, et al 1992 & Mukherjee, et al. 1997
•   Customized for specific applications (e.g., shared memory)


So, What Are the Challenges?
•   Starting point is several millions of lines of non-parallel C++ code (!)
•   This is production software  must be stable (unlike “research” software)
•   Parallel infrastructure must be modular, built once, used repeatedly without
    changing any architecture model code
•   Deal with new problems: load imbalance at multiple levels


Current Status: Created infrastructure, Work-In-Progress




    6
Speedup of the Pthread-per-socket Model
(on Clovertowns)




 Speedup scales linearly with problem size
 LOT more room for improvement exists

7

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:3/29/2012
language:
pages:7