Discovering and Exploiting Program Phases
Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma
400 Million Instructions
Non-Existent Processor New Processor
New Compiler
Spec2000 Benchmark Simulator
400 Million Instructions
• Suppose you have a time budget… • Less than half second of execution time • What would you simulate?
– Beginning? – Middle? – End?
400 Million Instructions
Programs exhibit diverse modes of behavior
gzip
gcc
400 Million Instructions
• Suppose you have a time budget… • Less than half second of execution time • What would you simulate?
– – – – Beginning? Middle? End? Samples of different modes of behavior
Program Phases
• Observation: programs exhibit various modes of periodic behavior • These modes are program phases • Challenge: Extract these automatically
Phase Basics
• Intervals – slices in times • Phases – intervals with similar behavior
IPC
Time (Instruction Count)
Phase Basics
• Intervals – slices in times • Phases – intervals with similar behavior
IPC
Time (Instruction Count)
Defining “Similar Behavior”
• Metric for comparing intervals?
– Cache misses? – IPC? – Branch misprediction rates?
• Problem: Performance alone is too architecture dependent
Defining “Similar Behavior”
• Code path traversal
– Directly affects time-varying behavior – Execute same code, same performance – Architecture independent
• Metrics for code path traversal
– Frequency of branches – Frequency of function calls – Frequency of basic block calls
Basic Block Vector
Time t 0 B1 0 B2 0 B3 0 B4
B1
B2
B3
B4
Basic Block Vector
Time t 1 B1 1 B2 0 B3 1 B4
B1
B2
B3
B4
Basic Block Vector
Time t 2 B1 1 B2 1 B3 2 B4
B1
B2
B3
B4
Basic Block Vector
Time t 2 B1 1 B2 1 B3 2 B4
B1
B2
B3
Time t + 1
0 B1
0 B2
0 B3
0 B4
B4
Basic Block Vector
Time t 2 B1 1 B2 1 B3 2 B4
B1
B2
B3
Time t + 1
1 B1
1 B2
0 B3
1 B4
B4
Basic Block Vector
Time t 2 B1 1 B2 1 B3 2 B4
B1
B2
B3
Time t + 1
2 B1
2 B2
0 B3
2 B4
B4
Manhattan Distance = |1 – 2| + |1 – 0| = 2 Euclidian Distance = sqrt((1 – 2)2 + (1 – 0)2) = sqrt(2)
Basic Block Similarity Matrix
• gzip
Basic Block Similarity Matrix
• gcc
BBV similarity between intervals reflects performance similarity
Automatic Phase Classification
• Classify intervals into phases
– We do not know which BBVs correspond to particular phases a priori
• k-means clustering
– Iterative clustering algorithm – Dimension Reduction
• Random Linear Projection
– Try different k values
• Use BIC to choose best
Automatic Phase Classification
Automatic Phase Classification
Clustering accurately distinguishes phases automatically
SimPoint
• Simulate large programs on a budget • Perform detailed simulation on representative code snippets
– Choose centroid interval from each phase (10 million instructions)
• Extrapolate large program performance
– Weighted by frequency of phase
SimPoint
• Simulate 400 million instructions total
Accurate estimate despite instruction budget
Why SimPoint Succeeds
• Program behavior varies over time • SimPoint intelligently chooses which intervals to simulate • Regularity within program phases allows accurate extrapolation
Online Classification
• Detect phases as program is running • Applications
– Thread scheduling – Power management – Predicting future phases
• Challenges
– One pass of input – Limited storage
Online Classification
Online Classification
High variance in metrics across full trace
Low variance shows online classification succeeds in finding phases
Conclusions
• Phases are a vital abstraction
– Performance varies greatly w/in program – Attributable to different modes of behavior – Offline: k-means clustering – Online
• Can discover phases automatically
• Code path characterization
– Strong correlation with actual performance – SimPoint exploits this with great success
Outline
• Introduction (motivate) • Basics (definitions, BBV, BBMatrix) • Offline Phase Classification
– SimPoints
• Online Phase Classification • Conclusions
Limitations of Clustering
Bayesian Information Criterion
• Fit to Gaussians
Self-Modifying Code
Self-modifying code
85o
Program Phases
Learning Phases
Learning Phases