Embed
Email

poster

Document Sample

Shared by: huanglianjiang1
Categories
Tags
Stats
views:
0
posted:
12/22/2011
language:
pages:
9
Path Profile Estimation and

Superblock Formation

Jeff Pang

Jimeng Sun

Motivation

Compile Optimize Run



Profile



Why Continuous Profiling? Challenges:

– Continuous Optimization – Automated

– Dynamic Optimization – Low overhead

– Realistic Profiles

– Accuracy



Related Work:

H. Chen, et al. Dynamic Trace Selection Using Performance

Hardware Sampling. CGO, 2003.

A. Shye, et al. Analysis of Path Profiling Information Gathered

with Performance Monitoring Hardware. ICCA, 2005.

Goals

Superblock Run with

Formation Simulated PMU

Path Profile Sample

Path Profile

Estimation

• Take advantage of modern Performance Monitoring Units

– Like in Pentium 4, Itanium, PPC 970, etc.

– Allows sampling of last couple branches

– “Simulated” for our project using instrumentation

• Estimate full path profile using samples

• Validate by doing Superblock formation

– Optimization to improve scheduling

on VLIW processors

– Path-based Superblocks based on Young (1997)

Design Overview

instrument instrumented

(pmu sim) program

source frontend

optimized

superblock backend

program





estimated Offline sampled

path profile estimator profile

• Implemented PMU simulator and Superblock

optimization as SUIF passes

• Implemented Estimator offline using sampled branch

profiles and SUIF CFG

Path Sampling Exact paths:

A

50 50 ABDEG

• Exact path profile: B C ACDFG

– Accurate 50

D

50





– But expensive 50 50

Edge Profile:

• Edge profile ABDEG

E F

– Inaccurate (due to the independence ACDFG

50 50

assumption) G and

– Cheap ABDFG

• It is hard (impossible) to reconstruct the ACDEG

path information

• Sampling path profile Sampling:

– Periodically sample 4 consecutive branches {AB, DE}

(branch trace buffer) {AC, DF}

– Cheap to collect and more accurate than =>

edge profile ABDEG

ACDFG

Hot Path Formation

• Sampling paths are short

• Sampling paths => longer paths

– Join 2 paths if they can merge into one

simple path and the frequencies about both

paths are large

– e.g. 5000 ABCD, 4000 CDEF => 4000

ABCDEF

Path Estimation Accuracy

• We compare the top 100

100%

Accuracy

90%



80%



paths captured by the 70%



60%



exact path profile and the 50%



40%

10k

100k





estimated path profile 30%



20%

1m









10%



0%

adpcm_e 099.go 132.ijpeg





• The success rate is

30



Σest ∩ act cycleact / 25

runtime

runtime





Σact cycleact 20



15



10



5



0

adpcm_e 099.go 132.ijpeg

Superblock Formation

A A

B F B F A A

A

A A A

C C C

A B B A

D G D D G

E E E A B



Tail Duplication Loop Unrolling Combinations



• Creates larger regions to schedule over

for hot paths

Superblock Performance

• Performance results Code Expansion (x86 ELF)



pending 1.8



– Waiting for CASH 1.6



simulator setup… 1.4









Normalized Exe Size

1.2



• Superblock formation 1 ba se

e x a ct

0.8

on P4 useless 0.6

e stim ate





– Causes 0-5% 0.4



0.2

slowdown on tested

0

benchmarks (probably a dpcm _e 132.ijpe g 099.go



due to icache misses) A pplication





– Need multi-issue

architecture to see

sched. benefits?



Related docs
Other docs by huanglianjiang...
ИТОГИ
Views: 0  |  Downloads: 0
AW Nov08 PT FINAL.indd
Views: 0  |  Downloads: 0
Michigan Arts
Views: 0  |  Downloads: 0
Educational Attainment - CT.gov Home
Views: 0  |  Downloads: 0
frankfurt_doctors_1107
Views: 8  |  Downloads: 0
Perceptionsoct07
Views: 0  |  Downloads: 0
4300 LP 4 x 2
Views: 2  |  Downloads: 0
20090515154711
Views: 0  |  Downloads: 0
CPChicago
Views: 0  |  Downloads: 0
Parent Release Form
Views: 1  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!