HPCaliper090903 ppt by Yx3Jx5zC

VIEWS: 9 PAGES: 26

									HP Caliper




Eric Gouriou
September 2003
        Today’s Agenda

            •   Intended audience of this presentation
            •   What is HP Caliper ?
            •   Measurements
            •   Usage
            •   Caliper cheat sheet
            •   Limitations
            •   Future plans
            •   More information on Caliper, DSPP
            •   Questions
3/16/2012                             HP Caliper DSPP Presentation   page 2
        Intended Audience

            •   Developers
            •   Tuning experts
            •   Engineers porting to HP-UX IPF
            •   Anyone interested in performance




3/16/2012                         HP Caliper DSPP Presentation   page 3
        What is HP Caliper ?

            •   Performance analysis & improvement tool
            •   Dynamic performance measurement tool for
                C / C++ / Fortran / assembly applications
            •   Data collection vehicle for compiler-feedback/PBO
            •   Works on all programs as is
                (32/64bits, debug or optimized, stripped, etc.)
            •   Multiple measurements via a unified interface
            •   Provides insights thanks to
                –   Itanium Performance Monitoring Unit (PMU)
                –   Dynamic binary instrumentation

3/16/2012                               HP Caliper DSPP Presentation   page 4
        Key Features

            •   Default measurement configurations, configurable
            •   Selective process & module measurements
            •   Text & HTML reports
            •   Performance datafile
            •   Three measurement models
                –   start application under Caliper
                –   attach to running process
                –   auto-invocation


3/16/2012                               HP Caliper DSPP Presentation   page 5
        Measurements

        •   PMU event counts
            –   Total count of selected hardware events per process
            –   Negligible overhead
            –   Default set of events, can be overriden
            –   400+ events described in Itanium2 documentation
            –   Non-default use for advanced users
            ------------------------------------------------
            Counter               Priv. Mask      Count
            -------------------------------------------
            IA64_INST_RETIRED      8 (USER)     3414917
            NOPS_RETIRED           8 (USER)      684477
            CPU_CYCLES             8 (USER)     1899187
            BACK_END_BUBBLE_ALL    8 (USER)      810631
            -------------------------------------------
            % of Cycles lost due to stalls (lower is better):
              42.68 = 100 * BACK_END_BUBBLE_ALL / CPU_CYCLES

            Effective CPI (lower is better):
              0.6956 = CPU_CYCLES / (IA64_INST_RETIRED - NOPS_RETIRED)

            Effective CPI during unstalled execution (lower is better):
              0.3987 = (CPU_CYCLES - BACK_END_BUBBLE_ALL) / (IA64_INST_RETIRED - NOPS_RETIRED)
            -------------------------------------------
3/16/2012                                              HP Caliper DSPP Presentation              page 6
        Measurements (cont’d)

        •   Histograms from samples of PMU data
            –   Allows identification of hotspots
            –   Module summary, function summary, function details,
                selected global counts and derived metrics
            –   Flat profile, Cache / TLB / Branch Prediction / ALAT
            –   Per-thread data available
            –   Very low overhead
            Function Summary
            -------------------------------------------------------------------------
            % Total Cumulat
                IP     % of              IP
            Samples    Total          Samples Function                           File
            -------------------------------------------------------------------------
              9.71     9.71              1286 livermore::init                    livermore.c
              6.03    15.75               799 livermore::main                    livermore.c
              4.07    19.82               539 libc.so.1::T_19_3c30_cl___doprnt_main doprnt.c
              0.60    20.42                 79 libc.so.1::_f80_to_dec            bindec.c
              0.45    20.86                 59 libc.so.1::getenv                 getenv.c
            ...

3/16/2012                                        HP Caliper DSPP Presentation                  page 7
        Measurements (cont’d)

            •   Traces of PMU samples
                –     Provides full details for each sample
                –     Low overhead but high volume of data
                –     Customize configuration file for relevant data
                ---------------------------------------------------------------------------------------
                       -----------------------DCache Miss---------------------- ------IP Samples------
                Sample Addr:Slot                     Data Bundle                Bundle Address
                Number (module:function)             Runtime Address    Latency (module:function)
                ---------------------------------------------------------------------------------------
                     1 0x3eda0:0                  0x200000007979f700          5 0x502f0
                       (dld.so::MM_malloc)                                      (dld.so::BU_grow)

                       2 0x211c0:0                  0x200000007950b200                  26 0x212a0
                         (dld.so::LE_finish_create)                                        (dld.so::LE_finish_create)

                       3 0x37bf0:0                  0x20000000795297b8                 172 0x37c40
                         (dld.so::R_apply_eplt_relocs)                                     (dld.so::R_apply_eplt_relocs)
                ...




3/16/2012                                               HP Caliper DSPP Presentation                                       page 8
        Measurements (cont’d)

            •   Source-level event counts
                –   Function call counts, arc counts
                –   High overhead, precise counts
                –   Done via dynamic binary instrumentation
                Function Count Details
                ----------------------------------------------------------
                      Total Function                           File [Line]
                ----------------------------------------------------------
                        150 livermore::abs                     livermore.c [~405]
                        104 libc.so.1::__milli_memset
                         92 libc.so.1::__milli_memmove
                ...
                Arc Counts
                --------------------------------------------------------------------------------------------
                  Total    Taken % Taken Source Address        Source Function                File [Line,Col]
                                          Target Address       Target Function                File [Line,Col]
                --------------------------------------------------------------------------------------------
                 28672   28616      99% 0x4005e70:2            livermore::init                livermore.c[376,7]
                                         0x4005e20:0           livermore::init                livermore.c[377,9]
                ...

3/16/2012                                             HP Caliper DSPP Presentation                                 page 9
        Measurements (cont’d)

        •   Call graph profile (gprof-like)
            –   Flat profile and call graph
            –   High overhead
            –   Hybrid of exact counts and PMU sampling
            Call Graph
            --------------------------------------------------------------
                                 De-      Called/Total          Parents
            Index %Time    Self scen-     Called+Self      Name      Index
                                 dants    Called/Total         Children
            --------------------------------------------------------------
                           0.00   0.00         1/1               *ROOT* [1]
            [2]     25.1   0.00   0.00         1            livermore::main [2]
                           0.00   0.00       150/150             livermore::abs [52]
                           0.00   0.00        30/30              livermore::clock [45]
                           0.00   0.00        14/14              livermore::init [3]
                           0.00   0.00        18/18              libc.so.1::printf [4]
            --------------------------------------------------------------
                           0.00   0.00        14/14              livermore::main [2]
            [3]     10.6   0.00   0.00        14            livermore::init [3]
            --------------------------------------------------------------
                           0.00   0.00        18/18              livermore::main [2]
            [4]      8.3   0.00   0.00        18            libc.so.1::printf [4]
                           0.00   0.00        18/18              libc.so.1::_doprnt [5]
                           0.00   0.00        18/18              libc.so.1::_wrtchk [7]

3/16/2012                                          HP Caliper DSPP Presentation           page 10
        Measurements (cont’d)

            •   PBO profile gathering configuration
                –   Auto-invoked when compiling using
                         +Oprofile=collect     (+I deprecated)
                –   Data used to improve compiler optimizations
                         +Oprofile=use         (+P deprecated)
                –   Can be done manually (caliper pbo ...), however not
                    recommended, sub-optimal
                –   Variable overhead
                > cc +Oprofile=collect -o livermore livermore.c
                > ./livermore
                [...]
                > ls flow*
                flow.data   flow.log
                > cc +Oprofile=use +O3 -o livermore livermore.c




3/16/2012                                             HP Caliper DSPP Presentation   page 11
        Usage

            •   Typical command line
                    caliper config_file [caliper_options] program [program_args]

            •   Example
                    > caliper fprof --process=all cc -o livermore livermore.c
            •   Configuration files:
                –   packaged ones
                –   copy/modify
                –   command-line overrides



3/16/2012                                  HP Caliper DSPP Presentation            page 12
        Usage (cont’d)

                  Type                        Configuration Files               Comments
            ----------------------------------------------------------------------------------------------
              Histograms                         fprof,                      reduced samples,
                                                 [d|i]cache_miss             very low impact
                                                 [d|i]tlb_miss
                                                 branch_prediction
                                                 alat_miss

             Call graph                          cgprof                         sampled + exact,
                                                                                high impact

             Sampled details                     pmu_trace                      large data volume

             Total HW event counts               total_cpu                      exact totals, no impact

             Exact source-level                  arc_count,                     exact details,
             event counts                        func_count,                    high impact

             Compiler feedback                   pbo                            “black box”
3/16/2012                                        HP Caliper DSPP Presentation                                page 13
        Caliper Cheat Sheet

            •   Where should I start ?
                –   Global view
                –   fprof, both for profile and per-process derived metrics
                –   cgprof, caller/callee, check for surprises
                –   dcache_miss, use latency threshold to show expensive misses
            •   Drill-down
                –   Restrict processes, libraries, functions measured
            •   What is missing for a global view ?
                –   System-wide measurements
                –   Multiplexed global counts (vs. many total_cpu runs)
3/16/2012                                    HP Caliper DSPP Presentation         page 14
        Caliper Cheat Sheet (cont’d)

            •   Tuning the data collection parameters
                –   Multi-process application ?
                    Check process tree output, select processes using --process
                –   Multi-threaded application ?
                    Check    --threads=all                     (per-thread histograms)
                    versus   --threads=sum-all                 (default, aggregated data)

                –   Libraries of interest or out of your control ?
                    Use --module-include / --module-exclude

                –   Functions of interest ?
                    Check --user-regions=rum/sum and/or triggered samples
3/16/2012                                  HP Caliper DSPP Presentation                     page 15
        Caliper Cheat Sheet (cont’d)

            •   Better reports
                –   Use HTML output (--html), text is the default
                –   Use datafiles
                    Allow multiple reports for a single run

                    Faster collection in multi-process runs

                –   Check source-level reporting
                    --report-details=statement

                –   Vary amount of details generated

3/16/2012                                 HP Caliper DSPP Presentation   page 16
        Caliper Cheat Sheet (cont’d)

            •   PBO
                –   Performance for free for some applications (almost)
                –   Use +Oprofile=collect/use
                –   ‘caliper pbo’ works on +O1 binaries but isn’t
                    recommended
                –   Can use ‘chatr +I enable’ to enable auto-invocation
                –   Trade-offs for large multi-process applications,
                    1 vs. many Caliper


3/16/2012                                HP Caliper DSPP Presentation     page 17
        Limitations

            •   Application characteristics
                –   no dynamic library reload
            •   Measurement & control
                –   pbo profile collection requires +O1 binary
                    (automatic when using +Oprofile=collect)
                –   HW limits the measurements possible per run
                –   per-thread data limited to histograms
            •   Other
                –   emulated PA binaries are not measured
                –   minimal dynamic code support
                –   limited gcc/g++ support
                –   setuid binaries require workaround
                –   limited support for MPAS binaries

3/16/2012                                  HP Caliper DSPP Presentation   page 18
        Future Plans

            •   PMU Measurements
                –   multiplexed PMU runs
                –   richer derived metrics
                –   system-wide measurements
                –   kernel profiles
            •   PBO
                –   PMU cache data collected for PBO
            •   Data Files:
                –   aggregation
                –   merging
                –   diffing

3/16/2012                             HP Caliper DSPP Presentation   page 19
        Future Plans (cont’d)

            •   Usability
                –   Graphs w/ html reports
                –   Reports on demand
                –   Function context
                –   Call stacks
            •   Remove limitations:
                – Detach for runs involving instrumentation
                – MPAS applications
                – Library load/unload
                – Dynamically generated code



3/16/2012                             HP Caliper DSPP Presentation   page 20
        More Information

            •   The Caliper web page is on the DSPP website:
                     <http://www.hp.com/go/hpcaliper>
            •   Documentation / Support / Downloads
            •   The Caliper mailing lists:
                –   Majordomo lists <majordomo@cxx.cup.hp.com>
                –   For product announcements:
                     <caliper-announce>
                –   For announcements and user forum:
                     <caliper>

3/16/2012                              HP Caliper DSPP Presentation   page 21
 DSPP Tools & Resources for Itanium 2
 Set You Up for Success

            Software
             –   development environments, compilers, operating
                 systems, installation/configuration tools, performance
                 tools and more

            Technical documentation
             –   white papers, tutorials, references documents and
                 manuals, FAQ’s, known problems, sample code, etc.

            Training and Education
             –   online and classroom training



3/16/2012                           HP Caliper DSPP Presentation          page 22
 More DSPP Tools & Resources


       Community
            –   Itanium forums, source code repository,
                document sharing and mailing lists

       Equipment
            –   rentals and purchase discounts
       Partner Resources
       News & Events




3/16/2012                             HP Caliper DSPP Presentation   page 23
     Where to go …

      Start with the Itanium web site for DSPP partners:
               http://www.hp.com/go/dspp_itanium


      Contact points for additional information, general support,
      equipment, localization resources and more:

      Americas       spp@cup.hp.com
                     telephone 1.800.249.3294

      Europe         dspp.emea@hp.com
                     telephone 800.100.929.70

      Asia-Pac       hpdev.support@hp.com or go to
                     www.hp.com/go/dspp for local country contacts

3/16/2012                       HP Caliper DSPP Presentation         page 24
 Quote slide




            Questions?


3/16/2012      HP Caliper DSPP Presentation   page 25

								
To top