VTune: Intel’s Visual
K. Sridharan, VTune Development Manager
What is VTune?
VTune is a performance tuning environment
for Windows™ developers from Intel.
VTune is now bundled with:
– Intel C/C++ and FORTRAN compilers
– Intel Performance Library Suite
– Intel Architecture tutorials, Processor Manuals
and Computer Based Training Materials.
Thursday, April 14, Intel Corporation 2
Overview of VTune
VTune is a Performance tuning tool for
Windows 95 and NT* developers to:
Monitor the performance of all active software.
Identify “HotSpots” in a program and analyze its
performance as it executes on an Intel Architecture
Examine each instruction and uncover problems at
machine code level.
Help optimize code using context-sensitive on-line
Thursday, April 14, Intel Corporation 3
A periodic interrupt drives the Sampling
– Interrupt sources can be VTD, RTC or NMI*.
– Sample data stored in buffer until its full.
– Buffer flush disables sampling during write.
Analyze the sampling data.
– Data consists of cs:eip and module name data.
– Match addresses with an application or OS routine.
– Sampling data stored in Access 7.0 DB.
– Easy to import Sampling data into Excel.
Thursday, April 14, Intel Corporation 5
Event-Based Sampling on
CPU Events report on internal CPU states.
Event Sampling enabled by CPU and APIC
VTune helps manage the choices
– Data Cache Misses
– Partial Stall Counts
– Branch Statistics
• Mispredictions and Total Branches Taken
– Clock Count Statistics
Thursday, April 14, Intel Corporation 6
From the System-wide view, you can zoom into a
specific module of interest.
– Display all of the HotSpots organized by functions in the module,
memory locations, class names and source files.
– For each HotSpot, VTune displays the symbol name, address, and
the number of samples collected.
For detailed information, double click on a HotSpot
to open a source code or assembly view.
The HotSpot view helps you to identify sections in your code
that take the most CPU time and that have potential
Thursday, April 14, Intel Corporation 7
Shows performance information by
– Target machine can be different than Host machine.
– Indicates pairing on Pentium® processor, decoder
groups on PentiumPro Processor.
– Shows penalties incurred by the instruction.
– Shows execution clocks or micro-ops.
– Disassembly fits into a Portal which can be moved or
enlarged. Size of Portal affects speed of analysis.
Thursday, April 14, Intel Corporation 8
Assists in pin-pointing the issues identified
by sampling or static analysis.
Uses advanced simulator technology to
run, trace and simulate your actual code in
Fine-tune performance attributes of key
sections of your code, such as:
• Cache behavior analysis
• Branch Prediction results
Thursday, April 14, Intel Corporation 9
VTune’s Code Coaches
C and Fortran Code Coaches offer specific
suggestions on the modifications to
improve performance; Examples:
• Loop interchange and loop Invariant code motion
• Converts from scalar to mmx using intrinsics calls
• Traversing an array/list: binary search, hash tables
• Consider using MMX(tm) technology
• Consider using Intel’s Performance Libraries.
Thursday, April 14, Intel Corporation 10
Intel Compiler Strategy
compilers track Intel
Available as a plug in extension to the
“Microsoft Developer’s Studio”
Thursday, April 14, Intel Corporation 11
Compiler Performance Features
Floating point optimizations
Floating point alignment
Thursday, April 14, Intel Corporation 12
Intel Performance Library Suite
Signal Processing Library V3.0
– DSP, Filtering, Transform and Telephony
Recognition Primitives Library V3.0
– Voice/Speech Recognition and Processing
Image Processing Library V1.0 Beta
– Photo Editing, Enhancement and Transform
Math Kernel Library V2.0
– Scientific and Technical Computation
Thursday, April 14, Intel Corporation 13
Why Use Intel’s Performance Libraries ?
Highly optimized for performance
Alternative for code development
Stay current with the latest Intel Architectures
High level programming language interface
Reduce development effort
Thursday, April 14, Intel Corporation 14
VTune Coming Attractions
VTune Support For Java:
– First release is VTune 2.5:
• view performance in every component: including Java
• call graph
• working with major vendors.
• Beta this month.
Thursday, April 14, Intel Corporation 15
VTune Coming Attractions
– Beta later this year.
– All OS events (Perfmon) and Events over time
– Process, Processor Views.
– Call graph
– Memory Pattern Analysis
– C++/ASM coach
– Dynamic Analysis for PII
Thursday, April 14, Intel Corporation 16
Code Analysis of binary files
Perform code analysis without the need
to create an executable and sample.
Accepts .exe, .dll or .obj files.
Zoom into source code by double-
clicking on the desired function from
• If no source code is available, the binary is
Thursday, April 14, Intel Corporation 18
C/C++, FTN or Assembly or ?? displayed.
Sample times shown by source line.
– Source and disassembly can be intermixed.
Any of these file types can supply symbol
and line number information:
– .SYM, .HDR, .DMP, .DBG, .MAP, .PDB , C7/CV (NB09),
NB10, NB11, FB09, FB0A, DWARF 2.
Thursday, April 14, Intel Corporation 19
Libraries Platform Support
operating system (Microsoft* Win 95 and
Microsoft* Win NT)
– Multi-Media Libraries (DLLs and Static Libraries)
» Intel C/C++ Compiler Plug-in
» MSVC/C++ V4.x
» Borland V5.0
– Math Kernel Library (Static Libraries)
» Intel Fortran 77 Compiler
» Microsoft Powerstation Fortran V4.0
» Watcom Fortran 77 V10.6
Thursday, April 14, Intel Corporation 20
Multi-Media Library Features
– Optimized processor specific DLLs and static libraries for :
• Intel 486 TM
• Pentium® Processor
• Pentium® Processor with MMXTM Technology
• Pentium® Pro Processor
• Pentium® II Processor
– Processor detection and processor specific DLL loading
C-Callable programming interface
Support for integer and floating point data
Custom DLL Builder for minimum memory footprint
Thursday, April 14, Intel Corporation 21
Math Kernel Library
Workstation technical computing library
– Basic Linear Algebra Subroutines (BLAS)
– Single and double precision FFTs
– Optimized kernels at every level of BLAS
– Level 3 BLAS multithreaded -
performance scales on SMP systems
– Optimized for Pentium® Pro Processor
– Interfaces to several FORTRAN compilers
– Static libraries
– DLL will be available
Thursday, April 14, Intel Corporation 22