callgrind
Callgrind is a tracing profiler which records the function call history of a target program and collects the number of executed instructions. It is part of the valgrind tool suite.
Profiling data is collected by instrumentation rather than sampling of the target program.
Callgrind does not capture the actual time spent in a function but computes the
inclusive & exclusive cost of a function based on the instructions fetched
(Ir = Instruction read
). This provides reproducibility and high-precision and
is a major difference to sampling profilers like perf
or vtune
.
Therefore effects like slow IO are not reflected, which should be kept in mind
when analyzing callgrind results.
By default the profiler data is dumped when the target process is terminating, but callgrind_control allows for interactive control of callgrind.
# Run a program under callgrind.
valgrind --tool=callgrind -- <prog> [<args>]
# Interactive control of callgrind.
callgrind_control [opts]
opts:
-b ............. show current backtrace
-e ............. show current event counters
-s ............. show current stats
--dump[=file] .. dump current collection
-i=on|off ...... turn instrumentation on|off
Results can be analyzed by using one of the following tools
- callgrind_annotate (cli)
# Show only specific trace events (default is all). callgrind_annotate --show=Ir,Dr,Dw [callgrind_out_file]
- kcachegrind (ui)
The following is a collection of frequently used callgrind options.
valgrind --tool=callgrind [opts] -- <prog>
opts:
--callgrind-out-file=<file> .... output file, rather than callgrind.out.<pid>
--dump-instr=<yes|no> .......... annotation on instrucion level,
allows for asm annotations
--instr-atstart=<yes|no> ....... control if instrumentation is enabled from
beginning of the program
--separate-threads=<yes|no> .... create separate output files per thread,
appends -<thread_id> to the output file
--cache-sim=<yes|no> ........... control if cache simulation is enabled
Trace events
By default callgrind collects following events:
Ir
: Instruction read
Callgrind also provides a functional cache simulation with their own model,
which is enabled by passing --cache-sim=yes
.
This simulates a 2-level cache hierarchy with separate L1 instruction and
data caches (L1i
/ L1d
) and a unified last level (LL
) cache.
When enabled, this collects the following additional events:
I1mr
: L1 cache miss on instruction readILmr
: LL cache miss on instruction readDr
: Data reads accessD1mr
: L1 cache miss on data readDLmr
: LL cache miss on data readDw
: Data write accessD1mw
: L1 cache miss on data writeDLmw
: LL cache miss on data write
Profile specific part of the target
Programmatically enable/disable instrumentation using the macros defined in the callgrind header.
#include <valgrind/callgrind.h>
int main() {
// init ..
CALLGRIND_START_INSTRUMENTATION;
compute();
CALLGRIND_STOP_INSTRUMENTATION;
// shutdown ..
}
In this case, callgrind should be launched with
--instr-atstart=no
.
Alternatively instrumentation can be controlled with callgrind_control -i on/off
.
The files cg_example.cc and Makefile provide a full example.