How to make linux ‘perf record’ work for libc and libstdc++ symbols

glibcperfprofiling

I'm using perf record -g on x86-64 Linux to profile a program. Several symbols in libc or libstdc++ have 0 as a parent: __GI___strcmp_ssse3 (libc) and strcmp@plt (libstdc++) for example. (I can actually break on these symbols in the debugger and get a backtrace.)

I'd love to know what the major callers of these functions are, and why they are not recorded. Is this because libc and libstdc++ do not have frame pointers on x86_64? And, more practically, is there some way around this?

Best Answer

This is an old question, but this in now possible with --call-graph dwarf. From the man page:

 -g
       Enables call-graph (stack chain/backtrace) recording.

   --call-graph
       Setup and enable call-graph (stack chain/backtrace) recording, implies -g.

           Allows specifying "fp" (frame pointer) or "dwarf"
           (DWARF's CFI - Call Frame Information) as the method to collect
           the information used to show the call graphs.

           In some systems, where binaries are build with gcc
           --fomit-frame-pointer, using the "fp" method will produce bogus
           call graphs, using "dwarf", if available (perf tools linked to
           the libunwind library) should be used instead.

I believe this requires a somewhat recent Linux kernel (>=3.9? I'm not entirely sure). You can check if your distro's perf package is linked with libdw or libunwind with readelf -d $(which perf) | grep -e libdw -e libunwind. On Fedora 20, perf is linked with libdw.

Related Solutions

Linux – How to analyze profile data from `perf record –a` (system-wide collection)

If you are distributing the computations with MPI, then using an MPI-aware tool would give you more sensible results: with a distributed application, you might have issues of load imbalance, where one MPI process is idle waiting for data to come from other processes. If you happen to be profiling exactly that MPI process, your performance profile will be all wrong.

So, the first step is usually to find out about the communication and load balance pattern of your program, and identify a sample input that gives you the workload you want (e.g., CPU-intensive on rank 0) For instance, mpiP is an MPI profiling tool that can produce a very complete report about the communication pattern, how much time each MPI call took, etc.

Then you can run a code profiling tool on one or more selected MPI ranks. Anyway, using perf on a single MPI rank is likely not a good idea because its measurements will contain also the performance of the MPI library code, which is probably not what you are looking for.

Understanding Linux Perf sched-switch and context-switches

I know this question is pretty old (Feb 16) but here a response in case it helps someone else. The problem is that you've entered the '-F 999' indicating that you want to sample the events at a frequency of 999 times a second. For 'trace' events, you don't generally want to do sampling. For instance, when I select sched:sched_switch, I want to see every context switch. If you enter -F 999 then you will get a sampling of the context switches... If you look at the output of your 'perf record' cmd with something like:

perf script --verbose -I --header -i perf.dat -F comm,pid,tid,cpu,time,period,event,trace,ip,sym,dso > perf.txt

then you would see that the 'period' (the number between the timestamp and the event name) would not (usually) be == 1.

If you use a 'perf record' cmd like below, you'll see a period of 1 in the 'perf script' output like:

Binder:695_5   695/2077  [000] 16231.700440:          1         sched:sched_switch: prev_comm=Binder:695_5 prev_pid=2077 prev_prio=120 prev_state=S ==> next_comm=kworker/u16:17 next_pid=7665 next_prio=120

A long winded explanation but basically: don't do that (where 'that' is '-F 999').

If you just do something like:

perf record -a -g -e sched:sched_switch -e sched:sched_blocked_reason -e sched:sched_stat_sleep -e sched:sched_stat_wait sleep 5

then the output would show every context switch with the call stack for each event. And you might need to do:

echo 1 > /proc/sys/kernel/sched_schedstats

to get the sched_stat events.

Best Answer

Related Solutions

Linux – How to analyze profile data from `perf record –a` (system-wide collection)

Understanding Linux Perf sched-switch and context-switches

Related Question