Blame - third_party/gperftools/docs/cpuprofile.html - RealtimeRoboticsGroup/test

blob: c81feb6ae1f3df2e0d8ee58f8ce508cbe0934375 [file] [log] [blame]

Austin Schuh	745610d	2015-09-06 18:19:50 -0700	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
				2	<HTML>
				3
				4	<HEAD>
				5	<link rel="stylesheet" href="designstyle.css">
				6	<title>Gperftools CPU Profiler</title>
				7	</HEAD>
				8
				9	<BODY>
				10
				11	<p align=right>
				12	<i>Last modified
				13	<script type=text/javascript>
				14	var lm = new Date(document.lastModified);
				15	document.write(lm.toDateString());
				16	</script></i>
				17	</p>
				18
				19	<p>This is the CPU profiler we use at Google. There are three parts
				20	to using it: linking the library into an application, running the
				21	code, and analyzing the output.</p>
				22
				23	<p>On the off-chance that you should need to understand it, the CPU
				24	profiler data file format is documented separately,
				25	<a href="cpuprofile-fileformat.html">here</a>.
				26
				27
				28	<H1>Linking in the Library</H1>
				29
				30	<p>To install the CPU profiler into your executable, add
				31	<code>-lprofiler</code> to the link-time step for your executable.
				32	(It's also probably possible to add in the profiler at run-time using
				33	<code>LD_PRELOAD</code>, e.g.
				34	<code>% env LD_PRELOAD="/usr/lib/libprofiler.so" <binary></code>,
				35	but this isn't necessarily recommended.)</p>
				36
				37	<p>This does <i>not</i> turn on CPU profiling; it just inserts the
				38	code. For that reason, it's practical to just always link
				39	<code>-lprofiler</code> into a binary while developing; that's what we
				40	do at Google. (However, since any user can turn on the profiler by
				41	setting an environment variable, it's not necessarily recommended to
				42	install profiler-linked binaries into a production, running
				43	system.)</p>
				44
				45
				46	<H1>Running the Code</H1>
				47
				48	<p>There are several alternatives to actually turn on CPU profiling
				49	for a given run of an executable:</p>
				50
				51	<ol>
				52	<li> <p>Define the environment variable CPUPROFILE to the filename
				53	to dump the profile to. For instance, if you had a version of
				54	<code>/bin/ls</code> that had been linked against libprofiler,
				55	you could run:</p>
				56	<pre>% env CPUPROFILE=ls.prof /bin/ls</pre>
				57	</li>
				58	<li> <p>In addition to defining the environment variable CPUPROFILE
				59	you can also define CPUPROFILESIGNAL. This allows profiling to be
				60	controlled via the signal number that you specify. The signal number
				61	must be unused by the program under normal operation. Internally it
				62	acts as a switch, triggered by the signal, which is off by default.
				63	For instance, if you had a copy of <code>/bin/chrome</code> that had been
				64	been linked against libprofiler, you could run:</p>
				65	<pre>% env CPUPROFILE=chrome.prof CPUPROFILESIGNAL=12 /bin/chrome &</pre>
				66	<p>You can then trigger profiling to start:</p>
				67	<pre>% killall -12 chrome</pre>
				68	<p>Then after a period of time you can tell it to stop which will
				69	generate the profile:</p>
				70	<pre>% killall -12 chrome</pre>
				71	</li>
				72	<li> <p>In your code, bracket the code you want profiled in calls to
				73	<code>ProfilerStart()</code> and <code>ProfilerStop()</code>.
				74	(These functions are declared in <code><gperftools/profiler.h></code>.)
				75	<code>ProfilerStart()</code> will take
				76	the profile-filename as an argument.</p>
				77	</li>
				78	</ol>
				79
				80	<p>In Linux 2.6 and above, profiling works correctly with threads,
				81	automatically profiling all threads. In Linux 2.4, profiling only
				82	profiles the main thread (due to a kernel bug involving itimers and
				83	threads). Profiling works correctly with sub-processes: each child
				84	process gets its own profile with its own name (generated by combining
				85	CPUPROFILE with the child's process id).</p>
				86
				87	<p>For security reasons, CPU profiling will not write to a file -- and
				88	is thus not usable -- for setuid programs.</p>
				89
				90	<p>See the include-file <code>gperftools/profiler.h</code> for
				91	advanced-use functions, including <code>ProfilerFlush()</code> and
				92	<code>ProfilerStartWithOptions()</code>.</p>
				93
				94
				95	<H2>Modifying Runtime Behavior</H2>
				96
				97	<p>You can more finely control the behavior of the CPU profiler via
				98	environment variables.</p>
				99
				100	<table frame=box rules=sides cellpadding=5 width=100%>
				101
				102	<tr valign=top>
				103	<td><code>CPUPROFILE_FREQUENCY=<i>x</i></code></td>
				104	<td>default: 100</td>
				105	<td>
				106	How many interrupts/second the cpu-profiler samples.
				107	</td>
				108	</tr>
				109
				110	<tr valign=top>
				111	<td><code>CPUPROFILE_REALTIME=1</code></td>
				112	<td>default: [not set]</td>
				113	<td>
				114	If set to any value (including 0 or the empty string), use
				115	ITIMER_REAL instead of ITIMER_PROF to gather profiles. In
				116	general, ITIMER_REAL is not as accurate as ITIMER_PROF, and also
				117	interacts badly with use of alarm(), so prefer ITIMER_PROF unless
				118	you have a reason prefer ITIMER_REAL.
				119	</td>
				120	</tr>
				121
				122	</table>
				123
				124
				125	<h1><a name="pprof">Analyzing the Output</a></h1>
				126
				127	<p><code>pprof</code> is the script used to analyze a profile. It has
				128	many output modes, both textual and graphical. Some give just raw
				129	numbers, much like the <code>-pg</code> output of <code>gcc</code>,
				130	and others show the data in the form of a dependency graph.</p>
				131
				132	<p>pprof <b>requires</b> <code>perl5</code> to be installed to run.
				133	It also requires <code>dot</code> to be installed for any of the
				134	graphical output routines, and <code>gv</code> to be installed for
				135	<code>--gv</code> mode (described below).
				136	</p>
				137
				138	<p>Here are some ways to call pprof. These are described in more
				139	detail below.</p>
				140
				141	<pre>
				142	% pprof /bin/ls ls.prof
				143	Enters "interactive" mode
				144	% pprof --text /bin/ls ls.prof
				145	Outputs one line per procedure
				146	% pprof --gv /bin/ls ls.prof
				147	Displays annotated call-graph via 'gv'
				148	% pprof --gv --focus=Mutex /bin/ls ls.prof
				149	Restricts to code paths including a .Mutex. entry
				150	% pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof
				151	Code paths including Mutex but not string
				152	% pprof --list=getdir /bin/ls ls.prof
				153	(Per-line) annotated source listing for getdir()
				154	% pprof --disasm=getdir /bin/ls ls.prof
				155	(Per-PC) annotated disassembly for getdir()
				156	% pprof --text localhost:1234
				157	Outputs one line per procedure for localhost:1234
				158	% pprof --callgrind /bin/ls ls.prof
				159	Outputs the call information in callgrind format
				160	</pre>
				161
				162
				163	<h3>Analyzing Text Output</h3>
				164
				165	<p>Text mode has lines of output that look like this:</p>
				166	<pre>
				167	14 2.1% 17.2% 58 8.7% std::_Rb_tree::find
				168	</pre>
				169
				170	<p>Here is how to interpret the columns:</p>
				171	<ol>
				172	<li> Number of profiling samples in this function
				173	<li> Percentage of profiling samples in this function
				174	<li> Percentage of profiling samples in the functions printed so far
				175	<li> Number of profiling samples in this function and its callees
				176	<li> Percentage of profiling samples in this function and its callees
				177	<li> Function name
				178	</ol>
				179
				180	<h3>Analyzing Callgrind Output</h3>
				181
				182	<p>Use <a href="http://kcachegrind.sourceforge.net">kcachegrind</a> to
				183	analyze your callgrind output:</p>
				184	<pre>
				185	% pprof --callgrind /bin/ls ls.prof > ls.callgrind
				186	% kcachegrind ls.callgrind
				187	</pre>
				188
				189	<p>The cost is specified in 'hits', i.e. how many times a function
				190	appears in the recorded call stack information. The 'calls' from
				191	function a to b record how many times function b was found in the
				192	stack traces directly below function a.</p>
				193
				194	<p>Tip: if you use a debug build the output will include file and line
				195	number information and kcachegrind will show an annotated source
				196	code view.</p>
				197
				198	<h3>Node Information</h3>
				199
				200	<p>In the various graphical modes of pprof, the output is a call graph
				201	annotated with timing information, like so:</p>
				202
				203	<A HREF="pprof-test-big.gif">
				204	<center><table><tr><td>
				205	<img src="pprof-test.gif">
				206	</td></tr></table></center>
				207	</A>
				208
				209	<p>Each node represents a procedure. The directed edges indicate
				210	caller to callee relations. Each node is formatted as follows:</p>
				211
				212	<center><pre>
				213	Class Name
				214	Method Name
				215	local (percentage)
				216	<b>of</b> cumulative (percentage)
				217	</pre></center>
				218
				219	<p>The last one or two lines contains the timing information. (The
				220	profiling is done via a sampling method, where by default we take 100
				221	samples a second. Therefor one unit of time in the output corresponds
				222	to about 10 milliseconds of execution time.) The "local" time is the
				223	time spent executing the instructions directly contained in the
				224	procedure (and in any other procedures that were inlined into the
				225	procedure). The "cumulative" time is the sum of the "local" time and
				226	the time spent in any callees. If the cumulative time is the same as
				227	the local time, it is not printed.</p>
				228
				229	<p>For instance, the timing information for test_main_thread()
				230	indicates that 155 units (about 1.55 seconds) were spent executing the
				231	code in <code>test_main_thread()</code> and 200 units were spent while
				232	executing <code>test_main_thread()</code> and its callees such as
				233	<code>snprintf()</code>.</p>
				234
				235	<p>The size of the node is proportional to the local count. The
				236	percentage displayed in the node corresponds to the count divided by
				237	the total run time of the program (that is, the cumulative count for
				238	<code>main()</code>).</p>
				239
				240	<h3>Edge Information</h3>
				241
				242	<p>An edge from one node to another indicates a caller to callee
				243	relationship. Each edge is labelled with the time spent by the callee
				244	on behalf of the caller. E.g, the edge from
				245	<code>test_main_thread()</code> to <code>snprintf()</code> indicates
				246	that of the 200 samples in <code>test_main_thread()</code>, 37 are
				247	because of calls to <code>snprintf()</code>.</p>
				248
				249	<p>Note that <code>test_main_thread()</code> has an edge to
				250	<code>vsnprintf()</code>, even though <code>test_main_thread()</code>
				251	doesn't call that function directly. This is because the code was
				252	compiled with <code>-O2</code>; the profile reflects the optimized
				253	control flow.</p>
				254
				255	<h3>Meta Information</h3>
				256
				257	<p>The top of the display should contain some meta information
				258	like:</p>
				259	<pre>
				260	/tmp/profiler2_unittest
				261	Total samples: 202
				262	Focusing on: 202
				263	Dropped nodes with <= 1 abs(samples)
				264	Dropped edges with <= 0 samples
				265	</pre>
				266
				267	<p>This section contains the name of the program, and the total
				268	samples collected during the profiling run. If the
				269	<code>--focus</code> option is on (see the <a href="#focus">Focus</a>
				270	section below), the legend also contains the number of samples being
				271	shown in the focused display. Furthermore, some unimportant nodes and
				272	edges are dropped to reduce clutter. The characteristics of the
				273	dropped nodes and edges are also displayed in the legend.</p>
				274
				275	<h3><a name=focus>Focus and Ignore</a></h3>
				276
				277	<p>You can ask pprof to generate a display focused on a particular
				278	piece of the program. You specify a regular expression. Any portion
				279	of the call-graph that is on a path which contains at least one node
				280	matching the regular expression is preserved. The rest of the
				281	call-graph is dropped on the floor. For example, you can focus on the
				282	<code>vsnprintf()</code> libc call in <code>profiler2_unittest</code>
				283	as follows:</p>
				284
				285	<pre>
				286	% pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof
				287	</pre>
				288	<A HREF="pprof-vsnprintf-big.gif">
				289	<center><table><tr><td>
				290	<img src="pprof-vsnprintf.gif">
				291	</td></tr></table></center>
				292	</A>
				293
				294	<p>Similarly, you can supply the <code>--ignore</code> option to
				295	ignore samples that match a specified regular expression. E.g., if
				296	you are interested in everything except calls to
				297	<code>snprintf()</code>, you can say:</p>
				298	<pre>
				299	% pprof --gv --ignore=snprintf /tmp/profiler2_unittest test.prof
				300	</pre>
				301
				302
				303	<h3>Interactive mode</a></h3>
				304
				305	<p>By default -- if you don't specify any flags to the contrary --
				306	pprof runs in interactive mode. At the <code>(pprof)</code> prompt,
				307	you can run many of the commands described above. You can type
				308	<code>help</code> for a list of what commands are available in
				309	interactive mode.</p>
				310
				311	<h3><a name=options>pprof Options</a></h3>
				312
				313	For a complete list of pprof options, you can run <code>pprof
				314	--help</code>.
				315
				316	<h4>Output Type</h4>
				317
				318	<p>
				319	<center>
				320	<table frame=box rules=sides cellpadding=5 width=100%>
				321	<tr valign=top>
				322	<td><code>--text</code></td>
				323	<td>
				324	Produces a textual listing. (Note: If you have an X display, and
				325	<code>dot</code> and <code>gv</code> installed, you will probably
				326	be happier with the <code>--gv</code> output.)
				327	</td>
				328	</tr>
				329	<tr valign=top>
				330	<td><code>--gv</code></td>
				331	<td>
				332	Generates annotated call-graph, converts to postscript, and
				333	displays via gv (requres <code>dot</code> and <code>gv</code> be
				334	installed).
				335	</td>
				336	</tr>
				337	<tr valign=top>
				338	<td><code>--dot</code></td>
				339	<td>
				340	Generates the annotated call-graph in dot format and
				341	emits to stdout (requres <code>dot</code> be installed).
				342	</td>
				343	</tr>
				344	<tr valign=top>
				345	<td><code>--ps</code></td>
				346	<td>
				347	Generates the annotated call-graph in Postscript format and
				348	emits to stdout (requres <code>dot</code> be installed).
				349	</td>
				350	</tr>
				351	<tr valign=top>
				352	<td><code>--pdf</code></td>
				353	<td>
				354	Generates the annotated call-graph in PDF format and emits to
				355	stdout (requires <code>dot</code> and <code>ps2pdf</code> be
				356	installed).
				357	</td>
				358	</tr>
				359	<tr valign=top>
				360	<td><code>--gif</code></td>
				361	<td>
				362	Generates the annotated call-graph in GIF format and
				363	emits to stdout (requres <code>dot</code> be installed).
				364	</td>
				365	</tr>
				366	<tr valign=top>
				367	<td><code>--list=<<i>regexp</i>></code></td>
				368	<td>
				369	<p>Outputs source-code listing of routines whose
				370	name matches <regexp>. Each line
				371	in the listing is annotated with flat and cumulative
				372	sample counts.</p>
				373
				374	<p>In the presence of inlined calls, the samples
				375	associated with inlined code tend to get assigned
				376	to a line that follows the location of the
				377	inlined call. A more precise accounting can be
				378	obtained by disassembling the routine using the
				379	--disasm flag.</p>
				380	</td>
				381	</tr>
				382	<tr valign=top>
				383	<td><code>--disasm=<<i>regexp</i>></code></td>
				384	<td>
				385	Generates disassembly of routines that match
				386	<regexp>, annotated with flat and
				387	cumulative sample counts and emits to stdout.
				388	</td>
				389	</tr>
				390	</table>
				391	</center>
				392
				393	<h4>Reporting Granularity</h4>
				394
				395	<p>By default, pprof produces one entry per procedure. However you can
				396	use one of the following options to change the granularity of the
				397	output. The <code>--files</code> option seems to be particularly
				398	useless, and may be removed eventually.</p>
				399
				400	<center>
				401	<table frame=box rules=sides cellpadding=5 width=100%>
				402	<tr valign=top>
				403	<td><code>--addresses</code></td>
				404	<td>
				405	Produce one node per program address.
				406	</td>
				407	</tr>
				408	<td><code>--lines</code></td>
				409	<td>
				410	Produce one node per source line.
				411	</td>
				412	</tr>
				413	<td><code>--functions</code></td>
				414	<td>
				415	Produce one node per function (this is the default).
				416	</td>
				417	</tr>
				418	<td><code>--files</code></td>
				419	<td>
				420	Produce one node per source file.
				421	</td>
				422	</tr>
				423	</table>
				424	</center>
				425
				426	<h4>Controlling the Call Graph Display</h4>
				427
				428	<p>Some nodes and edges are dropped to reduce clutter in the output
				429	display. The following options control this effect:</p>
				430
				431	<center>
				432	<table frame=box rules=sides cellpadding=5 width=100%>
				433	<tr valign=top>
				434	<td><code>--nodecount=<n></code></td>
				435	<td>
				436	This option controls the number of displayed nodes. The nodes
				437	are first sorted by decreasing cumulative count, and then only
				438	the top N nodes are kept. The default value is 80.
				439	</td>
				440	</tr>
				441	<tr valign=top>
				442	<td><code>--nodefraction=<f></code></td>
				443	<td>
				444	This option provides another mechanism for discarding nodes
				445	from the display. If the cumulative count for a node is
				446	less than this option's value multiplied by the total count
				447	for the profile, the node is dropped. The default value
				448	is 0.005; i.e. nodes that account for less than
				449	half a percent of the total time are dropped. A node
				450	is dropped if either this condition is satisfied, or the
				451	--nodecount condition is satisfied.
				452	</td>
				453	</tr>
				454	<tr valign=top>
				455	<td><code>--edgefraction=<f></code></td>
				456	<td>
				457	This option controls the number of displayed edges. First of all,
				458	an edge is dropped if either its source or destination node is
				459	dropped. Otherwise, the edge is dropped if the sample
				460	count along the edge is less than this option's value multiplied
				461	by the total count for the profile. The default value is
				462	0.001; i.e., edges that account for less than
				463	0.1% of the total time are dropped.
				464	</td>
				465	</tr>
				466	<tr valign=top>
				467	<td><code>--focus=<re></code></td>
				468	<td>
				469	This option controls what region of the graph is displayed
				470	based on the regular expression supplied with the option.
				471	For any path in the callgraph, we check all nodes in the path
				472	against the supplied regular expression. If none of the nodes
				473	match, the path is dropped from the output.
				474	</td>
				475	</tr>
				476	<tr valign=top>
				477	<td><code>--ignore=<re></code></td>
				478	<td>
				479	This option controls what region of the graph is displayed
				480	based on the regular expression supplied with the option.
				481	For any path in the callgraph, we check all nodes in the path
				482	against the supplied regular expression. If any of the nodes
				483	match, the path is dropped from the output.
				484	</td>
				485	</tr>
				486	</table>
				487	</center>
				488
				489	<p>The dropped edges and nodes account for some count mismatches in
				490	the display. For example, the cumulative count for
				491	<code>snprintf()</code> in the first diagram above was 41. However
				492	the local count (1) and the count along the outgoing edges (12+1+20+6)
				493	add up to only 40.</p>
				494
				495
				496	<h1>Caveats</h1>
				497
				498	<ul>
				499	<li> If the program exits because of a signal, the generated profile
				500	will be <font color=red>incomplete, and may perhaps be
				501	completely empty</font>.
				502	<li> The displayed graph may have disconnected regions because
				503	of the edge-dropping heuristics described above.
				504	<li> If the program linked in a library that was not compiled
				505	with enough symbolic information, all samples associated
				506	with the library may be charged to the last symbol found
				507	in the program before the library. This will artificially
				508	inflate the count for that symbol.
				509	<li> If you run the program on one machine, and profile it on
				510	another, and the shared libraries are different on the two
				511	machines, the profiling output may be confusing: samples that
				512	fall within shared libaries may be assigned to arbitrary
				513	procedures.
				514	<li> If your program forks, the children will also be profiled
				515	(since they inherit the same CPUPROFILE setting). Each process
				516	is profiled separately; to distinguish the child profiles from
				517	the parent profile and from each other, all children will have
				518	their process-id appended to the CPUPROFILE name.
				519	<li> Due to a hack we make to work around a possible gcc bug, your
				520	profiles may end up named strangely if the first character of
				521	your CPUPROFILE variable has ascii value greater than 127.
				522	This should be exceedingly rare, but if you need to use such a
				523	name, just set prepend <code>./</code> to your filename:
				524	<code>CPUPROFILE=./Ägypten</code>.
				525	</ul>
				526
				527
				528	<hr>
				529	<address>Sanjay Ghemawat<br>
				530	<!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
				531	<!-- hhmts start -->
				532	Last modified: Fri May 9 14:41:29 PDT 2008
				533	<!-- hhmts end -->
				534	</address>
				535	</BODY>
				536	</HTML>