blob: 6f508699eeb5b4e7862bfb9f70e9deadfa53d33c [file] [log] [blame]
Austin Schuh745610d2015-09-06 18:19:50 -07001<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
2<HTML>
3
4<HEAD>
5 <link rel="stylesheet" href="designstyle.css">
6 <title>Gperftools Heap Profiler</title>
7</HEAD>
8
9<BODY>
10
11<p align=right>
12 <i>Last modified
13 <script type=text/javascript>
14 var lm = new Date(document.lastModified);
15 document.write(lm.toDateString());
16 </script></i>
17</p>
18
19<p>This is the heap profiler we use at Google, to explore how C++
20programs manage memory. This facility can be useful for</p>
21<ul>
22 <li> Figuring out what is in the program heap at any given time
23 <li> Locating memory leaks
24 <li> Finding places that do a lot of allocation
25</ul>
26
27<p>The profiling system instruments all allocations and frees. It
28keeps track of various pieces of information per allocation site. An
29allocation site is defined as the active stack trace at the call to
30<code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or,
31<code>new</code>.</p>
32
33<p>There are three parts to using it: linking the library into an
34application, running the code, and analyzing the output.</p>
35
36
37<h1>Linking in the Library</h1>
38
39<p>To install the heap profiler into your executable, add
40<code>-ltcmalloc</code> to the link-time step for your executable.
41Also, while we don't necessarily recommend this form of usage, it's
42possible to add in the profiler at run-time using
43<code>LD_PRELOAD</code>:
44<pre>% env LD_PRELOAD="/usr/lib/libtcmalloc.so" &lt;binary&gt;</pre>
45
46<p>This does <i>not</i> turn on heap profiling; it just inserts the
47code. For that reason, it's practical to just always link
48<code>-ltcmalloc</code> into a binary while developing; that's what we
49do at Google. (However, since any user can turn on the profiler by
50setting an environment variable, it's not necessarily recommended to
51install profiler-linked binaries into a production, running
52system.) Note that if you wish to use the heap profiler, you must
53also use the tcmalloc memory-allocation library. There is no way
54currently to use the heap profiler separate from tcmalloc.</p>
55
56
57<h1>Running the Code</h1>
58
59<p>There are several alternatives to actually turn on heap profiling
60for a given run of an executable:</p>
61
62<ol>
63 <li> <p>Define the environment variable HEAPPROFILE to the filename
64 to dump the profile to. For instance, to profile
65 <code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p>
66 <pre>% env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre>
67 <li> <p>In your code, bracket the code you want profiled in calls to
68 <code>HeapProfilerStart()</code> and <code>HeapProfilerStop()</code>.
69 (These functions are declared in <code>&lt;gperftools/heap-profiler.h&gt;</code>.)
70 <code>HeapProfilerStart()</code> will take the
71 profile-filename-prefix as an argument. Then, as often as
72 you'd like before calling <code>HeapProfilerStop()</code>, you
73 can use <code>HeapProfilerDump()</code> or
74 <code>GetHeapProfile()</code> to examine the profile. In case
75 it's useful, <code>IsHeapProfilerRunning()</code> will tell you
76 whether you've already called HeapProfilerStart() or not.</p>
77</ol>
78
79
80<p>For security reasons, heap profiling will not write to a file --
81and is thus not usable -- for setuid programs.</p>
82
83<H2>Modifying Runtime Behavior</H2>
84
85<p>You can more finely control the behavior of the heap profiler via
86environment variables.</p>
87
88<table frame=box rules=sides cellpadding=5 width=100%>
89
90<tr valign=top>
91 <td><code>HEAP_PROFILE_ALLOCATION_INTERVAL</code></td>
92 <td>default: 1073741824 (1 Gb)</td>
93 <td>
94 Dump heap profiling information each time the specified number of
95 bytes has been allocated by the program.
96 </td>
97</tr>
98
99<tr valign=top>
100 <td><code>HEAP_PROFILE_INUSE_INTERVAL</code></td>
101 <td>default: 104857600 (100 Mb)</td>
102 <td>
103 Dump heap profiling information whenever the high-water memory
104 usage mark increases by the specified number of bytes.
105 </td>
106</tr>
107
108<tr valign=top>
109 <td><code>HEAP_PROFILE_TIME_INTERVAL</code></td>
110 <td>default: 0</td>
111 <td>
112 Dump heap profiling information each time the specified
113 number of seconds has elapsed.
114 </td>
115</tr>
116
117<tr valign=top>
Brian Silverman20350ac2021-11-17 18:19:55 -0800118 <td><code>HEAPPROFILESIGNAL</code></td>
119 <td>default: disabled</td>
120 <td>
121 Dump heap profiling information whenever the specified signal is sent to the
122 process.
123 </td>
124</tr>
125
126<tr valign=top>
Austin Schuh745610d2015-09-06 18:19:50 -0700127 <td><code>HEAP_PROFILE_MMAP</code></td>
128 <td>default: false</td>
129 <td>
130 Profile <code>mmap</code>, <code>mremap</code> and <code>sbrk</code>
131 calls in addition
132 to <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
133 and <code>new</code>. <b>NOTE:</b> this causes the profiler to
134 profile calls internal to tcmalloc, since tcmalloc and friends use
135 mmap and sbrk internally for allocations. One partial solution is
136 to filter these allocations out when running <code>pprof</code>,
137 with something like
138 <code>pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc</code>.
139 </td>
140</tr>
141
142<tr valign=top>
143 <td><code>HEAP_PROFILE_ONLY_MMAP</code></td>
144 <td>default: false</td>
145 <td>
146 Only profile <code>mmap</code>, <code>mremap</code>, and <code>sbrk</code>
147 calls; do not profile
148 <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
149 or <code>new</code>.
150 </td>
151</tr>
152
153<tr valign=top>
154 <td><code>HEAP_PROFILE_MMAP_LOG</code></td>
155 <td>default: false</td>
156 <td>
157 Log <code>mmap</code>/<code>munmap</code> calls.
158 </td>
159</tr>
160
161</table>
162
163<H2>Checking for Leaks</H2>
164
165<p>You can use the heap profiler to manually check for leaks, for
166instance by reading the profiler output and looking for large
167allocations. However, for that task, it's easier to use the <A
168HREF="heap_checker.html">automatic heap-checking facility</A> built
169into tcmalloc.</p>
170
171
172<h1><a name="pprof">Analyzing the Output</a></h1>
173
174<p>If heap-profiling is turned on in a program, the program will
175periodically write profiles to the filesystem. The sequence of
176profiles will be named:</p>
177<pre>
178 &lt;prefix&gt;.0000.heap
179 &lt;prefix&gt;.0001.heap
180 &lt;prefix&gt;.0002.heap
181 ...
182</pre>
183<p>where <code>&lt;prefix&gt;</code> is the filename-prefix supplied
184when running the code (e.g. via the <code>HEAPPROFILE</code>
185environment variable). Note that if the supplied prefix
186does not start with a <code>/</code>, the profile files will be
187written to the program's working directory.</p>
188
189<p>The profile output can be viewed by passing it to the
190<code>pprof</code> tool -- the same tool that's used to analyze <A
191HREF="cpuprofile.html">CPU profiles</A>.
192
193<p>Here are some examples. These examples assume the binary is named
194<code>gfs_master</code>, and a sequence of heap profile files can be
195found in files named:</p>
196<pre>
197 /tmp/profile.0001.heap
198 /tmp/profile.0002.heap
199 ...
200 /tmp/profile.0100.heap
201</pre>
202
203<h3>Why is a process so big</h3>
204
205<pre>
206 % pprof --gv gfs_master /tmp/profile.0100.heap
207</pre>
208
209<p>This command will pop-up a <code>gv</code> window that displays
210the profile information as a directed graph. Here is a portion
211of the resulting output:</p>
212
213<p><center>
214<img src="heap-example1.png">
215</center></p>
216
217A few explanations:
218<ul>
219<li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB
220 of the live memory, which is 25% of the total live memory.
221<li> <code>GFS_MasterChunkTable::UpdateState</code> is directly
222 accountable for 176.2 MB of the live memory (i.e., it directly
223 allocated 176.2 MB that has not been freed yet). Furthermore,
224 it and its callees are responsible for 729.9 MB. The
225 labels on the outgoing edges give a good indication of the
226 amount allocated by each callee.
227</ul>
228
229<h3>Comparing Profiles</h3>
230
231<p>You often want to skip allocations during the initialization phase
232of a program so you can find gradual memory leaks. One simple way to
233do this is to compare two profiles -- both collected after the program
234has been running for a while. Specify the name of the first profile
235using the <code>--base</code> option. For example:</p>
236<pre>
237 % pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap
238</pre>
239
240<p>The memory-usage in <code>/tmp/profile.0004.heap</code> will be
241subtracted from the memory-usage in
242<code>/tmp/profile.0100.heap</code> and the result will be
243displayed.</p>
244
245<h3>Text display</h3>
246
247<pre>
248% pprof --text gfs_master /tmp/profile.0100.heap
249 255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer
250 184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create
251 176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState
252 169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone
253 76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc
254 49.5 4.8% 88.0% 49.5 4.8% hashtable::resize
255 ...
256</pre>
257
258<p>
259<ul>
260 <li> The first column contains the direct memory use in MB.
261 <li> The fourth column contains memory use by the procedure
262 and all of its callees.
263 <li> The second and fifth columns are just percentage
264 representations of the numbers in the first and fourth columns.
265 <li> The third column is a cumulative sum of the second column
266 (i.e., the <code>k</code>th entry in the third column is the
267 sum of the first <code>k</code> entries in the second column.)
268</ul>
269
270<h3>Ignoring or focusing on specific regions</h3>
271
272<p>The following command will give a graphical display of a subset of
273the call-graph. Only paths in the call-graph that match the regular
274expression <code>DataBuffer</code> are included:</p>
275<pre>
276% pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap
277</pre>
278
279<p>Similarly, the following command will omit all paths subset of the
280call-graph. All paths in the call-graph that match the regular
281expression <code>DataBuffer</code> are discarded:</p>
282<pre>
283% pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap
284</pre>
285
286<h3>Total allocations + object-level information</h3>
287
288<p>All of the previous examples have displayed the amount of in-use
289space. I.e., the number of bytes that have been allocated but not
290freed. You can also get other types of information by supplying a
291flag to <code>pprof</code>:</p>
292
293<center>
294<table frame=box rules=sides cellpadding=5 width=100%>
295
296<tr valign=top>
297 <td><code>--inuse_space</code></td>
298 <td>
299 Display the number of in-use megabytes (i.e. space that has
300 been allocated but not freed). This is the default.
301 </td>
302</tr>
303
304<tr valign=top>
305 <td><code>--inuse_objects</code></td>
306 <td>
307 Display the number of in-use objects (i.e. number of
308 objects that have been allocated but not freed).
309 </td>
310</tr>
311
312<tr valign=top>
313 <td><code>--alloc_space</code></td>
314 <td>
315 Display the number of allocated megabytes. This includes
316 the space that has since been de-allocated. Use this
317 if you want to find the main allocation sites in the
318 program.
319 </td>
320</tr>
321
322<tr valign=top>
323 <td><code>--alloc_objects</code></td>
324 <td>
325 Display the number of allocated objects. This includes
326 the objects that have since been de-allocated. Use this
327 if you want to find the main allocation sites in the
328 program.
329 </td>
330
331</table>
332</center>
333
334
335<h3>Interactive mode</a></h3>
336
337<p>By default -- if you don't specify any flags to the contrary --
338pprof runs in interactive mode. At the <code>(pprof)</code> prompt,
339you can run many of the commands described above. You can type
340<code>help</code> for a list of what commands are available in
341interactive mode.</p>
342
343
344<h1>Caveats</h1>
345
346<ul>
347 <li> Heap profiling requires the use of libtcmalloc. This
348 requirement may be removed in a future version of the heap
349 profiler, and the heap profiler separated out into its own
350 library.
351
352 <li> If the program linked in a library that was not compiled
353 with enough symbolic information, all samples associated
354 with the library may be charged to the last symbol found
355 in the program before the library. This will artificially
356 inflate the count for that symbol.
357
358 <li> If you run the program on one machine, and profile it on
359 another, and the shared libraries are different on the two
360 machines, the profiling output may be confusing: samples that
361 fall within the shared libaries may be assigned to arbitrary
362 procedures.
363
364 <li> Several libraries, such as some STL implementations, do their
365 own memory management. This may cause strange profiling
366 results. We have code in libtcmalloc to cause STL to use
367 tcmalloc for memory management (which in our tests is better
368 than STL's internal management), though it only works for some
369 STL implementations.
370
371 <li> If your program forks, the children will also be profiled
372 (since they inherit the same HEAPPROFILE setting). Each
373 process is profiled separately; to distinguish the child
374 profiles from the parent profile and from each other, all
375 children will have their process-id attached to the HEAPPROFILE
376 name.
377
378 <li> Due to a hack we make to work around a possible gcc bug, your
379 profiles may end up named strangely if the first character of
380 your HEAPPROFILE variable has ascii value greater than 127.
381 This should be exceedingly rare, but if you need to use such a
382 name, just set prepend <code>./</code> to your filename:
383 <code>HEAPPROFILE=./&Auml;gypten</code>.
384</ul>
385
386<hr>
387<address>Sanjay Ghemawat
388<!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
389</address>
390</body>
391</html>