Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 1 | <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> |
| 2 | <HTML> |
| 3 | |
| 4 | <HEAD> |
| 5 | <link rel="stylesheet" href="designstyle.css"> |
| 6 | <title>Gperftools Heap Profiler</title> |
| 7 | </HEAD> |
| 8 | |
| 9 | <BODY> |
| 10 | |
| 11 | <p align=right> |
| 12 | <i>Last modified |
| 13 | <script type=text/javascript> |
| 14 | var lm = new Date(document.lastModified); |
| 15 | document.write(lm.toDateString()); |
| 16 | </script></i> |
| 17 | </p> |
| 18 | |
| 19 | <p>This is the heap profiler we use at Google, to explore how C++ |
| 20 | programs manage memory. This facility can be useful for</p> |
| 21 | <ul> |
| 22 | <li> Figuring out what is in the program heap at any given time |
| 23 | <li> Locating memory leaks |
| 24 | <li> Finding places that do a lot of allocation |
| 25 | </ul> |
| 26 | |
| 27 | <p>The profiling system instruments all allocations and frees. It |
| 28 | keeps track of various pieces of information per allocation site. An |
| 29 | allocation site is defined as the active stack trace at the call to |
| 30 | <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or, |
| 31 | <code>new</code>.</p> |
| 32 | |
| 33 | <p>There are three parts to using it: linking the library into an |
| 34 | application, running the code, and analyzing the output.</p> |
| 35 | |
| 36 | |
| 37 | <h1>Linking in the Library</h1> |
| 38 | |
| 39 | <p>To install the heap profiler into your executable, add |
| 40 | <code>-ltcmalloc</code> to the link-time step for your executable. |
| 41 | Also, while we don't necessarily recommend this form of usage, it's |
| 42 | possible to add in the profiler at run-time using |
| 43 | <code>LD_PRELOAD</code>: |
| 44 | <pre>% env LD_PRELOAD="/usr/lib/libtcmalloc.so" <binary></pre> |
| 45 | |
| 46 | <p>This does <i>not</i> turn on heap profiling; it just inserts the |
| 47 | code. For that reason, it's practical to just always link |
| 48 | <code>-ltcmalloc</code> into a binary while developing; that's what we |
| 49 | do at Google. (However, since any user can turn on the profiler by |
| 50 | setting an environment variable, it's not necessarily recommended to |
| 51 | install profiler-linked binaries into a production, running |
| 52 | system.) Note that if you wish to use the heap profiler, you must |
| 53 | also use the tcmalloc memory-allocation library. There is no way |
| 54 | currently to use the heap profiler separate from tcmalloc.</p> |
| 55 | |
| 56 | |
| 57 | <h1>Running the Code</h1> |
| 58 | |
| 59 | <p>There are several alternatives to actually turn on heap profiling |
| 60 | for a given run of an executable:</p> |
| 61 | |
| 62 | <ol> |
| 63 | <li> <p>Define the environment variable HEAPPROFILE to the filename |
| 64 | to dump the profile to. For instance, to profile |
| 65 | <code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p> |
| 66 | <pre>% env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre> |
| 67 | <li> <p>In your code, bracket the code you want profiled in calls to |
| 68 | <code>HeapProfilerStart()</code> and <code>HeapProfilerStop()</code>. |
| 69 | (These functions are declared in <code><gperftools/heap-profiler.h></code>.) |
| 70 | <code>HeapProfilerStart()</code> will take the |
| 71 | profile-filename-prefix as an argument. Then, as often as |
| 72 | you'd like before calling <code>HeapProfilerStop()</code>, you |
| 73 | can use <code>HeapProfilerDump()</code> or |
| 74 | <code>GetHeapProfile()</code> to examine the profile. In case |
| 75 | it's useful, <code>IsHeapProfilerRunning()</code> will tell you |
| 76 | whether you've already called HeapProfilerStart() or not.</p> |
| 77 | </ol> |
| 78 | |
| 79 | |
| 80 | <p>For security reasons, heap profiling will not write to a file -- |
| 81 | and is thus not usable -- for setuid programs.</p> |
| 82 | |
| 83 | <H2>Modifying Runtime Behavior</H2> |
| 84 | |
| 85 | <p>You can more finely control the behavior of the heap profiler via |
| 86 | environment variables.</p> |
| 87 | |
| 88 | <table frame=box rules=sides cellpadding=5 width=100%> |
| 89 | |
| 90 | <tr valign=top> |
| 91 | <td><code>HEAP_PROFILE_ALLOCATION_INTERVAL</code></td> |
| 92 | <td>default: 1073741824 (1 Gb)</td> |
| 93 | <td> |
| 94 | Dump heap profiling information each time the specified number of |
| 95 | bytes has been allocated by the program. |
| 96 | </td> |
| 97 | </tr> |
| 98 | |
| 99 | <tr valign=top> |
| 100 | <td><code>HEAP_PROFILE_INUSE_INTERVAL</code></td> |
| 101 | <td>default: 104857600 (100 Mb)</td> |
| 102 | <td> |
| 103 | Dump heap profiling information whenever the high-water memory |
| 104 | usage mark increases by the specified number of bytes. |
| 105 | </td> |
| 106 | </tr> |
| 107 | |
| 108 | <tr valign=top> |
| 109 | <td><code>HEAP_PROFILE_TIME_INTERVAL</code></td> |
| 110 | <td>default: 0</td> |
| 111 | <td> |
| 112 | Dump heap profiling information each time the specified |
| 113 | number of seconds has elapsed. |
| 114 | </td> |
| 115 | </tr> |
| 116 | |
| 117 | <tr valign=top> |
Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 118 | <td><code>HEAPPROFILESIGNAL</code></td> |
| 119 | <td>default: disabled</td> |
| 120 | <td> |
| 121 | Dump heap profiling information whenever the specified signal is sent to the |
| 122 | process. |
| 123 | </td> |
| 124 | </tr> |
| 125 | |
| 126 | <tr valign=top> |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 127 | <td><code>HEAP_PROFILE_MMAP</code></td> |
| 128 | <td>default: false</td> |
| 129 | <td> |
| 130 | Profile <code>mmap</code>, <code>mremap</code> and <code>sbrk</code> |
| 131 | calls in addition |
| 132 | to <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, |
| 133 | and <code>new</code>. <b>NOTE:</b> this causes the profiler to |
| 134 | profile calls internal to tcmalloc, since tcmalloc and friends use |
| 135 | mmap and sbrk internally for allocations. One partial solution is |
| 136 | to filter these allocations out when running <code>pprof</code>, |
| 137 | with something like |
| 138 | <code>pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc</code>. |
| 139 | </td> |
| 140 | </tr> |
| 141 | |
| 142 | <tr valign=top> |
| 143 | <td><code>HEAP_PROFILE_ONLY_MMAP</code></td> |
| 144 | <td>default: false</td> |
| 145 | <td> |
| 146 | Only profile <code>mmap</code>, <code>mremap</code>, and <code>sbrk</code> |
| 147 | calls; do not profile |
| 148 | <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, |
| 149 | or <code>new</code>. |
| 150 | </td> |
| 151 | </tr> |
| 152 | |
| 153 | <tr valign=top> |
| 154 | <td><code>HEAP_PROFILE_MMAP_LOG</code></td> |
| 155 | <td>default: false</td> |
| 156 | <td> |
| 157 | Log <code>mmap</code>/<code>munmap</code> calls. |
| 158 | </td> |
| 159 | </tr> |
| 160 | |
| 161 | </table> |
| 162 | |
| 163 | <H2>Checking for Leaks</H2> |
| 164 | |
| 165 | <p>You can use the heap profiler to manually check for leaks, for |
| 166 | instance by reading the profiler output and looking for large |
| 167 | allocations. However, for that task, it's easier to use the <A |
| 168 | HREF="heap_checker.html">automatic heap-checking facility</A> built |
| 169 | into tcmalloc.</p> |
| 170 | |
| 171 | |
| 172 | <h1><a name="pprof">Analyzing the Output</a></h1> |
| 173 | |
| 174 | <p>If heap-profiling is turned on in a program, the program will |
| 175 | periodically write profiles to the filesystem. The sequence of |
| 176 | profiles will be named:</p> |
| 177 | <pre> |
| 178 | <prefix>.0000.heap |
| 179 | <prefix>.0001.heap |
| 180 | <prefix>.0002.heap |
| 181 | ... |
| 182 | </pre> |
| 183 | <p>where <code><prefix></code> is the filename-prefix supplied |
| 184 | when running the code (e.g. via the <code>HEAPPROFILE</code> |
| 185 | environment variable). Note that if the supplied prefix |
| 186 | does not start with a <code>/</code>, the profile files will be |
| 187 | written to the program's working directory.</p> |
| 188 | |
| 189 | <p>The profile output can be viewed by passing it to the |
| 190 | <code>pprof</code> tool -- the same tool that's used to analyze <A |
| 191 | HREF="cpuprofile.html">CPU profiles</A>. |
| 192 | |
| 193 | <p>Here are some examples. These examples assume the binary is named |
| 194 | <code>gfs_master</code>, and a sequence of heap profile files can be |
| 195 | found in files named:</p> |
| 196 | <pre> |
| 197 | /tmp/profile.0001.heap |
| 198 | /tmp/profile.0002.heap |
| 199 | ... |
| 200 | /tmp/profile.0100.heap |
| 201 | </pre> |
| 202 | |
| 203 | <h3>Why is a process so big</h3> |
| 204 | |
| 205 | <pre> |
| 206 | % pprof --gv gfs_master /tmp/profile.0100.heap |
| 207 | </pre> |
| 208 | |
| 209 | <p>This command will pop-up a <code>gv</code> window that displays |
| 210 | the profile information as a directed graph. Here is a portion |
| 211 | of the resulting output:</p> |
| 212 | |
| 213 | <p><center> |
| 214 | <img src="heap-example1.png"> |
| 215 | </center></p> |
| 216 | |
| 217 | A few explanations: |
| 218 | <ul> |
| 219 | <li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB |
| 220 | of the live memory, which is 25% of the total live memory. |
| 221 | <li> <code>GFS_MasterChunkTable::UpdateState</code> is directly |
| 222 | accountable for 176.2 MB of the live memory (i.e., it directly |
| 223 | allocated 176.2 MB that has not been freed yet). Furthermore, |
| 224 | it and its callees are responsible for 729.9 MB. The |
| 225 | labels on the outgoing edges give a good indication of the |
| 226 | amount allocated by each callee. |
| 227 | </ul> |
| 228 | |
| 229 | <h3>Comparing Profiles</h3> |
| 230 | |
| 231 | <p>You often want to skip allocations during the initialization phase |
| 232 | of a program so you can find gradual memory leaks. One simple way to |
| 233 | do this is to compare two profiles -- both collected after the program |
| 234 | has been running for a while. Specify the name of the first profile |
| 235 | using the <code>--base</code> option. For example:</p> |
| 236 | <pre> |
| 237 | % pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap |
| 238 | </pre> |
| 239 | |
| 240 | <p>The memory-usage in <code>/tmp/profile.0004.heap</code> will be |
| 241 | subtracted from the memory-usage in |
| 242 | <code>/tmp/profile.0100.heap</code> and the result will be |
| 243 | displayed.</p> |
| 244 | |
| 245 | <h3>Text display</h3> |
| 246 | |
| 247 | <pre> |
| 248 | % pprof --text gfs_master /tmp/profile.0100.heap |
| 249 | 255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer |
| 250 | 184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create |
| 251 | 176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState |
| 252 | 169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone |
| 253 | 76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc |
| 254 | 49.5 4.8% 88.0% 49.5 4.8% hashtable::resize |
| 255 | ... |
| 256 | </pre> |
| 257 | |
| 258 | <p> |
| 259 | <ul> |
| 260 | <li> The first column contains the direct memory use in MB. |
| 261 | <li> The fourth column contains memory use by the procedure |
| 262 | and all of its callees. |
| 263 | <li> The second and fifth columns are just percentage |
| 264 | representations of the numbers in the first and fourth columns. |
| 265 | <li> The third column is a cumulative sum of the second column |
| 266 | (i.e., the <code>k</code>th entry in the third column is the |
| 267 | sum of the first <code>k</code> entries in the second column.) |
| 268 | </ul> |
| 269 | |
| 270 | <h3>Ignoring or focusing on specific regions</h3> |
| 271 | |
| 272 | <p>The following command will give a graphical display of a subset of |
| 273 | the call-graph. Only paths in the call-graph that match the regular |
| 274 | expression <code>DataBuffer</code> are included:</p> |
| 275 | <pre> |
| 276 | % pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap |
| 277 | </pre> |
| 278 | |
| 279 | <p>Similarly, the following command will omit all paths subset of the |
| 280 | call-graph. All paths in the call-graph that match the regular |
| 281 | expression <code>DataBuffer</code> are discarded:</p> |
| 282 | <pre> |
| 283 | % pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap |
| 284 | </pre> |
| 285 | |
| 286 | <h3>Total allocations + object-level information</h3> |
| 287 | |
| 288 | <p>All of the previous examples have displayed the amount of in-use |
| 289 | space. I.e., the number of bytes that have been allocated but not |
| 290 | freed. You can also get other types of information by supplying a |
| 291 | flag to <code>pprof</code>:</p> |
| 292 | |
| 293 | <center> |
| 294 | <table frame=box rules=sides cellpadding=5 width=100%> |
| 295 | |
| 296 | <tr valign=top> |
| 297 | <td><code>--inuse_space</code></td> |
| 298 | <td> |
| 299 | Display the number of in-use megabytes (i.e. space that has |
| 300 | been allocated but not freed). This is the default. |
| 301 | </td> |
| 302 | </tr> |
| 303 | |
| 304 | <tr valign=top> |
| 305 | <td><code>--inuse_objects</code></td> |
| 306 | <td> |
| 307 | Display the number of in-use objects (i.e. number of |
| 308 | objects that have been allocated but not freed). |
| 309 | </td> |
| 310 | </tr> |
| 311 | |
| 312 | <tr valign=top> |
| 313 | <td><code>--alloc_space</code></td> |
| 314 | <td> |
| 315 | Display the number of allocated megabytes. This includes |
| 316 | the space that has since been de-allocated. Use this |
| 317 | if you want to find the main allocation sites in the |
| 318 | program. |
| 319 | </td> |
| 320 | </tr> |
| 321 | |
| 322 | <tr valign=top> |
| 323 | <td><code>--alloc_objects</code></td> |
| 324 | <td> |
| 325 | Display the number of allocated objects. This includes |
| 326 | the objects that have since been de-allocated. Use this |
| 327 | if you want to find the main allocation sites in the |
| 328 | program. |
| 329 | </td> |
| 330 | |
| 331 | </table> |
| 332 | </center> |
| 333 | |
| 334 | |
| 335 | <h3>Interactive mode</a></h3> |
| 336 | |
| 337 | <p>By default -- if you don't specify any flags to the contrary -- |
| 338 | pprof runs in interactive mode. At the <code>(pprof)</code> prompt, |
| 339 | you can run many of the commands described above. You can type |
| 340 | <code>help</code> for a list of what commands are available in |
| 341 | interactive mode.</p> |
| 342 | |
| 343 | |
| 344 | <h1>Caveats</h1> |
| 345 | |
| 346 | <ul> |
| 347 | <li> Heap profiling requires the use of libtcmalloc. This |
| 348 | requirement may be removed in a future version of the heap |
| 349 | profiler, and the heap profiler separated out into its own |
| 350 | library. |
| 351 | |
| 352 | <li> If the program linked in a library that was not compiled |
| 353 | with enough symbolic information, all samples associated |
| 354 | with the library may be charged to the last symbol found |
| 355 | in the program before the library. This will artificially |
| 356 | inflate the count for that symbol. |
| 357 | |
| 358 | <li> If you run the program on one machine, and profile it on |
| 359 | another, and the shared libraries are different on the two |
| 360 | machines, the profiling output may be confusing: samples that |
| 361 | fall within the shared libaries may be assigned to arbitrary |
| 362 | procedures. |
| 363 | |
| 364 | <li> Several libraries, such as some STL implementations, do their |
| 365 | own memory management. This may cause strange profiling |
| 366 | results. We have code in libtcmalloc to cause STL to use |
| 367 | tcmalloc for memory management (which in our tests is better |
| 368 | than STL's internal management), though it only works for some |
| 369 | STL implementations. |
| 370 | |
| 371 | <li> If your program forks, the children will also be profiled |
| 372 | (since they inherit the same HEAPPROFILE setting). Each |
| 373 | process is profiled separately; to distinguish the child |
| 374 | profiles from the parent profile and from each other, all |
| 375 | children will have their process-id attached to the HEAPPROFILE |
| 376 | name. |
| 377 | |
| 378 | <li> Due to a hack we make to work around a possible gcc bug, your |
| 379 | profiles may end up named strangely if the first character of |
| 380 | your HEAPPROFILE variable has ascii value greater than 127. |
| 381 | This should be exceedingly rare, but if you need to use such a |
| 382 | name, just set prepend <code>./</code> to your filename: |
| 383 | <code>HEAPPROFILE=./Ägypten</code>. |
| 384 | </ul> |
| 385 | |
| 386 | <hr> |
| 387 | <address>Sanjay Ghemawat |
| 388 | <!-- Created: Tue Dec 19 10:43:14 PST 2000 --> |
| 389 | </address> |
| 390 | </body> |
| 391 | </html> |