Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 1 | gperftools |
| 2 | ---------- |
| 3 | (originally Google Performance Tools) |
| 4 | |
| 5 | The fastest malloc we’ve seen; works particularly well with threads |
| 6 | and STL. Also: thread-friendly heap-checker, heap-profiler, and |
| 7 | cpu-profiler. |
| 8 | |
| 9 | |
| 10 | OVERVIEW |
| 11 | --------- |
| 12 | |
| 13 | gperftools is a collection of a high-performance multi-threaded |
| 14 | malloc() implementation, plus some pretty nifty performance analysis |
| 15 | tools. |
| 16 | |
| 17 | gperftools is distributed under the terms of the BSD License. Join our |
| 18 | mailing list at gperftools@googlegroups.com for updates: |
| 19 | https://groups.google.com/forum/#!forum/gperftools |
| 20 | |
| 21 | gperftools was original home for pprof program. But do note that |
| 22 | original pprof (which is still included with gperftools) is now |
| 23 | deprecated in favor of Go version at https://github.com/google/pprof |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 24 | |
| 25 | |
| 26 | TCMALLOC |
| 27 | -------- |
| 28 | Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of |
| 29 | tcmalloc -- a replacement for malloc and new. See below for some |
| 30 | environment variables you can use with tcmalloc, as well. |
| 31 | |
| 32 | tcmalloc functionality is available on all systems we've tested; see |
| 33 | INSTALL for more details. See README_windows.txt for instructions on |
| 34 | using tcmalloc on Windows. |
| 35 | |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 36 | when compiling. gcc makes some optimizations assuming it is using its |
| 37 | own, built-in malloc; that assumption obviously isn't true with |
| 38 | tcmalloc. In practice, we haven't seen any problems with this, but |
| 39 | the expected risk is highest for users who register their own malloc |
| 40 | hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is |
| 41 | lowest for folks who use tcmalloc_minimal (or, of course, who pass in |
| 42 | the above flags :-) ). |
| 43 | |
| 44 | |
| 45 | HEAP PROFILER |
| 46 | ------------- |
Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 47 | See docs/heapprofile.html for information about how to use tcmalloc's |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 48 | heap profiler and analyze its output. |
| 49 | |
| 50 | As a quick-start, do the following after installing this package: |
| 51 | |
| 52 | 1) Link your executable with -ltcmalloc |
| 53 | 2) Run your executable with the HEAPPROFILE environment var set: |
| 54 | $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args] |
| 55 | 3) Run pprof to analyze the heap usage |
| 56 | $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options |
| 57 | $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap |
| 58 | |
| 59 | You can also use LD_PRELOAD to heap-profile an executable that you |
| 60 | didn't compile. |
| 61 | |
| 62 | There are other environment variables, besides HEAPPROFILE, you can |
| 63 | set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES" |
| 64 | below. |
| 65 | |
| 66 | The heap profiler is available on all unix-based systems we've tested; |
| 67 | see INSTALL for more details. It is not currently available on Windows. |
| 68 | |
| 69 | |
| 70 | HEAP CHECKER |
| 71 | ------------ |
Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 72 | See docs/heap_checker.html for information about how to use tcmalloc's |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 73 | heap checker. |
| 74 | |
| 75 | In order to catch all heap leaks, tcmalloc must be linked *last* into |
| 76 | your executable. The heap checker may mischaracterize some memory |
| 77 | accesses in libraries listed after it on the link line. For instance, |
| 78 | it may report these libraries as leaking memory when they're not. |
| 79 | (See the source code for more details.) |
| 80 | |
| 81 | Here's a quick-start for how to use: |
| 82 | |
| 83 | As a quick-start, do the following after installing this package: |
| 84 | |
| 85 | 1) Link your executable with -ltcmalloc |
| 86 | 2) Run your executable with the HEAPCHECK environment var set: |
| 87 | $ HEAPCHECK=1 <path/to/binary> [binary args] |
| 88 | |
| 89 | Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian |
| 90 | |
| 91 | You can also use LD_PRELOAD to heap-check an executable that you |
| 92 | didn't compile. |
| 93 | |
| 94 | The heap checker is only available on Linux at this time; see INSTALL |
| 95 | for more details. |
| 96 | |
| 97 | |
| 98 | CPU PROFILER |
| 99 | ------------ |
Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 100 | See docs/cpuprofile.html for information about how to use the CPU |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 101 | profiler and analyze its output. |
| 102 | |
| 103 | As a quick-start, do the following after installing this package: |
| 104 | |
| 105 | 1) Link your executable with -lprofiler |
| 106 | 2) Run your executable with the CPUPROFILE environment var set: |
| 107 | $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args] |
| 108 | 3) Run pprof to analyze the CPU usage |
| 109 | $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output |
| 110 | $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output |
| 111 | |
| 112 | There are other environment variables, besides CPUPROFILE, you can set |
| 113 | to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below. |
| 114 | |
| 115 | The CPU profiler is available on all unix-based systems we've tested; |
| 116 | see INSTALL for more details. It is not currently available on Windows. |
| 117 | |
| 118 | NOTE: CPU profiling doesn't work after fork (unless you immediately |
| 119 | do an exec()-like call afterwards). Furthermore, if you do |
| 120 | fork, and the child calls exit(), it may corrupt the profile |
| 121 | data. You can use _exit() to work around this. We hope to have |
| 122 | a fix for both problems in the next release of perftools |
| 123 | (hopefully perftools 1.2). |
| 124 | |
| 125 | |
| 126 | EVERYTHING IN ONE |
| 127 | ----------------- |
| 128 | If you want the CPU profiler, heap profiler, and heap leak-checker to |
| 129 | all be available for your application, you can do: |
| 130 | gcc -o myapp ... -lprofiler -ltcmalloc |
| 131 | |
| 132 | However, if you have a reason to use the static versions of the |
| 133 | library, this two-library linking won't work: |
| 134 | gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors! |
| 135 | |
| 136 | Instead, use the special libtcmalloc_and_profiler library, which we |
| 137 | make for just this purpose: |
| 138 | gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a |
| 139 | |
| 140 | |
| 141 | CONFIGURATION OPTIONS |
| 142 | --------------------- |
| 143 | For advanced users, there are several flags you can pass to |
Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 144 | './configure' that tweak tcmalloc performance. (These are in addition |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 145 | to the environment variables you can set at runtime to affect |
| 146 | tcmalloc, described below.) See the INSTALL file for details. |
| 147 | |
| 148 | |
| 149 | ENVIRONMENT VARIABLES |
| 150 | --------------------- |
| 151 | The cpu profiler, heap checker, and heap profiler will lie dormant, |
| 152 | using no memory or CPU, until you turn them on. (Thus, there's no |
| 153 | harm in linking -lprofiler into every application, and also -ltcmalloc |
| 154 | assuming you're ok using the non-libc malloc library.) |
| 155 | |
| 156 | The easiest way to turn them on is by setting the appropriate |
| 157 | environment variables. We have several variables that let you |
| 158 | enable/disable features as well as tweak parameters. |
| 159 | |
| 160 | Here are some of the most important variables: |
| 161 | |
| 162 | HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix |
| 163 | HEAPCHECK=<type> -- turns on heap checking with strictness 'type' |
| 164 | CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file. |
| 165 | PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code |
| 166 | surrounded with ProfilerEnable()/ProfilerDisable(). |
| 167 | CPUPROFILE_FREQUENCY=x-- how many interrupts/second the cpu-profiler samples. |
| 168 | |
Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 169 | PERFTOOLS_VERBOSE=<level> -- the higher level, the more messages malloc emits |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 170 | MALLOCSTATS=<level> -- prints memory-use stats at program-exit |
| 171 | |
| 172 | For a full list of variables, see the documentation pages: |
Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 173 | docs/cpuprofile.html |
| 174 | docs/heapprofile.html |
| 175 | docs/heap_checker.html |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 176 | |
| 177 | |
| 178 | COMPILING ON NON-LINUX SYSTEMS |
| 179 | ------------------------------ |
| 180 | |
| 181 | Perftools was developed and tested on x86 Linux systems, and it works |
| 182 | in its full generality only on those systems. However, we've |
| 183 | successfully ported much of the tcmalloc library to FreeBSD, Solaris |
| 184 | x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic |
| 185 | functionality in tcmalloc_minimal to Windows. See INSTALL for details. |
| 186 | See README_windows.txt for details on the Windows port. |
| 187 | |
| 188 | |
| 189 | PERFORMANCE |
| 190 | ----------- |
| 191 | |
| 192 | If you're interested in some third-party comparisons of tcmalloc to |
| 193 | other malloc libraries, here are a few web pages that have been |
| 194 | brought to our attention. The first discusses the effect of using |
| 195 | various malloc libraries on OpenLDAP. The second compares tcmalloc to |
| 196 | win32's malloc. |
| 197 | http://www.highlandsun.com/hyc/malloc/ |
| 198 | http://gaiacrtn.free.fr/articles/win32perftools.html |
| 199 | |
| 200 | It's possible to build tcmalloc in a way that trades off faster |
| 201 | performance (particularly for deletes) at the cost of more memory |
| 202 | fragmentation (that is, more unusable memory on your system). See the |
| 203 | INSTALL file for details. |
| 204 | |
| 205 | |
| 206 | OLD SYSTEM ISSUES |
| 207 | ----------------- |
| 208 | |
| 209 | When compiling perftools on some old systems, like RedHat 8, you may |
| 210 | get an error like this: |
| 211 | ___tls_get_addr: symbol not found |
| 212 | |
| 213 | This means that you have a system where some parts are updated enough |
| 214 | to support Thread Local Storage, but others are not. The perftools |
| 215 | configure script can't always detect this kind of case, leading to |
| 216 | that error. To fix it, just comment out (or delete) the line |
| 217 | #define HAVE_TLS 1 |
| 218 | in your config.h file before building. |
| 219 | |
| 220 | |
| 221 | 64-BIT ISSUES |
| 222 | ------------- |
| 223 | |
| 224 | There are two issues that can cause program hangs or crashes on x86_64 |
| 225 | 64-bit systems, which use the libunwind library to get stack-traces. |
| 226 | Neither issue should affect the core tcmalloc library; they both |
| 227 | affect the perftools tools such as cpu-profiler, heap-checker, and |
| 228 | heap-profiler. |
| 229 | |
| 230 | 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the |
| 231 | libc function dl_iterate_phdr() acquires its locks in the wrong |
| 232 | order. This bug should not affect tcmalloc, but may cause occasional |
| 233 | deadlock with the cpu-profiler, heap-profiler, and heap-checker. |
| 234 | Its likeliness increases the more dlopen() commands an executable has. |
| 235 | Most executables don't have any, though several library routines like |
| 236 | getgrgid() call dlopen() behind the scenes. |
| 237 | |
| 238 | 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the |
| 239 | cpu-profiler tool is unreliable: it will sometimes work, but sometimes |
| 240 | cause a segfault. I'll explain the problem first, and then some |
| 241 | workarounds. |
| 242 | |
| 243 | Note that this only affects the cpu-profiler, which is a |
| 244 | gperftools feature you must turn on manually by setting the |
| 245 | CPUPROFILE environment variable. If you do not turn on cpu-profiling, |
| 246 | you shouldn't see any crashes due to perftools. |
| 247 | |
| 248 | The gory details: The underlying problem is in the backtrace() |
| 249 | function, which is a built-in function in libc. |
| 250 | Backtracing is fairly straightforward in the normal case, but can run |
| 251 | into problems when having to backtrace across a signal frame. |
| 252 | Unfortunately, the cpu-profiler uses signals in order to register a |
| 253 | profiling event, so every backtrace that the profiler does crosses a |
| 254 | signal frame. |
| 255 | |
| 256 | In our experience, the only time there is trouble is when the signal |
| 257 | fires in the middle of pthread_mutex_lock. pthread_mutex_lock is |
| 258 | called quite a bit from system libraries, particularly at program |
| 259 | startup and when creating a new thread. |
| 260 | |
| 261 | The solution: The dwarf debugging format has support for 'cfi |
| 262 | annotations', which make it easy to recognize a signal frame. Some OS |
| 263 | distributions, such as Fedora and gentoo 2007.0, already have added |
| 264 | cfi annotations to their libc. A future version of libunwind should |
| 265 | recognize these annotations; these systems should not see any |
Brian Silverman | 20350ac | 2021-11-17 18:19:55 -0800 | [diff] [blame] | 266 | crashes. |
Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 267 | |
| 268 | Workarounds: If you see problems with crashes when running the |
| 269 | cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into |
| 270 | your code, rather than setting CPUPROFILE. This will profile only |
| 271 | those sections of the codebase. Though we haven't done much testing, |
| 272 | in theory this should reduce the chance of crashes by limiting the |
| 273 | signal generation to only a small part of the codebase. Ideally, you |
| 274 | would not use ProfilerStart()/ProfilerStop() around code that spawns |
| 275 | new threads, or is otherwise likely to cause a call to |
| 276 | pthread_mutex_lock! |
| 277 | |
| 278 | --- |
| 279 | 17 May 2011 |