Austin Schuh | 745610d | 2015-09-06 18:19:50 -0700 | [diff] [blame] | 1 | IMPORTANT NOTE FOR 64-BIT USERS |
| 2 | ------------------------------- |
| 3 | There are known issues with some perftools functionality on x86_64 |
| 4 | systems. See 64-BIT ISSUES, below. |
| 5 | |
| 6 | |
| 7 | TCMALLOC |
| 8 | -------- |
| 9 | Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of |
| 10 | tcmalloc -- a replacement for malloc and new. See below for some |
| 11 | environment variables you can use with tcmalloc, as well. |
| 12 | |
| 13 | tcmalloc functionality is available on all systems we've tested; see |
| 14 | INSTALL for more details. See README_windows.txt for instructions on |
| 15 | using tcmalloc on Windows. |
| 16 | |
| 17 | NOTE: When compiling with programs with gcc, that you plan to link |
| 18 | with libtcmalloc, it's safest to pass in the flags |
| 19 | |
| 20 | -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free |
| 21 | |
| 22 | when compiling. gcc makes some optimizations assuming it is using its |
| 23 | own, built-in malloc; that assumption obviously isn't true with |
| 24 | tcmalloc. In practice, we haven't seen any problems with this, but |
| 25 | the expected risk is highest for users who register their own malloc |
| 26 | hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is |
| 27 | lowest for folks who use tcmalloc_minimal (or, of course, who pass in |
| 28 | the above flags :-) ). |
| 29 | |
| 30 | |
| 31 | HEAP PROFILER |
| 32 | ------------- |
| 33 | See doc/heap-profiler.html for information about how to use tcmalloc's |
| 34 | heap profiler and analyze its output. |
| 35 | |
| 36 | As a quick-start, do the following after installing this package: |
| 37 | |
| 38 | 1) Link your executable with -ltcmalloc |
| 39 | 2) Run your executable with the HEAPPROFILE environment var set: |
| 40 | $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args] |
| 41 | 3) Run pprof to analyze the heap usage |
| 42 | $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options |
| 43 | $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap |
| 44 | |
| 45 | You can also use LD_PRELOAD to heap-profile an executable that you |
| 46 | didn't compile. |
| 47 | |
| 48 | There are other environment variables, besides HEAPPROFILE, you can |
| 49 | set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES" |
| 50 | below. |
| 51 | |
| 52 | The heap profiler is available on all unix-based systems we've tested; |
| 53 | see INSTALL for more details. It is not currently available on Windows. |
| 54 | |
| 55 | |
| 56 | HEAP CHECKER |
| 57 | ------------ |
| 58 | See doc/heap-checker.html for information about how to use tcmalloc's |
| 59 | heap checker. |
| 60 | |
| 61 | In order to catch all heap leaks, tcmalloc must be linked *last* into |
| 62 | your executable. The heap checker may mischaracterize some memory |
| 63 | accesses in libraries listed after it on the link line. For instance, |
| 64 | it may report these libraries as leaking memory when they're not. |
| 65 | (See the source code for more details.) |
| 66 | |
| 67 | Here's a quick-start for how to use: |
| 68 | |
| 69 | As a quick-start, do the following after installing this package: |
| 70 | |
| 71 | 1) Link your executable with -ltcmalloc |
| 72 | 2) Run your executable with the HEAPCHECK environment var set: |
| 73 | $ HEAPCHECK=1 <path/to/binary> [binary args] |
| 74 | |
| 75 | Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian |
| 76 | |
| 77 | You can also use LD_PRELOAD to heap-check an executable that you |
| 78 | didn't compile. |
| 79 | |
| 80 | The heap checker is only available on Linux at this time; see INSTALL |
| 81 | for more details. |
| 82 | |
| 83 | |
| 84 | CPU PROFILER |
| 85 | ------------ |
| 86 | See doc/cpu-profiler.html for information about how to use the CPU |
| 87 | profiler and analyze its output. |
| 88 | |
| 89 | As a quick-start, do the following after installing this package: |
| 90 | |
| 91 | 1) Link your executable with -lprofiler |
| 92 | 2) Run your executable with the CPUPROFILE environment var set: |
| 93 | $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args] |
| 94 | 3) Run pprof to analyze the CPU usage |
| 95 | $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output |
| 96 | $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output |
| 97 | |
| 98 | There are other environment variables, besides CPUPROFILE, you can set |
| 99 | to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below. |
| 100 | |
| 101 | The CPU profiler is available on all unix-based systems we've tested; |
| 102 | see INSTALL for more details. It is not currently available on Windows. |
| 103 | |
| 104 | NOTE: CPU profiling doesn't work after fork (unless you immediately |
| 105 | do an exec()-like call afterwards). Furthermore, if you do |
| 106 | fork, and the child calls exit(), it may corrupt the profile |
| 107 | data. You can use _exit() to work around this. We hope to have |
| 108 | a fix for both problems in the next release of perftools |
| 109 | (hopefully perftools 1.2). |
| 110 | |
| 111 | |
| 112 | EVERYTHING IN ONE |
| 113 | ----------------- |
| 114 | If you want the CPU profiler, heap profiler, and heap leak-checker to |
| 115 | all be available for your application, you can do: |
| 116 | gcc -o myapp ... -lprofiler -ltcmalloc |
| 117 | |
| 118 | However, if you have a reason to use the static versions of the |
| 119 | library, this two-library linking won't work: |
| 120 | gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors! |
| 121 | |
| 122 | Instead, use the special libtcmalloc_and_profiler library, which we |
| 123 | make for just this purpose: |
| 124 | gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a |
| 125 | |
| 126 | |
| 127 | CONFIGURATION OPTIONS |
| 128 | --------------------- |
| 129 | For advanced users, there are several flags you can pass to |
| 130 | './configure' that tweak tcmalloc performace. (These are in addition |
| 131 | to the environment variables you can set at runtime to affect |
| 132 | tcmalloc, described below.) See the INSTALL file for details. |
| 133 | |
| 134 | |
| 135 | ENVIRONMENT VARIABLES |
| 136 | --------------------- |
| 137 | The cpu profiler, heap checker, and heap profiler will lie dormant, |
| 138 | using no memory or CPU, until you turn them on. (Thus, there's no |
| 139 | harm in linking -lprofiler into every application, and also -ltcmalloc |
| 140 | assuming you're ok using the non-libc malloc library.) |
| 141 | |
| 142 | The easiest way to turn them on is by setting the appropriate |
| 143 | environment variables. We have several variables that let you |
| 144 | enable/disable features as well as tweak parameters. |
| 145 | |
| 146 | Here are some of the most important variables: |
| 147 | |
| 148 | HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix |
| 149 | HEAPCHECK=<type> -- turns on heap checking with strictness 'type' |
| 150 | CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file. |
| 151 | PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code |
| 152 | surrounded with ProfilerEnable()/ProfilerDisable(). |
| 153 | CPUPROFILE_FREQUENCY=x-- how many interrupts/second the cpu-profiler samples. |
| 154 | |
| 155 | TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits |
| 156 | MALLOCSTATS=<level> -- prints memory-use stats at program-exit |
| 157 | |
| 158 | For a full list of variables, see the documentation pages: |
| 159 | doc/cpuprofile.html |
| 160 | doc/heapprofile.html |
| 161 | doc/heap_checker.html |
| 162 | |
| 163 | |
| 164 | COMPILING ON NON-LINUX SYSTEMS |
| 165 | ------------------------------ |
| 166 | |
| 167 | Perftools was developed and tested on x86 Linux systems, and it works |
| 168 | in its full generality only on those systems. However, we've |
| 169 | successfully ported much of the tcmalloc library to FreeBSD, Solaris |
| 170 | x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic |
| 171 | functionality in tcmalloc_minimal to Windows. See INSTALL for details. |
| 172 | See README_windows.txt for details on the Windows port. |
| 173 | |
| 174 | |
| 175 | PERFORMANCE |
| 176 | ----------- |
| 177 | |
| 178 | If you're interested in some third-party comparisons of tcmalloc to |
| 179 | other malloc libraries, here are a few web pages that have been |
| 180 | brought to our attention. The first discusses the effect of using |
| 181 | various malloc libraries on OpenLDAP. The second compares tcmalloc to |
| 182 | win32's malloc. |
| 183 | http://www.highlandsun.com/hyc/malloc/ |
| 184 | http://gaiacrtn.free.fr/articles/win32perftools.html |
| 185 | |
| 186 | It's possible to build tcmalloc in a way that trades off faster |
| 187 | performance (particularly for deletes) at the cost of more memory |
| 188 | fragmentation (that is, more unusable memory on your system). See the |
| 189 | INSTALL file for details. |
| 190 | |
| 191 | |
| 192 | OLD SYSTEM ISSUES |
| 193 | ----------------- |
| 194 | |
| 195 | When compiling perftools on some old systems, like RedHat 8, you may |
| 196 | get an error like this: |
| 197 | ___tls_get_addr: symbol not found |
| 198 | |
| 199 | This means that you have a system where some parts are updated enough |
| 200 | to support Thread Local Storage, but others are not. The perftools |
| 201 | configure script can't always detect this kind of case, leading to |
| 202 | that error. To fix it, just comment out (or delete) the line |
| 203 | #define HAVE_TLS 1 |
| 204 | in your config.h file before building. |
| 205 | |
| 206 | |
| 207 | 64-BIT ISSUES |
| 208 | ------------- |
| 209 | |
| 210 | There are two issues that can cause program hangs or crashes on x86_64 |
| 211 | 64-bit systems, which use the libunwind library to get stack-traces. |
| 212 | Neither issue should affect the core tcmalloc library; they both |
| 213 | affect the perftools tools such as cpu-profiler, heap-checker, and |
| 214 | heap-profiler. |
| 215 | |
| 216 | 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the |
| 217 | libc function dl_iterate_phdr() acquires its locks in the wrong |
| 218 | order. This bug should not affect tcmalloc, but may cause occasional |
| 219 | deadlock with the cpu-profiler, heap-profiler, and heap-checker. |
| 220 | Its likeliness increases the more dlopen() commands an executable has. |
| 221 | Most executables don't have any, though several library routines like |
| 222 | getgrgid() call dlopen() behind the scenes. |
| 223 | |
| 224 | 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the |
| 225 | cpu-profiler tool is unreliable: it will sometimes work, but sometimes |
| 226 | cause a segfault. I'll explain the problem first, and then some |
| 227 | workarounds. |
| 228 | |
| 229 | Note that this only affects the cpu-profiler, which is a |
| 230 | gperftools feature you must turn on manually by setting the |
| 231 | CPUPROFILE environment variable. If you do not turn on cpu-profiling, |
| 232 | you shouldn't see any crashes due to perftools. |
| 233 | |
| 234 | The gory details: The underlying problem is in the backtrace() |
| 235 | function, which is a built-in function in libc. |
| 236 | Backtracing is fairly straightforward in the normal case, but can run |
| 237 | into problems when having to backtrace across a signal frame. |
| 238 | Unfortunately, the cpu-profiler uses signals in order to register a |
| 239 | profiling event, so every backtrace that the profiler does crosses a |
| 240 | signal frame. |
| 241 | |
| 242 | In our experience, the only time there is trouble is when the signal |
| 243 | fires in the middle of pthread_mutex_lock. pthread_mutex_lock is |
| 244 | called quite a bit from system libraries, particularly at program |
| 245 | startup and when creating a new thread. |
| 246 | |
| 247 | The solution: The dwarf debugging format has support for 'cfi |
| 248 | annotations', which make it easy to recognize a signal frame. Some OS |
| 249 | distributions, such as Fedora and gentoo 2007.0, already have added |
| 250 | cfi annotations to their libc. A future version of libunwind should |
| 251 | recognize these annotations; these systems should not see any |
| 252 | crashses. |
| 253 | |
| 254 | Workarounds: If you see problems with crashes when running the |
| 255 | cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into |
| 256 | your code, rather than setting CPUPROFILE. This will profile only |
| 257 | those sections of the codebase. Though we haven't done much testing, |
| 258 | in theory this should reduce the chance of crashes by limiting the |
| 259 | signal generation to only a small part of the codebase. Ideally, you |
| 260 | would not use ProfilerStart()/ProfilerStop() around code that spawns |
| 261 | new threads, or is otherwise likely to cause a call to |
| 262 | pthread_mutex_lock! |
| 263 | |
| 264 | --- |
| 265 | 17 May 2011 |