Squashed 'third_party/gperftools/' changes from 54505f1d50..c25941200e c25941200e fix cmake gperftools_enable_libunwind invalid 43504ab709 tcmalloc page fences: add TCMALLOC_PAGE_FENCE_READABLE option d57a9ea8bc README: replace "golang" moniker with "Go" f7c6fb6c8e bump version to 2.9.1 c2f60400a8 prefer backtrace() on OSX a015377a54 Set tcmalloc heap limit prior to testing oom c939dd5531 correctly check sized delete hint when asserts are on 47b5b59ca9 bump version to 2.9 d7cbc8c2ff unbreak cmake build be0bbdb340 amputate various unused bits from elfcore.h 42bab59f25 liberate profile handler from linux_syscall_support 4629511e99 liberate spinlock futex waits from linux_syscall_support includes 2e7094a862 liberate malloc_hook_mmap_linux.h from linux_syscall_support 35301e2e59 add missing noopt wrappings around more operator new calls fa412adfe3 Fix thread-safety (annotalysis) annotations cc496aecb8 tcmalloc: Switch thread-safety annotations to support clang 96ba58e19b bump version to 2.9rc 9ce32aefa9 upgrade test bot to xenial (ubuntu 16.04 LTS) 91ff311449 don't default to generic_fp without frame pointers 4cf7dd0a75 enable emergency_malloc on all architectures with mmap 37087ec536 prefer libunwind on x86-64 even with -fno-omit-frame-pointer f4aa2a435e implement generic frame pointer backtracer 17bab484ae always respect --enable-frame-pointers 22c0eceddc add emacs mode line annotations to remaining files b12139ddba delete-trailing-whitespace on all files 419c85814d amputate unused dynamic annotations support 73a72cdb61 don't check for snprintf 95b52b0504 don't check for unused uname symbol 01c2697fac amputate unused SleepForMilliseconds from sysinfo.{h,cc} ac68c97187 don't check for useless __builtin_stack_pointer 7271bb72be unbreak cmake check for TLS support 7c106ca241 don't bother checking for stl namespace and use std 0d6f32b9ce use standard way to print size_t-sized ints 0c11d35f4a amputate checking for __int64 92718aaaeb amputate checking for conflict-signal.h 9bb2937261 amputate checking for inline keyword support d9c4c3b481 profile-handler: use documented sigev_notify_thread_id in sigevent 43459feb33 configure.ac: check for features.h once 290b123c01 atomicops: Remove Acquire_Store / Release_Load 3b1c60cc4e Add support for Elbrus 2000 (e2k) c5747615da syscall: Mark x8 as clobbered d8eb315fb1 bump version to 2.8.1 6ed61f8e91 add note that cmake build is preliminary 6bbf2ed150 Update cmake 913d3eb7d7 Fix a few macros for Apple 64a73b1cb8 Work on fixing linking errors in stacktrace b788d51eb4 Fix conditional definitions 495229b625 Make internal tcmalloc libs cca7f6f669 More unit tests and libraries 11dc65c3c4 Fix config headers, add more unit tests 6078fe40d9 Finish configure.ac conversion to CMake, start on Makefile.am 515fb22196 Generate config header 4adb5ff74d Add architecture checks fa9bedc82c Add most of CMake build 9e4f72fd1f Define options, start system checks a6ce98174b Add CMakeLists.txt 3134955875 Additional porting for riscv64. f0e289bdbb Enable build on riscv64. 6c715b4fa1 docs: fix simple typo, defininitions -> definitions 02d5264018 Revert "drop page heap lock when returning memory back to kernel" 151cbf5146 Add OS X arm64 program counter 140e3481d0 Merge pull request #1231 from PatriosTheGreat/master 0fc5cabdfc Fix implicit-int-float-conversion warning. bda3c82e11 Increase kMaxStackDepth to 254 1d9b8bb59d don't test sbrk hook if we're on linux and don't have __sbrk 180bfa10d7 bumped version to 2.8 c1bcc412ba Don't try to mark esp as clobbered in linux syscall support. 50f89afaed liberate gperftools tests from relying on -fno-builtin-XXX flags 98ccd0f102 prevent inlining in heap-checker unittest e521472f1a fix linking of page_heap_test on windows e5f77d6485 chmod -x Makefile.am gperftools.sln 6b92e00cec don't assume HAVE_MMAP on mingw builds 4cddede399 New ProfilerGetStackTrace() db7aa547ab bumped version to 2.8rc be3da70298 drop page heap lock when returning memory back to kernel 87acc2782f amputate span events history e40c7f231a Fix mmap syscall on s390 b7607ee7d4 tcmalloc: ability to disable fallback allocator in memfs 1bfcb5bc3a tcmalloc: fragmentation overhead instrumentation 36bf1309de Fix a clang-tidy readability warning for static member access 2b2a962c2b Remove executable flag for c++ files 8f308afbfe Increase kClassSizesMax to 128 to allow for page size of 4K d3fefdb694 Allow configuring page size to 4K, 8K, 16K, 32K, 64K, 128K and 256K cf2df3b000 Fix the removed std::allocator::pointer member type removed in C++20 31024506c5 Add mips64* support fe62a0baab Update config.h in Windows 8272631b5a Fix a long time typo c1d546d7b2 never test and always default HAVE_MMAP to on fba6ce0e7a Fix build on FreeBSD 98ac4ee9bc Fix typos 9e5b162873 don't try to mark rsp as clobbered in linux syscall support 1e36ed7055 Use initial exec TLS model for all thread local variables from thread_cache.cc 8f9a873fce Fix accessing PC on FreeBSD/powerpc and powerpc64 fc00474ddc Include asm/ptrace.h when accessing ucontext_t 5574c87e39 Compile time aggressive decommit option e9ab4c5304 undef mmap64 function 5eec9d0ae3 Drop not very portable and not very useful unwind benchmark. 1561f0946f check for __sbrk 1de76671d4 Fix mmap region iteration while no regions are recorded. acdcacc28f Use off64_t instead of __off64_t 0177a2420a Return early in WriteProfile to reduce indentation b85652bf26 Add generic.total_physical_bytes property to MallocExtension 90df23c81f Make some tcmalloc constants truly const 49dbe4362b Add comment about gperftools 2.8 not deduplicating heapz samples. 63a12a5ed3 Drop de-duplication of heap sample (aka heapz) entries. 954f9dc0e3 Add flag to disable installing unmaintained & deprecated pprof. 893bff51bc Avoid static initialization of pprof path for symbolization. 69867c523b Clean up MSVC projects f2bca77aed Fix page_heap_test flakiness c41688bf20 Use standard-conforming alignof in debugallocation.cc 71c8cedaca Fix incompatible aliasing warnings 8dd3040358 Format and fix out of bound access in CpuProfilerSwitch 467502e70a provide constexpr constructor for Sampler 1fb543cc70 Patch _free_dbg to make Debug mode in MSVC works 267f431d80 Use indirect system calls in the linux spinlock implementation 73ee9b1544 Use indirect system calls in the mmap malloc hooks. 3af509d4f9 benchmark: use angle brackets to include ucontext.h 0cdda6d7cc use utf-8 for special symbols c7a0cfda88 Fix potential missing nul character on resolved symbol names e42bfc8c06 tcmalloc: use relative addresses with the windows addr2line wrapper d8f8d1cced tcmalloc: add long form flag '--exe' to specify the binary 25c53aca12 tcmalloc: fixes for the windows addr2line wrapper f02e28f348 Replace builtin_expect configure test with a direct GCC compiler check 62c4eca6e7 Under x64, the PE loader looks for callbacks in constant sections 0b588e7490 Fix uninitialized memory use in sampler_test 51a5613f21 Upgrade MSVC projects to MSVC2015 44da4ce539 build with c++11 or later f47a52ce85 Make _recalloc adhere to MS's definition fe87ffb7ea Disable large allocation report by default 9608fa3bcf bumped version to 2.7 db890ccfad Clean up src/windows/config.h 497ea33165 Fix WIN32_OVERRIDE_ALLOCATORS for VS2017 ebc85cca90 Enable aligned new/delete declarations on Windows when applicable a3badd6d21 Really fix CheckAddressBits compilation warning 7c718fe176 Add tests for sized deallocation 30e5e614a8 Fix build without static libraries 836c4f29a5 Update documentation for heap_checker.html e47d0d1c51 powerpc: Re-enable VDSO support 0a66dd3a6a linux: add aarch64_ilp32 support. 05dff09663 Fix signature of sbrk. 33ae0ed2ae unbreak compilation on GNU/Linux i386 977e0d4500 Remove not needed header in vdso_support.cc. 36bfa9a404 Enable tcmalloc VDSO support only on x86 to reduce static initializers 1cb5de6db9 Explicitly prevent int overflow 8f63f2bb98 Correctly detect presence of various functions in tcmalloc.h 736648887b Don't test OOM handling of debugallocator c4a8e00da4 Fix warning about one of CheckAddressBits functions unused 47c99cf492 unbreak printing large span stats 34f78a2dcd bumped version to 2.7rc db98aac55a Add a central free list for kMaxPages-sized spans d7be938560 implement more robust detection of sized delete support f1d3fe4a21 refactored handling of reverse span set iterator for correctness 59c77be0fa Update docs for central page heap to reflect tree 06c9414ec4 Implemented O(log n) searching among large spans a42e44738a typo in docs/tcmalloc.html 71bf09aabe bumped version to 2.6.3 0bccb5e658 fix malloc fast path for patched windows functions 8b1d13c631 configure.ac: use link check for std::align_val_t 36ab068baa configure.ac: better test for -faligned-new 6a4b079997 bumped version to 2.6.2 2291714518 implement fast-path for memalign/aligned_alloc/tc_new_aligned 8b9728b023 add memalign benchmark to malloc_bench 79c91a9810 always define empty PERFTOOLS_NOTHROW 03da6afff5 unbreak throw declarations on operators new/delete 89fe59c831 Fix OOM handling in fast-path a29a0cf348 delete-trailing-whitespace on thread_cache.* e6cd69bdec reintroduce aliasing for aligned delete fb30c3d435 fully disable aligned new on windows for now 7efb3ecf37 Add support for C++17 operator new/delete for overaligned types. 7a6e25f3b1 Add new statistics for the PageHeap 6e3a702fb9 Fix data race setting size_left_ in ThreadCache::SetMaxSize 235471f965 fix memory leak in Symbolize function 47efdd60f5 Added mising va_end() in TracePrintf function 497b60ef0f Implemented GetProgramInvocationName on FreeBSD ac072a3fc7 Revert "Ignore current_instance heap allocation when leak sanitizer is enabled" fb5987d579 Revert "Ensure that lsan flags are appended on all necessary targets" 5815f02105 Use safe getenv for setting up backtrace capturing method aab4277311 Fixed LTO warning about the mismatch between return values for ProfilingIsEnabledForAllThreads() d406f22853 implement support for C11 aligned_alloc 92a27e41a1 Fix build on macOS. e033431e5a include fcntl.h for loff_t definition e41bc41404 Use ucontext_t instead of struct ucontext bf840dec04 bumped version to 2.6.1 2d220c7e26 Replace "throw()" by "PERFTOOLS_NOTHROW" c4de73c0e6 Add PERFTOOLS_THROW where necessary (as detected by GCC). e5fbd0e24e Rename PERFTOOLS_THROW into PERFTOOLS_NOTHROW. eeb7b84c20 Register tcmalloc atfork handler as early as possible 208c26caef Add initial syscall support for mips64 32-bit ABI a3bf61ca81 Ensure that lsan flags are appended on all necessary targets 97646a1932 Add missing NEWS entry for recent 2.6 release 4be05e43a1 bumped version up to 2.6 70a35422b5 Ignore current_instance heap allocation when leak sanitizer is enabled 6eca6c64fa Revert "issue-654: [pprof] handle split text segments" a495969cb6 update the prev_class_size in each loop, or the min_object_size of tcmalloc.thread will always be 1 when calling GetFreeListSizes 163224d8af Document HEAPPROFILESIGNAL environment variable 5ac82ec5b9 added stacktrace capturing benchmark c571ae2fc9 2.6rc4 f2bae51e7e Revert "Revert "disable dynamic sized delete support by default"" 6426c0cc80 2.6rc3 0c0e2fe43b enable 48-bit page map on msvc as well 83d6818295 speed up 3-level page map access f7ff175b92 add configure-time warning on unsupported backtrace capturing cef582350c align fast-path functions only if compiler supports that bddf862b18 actually support very early freeing of NULL 07a124d8c1 don't use arg-ful constructor attribute for early nallocx test 5346b8a4de don't depend on SIZE_MAX definition in sampler.cc 50125d8f70 2.6rc2 a5e8e42a47 don't link-in libunwind if libunwind.h is missing e92acdf98d Fix compilation error for powerpc32 b48403a4b0 2.6rc 53f15325d9 fix compilation of tcmalloc_unittest.cc on older llvm-gcc b1d88662cb change size class to be represented by 32 bit int 991f47a159 change default transfer batch back to 32 7bc34ad1f6 support different number of size classes at runtime 4585b78c8d massage allocation and deallocation fast-path for performance 5964a1d9c9 always inline a number of hot functions e419b7b9a6 introduce ATTRIBUTE_ALWAYS_INLINE 7d588da7ec synchronized Sampler implementation with Google-internal version 27da4ade70 reduce size of class_to_size_ array 335f09d4e4 use static location for pageheap 6ff332fb51 move size classes map earlier in SizeMap 121b1cb32e slightly faster size class cache b57c0bad41 init tcmalloc prior to replacing system alloc 71fa9f8730 use 2-level page map for 48-bit addresses bad70249dd use 48-bit addresses on 64-bit arms too 5f12147c6d use hidden visibility for some key global variables dfd53da578 set ENOMEM in handle_oom 14fd551072 avoid O(N²) in thread cache creation code 507a105e84 pass original size to DoSampledAllocation bb77979dea don't declare throw() on malloc funtions since it is faster 89c74cb79c handle duplicate google_malloc frames in malloc hook stack trace 0feb1109ac fix stack trace capturing in debug malloc 0506e965ee replace LIKELY/UNLIKELY with PREDICT_{TRUE,FALSE} 59a4987054 prevent inlining ATTRIBUTE_SECTION functions ebb575b8a0 Revert "enabled aggressive decommit by default" b82d89cb7c Revert "disable dynamic sized delete support by default" fac0bb44d5 Do not depend on memchr in commandlineflags::StringToBool 7d49f015a0 Make GetenvBeforeMain work inside ifunc handler a2550b6309 turn bench_fastpath_throughput into actual throughput benchmark b762b1a492 added sized free benchmarks to malloc_bench 71ffc1cd6b added free lists randomization step to malloc_bench 732dfeb83d Run StartStopNoOptionsEmpty profiledata unittest cbb312fbe8 aggressive decommit: only free necessary regions and fix O(N²) 6d98223a90 don't build with -fno-exceptions d6a1931cce fixed warning in casting heap of checker's main_thread_counter 5c778701d9 added tcmalloc minimal unittest with ASSERTs checked a9167617ab drop unused g_load_map variable in patch_functionc.cc d52e56dcb5 don't compare integer to NULL bae00c0341 add fake_stacktrace_scope to few msvc projects 79aab4fed4 correctly dllexport nallocx on windows b010895a08 don't undef PERFTOOLS_DLL_DECL 491b1aca7e don't try to use pthread_atfork on windows 691045b957 suppress warnings from legacy headers while building legacy headers test 22f7ceb97a use unsigned for few flags in mini_disassembler_types.h 9b17a8a5ba remove superfluous size_t value >= 0 check 86ce69d77f Update binary_trees.cc cd8586ed6c Fix path names in README 98753aa737 test that sized deallocation really works before enabling it 5618ef7850 Don't assume memalign exists in memalign vs nallocx test bf640cd740 rename sys allocator's sys_alloc symbol to tcmalloc_sys_alloc 069e3b1655 build malloc_bench_shared_full only when full tcmalloc is built b8f9d0d44f ported nallocx support from Google-internal tcmalloc b0abefd938 Fix a typo in the page fence flag declaration 855b380006 replace docs by doc 664210ead8 doc -> docs, with symlink 75dc9a6e14 Fix Post(s)cript tyos dde32f8bbc Fix unaligned memory accesses in debug allocator 02eeed29df Fix redefinition of mmap on aarch64. c07a15cff4 [windows] patch _free_base as well acac6af26b Fix finding default zone on macOS sierra 7822b5b0b9 Stop using glibc malloc hooks c92f0ed089 Remove references to __malloc_initialize_hook 9709eef361 Merge pull request #821 from jtmcdole/patch-1 44f276e132 Rename TCMALLOC_DEBUG to PERFTOOLS_VERBOSE eb474c995e Summary: support gcc atomic ops on clang too 7f86eab1f3 Recognize .node files as shared libraries bf8eacce69 Add support for 31-bit s390; merge linux_syscall_support.h changes from upstream. c54218069b Update README 06f4ce65c2 Small performance tweak: avoid calling time() if we don't need it db8d483609 Autogenerate ChangeLog from git on make dist 4a13598319 renamed ChangeLog to ChangeLog.old 7852eeb75b Use initial-exec tls for libunwind's recursion flag a07f9fe75a gerftools -> gperftools in readme 9fd6d26879 added define to enable MADV_FREE usage on Linux 6f7a14f45e Don't use MADV_FREE on Linux 55cf6e6281 Fix symbol resolution on OSX 8e85843622 added simple .travis.yml config 05e40d29c0 Recognize modern Linux ARM 632de2975e bumped version up to 2.5 6682016092 Unbreak profiling with CPUPROFILE_FREQUENCY=1 6ff86ff6a7 bumped version to 2.4.91 for 2.5rc2 782165fa7f build sized delete aliases even when sized-delete is disabled 06811b3ae4 disable dynamic sized delete support by default d4d99eb608 unbreak compilation with visual studio 126d4582c1 Call function pointers with the right type e0fa28ef7d Don't shift a type by more than its width a1c764d263 Initialize counters in test 22123a37c2 Don't overflow a signed integer 66e1e94f38 added minimal "header section" to README 2804b7cfee bumped version to 2.5rc f47fefbfc1 updated NEWS for 2.5rc cef6036174 alias same malloc/free variants to their canonical versions ea8d242061 Re-enable MultipleIdleNonIdlePhases test c9962f698b added maybe_emergency_malloc.h to Makefile.am 7dd4af6536 don't round up sizes for large allocation when sampling 4f3410e759 enable emergency malloc by default on arm when using libunwind 7f12051dbe implemented emergency malloc 3ee2360250 replaced invalid uses of __THROW 013b82abcf unbreak <malloc.h> inclusion in gperftools/tcmalloc.h 19903e6f15 drop detection of sys/malloc.h and malloc/malloc.h cdff090ebd Fix several harmless clang warnings 9095ed0840 implemented stacktrace capturing via libgcc's C++ ABI function 728cbe1021 force profiler_unittest to do 'real' work fff6b4fb88 Extend low-level allocator to support custom pages allocator 32d9926795 added malloc_bench_shared_full 00d8fa1ef8 always use real throw() on operators new/delete 08e034ad59 Detect working ifunc before enabling dynamic sized delete support a788f354a0 include unistd.h for getpid in thread_lister.c 644a6bdbdb Add support for Linux s390x bab7753aad Fix typo in heap-checker-death_unittest.sh 17182e1d3c Fix include of malloc_hook_c.h in malloc_hook.h c69721b2b2 Add support for obtaining cache size of the current thread and softer idling 5ce42e535d Don't always arm the profiling timer. 7f801ea091 Make sure the alias is not removed by link-time optimization when it can prove that it isn't used by the program, as it might still be needed to override the corresponding symbol in shared libraries (or inline assembler for that matter). For example, suppose the program uses malloc and free but not calloc and is statically linked against tcmalloc (built with -flto) and LTO is done. Then before this patch the calloc alias would be deleted by LTO due to not being used, but the malloc/free aliases would be kept because they are used by the program. Suppose the program is dynamically linked with a shared library that allocates memory using calloc and later frees it by calling free. Then calloc will use the libc memory allocator, because the calloc alias was deleted, but free will call into tcmalloc, resulting in a crash. 6b3e6ef5e0 don't retain compatibility with old docdir behavior ccffcbd9e9 support use of configure --docdir argument 050f2d28be use alias attribute only for elf platforms 07b0b21ddd fix compilation error in spinlock e14450366a Added better description for GetStats API 64892ae730 lower default transfer batch size down to 512 6fdfc5a7f4 implemented enabling sized-delete support at runtime c2a79d063c use x86 pause in spin loop 0fb6dd8aa3 added binary_trees benchmark a8852489e5 drop unsupported allocation sampling code in tcmalloc_minimal a9db0ae516 implemented (disabled by default) sized delete support 88686972b9 pass -fsized-deallocation to gcc 5 0a18fab3af implemented sized free support via tc_free_sized 464688ab6d speedup free code path by dropping "fast path allowed check" 10f7e20716 added SizeMap::MaybeSizeClass 436e1dea43 slightly faster GetCacheIfPresent 04df911915 tell compiler that non-empty hooks are unlikely 8cc75acd1f correctly test for -Wno-unused-result support 7753d8239b fixed clang warning about shifting negative values ae09ebb383 Fix tmpdir usage in heap-profiler_unittest.sh df34e71b57 use $0 when referring to pprof 7773ea64ee Alignment fix to static variables for system allocators c46eb1f3d2 Fixed printf misuse in pprof - printed string was passed as format. Better use print instead 9bbed8b1a8 Fixed assembler argument passing inside _syscall6 on MIPS - it was causing 'Expression too complex' compilation errors in spinlock 962aa53c55 added more fastpath microbenchmarks 347a830689 Ensure that PPROF_PATH is set for debugallocation_test a9059b7c30 prevent clang from inlining Mallocer in heap checker unittest 6627f9217d drop cycleclock f985abc296 amputate unportable and unused stuff from sysinfo 16408eb4d7 amputated wait_cycles accounting in spinlocks fedceef40c drop cycleclock reference in ThreadCache d7fdc3fc9d dropped unused and unsupported synchronization profiling facility 3a054d37c1 dropped unused SpinLockWait function 5b62d38329 avoid checking for dup. entries on empty backtrace 7b9ded722e fixed compiler warning in memory_region_map.cc 4194e485cb Don't link libtcmalloc_minimal.so to libpthread.so 121038308d Check if _MSC_VER is defined to avoid warnings 7367322995 Make default config.h work with VS2015 ae0a444db0 Ensure ThreadCache objects are CACHELINE_ALIGNED. ea0b1d3154 unbreak TestErrno again e53aef24ad don't try to test memalign on windows 7707582448 Merge pull request #717 from myrsloik/master 9eb63bddfb Use correct mangled new and delete symbols on windows x64 5078abdb33 Don't discard curl options if timeout is not defined. Change-Id: If80121a97ff4c18289c6ebff7ccea1d1b355ec89 git-subtree-dir: third_party/gperftools git-subtree-split: c25941200ef4ce39d0774c1332ff7abfbeab7035 Signed-off-by: Brian Silverman <bsilver16834@gmail.com>

commit: 20350acc2cd2f9271064477a1605129ef9585e6c [log] [tgz]
author: Brian Silverman <bsilver16384@gmail.com> Wed Nov 17 18:19:55 2021 -0800
committer: Brian Silverman <brian.silverman@bluerivertech.com> Wed Nov 17 18:19:55 2021 -0800
tree: d0014a348441d5d70408012fc5e485c35b45220e
parent: 745610d16119f59479f84918a66456ece9d6d461 [diff]
diff --git a/src/addressmap-inl.h b/src/addressmap-inl.h
index fd1dc5b..524aff6 100644
--- a/src/addressmap-inl.h
+++ b/src/addressmap-inl.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/base/arm_instruction_set_select.h b/src/base/arm_instruction_set_select.h
index 6fde685..77ff670 100644
--- a/src/base/arm_instruction_set_select.h
+++ b/src/base/arm_instruction_set_select.h

@@ -1,3 +1,4 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2011, Google Inc.
 // All rights reserved.
 //

diff --git a/src/base/atomicops-internals-arm-generic.h b/src/base/atomicops-internals-arm-generic.h
index d0f9413..cfa6143 100644
--- a/src/base/atomicops-internals-arm-generic.h
+++ b/src/base/atomicops-internals-arm-generic.h

@@ -122,11 +122,6 @@
   pLinuxKernelMemoryBarrier();
 }
 
-inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
-  *ptr = value;
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic32* ptr, Atomic32 value) {
   MemoryBarrier();
   *ptr = value;
@@ -142,11 +137,6 @@
   return value;
 }
 
-inline Atomic32 Release_Load(volatile const Atomic32* ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 
 // 64-bit versions are not implemented yet.
 
@@ -185,10 +175,6 @@
   NotImplementedFatalError("NoBarrier_Store");
 }
 
-inline void Acquire_Store(volatile Atomic64* ptr, Atomic64 value) {
-  NotImplementedFatalError("Acquire_Store64");
-}
-
 inline void Release_Store(volatile Atomic64* ptr, Atomic64 value) {
   NotImplementedFatalError("Release_Store");
 }
@@ -203,11 +189,6 @@
   return 0;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64* ptr) {
-  NotImplementedFatalError("Atomic64 Release_Load");
-  return 0;
-}
-
 inline Atomic64 Acquire_CompareAndSwap(volatile Atomic64* ptr,
                                        Atomic64 old_value,
                                        Atomic64 new_value) {

diff --git a/src/base/atomicops-internals-arm-v6plus.h b/src/base/atomicops-internals-arm-v6plus.h
index 35f1048..af2920a 100644
--- a/src/base/atomicops-internals-arm-v6plus.h
+++ b/src/base/atomicops-internals-arm-v6plus.h

@@ -136,11 +136,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
-  *ptr = value;
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic32* ptr, Atomic32 value) {
   MemoryBarrier();
   *ptr = value;
@@ -156,11 +151,6 @@
   return value;
 }
 
-inline Atomic32 Release_Load(volatile const Atomic32* ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 // 64-bit versions are only available if LDREXD and STREXD instructions
 // are available.
 #ifdef BASE_ATOMICOPS_HAS_LDREXD_AND_STREXD
@@ -288,11 +278,6 @@
 
 #endif // BASE_ATOMICOPS_HAS_LDREXD_AND_STREXD
 
-inline void Acquire_Store(volatile Atomic64* ptr, Atomic64 value) {
-  NoBarrier_Store(ptr, value);
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic64* ptr, Atomic64 value) {
   MemoryBarrier();
   NoBarrier_Store(ptr, value);
@@ -304,11 +289,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64* ptr) {
-  MemoryBarrier();
-  return NoBarrier_Load(ptr);
-}
-
 inline Atomic64 Acquire_CompareAndSwap(volatile Atomic64* ptr,
                                        Atomic64 old_value,
                                        Atomic64 new_value) {

diff --git a/src/base/atomicops-internals-gcc.h b/src/base/atomicops-internals-gcc.h
index f8d2786..4f81ce3 100644
--- a/src/base/atomicops-internals-gcc.h
+++ b/src/base/atomicops-internals-gcc.h

@@ -57,7 +57,7 @@
                                          Atomic32 old_value,
                                          Atomic32 new_value) {
   Atomic32 prev_value = old_value;
-  __atomic_compare_exchange_n(ptr, &prev_value, new_value, 
+  __atomic_compare_exchange_n(ptr, &prev_value, new_value,
           0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
   return prev_value;
 }
@@ -81,7 +81,7 @@
                                        Atomic32 old_value,
                                        Atomic32 new_value) {
   Atomic32 prev_value = old_value;
-  __atomic_compare_exchange_n(ptr, &prev_value, new_value, 
+  __atomic_compare_exchange_n(ptr, &prev_value, new_value,
           0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
   return prev_value;
 }
@@ -90,7 +90,7 @@
                                        Atomic32 old_value,
                                        Atomic32 new_value) {
   Atomic32 prev_value = old_value;
-  __atomic_compare_exchange_n(ptr, &prev_value, new_value, 
+  __atomic_compare_exchange_n(ptr, &prev_value, new_value,
           0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
   return prev_value;
 }
@@ -99,11 +99,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
-  *ptr = value;
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic32* ptr, Atomic32 value) {
   MemoryBarrier();
   *ptr = value;
@@ -119,18 +114,13 @@
   return value;
 }
 
-inline Atomic32 Release_Load(volatile const Atomic32* ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 // 64-bit versions
 
 inline Atomic64 NoBarrier_CompareAndSwap(volatile Atomic64* ptr,
                                          Atomic64 old_value,
                                          Atomic64 new_value) {
   Atomic64 prev_value = old_value;
-  __atomic_compare_exchange_n(ptr, &prev_value, new_value, 
+  __atomic_compare_exchange_n(ptr, &prev_value, new_value,
           0, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
   return prev_value;
 }
@@ -154,7 +144,7 @@
                                        Atomic64 old_value,
                                        Atomic64 new_value) {
   Atomic64 prev_value = old_value;
-  __atomic_compare_exchange_n(ptr, &prev_value, new_value, 
+  __atomic_compare_exchange_n(ptr, &prev_value, new_value,
           0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
   return prev_value;
 }
@@ -163,7 +153,7 @@
                                        Atomic64 old_value,
                                        Atomic64 new_value) {
   Atomic64 prev_value = old_value;
-  __atomic_compare_exchange_n(ptr, &prev_value, new_value, 
+  __atomic_compare_exchange_n(ptr, &prev_value, new_value,
           0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
   return prev_value;
 }
@@ -172,11 +162,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic64* ptr, Atomic64 value) {
-  *ptr = value;
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic64* ptr, Atomic64 value) {
   MemoryBarrier();
   *ptr = value;
@@ -192,11 +177,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64* ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 }  // namespace base::subtle
 }  // namespace base
 

diff --git a/src/base/atomicops-internals-linuxppc.h b/src/base/atomicops-internals-linuxppc.h
index b52fdf0..5c4d03c 100644
--- a/src/base/atomicops-internals-linuxppc.h
+++ b/src/base/atomicops-internals-linuxppc.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2008, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -359,14 +359,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic32 *ptr, Atomic32 value) {
-  *ptr = value;
-  // This can't be _lwsync(); we need to order the immediately
-  // preceding stores against any load that may follow, but lwsync
-  // doesn't guarantee that.
-  _sync();
-}
-
 inline void Release_Store(volatile Atomic32 *ptr, Atomic32 value) {
   _lwsync();
   *ptr = value;
@@ -382,14 +374,6 @@
   return value;
 }
 
-inline Atomic32 Release_Load(volatile const Atomic32 *ptr) {
-  // This can't be _lwsync(); we need to order the immediately
-  // preceding stores against any load that may follow, but lwsync
-  // doesn't guarantee that.
-  _sync();
-  return *ptr;
-}
-
 #ifdef __PPC64__
 
 // 64-bit Versions.
@@ -398,14 +382,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic64 *ptr, Atomic64 value) {
-  *ptr = value;
-  // This can't be _lwsync(); we need to order the immediately
-  // preceding stores against any load that may follow, but lwsync
-  // doesn't guarantee that.
-  _sync();
-}
-
 inline void Release_Store(volatile Atomic64 *ptr, Atomic64 value) {
   _lwsync();
   *ptr = value;
@@ -421,14 +397,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64 *ptr) {
-  // This can't be _lwsync(); we need to order the immediately
-  // preceding stores against any load that may follow, but lwsync
-  // doesn't guarantee that.
-  _sync();
-  return *ptr;
-}
-
 #endif
 
 }   // namespace base::subtle

diff --git a/src/base/atomicops-internals-macosx.h b/src/base/atomicops-internals-macosx.h
index b5130d4..9a0c00a 100644
--- a/src/base/atomicops-internals-macosx.h
+++ b/src/base/atomicops-internals-macosx.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -172,11 +172,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic32 *ptr, Atomic32 value) {
-  *ptr = value;
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic32 *ptr, Atomic32 value) {
   MemoryBarrier();
   *ptr = value;
@@ -192,11 +187,6 @@
   return value;
 }
 
-inline Atomic32 Release_Load(volatile const Atomic32 *ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 // 64-bit version
 
 inline Atomic64 NoBarrier_CompareAndSwap(volatile Atomic64 *ptr,
@@ -268,11 +258,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic64 *ptr, Atomic64 value) {
-  *ptr = value;
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic64 *ptr, Atomic64 value) {
   MemoryBarrier();
   *ptr = value;
@@ -288,11 +273,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64 *ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 #else
 
 // 64-bit implementation on 32-bit platform
@@ -342,11 +322,6 @@
 #endif
 
 
-inline void Acquire_Store(volatile Atomic64 *ptr, Atomic64 value) {
-  NoBarrier_Store(ptr, value);
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic64 *ptr, Atomic64 value) {
   MemoryBarrier();
   NoBarrier_Store(ptr, value);
@@ -358,10 +333,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64 *ptr) {
-  MemoryBarrier();
-  return NoBarrier_Load(ptr);
-}
 #endif  // __LP64__
 
 }   // namespace base::subtle

diff --git a/src/base/atomicops-internals-mips.h b/src/base/atomicops-internals-mips.h
index 4bfd7f6..58e0f14 100644
--- a/src/base/atomicops-internals-mips.h
+++ b/src/base/atomicops-internals-mips.h

@@ -161,12 +161,6 @@
     return NoBarrier_AtomicExchange(ptr, new_value);
 }
 
-inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value)
-{
-    *ptr = value;
-    MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic32* ptr, Atomic32 value)
 {
     MemoryBarrier();
@@ -185,12 +179,6 @@
     return value;
 }
 
-inline Atomic32 Release_Load(volatile const Atomic32* ptr)
-{
-    MemoryBarrier();
-    return *ptr;
-}
-
 #if (_MIPS_ISA == _MIPS_ISA_MIPS64) || (_MIPS_SIM == _MIPS_SIM_ABI64)
 
 typedef int64_t Atomic64;
@@ -285,12 +273,6 @@
     return NoBarrier_AtomicExchange(ptr, new_value);
 }
 
-inline void Acquire_Store(volatile Atomic64* ptr, Atomic64 value)
-{
-    *ptr = value;
-    MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic64* ptr, Atomic64 value)
 {
     MemoryBarrier();
@@ -309,12 +291,6 @@
     return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64* ptr)
-{
-    MemoryBarrier();
-    return *ptr;
-}
-
 #endif
 
 }   // namespace base::subtle

diff --git a/src/base/atomicops-internals-windows.h b/src/base/atomicops-internals-windows.h
index 93ced87..f7c2907 100644
--- a/src/base/atomicops-internals-windows.h
+++ b/src/base/atomicops-internals-windows.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -188,10 +188,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
-  Acquire_AtomicExchange(ptr, value);
-}
-
 inline void Release_Store(volatile Atomic32* ptr, Atomic32 value) {
   *ptr = value; // works w/o barrier for current Intel chips as of June 2005
   // See comments in Atomic64 version of Release_Store() below.
@@ -206,11 +202,6 @@
   return value;
 }
 
-inline Atomic32 Release_Load(volatile const Atomic32* ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 // 64-bit operations
 
 #if defined(_WIN64) || defined(__MINGW64__)
@@ -298,11 +289,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic64* ptr, Atomic64 value) {
-  NoBarrier_AtomicExchange(ptr, value);
-              // acts as a barrier in this implementation
-}
-
 inline void Release_Store(volatile Atomic64* ptr, Atomic64 value) {
   *ptr = value; // works w/o barrier for current Intel chips as of June 2005
 
@@ -323,11 +309,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64* ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 #else  // defined(_WIN64) || defined(__MINGW64__)
 
 // 64-bit low-level operations on 32-bit platform
@@ -393,11 +374,6 @@
   	}
 }
 
-inline void Acquire_Store(volatile Atomic64* ptr, Atomic64 value) {
-  NoBarrier_AtomicExchange(ptr, value);
-              // acts as a barrier in this implementation
-}
-
 inline void Release_Store(volatile Atomic64* ptr, Atomic64 value) {
   NoBarrier_Store(ptr, value);
 }
@@ -419,11 +395,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64* ptr) {
-  MemoryBarrier();
-  return NoBarrier_Load(ptr);
-}
-
 #endif  // defined(_WIN64) || defined(__MINGW64__)
 
 

diff --git a/src/base/atomicops-internals-x86.cc b/src/base/atomicops-internals-x86.cc
index c3391e7..20073c2 100644
--- a/src/base/atomicops-internals-x86.cc
+++ b/src/base/atomicops-internals-x86.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/base/atomicops-internals-x86.h b/src/base/atomicops-internals-x86.h
index e441ac7..94c7aac 100644
--- a/src/base/atomicops-internals-x86.h
+++ b/src/base/atomicops-internals-x86.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -128,11 +128,6 @@
   __asm__ __volatile__("mfence" : : : "memory");
 }
 
-inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
-  *ptr = value;
-  MemoryBarrier();
-}
-
 #else
 
 inline void MemoryBarrier() {
@@ -144,14 +139,6 @@
   }
 }
 
-inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
-  if (AtomicOps_Internalx86CPUFeatures.has_sse2) {
-    *ptr = value;
-    __asm__ __volatile__("mfence" : : : "memory");
-  } else {
-    Acquire_AtomicExchange(ptr, value);
-  }
-}
 #endif
 
 inline void Release_Store(volatile Atomic32* ptr, Atomic32 value) {
@@ -171,11 +158,6 @@
   return value;
 }
 
-inline Atomic32 Release_Load(volatile const Atomic32* ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 #if defined(__x86_64__)
 
 // 64-bit low-level operations on 64-bit platform.
@@ -216,11 +198,6 @@
   *ptr = value;
 }
 
-inline void Acquire_Store(volatile Atomic64* ptr, Atomic64 value) {
-  *ptr = value;
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic64* ptr, Atomic64 value) {
   ATOMICOPS_COMPILER_BARRIER();
 
@@ -254,11 +231,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64* ptr) {
-  MemoryBarrier();
-  return *ptr;
-}
-
 #else // defined(__x86_64__)
 
 // 64-bit low-level operations on 32-bit platform.
@@ -333,11 +305,6 @@
                          "mm2", "mm3", "mm4", "mm5", "mm6", "mm7");
 }
 
-inline void Acquire_Store(volatile Atomic64* ptr, Atomic64 value) {
-  NoBarrier_Store(ptr, value);
-  MemoryBarrier();
-}
-
 inline void Release_Store(volatile Atomic64* ptr, Atomic64 value) {
   ATOMICOPS_COMPILER_BARRIER();
   NoBarrier_Store(ptr, value);
@@ -363,11 +330,6 @@
   return value;
 }
 
-inline Atomic64 Release_Load(volatile const Atomic64* ptr) {
-  MemoryBarrier();
-  return NoBarrier_Load(ptr);
-}
-
 #endif // defined(__x86_64__)
 
 inline Atomic64 Acquire_CompareAndSwap(volatile Atomic64* ptr,

diff --git a/src/base/atomicops.h b/src/base/atomicops.h
index be038f3..f1daf3b 100644
--- a/src/base/atomicops.h
+++ b/src/base/atomicops.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -102,8 +102,14 @@
                      + __GNUC_MINOR__ * 100           \
                      + __GNUC_PATCHLEVEL__)
 
+#define CLANG_VERSION (__clang_major__ * 10000         \
+                       + __clang_minor__ * 100         \
+                       + __clang_patchlevel__)
+
 #if defined(TCMALLOC_PREFER_GCC_ATOMICS) && defined(__GNUC__) && GCC_VERSION >= 40700
 #include "base/atomicops-internals-gcc.h"
+#elif defined(TCMALLOC_PREFER_GCC_ATOMICS) && defined(__clang__) && CLANG_VERSION >= 30400
+#include "base/atomicops-internals-gcc.h"
 #elif defined(__MACH__) && defined(__APPLE__)
 #include "base/atomicops-internals-macosx.h"
 #elif defined(__GNUC__) && defined(ARMV6)
@@ -120,6 +126,8 @@
 #include "base/atomicops-internals-mips.h"
 #elif defined(__GNUC__) && GCC_VERSION >= 40700
 #include "base/atomicops-internals-gcc.h"
+#elif defined(__clang__) && CLANG_VERSION >= 30400
+#include "base/atomicops-internals-gcc.h"
 #else
 #error You need to implement atomic operations for this architecture
 #endif
@@ -197,11 +205,6 @@
       reinterpret_cast<volatile AtomicWordCastType*>(ptr), value);
 }
 
-inline void Acquire_Store(volatile AtomicWord* ptr, AtomicWord value) {
-  return base::subtle::Acquire_Store(
-      reinterpret_cast<volatile AtomicWordCastType*>(ptr), value);
-}
-
 inline void Release_Store(volatile AtomicWord* ptr, AtomicWord value) {
   return base::subtle::Release_Store(
       reinterpret_cast<volatile AtomicWordCastType*>(ptr), value);
@@ -217,11 +220,6 @@
       reinterpret_cast<volatile const AtomicWordCastType*>(ptr));
 }
 
-inline AtomicWord Release_Load(volatile const AtomicWord* ptr) {
-  return base::subtle::Release_Load(
-      reinterpret_cast<volatile const AtomicWordCastType*>(ptr));
-}
-
 }  // namespace base::subtle
 }  // namespace base
 #endif  // AtomicWordCastType
@@ -260,11 +258,9 @@
                                 Atomic32 old_value,
                                 Atomic32 new_value);
 void NoBarrier_Store(volatile Atomic32* ptr, Atomic32 value);
-void Acquire_Store(volatile Atomic32* ptr, Atomic32 value);
 void Release_Store(volatile Atomic32* ptr, Atomic32 value);
 Atomic32 NoBarrier_Load(volatile const Atomic32* ptr);
 Atomic32 Acquire_Load(volatile const Atomic32* ptr);
-Atomic32 Release_Load(volatile const Atomic32* ptr);
 
 // Corresponding operations on Atomic64
 Atomic64 NoBarrier_CompareAndSwap(volatile Atomic64* ptr,
@@ -281,11 +277,9 @@
                                 Atomic64 old_value,
                                 Atomic64 new_value);
 void NoBarrier_Store(volatile Atomic64* ptr, Atomic64 value);
-void Acquire_Store(volatile Atomic64* ptr, Atomic64 value);
 void Release_Store(volatile Atomic64* ptr, Atomic64 value);
 Atomic64 NoBarrier_Load(volatile const Atomic64* ptr);
 Atomic64 Acquire_Load(volatile const Atomic64* ptr);
-Atomic64 Release_Load(volatile const Atomic64* ptr);
 }  // namespace base::subtle
 }  // namespace base
 
@@ -313,10 +307,6 @@
   return base::subtle::Release_CompareAndSwap(ptr, old_value, new_value);
 }
 
-inline void Acquire_Store(volatile AtomicWord* ptr, AtomicWord value) {
-  return base::subtle::Acquire_Store(ptr, value);
-}
-
 inline void Release_Store(volatile AtomicWord* ptr, AtomicWord value) {
   return base::subtle::Release_Store(ptr, value);
 }
@@ -324,10 +314,6 @@
 inline AtomicWord Acquire_Load(volatile const AtomicWord* ptr) {
   return base::subtle::Acquire_Load(ptr);
 }
-
-inline AtomicWord Release_Load(volatile const AtomicWord* ptr) {
-  return base::subtle::Release_Load(ptr);
-}
 #endif  // AtomicWordCastType
 
 // 32-bit Acquire/Release operations to be deprecated.
@@ -342,18 +328,12 @@
                                        Atomic32 new_value) {
   return base::subtle::Release_CompareAndSwap(ptr, old_value, new_value);
 }
-inline void Acquire_Store(volatile Atomic32* ptr, Atomic32 value) {
-  base::subtle::Acquire_Store(ptr, value);
-}
 inline void Release_Store(volatile Atomic32* ptr, Atomic32 value) {
   return base::subtle::Release_Store(ptr, value);
 }
 inline Atomic32 Acquire_Load(volatile const Atomic32* ptr) {
   return base::subtle::Acquire_Load(ptr);
 }
-inline Atomic32 Release_Load(volatile const Atomic32* ptr) {
-  return base::subtle::Release_Load(ptr);
-}
 
 #ifdef BASE_HAS_ATOMIC64
 
@@ -369,10 +349,6 @@
     base::subtle::Atomic64 old_value, base::subtle::Atomic64 new_value) {
   return base::subtle::Release_CompareAndSwap(ptr, old_value, new_value);
 }
-inline void Acquire_Store(
-    volatile base::subtle::Atomic64* ptr, base::subtle::Atomic64 value) {
-  base::subtle::Acquire_Store(ptr, value);
-}
 inline void Release_Store(
     volatile base::subtle::Atomic64* ptr, base::subtle::Atomic64 value) {
   return base::subtle::Release_Store(ptr, value);
@@ -381,10 +357,6 @@
     volatile const base::subtle::Atomic64* ptr) {
   return base::subtle::Acquire_Load(ptr);
 }
-inline base::subtle::Atomic64 Release_Load(
-    volatile const base::subtle::Atomic64* ptr) {
-  return base::subtle::Release_Load(ptr);
-}
 
 #endif  // BASE_HAS_ATOMIC64
 

diff --git a/src/base/basictypes.h b/src/base/basictypes.h
index 4779611..ea87a6d 100644
--- a/src/base/basictypes.h
+++ b/src/base/basictypes.h

@@ -83,7 +83,7 @@
 const  int8  kint8min   = (   (  int8) 0x80);
 const  int16 kint16min  = (   ( int16) 0x8000);
 const  int32 kint32min  = (   ( int32) 0x80000000);
-const  int64 kint64min =  ( ((( int64) kint32min) << 32) | 0 );
+const  int64 kint64min =  ( (((uint64) kint32min) << 32) | 0 );
 
 // Define the "portable" printf and scanf macros, if they're not
 // already there (via the inttypes.h we #included above, hopefully).
@@ -117,6 +117,14 @@
 #define PRINTABLE_PTHREAD(pthreadt) pthreadt
 #endif
 
+#if defined(__GNUC__)
+#define PREDICT_TRUE(x) __builtin_expect(!!(x), 1)
+#define PREDICT_FALSE(x) __builtin_expect(!!(x), 0)
+#else
+#define PREDICT_TRUE(x) (x)
+#define PREDICT_FALSE(x) (x)
+#endif
+
 // A macro to disallow the evil copy constructor and operator= functions
 // This should be used in the private: declarations for a class
 #define DISALLOW_EVIL_CONSTRUCTORS(TypeName)    \
@@ -192,6 +200,12 @@
 # define ATTRIBUTE_UNUSED
 #endif
 
+#if defined(HAVE___ATTRIBUTE__) && defined(HAVE_TLS)
+#define ATTR_INITIAL_EXEC __attribute__ ((tls_model ("initial-exec")))
+#else
+#define ATTR_INITIAL_EXEC
+#endif
+
 #define COMPILE_ASSERT(expr, msg)                               \
   typedef CompileAssert<(bool(expr))> msg[bool(expr) ? 1 : -1] ATTRIBUTE_UNUSED
 
@@ -223,6 +237,16 @@
   return dest;
 }
 
+// bit_store<Dest,Source> implements the equivalent of
+// "dest = *reinterpret_cast<Dest*>(&source)".
+//
+// This prevents undefined behavior when the dest pointer is unaligned.
+template <class Dest, class Source>
+inline void bit_store(Dest *dest, const Source *source) {
+  COMPILE_ASSERT(sizeof(Dest) == sizeof(Source), bitcasting_unequal_sizes);
+  memcpy(dest, source, sizeof(Dest));
+}
+
 #ifdef HAVE___ATTRIBUTE__
 # define ATTRIBUTE_WEAK      __attribute__((weak))
 # define ATTRIBUTE_NOINLINE  __attribute__((noinline))
@@ -257,7 +281,7 @@
 //    ATTRIBUTE_SECTION are guaranteed to be between START and STOP.
 
 #if defined(HAVE___ATTRIBUTE__) && defined(__ELF__)
-# define ATTRIBUTE_SECTION(name) __attribute__ ((section (#name)))
+# define ATTRIBUTE_SECTION(name) __attribute__ ((section (#name))) __attribute__((noinline))
 
   // Weak section declaration to be used as a global declaration
   // for ATTRIBUTE_SECTION_START|STOP(name) to compile and link
@@ -357,13 +381,45 @@
 # elif (defined(__aarch64__))
 #   define CACHELINE_ALIGNED __attribute__((aligned(64)))
     // implementation specific, Cortex-A53 and 57 should have 64 bytes
+# elif (defined(__s390__))
+#   define CACHELINE_ALIGNED __attribute__((aligned(256)))
+# elif (defined(__riscv) && __riscv_xlen == 64)
+#   define CACHELINE_ALIGNED __attribute__((aligned(64)))
+# elif (defined(__e2k__))
+#   define CACHELINE_ALIGNED __attribute__((aligned(64)))
 # else
 #   error Could not determine cache line length - unknown architecture
 # endif
 #else
 # define CACHELINE_ALIGNED
-#endif  // defined(HAVE___ATTRIBUTE__) && (__i386__ || __x86_64__)
+#endif  // defined(HAVE___ATTRIBUTE__)
 
+#if defined(HAVE___ATTRIBUTE__ALIGNED_FN)
+#  define CACHELINE_ALIGNED_FN CACHELINE_ALIGNED
+#else
+#  define CACHELINE_ALIGNED_FN
+#endif
+
+// Structure for discovering alignment
+union MemoryAligner {
+  void*  p;
+  double d;
+  size_t s;
+} CACHELINE_ALIGNED;
+
+#if defined(HAVE___ATTRIBUTE__) && defined(__ELF__)
+#define ATTRIBUTE_HIDDEN __attribute__((visibility("hidden")))
+#else
+#define ATTRIBUTE_HIDDEN
+#endif
+
+#if defined(__GNUC__)
+#define ATTRIBUTE_ALWAYS_INLINE __attribute__((always_inline))
+#elif defined(_MSC_VER)
+#define ATTRIBUTE_ALWAYS_INLINE __forceinline
+#else
+#define ATTRIBUTE_ALWAYS_INLINE
+#endif
 
 // The following enum should be used only as a constructor argument to indicate
 // that the variable has static storage class, and that the constructor should

diff --git a/src/base/commandlineflags.h b/src/base/commandlineflags.h
index f54776a..038a94a 100644
--- a/src/base/commandlineflags.h
+++ b/src/base/commandlineflags.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -119,7 +119,16 @@
       if (!value) {
         return def;
       }
-      return memchr("tTyY1\0", value[0], 6) != NULL;
+      switch (value[0]) {
+      case 't':
+      case 'T':
+      case 'y':
+      case 'Y':
+      case '1':
+      case '\0':
+        return true;
+      }
+      return false;
     }
 
     inline int StringToInt(const char *value, int def) {

diff --git a/src/base/cycleclock.h b/src/base/cycleclock.h
deleted file mode 100644
index dc2d569..0000000
--- a/src/base/cycleclock.h
+++ /dev/null

@@ -1,173 +0,0 @@
-// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
-// Copyright (c) 2004, Google Inc.
-// All rights reserved.
-// 
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are
-// met:
-// 
-//     * Redistributions of source code must retain the above copyright
-// notice, this list of conditions and the following disclaimer.
-//     * Redistributions in binary form must reproduce the above
-// copyright notice, this list of conditions and the following disclaimer
-// in the documentation and/or other materials provided with the
-// distribution.
-//     * Neither the name of Google Inc. nor the names of its
-// contributors may be used to endorse or promote products derived from
-// this software without specific prior written permission.
-// 
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
-// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
-// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
-// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
-// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
-// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
-// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
-// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-// ----------------------------------------------------------------------
-// CycleClock
-//    A CycleClock tells you the current time in Cycles.  The "time"
-//    is actually time since power-on.  This is like time() but doesn't
-//    involve a system call and is much more precise.
-//
-// NOTE: Not all cpu/platform/kernel combinations guarantee that this
-// clock increments at a constant rate or is synchronized across all logical
-// cpus in a system.
-//
-// Also, in some out of order CPU implementations, the CycleClock is not 
-// serializing. So if you're trying to count at cycles granularity, your
-// data might be inaccurate due to out of order instruction execution.
-// ----------------------------------------------------------------------
-
-#ifndef GOOGLE_BASE_CYCLECLOCK_H_
-#define GOOGLE_BASE_CYCLECLOCK_H_
-
-#include "base/basictypes.h"   // make sure we get the def for int64
-#include "base/arm_instruction_set_select.h"
-// base/sysinfo.h is really big and we don't want to include it unless
-// it is necessary.
-#if defined(__arm__) || defined(__mips__) || defined(__aarch64__)
-# include "base/sysinfo.h"
-#endif
-#if defined(__MACH__) && defined(__APPLE__)
-# include <mach/mach_time.h>
-#endif
-// For MSVC, we want to use '_asm rdtsc' when possible (since it works
-// with even ancient MSVC compilers), and when not possible the
-// __rdtsc intrinsic, declared in <intrin.h>.  Unfortunately, in some
-// environments, <windows.h> and <intrin.h> have conflicting
-// declarations of some other intrinsics, breaking compilation.
-// Therefore, we simply declare __rdtsc ourselves. See also
-// http://connect.microsoft.com/VisualStudio/feedback/details/262047
-#if defined(_MSC_VER) && !defined(_M_IX86)
-extern "C" uint64 __rdtsc();
-#pragma intrinsic(__rdtsc)
-#endif
-#if defined(ARMV3) || defined(__mips__) || defined(__aarch64__)
-#include <sys/time.h>
-#endif
-
-// NOTE: only i386 and x86_64 have been well tested.
-// PPC, sparc, alpha, and ia64 are based on
-//    http://peter.kuscsik.com/wordpress/?p=14
-// with modifications by m3b.  See also
-//    https://setisvn.ssl.berkeley.edu/svn/lib/fftw-3.0.1/kernel/cycle.h
-struct CycleClock {
-  // This should return the number of cycles since power-on.  Thread-safe.
-  static inline int64 Now() {
-#if defined(__MACH__) && defined(__APPLE__)
-    // this goes at the top because we need ALL Macs, regardless of
-    // architecture, to return the number of "mach time units" that
-    // have passed since startup.  See sysinfo.cc where
-    // InitializeSystemInfo() sets the supposed cpu clock frequency of
-    // macs to the number of mach time units per second, not actual
-    // CPU clock frequency (which can change in the face of CPU
-    // frequency scaling).  Also note that when the Mac sleeps, this
-    // counter pauses; it does not continue counting, nor does it
-    // reset to zero.
-    return mach_absolute_time();
-#elif defined(__i386__)
-    int64 ret;
-    __asm__ volatile ("rdtsc" : "=A" (ret) );
-    return ret;
-#elif defined(__x86_64__) || defined(__amd64__)
-    uint64 low, high;
-    __asm__ volatile ("rdtsc" : "=a" (low), "=d" (high));
-    return (high << 32) | low;
-#elif defined(__powerpc64__) || defined(__ppc64__)
-    uint64 tb;
-    __asm__ volatile (\
-      "mfspr %0, 268"
-      : "=r" (tb));
-    return tb;
-#elif defined(__powerpc__) || defined(__ppc__)
-    // This returns a time-base, which is not always precisely a cycle-count.
-    uint32 tbu, tbl, tmp;
-    __asm__ volatile (\
-      "0:\n"
-      "mftbu %0\n"
-      "mftbl %1\n"
-      "mftbu %2\n"
-      "cmpw %0, %2\n"
-      "bne- 0b"
-      : "=r" (tbu), "=r" (tbl), "=r" (tmp));
-    return (((uint64) tbu << 32) | tbl);
-#elif defined(__sparc__)
-    int64 tick;
-    asm(".byte 0x83, 0x41, 0x00, 0x00");
-    asm("mov   %%g1, %0" : "=r" (tick));
-    return tick;
-#elif defined(__ia64__)
-    int64 itc;
-    asm("mov %0 = ar.itc" : "=r" (itc));
-    return itc;
-#elif defined(_MSC_VER) && defined(_M_IX86)
-    // Older MSVC compilers (like 7.x) don't seem to support the
-    // __rdtsc intrinsic properly, so I prefer to use _asm instead
-    // when I know it will work.  Otherwise, I'll use __rdtsc and hope
-    // the code is being compiled with a non-ancient compiler.
-    _asm rdtsc
-#elif defined(_MSC_VER)
-    return __rdtsc();
-#elif defined(ARMV3) || defined(__aarch64__)
-#if defined(ARMV7)  // V7 is the earliest arch that has a standard cyclecount
-    uint32 pmccntr;
-    uint32 pmuseren;
-    uint32 pmcntenset;
-    // Read the user mode perf monitor counter access permissions.
-    asm volatile ("mrc p15, 0, %0, c9, c14, 0" : "=r" (pmuseren));
-    if (pmuseren & 1) {  // Allows reading perfmon counters for user mode code.
-      asm volatile ("mrc p15, 0, %0, c9, c12, 1" : "=r" (pmcntenset));
-      if (pmcntenset & 0x80000000ul) {  // Is it counting?
-        asm volatile ("mrc p15, 0, %0, c9, c13, 0" : "=r" (pmccntr));
-        // The counter is set up to count every 64th cycle
-        return static_cast<int64>(pmccntr) * 64;  // Should optimize to << 6
-      }
-    }
-#endif
-    struct timeval tv;
-    gettimeofday(&tv, NULL);
-    return static_cast<int64>((tv.tv_sec + tv.tv_usec * 0.000001)
-                              * CyclesPerSecond());
-#elif defined(__mips__)
-    // mips apparently only allows rdtsc for superusers, so we fall
-    // back to gettimeofday.  It's possible clock_gettime would be better.
-    struct timeval tv;
-    gettimeofday(&tv, NULL);
-    return static_cast<int64>((tv.tv_sec + tv.tv_usec * 0.000001)
-                              * CyclesPerSecond());
-#else
-// The soft failover to a generic implementation is automatic only for ARM.
-// For other platforms the developer is expected to make an attempt to create
-// a fast implementation and use generic version if nothing better is available.
-#error You need to define CycleTimer for your O/S and CPU
-#endif
-  }
-};
-
-
-#endif  // GOOGLE_BASE_CYCLECLOCK_H_

diff --git a/src/base/dynamic_annotations.c b/src/base/dynamic_annotations.c
index 87bd2ec..9b966a7 100644
--- a/src/base/dynamic_annotations.c
+++ b/src/base/dynamic_annotations.c

@@ -1,3 +1,4 @@
+// -*- Mode: c; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2008-2009, Google Inc.
  * All rights reserved.
  *
@@ -42,101 +43,6 @@
 #include "base/dynamic_annotations.h"
 #include "getenv_safe.h" // for TCMallocGetenvSafe
 
-#ifdef __GNUC__
-/* valgrind.h uses gcc extensions so it won't build with other compilers */
-# ifdef HAVE_VALGRIND_H    /* prefer the user's copy if they have it */
-#  include <valgrind.h>
-# else                     /* otherwise just use the copy that we have */
-#  include "third_party/valgrind.h"
-# endif
-#endif
-
-/* Compiler-based ThreadSanitizer defines
-   DYNAMIC_ANNOTATIONS_EXTERNAL_IMPL = 1
-   and provides its own definitions of the functions. */
-
-#ifndef DYNAMIC_ANNOTATIONS_EXTERNAL_IMPL
-# define DYNAMIC_ANNOTATIONS_EXTERNAL_IMPL 0
-#endif
-
-/* Each function is empty and called (via a macro) only in debug mode.
-   The arguments are captured by dynamic tools at runtime. */
-
-#if DYNAMIC_ANNOTATIONS_ENABLED == 1 \
-    && DYNAMIC_ANNOTATIONS_EXTERNAL_IMPL == 0
-
-void AnnotateRWLockCreate(const char *file, int line,
-                          const volatile void *lock){}
-void AnnotateRWLockDestroy(const char *file, int line,
-                           const volatile void *lock){}
-void AnnotateRWLockAcquired(const char *file, int line,
-                            const volatile void *lock, long is_w){}
-void AnnotateRWLockReleased(const char *file, int line,
-                            const volatile void *lock, long is_w){}
-void AnnotateBarrierInit(const char *file, int line,
-                         const volatile void *barrier, long count,
-                         long reinitialization_allowed) {}
-void AnnotateBarrierWaitBefore(const char *file, int line,
-                               const volatile void *barrier) {}
-void AnnotateBarrierWaitAfter(const char *file, int line,
-                              const volatile void *barrier) {}
-void AnnotateBarrierDestroy(const char *file, int line,
-                            const volatile void *barrier) {}
-
-void AnnotateCondVarWait(const char *file, int line,
-                         const volatile void *cv,
-                         const volatile void *lock){}
-void AnnotateCondVarSignal(const char *file, int line,
-                           const volatile void *cv){}
-void AnnotateCondVarSignalAll(const char *file, int line,
-                              const volatile void *cv){}
-void AnnotatePublishMemoryRange(const char *file, int line,
-                                const volatile void *address,
-                                long size){}
-void AnnotateUnpublishMemoryRange(const char *file, int line,
-                                  const volatile void *address,
-                                  long size){}
-void AnnotatePCQCreate(const char *file, int line,
-                       const volatile void *pcq){}
-void AnnotatePCQDestroy(const char *file, int line,
-                        const volatile void *pcq){}
-void AnnotatePCQPut(const char *file, int line,
-                    const volatile void *pcq){}
-void AnnotatePCQGet(const char *file, int line,
-                    const volatile void *pcq){}
-void AnnotateNewMemory(const char *file, int line,
-                       const volatile void *mem,
-                       long size){}
-void AnnotateExpectRace(const char *file, int line,
-                        const volatile void *mem,
-                        const char *description){}
-void AnnotateBenignRace(const char *file, int line,
-                        const volatile void *mem,
-                        const char *description){}
-void AnnotateBenignRaceSized(const char *file, int line,
-                             const volatile void *mem,
-                             long size,
-                             const char *description) {}
-void AnnotateMutexIsUsedAsCondVar(const char *file, int line,
-                                  const volatile void *mu){}
-void AnnotateTraceMemory(const char *file, int line,
-                         const volatile void *arg){}
-void AnnotateThreadName(const char *file, int line,
-                        const char *name){}
-void AnnotateIgnoreReadsBegin(const char *file, int line){}
-void AnnotateIgnoreReadsEnd(const char *file, int line){}
-void AnnotateIgnoreWritesBegin(const char *file, int line){}
-void AnnotateIgnoreWritesEnd(const char *file, int line){}
-void AnnotateEnableRaceDetection(const char *file, int line, int enable){}
-void AnnotateNoOp(const char *file, int line,
-                  const volatile void *arg){}
-void AnnotateFlushState(const char *file, int line){}
-
-#endif  /* DYNAMIC_ANNOTATIONS_ENABLED == 1
-    && DYNAMIC_ANNOTATIONS_EXTERNAL_IMPL == 0 */
-
-#if DYNAMIC_ANNOTATIONS_EXTERNAL_IMPL == 0
-
 static int GetRunningOnValgrind(void) {
 #ifdef RUNNING_ON_VALGRIND
   if (RUNNING_ON_VALGRIND) return 1;
@@ -152,28 +58,7 @@
 int RunningOnValgrind(void) {
   static volatile int running_on_valgrind = -1;
   int local_running_on_valgrind = running_on_valgrind;
-  /* C doesn't have thread-safe initialization of statics, and we
-     don't want to depend on pthread_once here, so hack it. */
-  ANNOTATE_BENIGN_RACE(&running_on_valgrind, "safe hack");
   if (local_running_on_valgrind == -1)
     running_on_valgrind = local_running_on_valgrind = GetRunningOnValgrind();
   return local_running_on_valgrind;
 }
-
-#endif  /* DYNAMIC_ANNOTATIONS_EXTERNAL_IMPL == 0 */
-
-/* See the comments in dynamic_annotations.h */
-double ValgrindSlowdown(void) {
-  /* Same initialization hack as in RunningOnValgrind(). */
-  static volatile double slowdown = 0.0;
-  double local_slowdown = slowdown;
-  ANNOTATE_BENIGN_RACE(&slowdown, "safe hack");
-  if (RunningOnValgrind() == 0) {
-    return 1.0;
-  }
-  if (local_slowdown == 0.0) {
-    char *env = getenv("VALGRIND_SLOWDOWN");
-    slowdown = local_slowdown = env ? atof(env) : 50.0;
-  }
-  return local_slowdown;
-}

diff --git a/src/base/dynamic_annotations.h b/src/base/dynamic_annotations.h
index 4669315..af944527 100644
--- a/src/base/dynamic_annotations.h
+++ b/src/base/dynamic_annotations.h

@@ -1,3 +1,4 @@
+/* -*- Mode: c; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2008, Google Inc.
  * All rights reserved.
  *
@@ -57,437 +58,9 @@
 #ifndef BASE_DYNAMIC_ANNOTATIONS_H_
 #define BASE_DYNAMIC_ANNOTATIONS_H_
 
-#ifndef DYNAMIC_ANNOTATIONS_ENABLED
-# define DYNAMIC_ANNOTATIONS_ENABLED 0
-#endif
-
-#if DYNAMIC_ANNOTATIONS_ENABLED != 0
-
-  /* -------------------------------------------------------------
-     Annotations useful when implementing condition variables such as CondVar,
-     using conditional critical sections (Await/LockWhen) and when constructing
-     user-defined synchronization mechanisms.
-
-     The annotations ANNOTATE_HAPPENS_BEFORE() and ANNOTATE_HAPPENS_AFTER() can
-     be used to define happens-before arcs in user-defined synchronization
-     mechanisms:  the race detector will infer an arc from the former to the
-     latter when they share the same argument pointer.
-
-     Example 1 (reference counting):
-
-     void Unref() {
-       ANNOTATE_HAPPENS_BEFORE(&refcount_);
-       if (AtomicDecrementByOne(&refcount_) == 0) {
-         ANNOTATE_HAPPENS_AFTER(&refcount_);
-         delete this;
-       }
-     }
-
-     Example 2 (message queue):
-
-     void MyQueue::Put(Type *e) {
-       MutexLock lock(&mu_);
-       ANNOTATE_HAPPENS_BEFORE(e);
-       PutElementIntoMyQueue(e);
-     }
-
-     Type *MyQueue::Get() {
-       MutexLock lock(&mu_);
-       Type *e = GetElementFromMyQueue();
-       ANNOTATE_HAPPENS_AFTER(e);
-       return e;
-     }
-
-     Note: when possible, please use the existing reference counting and message
-     queue implementations instead of inventing new ones. */
-
-  /* Report that wait on the condition variable at address "cv" has succeeded
-     and the lock at address "lock" is held. */
-  #define ANNOTATE_CONDVAR_LOCK_WAIT(cv, lock) \
-    AnnotateCondVarWait(__FILE__, __LINE__, cv, lock)
-
-  /* Report that wait on the condition variable at "cv" has succeeded.  Variant
-     w/o lock. */
-  #define ANNOTATE_CONDVAR_WAIT(cv) \
-    AnnotateCondVarWait(__FILE__, __LINE__, cv, NULL)
-
-  /* Report that we are about to signal on the condition variable at address
-     "cv". */
-  #define ANNOTATE_CONDVAR_SIGNAL(cv) \
-    AnnotateCondVarSignal(__FILE__, __LINE__, cv)
-
-  /* Report that we are about to signal_all on the condition variable at "cv". */
-  #define ANNOTATE_CONDVAR_SIGNAL_ALL(cv) \
-    AnnotateCondVarSignalAll(__FILE__, __LINE__, cv)
-
-  /* Annotations for user-defined synchronization mechanisms. */
-  #define ANNOTATE_HAPPENS_BEFORE(obj) ANNOTATE_CONDVAR_SIGNAL(obj)
-  #define ANNOTATE_HAPPENS_AFTER(obj)  ANNOTATE_CONDVAR_WAIT(obj)
-
-  /* Report that the bytes in the range [pointer, pointer+size) are about
-     to be published safely. The race checker will create a happens-before
-     arc from the call ANNOTATE_PUBLISH_MEMORY_RANGE(pointer, size) to
-     subsequent accesses to this memory.
-     Note: this annotation may not work properly if the race detector uses
-     sampling, i.e. does not observe all memory accesses.
-     */
-  #define ANNOTATE_PUBLISH_MEMORY_RANGE(pointer, size) \
-    AnnotatePublishMemoryRange(__FILE__, __LINE__, pointer, size)
-
-  /* DEPRECATED. Don't use it. */
-  #define ANNOTATE_UNPUBLISH_MEMORY_RANGE(pointer, size) \
-    AnnotateUnpublishMemoryRange(__FILE__, __LINE__, pointer, size)
-
-  /* DEPRECATED. Don't use it. */
-  #define ANNOTATE_SWAP_MEMORY_RANGE(pointer, size)   \
-    do {                                              \
-      ANNOTATE_UNPUBLISH_MEMORY_RANGE(pointer, size); \
-      ANNOTATE_PUBLISH_MEMORY_RANGE(pointer, size);   \
-    } while (0)
-
-  /* Instruct the tool to create a happens-before arc between mu->Unlock() and
-     mu->Lock(). This annotation may slow down the race detector and hide real
-     races. Normally it is used only when it would be difficult to annotate each
-     of the mutex's critical sections individually using the annotations above.
-     This annotation makes sense only for hybrid race detectors. For pure
-     happens-before detectors this is a no-op. For more details see
-     http://code.google.com/p/data-race-test/wiki/PureHappensBeforeVsHybrid . */
-  #define ANNOTATE_PURE_HAPPENS_BEFORE_MUTEX(mu) \
-    AnnotateMutexIsUsedAsCondVar(__FILE__, __LINE__, mu)
-
-  /* Deprecated. Use ANNOTATE_PURE_HAPPENS_BEFORE_MUTEX. */
-  #define ANNOTATE_MUTEX_IS_USED_AS_CONDVAR(mu) \
-    AnnotateMutexIsUsedAsCondVar(__FILE__, __LINE__, mu)
-
-  /* -------------------------------------------------------------
-     Annotations useful when defining memory allocators, or when memory that
-     was protected in one way starts to be protected in another. */
-
-  /* Report that a new memory at "address" of size "size" has been allocated.
-     This might be used when the memory has been retrieved from a free list and
-     is about to be reused, or when a the locking discipline for a variable
-     changes. */
-  #define ANNOTATE_NEW_MEMORY(address, size) \
-    AnnotateNewMemory(__FILE__, __LINE__, address, size)
-
-  /* -------------------------------------------------------------
-     Annotations useful when defining FIFO queues that transfer data between
-     threads. */
-
-  /* Report that the producer-consumer queue (such as ProducerConsumerQueue) at
-     address "pcq" has been created.  The ANNOTATE_PCQ_* annotations
-     should be used only for FIFO queues.  For non-FIFO queues use
-     ANNOTATE_HAPPENS_BEFORE (for put) and ANNOTATE_HAPPENS_AFTER (for get). */
-  #define ANNOTATE_PCQ_CREATE(pcq) \
-    AnnotatePCQCreate(__FILE__, __LINE__, pcq)
-
-  /* Report that the queue at address "pcq" is about to be destroyed. */
-  #define ANNOTATE_PCQ_DESTROY(pcq) \
-    AnnotatePCQDestroy(__FILE__, __LINE__, pcq)
-
-  /* Report that we are about to put an element into a FIFO queue at address
-     "pcq". */
-  #define ANNOTATE_PCQ_PUT(pcq) \
-    AnnotatePCQPut(__FILE__, __LINE__, pcq)
-
-  /* Report that we've just got an element from a FIFO queue at address "pcq". */
-  #define ANNOTATE_PCQ_GET(pcq) \
-    AnnotatePCQGet(__FILE__, __LINE__, pcq)
-
-  /* -------------------------------------------------------------
-     Annotations that suppress errors.  It is usually better to express the
-     program's synchronization using the other annotations, but these can
-     be used when all else fails. */
-
-  /* Report that we may have a benign race at "pointer", with size
-     "sizeof(*(pointer))". "pointer" must be a non-void* pointer.  Insert at the
-     point where "pointer" has been allocated, preferably close to the point
-     where the race happens.  See also ANNOTATE_BENIGN_RACE_STATIC. */
-  #define ANNOTATE_BENIGN_RACE(pointer, description) \
-    AnnotateBenignRaceSized(__FILE__, __LINE__, pointer, \
-                            sizeof(*(pointer)), description)
-
-  /* Same as ANNOTATE_BENIGN_RACE(address, description), but applies to
-     the memory range [address, address+size). */
-  #define ANNOTATE_BENIGN_RACE_SIZED(address, size, description) \
-    AnnotateBenignRaceSized(__FILE__, __LINE__, address, size, description)
-
-  /* Request the analysis tool to ignore all reads in the current thread
-     until ANNOTATE_IGNORE_READS_END is called.
-     Useful to ignore intentional racey reads, while still checking
-     other reads and all writes.
-     See also ANNOTATE_UNPROTECTED_READ. */
-  #define ANNOTATE_IGNORE_READS_BEGIN() \
-    AnnotateIgnoreReadsBegin(__FILE__, __LINE__)
-
-  /* Stop ignoring reads. */
-  #define ANNOTATE_IGNORE_READS_END() \
-    AnnotateIgnoreReadsEnd(__FILE__, __LINE__)
-
-  /* Similar to ANNOTATE_IGNORE_READS_BEGIN, but ignore writes. */
-  #define ANNOTATE_IGNORE_WRITES_BEGIN() \
-    AnnotateIgnoreWritesBegin(__FILE__, __LINE__)
-
-  /* Stop ignoring writes. */
-  #define ANNOTATE_IGNORE_WRITES_END() \
-    AnnotateIgnoreWritesEnd(__FILE__, __LINE__)
-
-  /* Start ignoring all memory accesses (reads and writes). */
-  #define ANNOTATE_IGNORE_READS_AND_WRITES_BEGIN() \
-    do {\
-      ANNOTATE_IGNORE_READS_BEGIN();\
-      ANNOTATE_IGNORE_WRITES_BEGIN();\
-    }while(0)\
-
-  /* Stop ignoring all memory accesses. */
-  #define ANNOTATE_IGNORE_READS_AND_WRITES_END() \
-    do {\
-      ANNOTATE_IGNORE_WRITES_END();\
-      ANNOTATE_IGNORE_READS_END();\
-    }while(0)\
-
-  /* Enable (enable!=0) or disable (enable==0) race detection for all threads.
-     This annotation could be useful if you want to skip expensive race analysis
-     during some period of program execution, e.g. during initialization. */
-  #define ANNOTATE_ENABLE_RACE_DETECTION(enable) \
-    AnnotateEnableRaceDetection(__FILE__, __LINE__, enable)
-
-  /* -------------------------------------------------------------
-     Annotations useful for debugging. */
-
-  /* Request to trace every access to "address". */
-  #define ANNOTATE_TRACE_MEMORY(address) \
-    AnnotateTraceMemory(__FILE__, __LINE__, address)
-
-  /* Report the current thread name to a race detector. */
-  #define ANNOTATE_THREAD_NAME(name) \
-    AnnotateThreadName(__FILE__, __LINE__, name)
-
-  /* -------------------------------------------------------------
-     Annotations useful when implementing locks.  They are not
-     normally needed by modules that merely use locks.
-     The "lock" argument is a pointer to the lock object. */
-
-  /* Report that a lock has been created at address "lock". */
-  #define ANNOTATE_RWLOCK_CREATE(lock) \
-    AnnotateRWLockCreate(__FILE__, __LINE__, lock)
-
-  /* Report that the lock at address "lock" is about to be destroyed. */
-  #define ANNOTATE_RWLOCK_DESTROY(lock) \
-    AnnotateRWLockDestroy(__FILE__, __LINE__, lock)
-
-  /* Report that the lock at address "lock" has been acquired.
-     is_w=1 for writer lock, is_w=0 for reader lock. */
-  #define ANNOTATE_RWLOCK_ACQUIRED(lock, is_w) \
-    AnnotateRWLockAcquired(__FILE__, __LINE__, lock, is_w)
-
-  /* Report that the lock at address "lock" is about to be released. */
-  #define ANNOTATE_RWLOCK_RELEASED(lock, is_w) \
-    AnnotateRWLockReleased(__FILE__, __LINE__, lock, is_w)
-
-  /* -------------------------------------------------------------
-     Annotations useful when implementing barriers.  They are not
-     normally needed by modules that merely use barriers.
-     The "barrier" argument is a pointer to the barrier object. */
-
-  /* Report that the "barrier" has been initialized with initial "count".
-   If 'reinitialization_allowed' is true, initialization is allowed to happen
-   multiple times w/o calling barrier_destroy() */
-  #define ANNOTATE_BARRIER_INIT(barrier, count, reinitialization_allowed) \
-    AnnotateBarrierInit(__FILE__, __LINE__, barrier, count, \
-                        reinitialization_allowed)
-
-  /* Report that we are about to enter barrier_wait("barrier"). */
-  #define ANNOTATE_BARRIER_WAIT_BEFORE(barrier) \
-    AnnotateBarrierWaitBefore(__FILE__, __LINE__, barrier)
-
-  /* Report that we just exited barrier_wait("barrier"). */
-  #define ANNOTATE_BARRIER_WAIT_AFTER(barrier) \
-    AnnotateBarrierWaitAfter(__FILE__, __LINE__, barrier)
-
-  /* Report that the "barrier" has been destroyed. */
-  #define ANNOTATE_BARRIER_DESTROY(barrier) \
-    AnnotateBarrierDestroy(__FILE__, __LINE__, barrier)
-
-  /* -------------------------------------------------------------
-     Annotations useful for testing race detectors. */
-
-  /* Report that we expect a race on the variable at "address".
-     Use only in unit tests for a race detector. */
-  #define ANNOTATE_EXPECT_RACE(address, description) \
-    AnnotateExpectRace(__FILE__, __LINE__, address, description)
-
-  /* A no-op. Insert where you like to test the interceptors. */
-  #define ANNOTATE_NO_OP(arg) \
-    AnnotateNoOp(__FILE__, __LINE__, arg)
-
-  /* Force the race detector to flush its state. The actual effect depends on
-   * the implementation of the detector. */
-  #define ANNOTATE_FLUSH_STATE() \
-    AnnotateFlushState(__FILE__, __LINE__)
-
-
-#else  /* DYNAMIC_ANNOTATIONS_ENABLED == 0 */
-
-  #define ANNOTATE_RWLOCK_CREATE(lock) /* empty */
-  #define ANNOTATE_RWLOCK_DESTROY(lock) /* empty */
-  #define ANNOTATE_RWLOCK_ACQUIRED(lock, is_w) /* empty */
-  #define ANNOTATE_RWLOCK_RELEASED(lock, is_w) /* empty */
-  #define ANNOTATE_BARRIER_INIT(barrier, count, reinitialization_allowed) /* */
-  #define ANNOTATE_BARRIER_WAIT_BEFORE(barrier) /* empty */
-  #define ANNOTATE_BARRIER_WAIT_AFTER(barrier) /* empty */
-  #define ANNOTATE_BARRIER_DESTROY(barrier) /* empty */
-  #define ANNOTATE_CONDVAR_LOCK_WAIT(cv, lock) /* empty */
-  #define ANNOTATE_CONDVAR_WAIT(cv) /* empty */
-  #define ANNOTATE_CONDVAR_SIGNAL(cv) /* empty */
-  #define ANNOTATE_CONDVAR_SIGNAL_ALL(cv) /* empty */
-  #define ANNOTATE_HAPPENS_BEFORE(obj) /* empty */
-  #define ANNOTATE_HAPPENS_AFTER(obj) /* empty */
-  #define ANNOTATE_PUBLISH_MEMORY_RANGE(address, size) /* empty */
-  #define ANNOTATE_UNPUBLISH_MEMORY_RANGE(address, size)  /* empty */
-  #define ANNOTATE_SWAP_MEMORY_RANGE(address, size)  /* empty */
-  #define ANNOTATE_PCQ_CREATE(pcq) /* empty */
-  #define ANNOTATE_PCQ_DESTROY(pcq) /* empty */
-  #define ANNOTATE_PCQ_PUT(pcq) /* empty */
-  #define ANNOTATE_PCQ_GET(pcq) /* empty */
-  #define ANNOTATE_NEW_MEMORY(address, size) /* empty */
-  #define ANNOTATE_EXPECT_RACE(address, description) /* empty */
-  #define ANNOTATE_BENIGN_RACE(address, description) /* empty */
-  #define ANNOTATE_BENIGN_RACE_SIZED(address, size, description) /* empty */
-  #define ANNOTATE_PURE_HAPPENS_BEFORE_MUTEX(mu) /* empty */
-  #define ANNOTATE_MUTEX_IS_USED_AS_CONDVAR(mu) /* empty */
-  #define ANNOTATE_TRACE_MEMORY(arg) /* empty */
-  #define ANNOTATE_THREAD_NAME(name) /* empty */
-  #define ANNOTATE_IGNORE_READS_BEGIN() /* empty */
-  #define ANNOTATE_IGNORE_READS_END() /* empty */
-  #define ANNOTATE_IGNORE_WRITES_BEGIN() /* empty */
-  #define ANNOTATE_IGNORE_WRITES_END() /* empty */
-  #define ANNOTATE_IGNORE_READS_AND_WRITES_BEGIN() /* empty */
-  #define ANNOTATE_IGNORE_READS_AND_WRITES_END() /* empty */
-  #define ANNOTATE_ENABLE_RACE_DETECTION(enable) /* empty */
-  #define ANNOTATE_NO_OP(arg) /* empty */
-  #define ANNOTATE_FLUSH_STATE() /* empty */
-
-#endif  /* DYNAMIC_ANNOTATIONS_ENABLED */
-
-/* Macro definitions for GCC attributes that allow static thread safety
-   analysis to recognize and use some of the dynamic annotations as
-   escape hatches.
-   TODO(lcwu): remove the check for __SUPPORT_DYN_ANNOTATION__ once the
-   default crosstool/GCC supports these GCC attributes.  */
-
-#define ANNOTALYSIS_STATIC_INLINE
-#define ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY ;
-#define ANNOTALYSIS_IGNORE_READS_BEGIN
-#define ANNOTALYSIS_IGNORE_READS_END
-#define ANNOTALYSIS_IGNORE_WRITES_BEGIN
-#define ANNOTALYSIS_IGNORE_WRITES_END
-#define ANNOTALYSIS_UNPROTECTED_READ
-
-#if defined(__GNUC__) && (!defined(SWIG)) && (!defined(__clang__)) && \
-    defined(__SUPPORT_TS_ANNOTATION__) && defined(__SUPPORT_DYN_ANNOTATION__)
-
-#if DYNAMIC_ANNOTATIONS_ENABLED == 0
-#define ANNOTALYSIS_ONLY 1
-#undef ANNOTALYSIS_STATIC_INLINE
-#define ANNOTALYSIS_STATIC_INLINE static inline
-#undef ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY
-#define ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY { (void)file; (void)line; }
-#endif
-
-/* Only emit attributes when annotalysis is enabled. */
-#if defined(__SUPPORT_TS_ANNOTATION__) && defined(__SUPPORT_DYN_ANNOTATION__)
-#undef  ANNOTALYSIS_IGNORE_READS_BEGIN
-#define ANNOTALYSIS_IGNORE_READS_BEGIN  __attribute__ ((ignore_reads_begin))
-#undef  ANNOTALYSIS_IGNORE_READS_END
-#define ANNOTALYSIS_IGNORE_READS_END    __attribute__ ((ignore_reads_end))
-#undef  ANNOTALYSIS_IGNORE_WRITES_BEGIN
-#define ANNOTALYSIS_IGNORE_WRITES_BEGIN __attribute__ ((ignore_writes_begin))
-#undef  ANNOTALYSIS_IGNORE_WRITES_END
-#define ANNOTALYSIS_IGNORE_WRITES_END   __attribute__ ((ignore_writes_end))
-#undef  ANNOTALYSIS_UNPROTECTED_READ
-#define ANNOTALYSIS_UNPROTECTED_READ    __attribute__ ((unprotected_read))
-#endif
-
-#endif // defined(__GNUC__) && (!defined(SWIG)) && (!defined(__clang__))
-
-/* Use the macros above rather than using these functions directly. */
 #ifdef __cplusplus
 extern "C" {
 #endif
-void AnnotateRWLockCreate(const char *file, int line,
-                          const volatile void *lock);
-void AnnotateRWLockDestroy(const char *file, int line,
-                           const volatile void *lock);
-void AnnotateRWLockAcquired(const char *file, int line,
-                            const volatile void *lock, long is_w);
-void AnnotateRWLockReleased(const char *file, int line,
-                            const volatile void *lock, long is_w);
-void AnnotateBarrierInit(const char *file, int line,
-                         const volatile void *barrier, long count,
-                         long reinitialization_allowed);
-void AnnotateBarrierWaitBefore(const char *file, int line,
-                               const volatile void *barrier);
-void AnnotateBarrierWaitAfter(const char *file, int line,
-                              const volatile void *barrier);
-void AnnotateBarrierDestroy(const char *file, int line,
-                            const volatile void *barrier);
-void AnnotateCondVarWait(const char *file, int line,
-                         const volatile void *cv,
-                         const volatile void *lock);
-void AnnotateCondVarSignal(const char *file, int line,
-                           const volatile void *cv);
-void AnnotateCondVarSignalAll(const char *file, int line,
-                              const volatile void *cv);
-void AnnotatePublishMemoryRange(const char *file, int line,
-                                const volatile void *address,
-                                long size);
-void AnnotateUnpublishMemoryRange(const char *file, int line,
-                                  const volatile void *address,
-                                  long size);
-void AnnotatePCQCreate(const char *file, int line,
-                       const volatile void *pcq);
-void AnnotatePCQDestroy(const char *file, int line,
-                        const volatile void *pcq);
-void AnnotatePCQPut(const char *file, int line,
-                    const volatile void *pcq);
-void AnnotatePCQGet(const char *file, int line,
-                    const volatile void *pcq);
-void AnnotateNewMemory(const char *file, int line,
-                       const volatile void *address,
-                       long size);
-void AnnotateExpectRace(const char *file, int line,
-                        const volatile void *address,
-                        const char *description);
-void AnnotateBenignRace(const char *file, int line,
-                        const volatile void *address,
-                        const char *description);
-void AnnotateBenignRaceSized(const char *file, int line,
-                        const volatile void *address,
-                        long size,
-                        const char *description);
-void AnnotateMutexIsUsedAsCondVar(const char *file, int line,
-                                  const volatile void *mu);
-void AnnotateTraceMemory(const char *file, int line,
-                         const volatile void *arg);
-void AnnotateThreadName(const char *file, int line,
-                        const char *name);
-ANNOTALYSIS_STATIC_INLINE
-void AnnotateIgnoreReadsBegin(const char *file, int line)
-    ANNOTALYSIS_IGNORE_READS_BEGIN ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY
-ANNOTALYSIS_STATIC_INLINE
-void AnnotateIgnoreReadsEnd(const char *file, int line)
-    ANNOTALYSIS_IGNORE_READS_END ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY
-ANNOTALYSIS_STATIC_INLINE
-void AnnotateIgnoreWritesBegin(const char *file, int line)
-    ANNOTALYSIS_IGNORE_WRITES_BEGIN ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY
-ANNOTALYSIS_STATIC_INLINE
-void AnnotateIgnoreWritesEnd(const char *file, int line)
-    ANNOTALYSIS_IGNORE_WRITES_END ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY
-void AnnotateEnableRaceDetection(const char *file, int line, int enable);
-void AnnotateNoOp(const char *file, int line,
-                  const volatile void *arg);
-void AnnotateFlushState(const char *file, int line);
 
 /* Return non-zero value if running under valgrind.
 
@@ -506,122 +79,8 @@
  */
 int RunningOnValgrind(void);
 
-/* ValgrindSlowdown returns:
-    * 1.0, if (RunningOnValgrind() == 0)
-    * 50.0, if (RunningOnValgrind() != 0 && getenv("VALGRIND_SLOWDOWN") == NULL)
-    * atof(getenv("VALGRIND_SLOWDOWN")) otherwise
-   This function can be used to scale timeout values:
-   EXAMPLE:
-   for (;;) {
-     DoExpensiveBackgroundTask();
-     SleepForSeconds(5 * ValgrindSlowdown());
-   }
- */
-double ValgrindSlowdown(void);
-
 #ifdef __cplusplus
 }
 #endif
 
-#if DYNAMIC_ANNOTATIONS_ENABLED != 0 && defined(__cplusplus)
-
-  /* ANNOTATE_UNPROTECTED_READ is the preferred way to annotate racey reads.
-
-     Instead of doing
-        ANNOTATE_IGNORE_READS_BEGIN();
-        ... = x;
-        ANNOTATE_IGNORE_READS_END();
-     one can use
-        ... = ANNOTATE_UNPROTECTED_READ(x); */
-  template <class T>
-  inline T ANNOTATE_UNPROTECTED_READ(const volatile T &x)
-      ANNOTALYSIS_UNPROTECTED_READ {
-    ANNOTATE_IGNORE_READS_BEGIN();
-    T res = x;
-    ANNOTATE_IGNORE_READS_END();
-    return res;
-  }
-  /* Apply ANNOTATE_BENIGN_RACE_SIZED to a static variable. */
-  #define ANNOTATE_BENIGN_RACE_STATIC(static_var, description)        \
-    namespace {                                                       \
-      class static_var ## _annotator {                                \
-       public:                                                        \
-        static_var ## _annotator() {                                  \
-          ANNOTATE_BENIGN_RACE_SIZED(&static_var,                     \
-                                      sizeof(static_var),             \
-            # static_var ": " description);                           \
-        }                                                             \
-      };                                                              \
-      static static_var ## _annotator the ## static_var ## _annotator;\
-    }
-#else /* DYNAMIC_ANNOTATIONS_ENABLED == 0 */
-
-  #define ANNOTATE_UNPROTECTED_READ(x) (x)
-  #define ANNOTATE_BENIGN_RACE_STATIC(static_var, description)  /* empty */
-
-#endif /* DYNAMIC_ANNOTATIONS_ENABLED */
-
-/* Annotalysis, a GCC based static analyzer, is able to understand and use
-   some of the dynamic annotations defined in this file. However, dynamic
-   annotations are usually disabled in the opt mode (to avoid additional
-   runtime overheads) while Annotalysis only works in the opt mode.
-   In order for Annotalysis to use these dynamic annotations when they
-   are disabled, we re-define these annotations here. Note that unlike the
-   original macro definitions above, these macros are expanded to calls to
-   static inline functions so that the compiler will be able to remove the
-   calls after the analysis. */
-
-#ifdef ANNOTALYSIS_ONLY
-
-  #undef ANNOTALYSIS_ONLY
-
-  /* Undefine and re-define the macros that the static analyzer understands. */
-  #undef ANNOTATE_IGNORE_READS_BEGIN
-  #define ANNOTATE_IGNORE_READS_BEGIN()           \
-    AnnotateIgnoreReadsBegin(__FILE__, __LINE__)
-
-  #undef ANNOTATE_IGNORE_READS_END
-  #define ANNOTATE_IGNORE_READS_END()             \
-    AnnotateIgnoreReadsEnd(__FILE__, __LINE__)
-
-  #undef ANNOTATE_IGNORE_WRITES_BEGIN
-  #define ANNOTATE_IGNORE_WRITES_BEGIN()          \
-    AnnotateIgnoreWritesBegin(__FILE__, __LINE__)
-
-  #undef ANNOTATE_IGNORE_WRITES_END
-  #define ANNOTATE_IGNORE_WRITES_END()            \
-    AnnotateIgnoreWritesEnd(__FILE__, __LINE__)
-
-  #undef ANNOTATE_IGNORE_READS_AND_WRITES_BEGIN
-  #define ANNOTATE_IGNORE_READS_AND_WRITES_BEGIN()       \
-    do {                                                 \
-      ANNOTATE_IGNORE_READS_BEGIN();                     \
-      ANNOTATE_IGNORE_WRITES_BEGIN();                    \
-    }while(0)                                            \
-
-  #undef ANNOTATE_IGNORE_READS_AND_WRITES_END
-  #define ANNOTATE_IGNORE_READS_AND_WRITES_END()  \
-    do {                                          \
-      ANNOTATE_IGNORE_WRITES_END();               \
-      ANNOTATE_IGNORE_READS_END();                \
-    }while(0)                                     \
-
-  #if defined(__cplusplus)
-    #undef ANNOTATE_UNPROTECTED_READ
-    template <class T>
-    inline T ANNOTATE_UNPROTECTED_READ(const volatile T &x)
-         ANNOTALYSIS_UNPROTECTED_READ {
-      ANNOTATE_IGNORE_READS_BEGIN();
-      T res = x;
-      ANNOTATE_IGNORE_READS_END();
-      return res;
-    }
-  #endif /* __cplusplus */
-
-#endif /* ANNOTALYSIS_ONLY */
-
-/* Undefine the macros intended only in this file. */
-#undef ANNOTALYSIS_STATIC_INLINE
-#undef ANNOTALYSIS_SEMICOLON_OR_EMPTY_BODY
-
 #endif  /* BASE_DYNAMIC_ANNOTATIONS_H_ */

diff --git a/src/base/elfcore.h b/src/base/elfcore.h
index d9599ed..11c09b4 100644
--- a/src/base/elfcore.h
+++ b/src/base/elfcore.h

@@ -41,13 +41,14 @@
 /* We currently only support x86-32, x86-64, ARM, MIPS, PPC on Linux.
  * Porting to other related platforms should not be difficult.
  */
-#if (defined(__i386__) || defined(__x86_64__) || defined(__ARM_ARCH_3__) || \
+#if (defined(__i386__) || defined(__x86_64__) || defined(__arm__) || \
      defined(__mips__) || defined(__PPC__)) && defined(__linux)
 
+#include "config.h"
+
 #include <stdarg.h>
 #include <stdint.h>
 #include <sys/types.h>
-#include <config.h>
 
 
 /* Define the DUMPER symbol to make sure that there is exactly one
@@ -89,7 +90,7 @@
     uint16_t  ss, __ss;
   #endif
   } i386_regs;
-#elif defined(__ARM_ARCH_3__)
+#elif defined(__arm__)
   typedef struct arm_regs {     /* General purpose registers                 */
     #define BP uregs[11]        /* Frame pointer                             */
     #define SP uregs[13]        /* Stack pointer                             */
@@ -126,274 +127,7 @@
   } ppc_regs;
 #endif
 
-#if defined(__i386__) && defined(__GNUC__)
-  /* On x86 we provide an optimized version of the FRAME() macro, if the
-   * compiler supports a GCC-style asm() directive. This results in somewhat
-   * more accurate values for CPU registers.
-   */
-  typedef struct Frame {
-    struct i386_regs uregs;
-    int              errno_;
-    pid_t            tid;
-  } Frame;
-  #define FRAME(f) Frame f;                                           \
-                   do {                                               \
-                     f.errno_ = errno;                                \
-                     f.tid    = sys_gettid();                         \
-                     __asm__ volatile (                               \
-                       "push %%ebp\n"                                 \
-                       "push %%ebx\n"                                 \
-                       "mov  %%ebx,0(%%eax)\n"                        \
-                       "mov  %%ecx,4(%%eax)\n"                        \
-                       "mov  %%edx,8(%%eax)\n"                        \
-                       "mov  %%esi,12(%%eax)\n"                       \
-                       "mov  %%edi,16(%%eax)\n"                       \
-                       "mov  %%ebp,20(%%eax)\n"                       \
-                       "mov  %%eax,24(%%eax)\n"                       \
-                       "mov  %%ds,%%ebx\n"                            \
-                       "mov  %%ebx,28(%%eax)\n"                       \
-                       "mov  %%es,%%ebx\n"                            \
-                       "mov  %%ebx,32(%%eax)\n"                       \
-                       "mov  %%fs,%%ebx\n"                            \
-                       "mov  %%ebx,36(%%eax)\n"                       \
-                       "mov  %%gs,%%ebx\n"                            \
-                       "mov  %%ebx, 40(%%eax)\n"                      \
-                       "call 0f\n"                                    \
-                     "0:pop %%ebx\n"                                  \
-                       "add  $1f-0b,%%ebx\n"                          \
-                       "mov  %%ebx,48(%%eax)\n"                       \
-                       "mov  %%cs,%%ebx\n"                            \
-                       "mov  %%ebx,52(%%eax)\n"                       \
-                       "pushf\n"                                      \
-                       "pop  %%ebx\n"                                 \
-                       "mov  %%ebx,56(%%eax)\n"                       \
-                       "mov  %%esp,%%ebx\n"                           \
-                       "add  $8,%%ebx\n"                              \
-                       "mov  %%ebx,60(%%eax)\n"                       \
-                       "mov  %%ss,%%ebx\n"                            \
-                       "mov  %%ebx,64(%%eax)\n"                       \
-                       "pop  %%ebx\n"                                 \
-                       "pop  %%ebp\n"                                 \
-                     "1:"                                             \
-                       : : "a" (&f) : "memory");                      \
-                     } while (0)
-  #define SET_FRAME(f,r)                                              \
-                     do {                                             \
-                       errno = (f).errno_;                            \
-                       (r)   = (f).uregs;                             \
-                     } while (0)
-#elif defined(__x86_64__) && defined(__GNUC__)
-  /* The FRAME and SET_FRAME macros for x86_64.  */
-  typedef struct Frame {
-    struct i386_regs uregs;
-    int              errno_;
-    pid_t            tid;
-  } Frame;
-  #define FRAME(f) Frame f;                                           \
-                   do {                                               \
-                     f.errno_ = errno;                                \
-                     f.tid    = sys_gettid();                         \
-                     __asm__ volatile (                               \
-                       "push %%rbp\n"                                 \
-                       "push %%rbx\n"                                 \
-                       "mov  %%r15,0(%%rax)\n"                        \
-                       "mov  %%r14,8(%%rax)\n"                        \
-                       "mov  %%r13,16(%%rax)\n"                       \
-                       "mov  %%r12,24(%%rax)\n"                       \
-                       "mov  %%rbp,32(%%rax)\n"                       \
-                       "mov  %%rbx,40(%%rax)\n"                       \
-                       "mov  %%r11,48(%%rax)\n"                       \
-                       "mov  %%r10,56(%%rax)\n"                       \
-                       "mov  %%r9,64(%%rax)\n"                        \
-                       "mov  %%r8,72(%%rax)\n"                        \
-                       "mov  %%rax,80(%%rax)\n"                       \
-                       "mov  %%rcx,88(%%rax)\n"                       \
-                       "mov  %%rdx,96(%%rax)\n"                       \
-                       "mov  %%rsi,104(%%rax)\n"                      \
-                       "mov  %%rdi,112(%%rax)\n"                      \
-                       "mov  %%ds,%%rbx\n"                            \
-                       "mov  %%rbx,184(%%rax)\n"                      \
-                       "mov  %%es,%%rbx\n"                            \
-                       "mov  %%rbx,192(%%rax)\n"                      \
-                       "mov  %%fs,%%rbx\n"                            \
-                       "mov  %%rbx,200(%%rax)\n"                      \
-                       "mov  %%gs,%%rbx\n"                            \
-                       "mov  %%rbx,208(%%rax)\n"                      \
-                       "call 0f\n"                                    \
-                     "0:pop %%rbx\n"                                  \
-                       "add  $1f-0b,%%rbx\n"                          \
-                       "mov  %%rbx,128(%%rax)\n"                      \
-                       "mov  %%cs,%%rbx\n"                            \
-                       "mov  %%rbx,136(%%rax)\n"                      \
-                       "pushf\n"                                      \
-                       "pop  %%rbx\n"                                 \
-                       "mov  %%rbx,144(%%rax)\n"                      \
-                       "mov  %%rsp,%%rbx\n"                           \
-                       "add  $16,%%ebx\n"                             \
-                       "mov  %%rbx,152(%%rax)\n"                      \
-                       "mov  %%ss,%%rbx\n"                            \
-                       "mov  %%rbx,160(%%rax)\n"                      \
-                       "pop  %%rbx\n"                                 \
-                       "pop  %%rbp\n"                                 \
-                     "1:"                                             \
-                       : : "a" (&f) : "memory");                      \
-                     } while (0)
-  #define SET_FRAME(f,r)                                              \
-                     do {                                             \
-                       errno = (f).errno_;                            \
-                       (f).uregs.fs_base = (r).fs_base;               \
-                       (f).uregs.gs_base = (r).gs_base;               \
-                       (r)   = (f).uregs;                             \
-                     } while (0)
-#elif defined(__ARM_ARCH_3__) && defined(__GNUC__)
-  /* ARM calling conventions are a little more tricky. A little assembly
-   * helps in obtaining an accurate snapshot of all registers.
-   */
-  typedef struct Frame {
-    struct arm_regs arm;
-    int             errno_;
-    pid_t           tid;
-  } Frame;
-  #define FRAME(f) Frame f;                                           \
-                   do {                                               \
-                     long cpsr;                                       \
-                     f.errno_ = errno;                                \
-                     f.tid    = sys_gettid();                         \
-                     __asm__ volatile(                                \
-                       "stmia %0, {r0-r15}\n" /* All integer regs   */\
-                       : : "r"(&f.arm) : "memory");                   \
-                     f.arm.uregs[16] = 0;                             \
-                     __asm__ volatile(                                \
-                       "mrs %0, cpsr\n"       /* Condition code reg */\
-                       : "=r"(cpsr));                                 \
-                     f.arm.uregs[17] = cpsr;                          \
-                   } while (0)
-  #define SET_FRAME(f,r)                                              \
-                     do {                                             \
-                       /* Don't override the FPU status register.   */\
-                       /* Use the value obtained from ptrace(). This*/\
-                       /* works, because our code does not perform  */\
-                       /* any FPU operations, itself.               */\
-                       long fps      = (f).arm.uregs[16];             \
-                       errno         = (f).errno_;                    \
-                       (r)           = (f).arm;                       \
-                       (r).uregs[16] = fps;                           \
-                     } while (0)
-#elif defined(__mips__) && defined(__GNUC__)
-  typedef struct Frame {
-    struct mips_regs mips_regs;
-    int              errno_;
-    pid_t            tid;
-  } Frame;
-  #define MIPSREG(n) ({ register unsigned long r __asm__("$"#n); r; })
-  #define FRAME(f) Frame f = { 0 };                                   \
-                   do {                                               \
-                     unsigned long hi, lo;                            \
-                     register unsigned long pc __asm__("$31");        \
-                     f.mips_regs.uregs[ 0] = MIPSREG( 0);             \
-                     f.mips_regs.uregs[ 1] = MIPSREG( 1);             \
-                     f.mips_regs.uregs[ 2] = MIPSREG( 2);             \
-                     f.mips_regs.uregs[ 3] = MIPSREG( 3);             \
-                     f.mips_regs.uregs[ 4] = MIPSREG( 4);             \
-                     f.mips_regs.uregs[ 5] = MIPSREG( 5);             \
-                     f.mips_regs.uregs[ 6] = MIPSREG( 6);             \
-                     f.mips_regs.uregs[ 7] = MIPSREG( 7);             \
-                     f.mips_regs.uregs[ 8] = MIPSREG( 8);             \
-                     f.mips_regs.uregs[ 9] = MIPSREG( 9);             \
-                     f.mips_regs.uregs[10] = MIPSREG(10);             \
-                     f.mips_regs.uregs[11] = MIPSREG(11);             \
-                     f.mips_regs.uregs[12] = MIPSREG(12);             \
-                     f.mips_regs.uregs[13] = MIPSREG(13);             \
-                     f.mips_regs.uregs[14] = MIPSREG(14);             \
-                     f.mips_regs.uregs[15] = MIPSREG(15);             \
-                     f.mips_regs.uregs[16] = MIPSREG(16);             \
-                     f.mips_regs.uregs[17] = MIPSREG(17);             \
-                     f.mips_regs.uregs[18] = MIPSREG(18);             \
-                     f.mips_regs.uregs[19] = MIPSREG(19);             \
-                     f.mips_regs.uregs[20] = MIPSREG(20);             \
-                     f.mips_regs.uregs[21] = MIPSREG(21);             \
-                     f.mips_regs.uregs[22] = MIPSREG(22);             \
-                     f.mips_regs.uregs[23] = MIPSREG(23);             \
-                     f.mips_regs.uregs[24] = MIPSREG(24);             \
-                     f.mips_regs.uregs[25] = MIPSREG(25);             \
-                     f.mips_regs.uregs[26] = MIPSREG(26);             \
-                     f.mips_regs.uregs[27] = MIPSREG(27);             \
-                     f.mips_regs.uregs[28] = MIPSREG(28);             \
-                     f.mips_regs.uregs[29] = MIPSREG(29);             \
-                     f.mips_regs.uregs[30] = MIPSREG(30);             \
-                     f.mips_regs.uregs[31] = MIPSREG(31);             \
-                     __asm__ volatile ("mfhi %0" : "=r"(hi));         \
-                     __asm__ volatile ("mflo %0" : "=r"(lo));         \
-                     __asm__ volatile ("jal 1f; 1:nop" : "=r"(pc));   \
-                     f.mips_regs.hi       = hi;                       \
-                     f.mips_regs.lo       = lo;                       \
-                     f.mips_regs.cp0_epc  = pc;                       \
-                     f.errno_             = errno;                    \
-                     f.tid                = sys_gettid();             \
-                   } while (0)
-  #define SET_FRAME(f,r)                                              \
-                   do {                                               \
-                     errno       = (f).errno_;                        \
-                     memcpy((r).uregs, (f).mips_regs.uregs,           \
-                            32*sizeof(unsigned long));                \
-                     (r).hi      = (f).mips_regs.hi;                  \
-                     (r).lo      = (f).mips_regs.lo;                  \
-                     (r).cp0_epc = (f).mips_regs.cp0_epc;             \
-                   } while (0)
-#else
-  /* If we do not have a hand-optimized assembly version of the FRAME()
-   * macro, we cannot reliably unroll the stack. So, we show a few additional
-   * stack frames for the coredumper.
-   */
-  typedef struct Frame {
-    pid_t tid;
-  } Frame;
-  #define FRAME(f) Frame f; do { f.tid = sys_gettid(); } while (0)
-  #define SET_FRAME(f,r) do { } while (0)
-#endif
-
-
-/* Internal function for generating a core file. This API can change without
- * notice and is only supposed to be used internally by the core dumper.
- *
- * This function works for both single- and multi-threaded core
- * dumps. If called as
- *
- *   FRAME(frame);
- *   InternalGetCoreDump(&frame, 0, NULL, ap);
- *
- * it creates a core file that only contains information about the
- * calling thread.
- *
- * Optionally, the caller can provide information about other threads
- * by passing their process ids in "thread_pids". The process id of
- * the caller should not be included in this array. All of the threads
- * must have been attached to with ptrace(), prior to calling this
- * function. They will be detached when "InternalGetCoreDump()" returns.
- *
- * This function either returns a file handle that can be read for obtaining
- * a core dump, or "-1" in case of an error. In the latter case, "errno"
- * will be set appropriately.
- *
- * While "InternalGetCoreDump()" is not technically async signal safe, you
- * might be tempted to invoke it from a signal handler. The code goes to
- * great lengths to make a best effort that this will actually work. But in
- * any case, you must make sure that you preserve the value of "errno"
- * yourself. It is guaranteed to be clobbered otherwise.
- *
- * Also, "InternalGetCoreDump" is not strictly speaking re-entrant. Again,
- * it makes a best effort to behave reasonably when called in a multi-
- * threaded environment, but it is ultimately the caller's responsibility
- * to provide locking.
- */
-int InternalGetCoreDump(void *frame, int num_threads, pid_t *thread_pids,
-                        va_list ap
-                     /* const struct CoreDumpParameters *params,
-                        const char *file_name,
-                        const char *PATH
-                      */);
-
-#endif
+#endif  // __linux and various arches
 
 #ifdef __cplusplus
 }

diff --git a/src/base/googleinit.h b/src/base/googleinit.h
index 3ea411a..a290427 100644
--- a/src/base/googleinit.h
+++ b/src/base/googleinit.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/base/linux_syscall_support.h b/src/base/linux_syscall_support.h
index 56b8fac..d6899b8 100644
--- a/src/base/linux_syscall_support.h
+++ b/src/base/linux_syscall_support.h

@@ -130,11 +130,14 @@
 #ifndef SYS_LINUX_SYSCALL_SUPPORT_H
 #define SYS_LINUX_SYSCALL_SUPPORT_H
 
-/* We currently only support x86-32, x86-64, ARM, MIPS, PPC/PPC64 and Aarch64 on Linux.
+/* We currently only support x86-32, x86-64, ARM, MIPS, PPC/PPC64, Aarch64,
+ * s390, s390x, and riscv64 on Linux.
  * Porting to other related platforms should not be difficult.
  */
 #if (defined(__i386__) || defined(__x86_64__) || defined(__arm__) || \
-     defined(__mips__) || defined(__PPC__) || defined(__aarch64__)) && defined(__linux)
+     defined(__mips__) || defined(__mips64) || defined(__mips64el__) || defined(__PPC__) || \
+     defined(__aarch64__) || defined(__s390__) || defined(__riscv)) \
+  && (defined(__linux))
 
 #ifndef SYS_CPLUSPLUS
 #ifdef __cplusplus
@@ -159,6 +162,7 @@
 #include <unistd.h>
 #include <linux/unistd.h>
 #include <endian.h>
+#include <fcntl.h>
 
 #ifdef __mips__
 /* Include definitions of the ABI currently in use.                          */
@@ -245,7 +249,8 @@
   long               ru_nivcsw;
 };
 
-#if defined(__i386__) || defined(__arm__) || defined(__PPC__)
+#if defined(__i386__) || defined(__arm__) \
+  || defined(__PPC__) || (defined(__s390__) && !defined(__s390x__))
 
 /* include/asm-{arm,i386,mips,ppc}/signal.h                                  */
 struct kernel_old_sigaction {
@@ -259,6 +264,8 @@
 } __attribute__((packed,aligned(4)));
 #elif (defined(__mips__) && _MIPS_SIM == _MIPS_SIM_ABI32)
   #define kernel_old_sigaction kernel_sigaction
+#elif defined(__aarch64__)
+  // No kernel_old_sigaction defined for arm64.
 #endif
 
 /* Some kernel functions (e.g. sigaction() in 2.6.23) require that the
@@ -302,9 +309,9 @@
 #endif
 };
 
-/* include/asm-{arm,i386,mips,ppc}/stat.h                                    */
+/* include/asm-{arm,i386,mips,ppc,s390}/stat.h                               */
 #ifdef __mips__
-#if _MIPS_SIM == _MIPS_SIM_ABI64
+#if (_MIPS_SIM == _MIPS_SIM_ABI64 || _MIPS_SIM == _MIPS_SIM_NABI32)
 struct kernel_stat {
 #else
 struct kernel_stat64 {
@@ -373,7 +380,7 @@
 };
 #endif
 
-/* include/asm-{arm,generic,i386,mips,x86_64,ppc}/stat.h                     */
+/* include/asm-{arm,generic,i386,mips,x86_64,ppc,s390}/stat.h                     */
 #if defined(__i386__) || defined(__arm__)
 struct kernel_stat {
   /* The kernel headers suggest that st_dev and st_rdev should be 32bit
@@ -443,7 +450,8 @@
   unsigned long      __unused5;
   unsigned long      __unused6;
 };
-#elif (defined(__mips__) && _MIPS_SIM != _MIPS_SIM_ABI64)
+#elif defined(__mips__) \
+       && !(_MIPS_SIM == _MIPS_SIM_ABI64 || _MIPS_SIM == _MIPS_SIM_NABI32)
 struct kernel_stat {
   unsigned           st_dev;
   int                st_pad1[3];
@@ -489,6 +497,50 @@
   unsigned int       __unused4;
   unsigned int       __unused5;
 };
+#elif defined(__s390x__)
+struct kernel_stat {
+  unsigned long      st_dev;
+  unsigned long      st_ino;
+  unsigned long      st_nlink;
+  unsigned int       st_mode;
+  unsigned int       st_uid;
+  unsigned int       st_gid;
+  unsigned int       __pad1;
+  unsigned long      st_rdev;
+  unsigned long      st_size;
+  unsigned long      st_atime_;
+  unsigned long      st_atime_nsec_;
+  unsigned long      st_mtime_;
+  unsigned long      st_mtime_nsec_;
+  unsigned long      st_ctime_;
+  unsigned long      st_ctime_nsec_;
+  unsigned long      st_blksize;
+  long               st_blocks;
+  unsigned long      __unused[3];
+};
+#elif defined(__s390__)
+struct kernel_stat {
+  unsigned short     st_dev;
+  unsigned short     __pad1;
+  unsigned long      st_ino;
+  unsigned short     st_mode;
+  unsigned short     st_nlink;
+  unsigned short     st_uid;
+  unsigned short     st_gid;
+  unsigned short     st_rdev;
+  unsigned short     __pad2;
+  unsigned long      st_size;
+  unsigned long      st_blksize;
+  unsigned long      st_blocks;
+  unsigned long      st_atime_;
+  unsigned long      st_atime_nsec_;
+  unsigned long      st_mtime_;
+  unsigned long      st_mtime_nsec_;
+  unsigned long      st_ctime_;
+  unsigned long      st_ctime_nsec_;
+  unsigned long      __unused4;
+  unsigned long      __unused5;
+};
 #endif
 
 
@@ -632,7 +684,7 @@
 #define __NR_getcpu             (__NR_Linux + 312)
 #endif
 /* End of MIPS (old 32bit API) definitions */
-#elif  _MIPS_SIM == _MIPS_SIM_ABI64
+#elif (_MIPS_SIM == _MIPS_SIM_ABI64 || _MIPS_SIM == _MIPS_SIM_NABI32)
 #ifndef __NR_gettid
 #define __NR_gettid             (__NR_Linux + 178)
 #endif
@@ -697,12 +749,211 @@
 #ifndef __NR_getcpu
 #define __NR_getcpu             302
 #endif
-/* End of powerpc defininitions                                              */
+/* End of powerpc definitions                                              */
 #elif defined(__aarch64__)
 #ifndef __NR_fstatat
 #define __NR_fstatat             79
 #endif
-/* End of aarch64 defininitions                                              */
+/* End of aarch64 definitions                                              */
+#elif defined(__s390__)
+#ifndef __NR_quotactl
+#define __NR_quotactl           131
+#endif
+#ifndef __NR_rt_sigreturn
+#define __NR_rt_sigreturn       173
+#endif
+#ifndef __NR_rt_sigaction
+#define __NR_rt_sigaction       174
+#endif
+#ifndef __NR_rt_sigprocmask
+#define __NR_rt_sigprocmask     175
+#endif
+#ifndef __NR_rt_sigpending
+#define __NR_rt_sigpending      176
+#endif
+#ifndef __NR_rt_sigsuspend
+#define __NR_rt_sigsuspend      179
+#endif
+#ifndef __NR_pread64
+#define __NR_pread64            180
+#endif
+#ifndef __NR_pwrite64
+#define __NR_pwrite64           181
+#endif
+#ifndef __NR_getdents64
+#define __NR_getdents64         220
+#endif
+#ifndef __NR_readahead
+#define __NR_readahead          222
+#endif
+#ifndef __NR_setxattr
+#define __NR_setxattr           224
+#endif
+#ifndef __NR_lsetxattr
+#define __NR_lsetxattr          225
+#endif
+#ifndef __NR_getxattr
+#define __NR_getxattr           227
+#endif
+#ifndef __NR_lgetxattr
+#define __NR_lgetxattr          228
+#endif
+#ifndef __NR_listxattr
+#define __NR_listxattr          230
+#endif
+#ifndef __NR_llistxattr
+#define __NR_llistxattr         231
+#endif
+#ifndef __NR_gettid
+#define __NR_gettid             236
+#endif
+#ifndef __NR_tkill
+#define __NR_tkill              237
+#endif
+#ifndef __NR_futex
+#define __NR_futex              238
+#endif
+#ifndef __NR_sched_setaffinity
+#define __NR_sched_setaffinity  239
+#endif
+#ifndef __NR_sched_getaffinity
+#define __NR_sched_getaffinity  240
+#endif
+#ifndef __NR_set_tid_address
+#define __NR_set_tid_address    252
+#endif
+#ifndef __NR_clock_gettime
+#define __NR_clock_gettime      260
+#endif
+#ifndef __NR_clock_getres
+#define __NR_clock_getres       261
+#endif
+#ifndef __NR_statfs64
+#define __NR_statfs64           265
+#endif
+#ifndef __NR_fstatfs64
+#define __NR_fstatfs64          266
+#endif
+#ifndef __NR_ioprio_set
+#define __NR_ioprio_set         282
+#endif
+#ifndef __NR_ioprio_get
+#define __NR_ioprio_get         283
+#endif
+#ifndef __NR_openat
+#define __NR_openat             288
+#endif
+#ifndef __NR_unlinkat
+#define __NR_unlinkat           294
+#endif
+#ifndef __NR_move_pages
+#define __NR_move_pages         310
+#endif
+#ifndef __NR_getcpu
+#define __NR_getcpu             311
+#endif
+#ifndef __NR_fallocate
+#define __NR_fallocate          314
+#endif
+/* Some syscalls are named/numbered differently between s390 and s390x. */
+#ifdef __s390x__
+# ifndef __NR_getrlimit
+# define __NR_getrlimit          191
+# endif
+# ifndef __NR_setresuid
+# define __NR_setresuid          208
+# endif
+# ifndef __NR_getresuid
+# define __NR_getresuid          209
+# endif
+# ifndef __NR_setresgid
+# define __NR_setresgid          210
+# endif
+# ifndef __NR_getresgid
+# define __NR_getresgid          211
+# endif
+# ifndef __NR_setfsuid
+# define __NR_setfsuid           215
+# endif
+# ifndef __NR_setfsgid
+# define __NR_setfsgid           216
+# endif
+# ifndef __NR_fadvise64
+# define __NR_fadvise64          253
+# endif
+# ifndef __NR_newfstatat
+# define __NR_newfstatat         293
+# endif
+#else /* __s390x__ */
+# ifndef __NR_getrlimit
+# define __NR_getrlimit          76
+# endif
+# ifndef __NR_setfsuid
+# define __NR_setfsuid           138
+# endif
+# ifndef __NR_setfsgid
+# define __NR_setfsgid           139
+# endif
+# ifndef __NR_setresuid
+# define __NR_setresuid          164
+# endif
+# ifndef __NR_getresuid
+# define __NR_getresuid          165
+# endif
+# ifndef __NR_setresgid
+# define __NR_setresgid          170
+# endif
+# ifndef __NR_getresgid
+# define __NR_getresgid          171
+# endif
+# ifndef __NR_ugetrlimit
+# define __NR_ugetrlimit         191
+# endif
+# ifndef __NR_mmap2
+# define __NR_mmap2              192
+# endif
+# ifndef __NR_setresuid32
+# define __NR_setresuid32        208
+# endif
+# ifndef __NR_getresuid32
+# define __NR_getresuid32        209
+# endif
+# ifndef __NR_setresgid32
+# define __NR_setresgid32        210
+# endif
+# ifndef __NR_getresgid32
+# define __NR_getresgid32        211
+# endif
+# ifndef __NR_setfsuid32
+# define __NR_setfsuid32         215
+# endif
+# ifndef __NR_setfsgid32
+# define __NR_setfsgid32         216
+# endif
+# ifndef __NR_fadvise64_64
+# define __NR_fadvise64_64       264
+# endif
+# ifndef __NR_fstatat64
+# define __NR_fstatat64          293
+# endif
+#endif /* __s390__ */
+/* End of s390/s390x definitions                                             */
+#elif defined(__riscv)
+# ifndef __NR_gettid
+# define __NR_gettid             178
+# endif
+# ifndef __NR_futex
+# define __NR_futex              422
+# endif
+# ifndef __NR_getdents64
+# define __NR_getdents64         61
+# endif
+# ifndef __NR_openat
+# define __NR_openat             56
+# endif
+# ifndef __NR_fstatat
+# define __NR_fstatat            79
+# endif
 #endif
 
 
@@ -766,7 +1017,7 @@
 
   #undef  LSS_RETURN
   #if (defined(__i386__) || defined(__x86_64__) || defined(__arm__) ||        \
-       defined(__aarch64__))
+       defined(__aarch64__) || defined(__s390__) || defined(__riscv))
   /* Failing system calls return a negative result in the range of
    * -1..-4095. These are "errno" values with the sign inverted.
    */
@@ -831,7 +1082,7 @@
                            "pop %%ebx\n"                                      \
                            CFI_ADJUST_CFA_OFFSET(-4)                          \
                            args                                               \
-                           : "esp", "memory");                                \
+                           : "memory");                                       \
       LSS_RETURN(type,__res)
     #undef  _syscall0
     #define _syscall0(type,name)                                              \
@@ -888,7 +1139,7 @@
                              : "i" (__NR_##name), "ri" ((long)(arg1)),        \
                                "c" ((long)(arg2)), "d" ((long)(arg3)),        \
                                "S" ((long)(arg4)), "D" ((long)(arg5))         \
-                             : "esp", "memory");                              \
+                             : "memory");                                     \
         LSS_RETURN(type,__res);                                               \
       }
     #undef  _syscall6
@@ -910,7 +1161,7 @@
                              : "i" (__NR_##name),  "0" ((long)(&__s)),        \
                                "c" ((long)(arg2)), "d" ((long)(arg3)),        \
                                "S" ((long)(arg4)), "D" ((long)(arg5))         \
-                             : "esp", "memory");                              \
+                             : "memory");                                     \
         LSS_RETURN(type,__res);                                               \
       }
     LSS_INLINE int LSS_NAME(clone)(int (*fn)(void *), void *child_stack,
@@ -996,7 +1247,7 @@
                            : "0"(-EINVAL), "i"(__NR_clone),
                              "m"(fn), "m"(child_stack), "m"(flags), "m"(arg),
                              "m"(parent_tidptr), "m"(newtls), "m"(child_tidptr)
-                           : "esp", "memory", "ecx", "edx", "esi", "edi");
+                           : "memory", "ecx", "edx", "esi", "edi");
       LSS_RETURN(int, __res);
     }
 
@@ -1250,7 +1501,7 @@
                                "d"(LSS_SYSCALL_ARG(parent_tidptr)),
                                "r"(LSS_SYSCALL_ARG(newtls)),
                                "r"(LSS_SYSCALL_ARG(child_tidptr))
-                             : "rsp", "memory", "r8", "r10", "r11", "rcx");
+                             : "memory", "r8", "r10", "r11", "rcx");
       }
       LSS_RETURN(int, __res);
     }
@@ -1582,8 +1833,8 @@
                               ".set reorder\n"                                \
                               : "=&r"(__v0), "+r" (__r7)                      \
                               : "i" (__NR_##name), "r"(__r4), "r"(__r5),      \
-                                "r"(__r6), "r" ((unsigned long)arg5),         \
-                                "r" ((unsigned long)arg6)                     \
+                                "r"(__r6), "m" ((unsigned long)arg5),         \
+                                "m" ((unsigned long)arg6)                     \
                               : MIPS_SYSCALL_CLOBBERS);                       \
         LSS_RETURN(type, __v0, __r7);                                         \
       }
@@ -1972,7 +2223,7 @@
                                 "svc 0x0\n"                                   \
                                 : "=r"(__res_x0)                              \
                                 : "i"(__NR_##name) , ## args                  \
-                                : "memory");                                  \
+                                : "x8", "memory");                            \
           __res = __res_x0;                                                   \
           LSS_RETURN(type, __res)
     #undef _syscall0
@@ -1986,17 +2237,23 @@
         LSS_REG(0, arg1); LSS_BODY(type, name, "r"(__x0));                    \
       }
     #undef _syscall2
-    #define _syscall2(type, name, type1, arg1, type2, arg2)                   \
+    #define _syscall2_long(type, name, svc, type1, arg1, type2, arg2)         \
       type LSS_NAME(name)(type1 arg1, type2 arg2) {                           \
         LSS_REG(0, arg1); LSS_REG(1, arg2);                                   \
-        LSS_BODY(type, name, "r"(__x0), "r"(__x1));                           \
+        LSS_BODY(type, svc, "r"(__x0), "r"(__x1));                            \
       }
+    #define _syscall2(type, name, type1, arg1, type2, arg2)                   \
+            _syscall2_long(type, name, name, type1, arg1, type2, arg2)
     #undef _syscall3
-    #define _syscall3(type, name, type1, arg1, type2, arg2, type3, arg3)      \
+    #define _syscall3_long(type, name, svc, type1, arg1, type2, arg2,         \
+                           type3, arg3)                                       \
       type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3) {               \
         LSS_REG(0, arg1); LSS_REG(1, arg2); LSS_REG(2, arg3);                 \
-        LSS_BODY(type, name, "r"(__x0), "r"(__x1), "r"(__x2));                \
+        LSS_BODY(type, svc, "r"(__x0), "r"(__x1), "r"(__x2));                 \
       }
+    #define _syscall3(type, name, type1, arg1, type2, arg2, type3, arg3)      \
+            _syscall3_long(type, name, name, type1, arg1, type2, arg2,        \
+                           type3, arg3)
     #undef _syscall4
     #define _syscall4(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4)  \
       type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3, type4 arg4) {   \
@@ -2015,15 +2272,19 @@
                              "r"(__x4));                                      \
       }
     #undef _syscall6
-    #define _syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,  \
-                      type5,arg5,type6,arg6)                                  \
+    #define _syscall6_long(type,name,svc,type1,arg1,type2,arg2,type3,arg3,    \
+                           type4,arg4,type5,arg5,type6,arg6)                  \
       type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3, type4 arg4,     \
                           type5 arg5, type6 arg6) {                           \
         LSS_REG(0, arg1); LSS_REG(1, arg2); LSS_REG(2, arg3);                 \
         LSS_REG(3, arg4); LSS_REG(4, arg5); LSS_REG(5, arg6);                 \
-        LSS_BODY(type, name, "r"(__x0), "r"(__x1), "x"(__x2), "r"(__x3),      \
+        LSS_BODY(type, svc, "r"(__x0), "r"(__x1), "x"(__x2), "r"(__x3),       \
                              "r"(__x4), "r"(__x5));                           \
       }
+    #define _syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,  \
+                      type5,arg5,type6,arg6)                                  \
+            _syscall6_long(type,name,name,type1,arg1,type2,arg2,type3,arg3,   \
+                           type4,arg4,type5,arg5,type6,arg6)
     /* clone function adapted from glibc 2.18 clone.S                       */
     LSS_INLINE int LSS_NAME(clone)(int (*fn)(void *), void *child_stack,
                                    int flags, void *arg, int *parent_tidptr,
@@ -2079,20 +2340,252 @@
                                "r"(__fn), "r"(__stack), "r"(__flags), "r"(__arg),
                                "r"(__ptid), "r"(__tls), "r"(__ctid),
                                "i"(__NR_clone), "i"(__NR_exit)
-                             : "x30", "memory");
+                             : "x8", "x30", "memory");
       }
       LSS_RETURN(int, __res);
     }
+  #elif defined(__s390__)
+    #undef  LSS_REG
+    #define LSS_REG(r, a) register unsigned long __r##r __asm__("r"#r) = (unsigned long) a
+    #undef  LSS_BODY
+    #define LSS_BODY(type, name, args...)                                     \
+        register unsigned long __nr __asm__("r1")                             \
+            = (unsigned long)(__NR_##name);                                   \
+        register long __res_r2 __asm__("r2");                                 \
+        long __res;                                                           \
+        __asm__ __volatile__                                                  \
+            ("svc 0\n\t"                                                      \
+             : "=d"(__res_r2)                                                 \
+             : "d"(__nr), ## args                                             \
+             : "memory");                                                     \
+        __res = __res_r2;                                                     \
+        LSS_RETURN(type, __res)
+    #undef _syscall0
+    #define _syscall0(type, name)                                             \
+       type LSS_NAME(name)(void) {                                            \
+          LSS_BODY(type, name);                                               \
+       }
+    #undef _syscall1
+    #define _syscall1(type, name, type1, arg1)                                \
+       type LSS_NAME(name)(type1 arg1) {                                      \
+          LSS_REG(2, arg1);                                                   \
+          LSS_BODY(type, name, "0"(__r2));                                    \
+       }
+    #undef _syscall2
+    #define _syscall2(type, name, type1, arg1, type2, arg2)                   \
+       type LSS_NAME(name)(type1 arg1, type2 arg2) {                          \
+          LSS_REG(2, arg1); LSS_REG(3, arg2);                                 \
+          LSS_BODY(type, name, "0"(__r2), "d"(__r3));                         \
+       }
+    #undef _syscall3
+    #define _syscall3(type, name, type1, arg1, type2, arg2, type3, arg3)      \
+       type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3) {              \
+          LSS_REG(2, arg1); LSS_REG(3, arg2); LSS_REG(4, arg3);               \
+          LSS_BODY(type, name, "0"(__r2), "d"(__r3), "d"(__r4));              \
+       }
+    #undef _syscall4
+    #define _syscall4(type, name, type1, arg1, type2, arg2, type3, arg3,      \
+                                  type4, arg4)                                \
+       type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3,                \
+                           type4 arg4) {                                      \
+          LSS_REG(2, arg1); LSS_REG(3, arg2); LSS_REG(4, arg3);               \
+          LSS_REG(5, arg4);                                                   \
+          LSS_BODY(type, name, "0"(__r2), "d"(__r3), "d"(__r4),               \
+                               "d"(__r5));                                    \
+       }
+    #undef _syscall5
+    #define _syscall5(type, name, type1, arg1, type2, arg2, type3, arg3,      \
+                                  type4, arg4, type5, arg5)                   \
+       type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3,                \
+                           type4 arg4, type5 arg5) {                          \
+          LSS_REG(2, arg1); LSS_REG(3, arg2); LSS_REG(4, arg3);               \
+          LSS_REG(5, arg4); LSS_REG(6, arg5);                                 \
+          LSS_BODY(type, name, "0"(__r2), "d"(__r3), "d"(__r4),               \
+                               "d"(__r5), "d"(__r6));                         \
+       }
+    #undef _syscall6
+    #define _syscall6(type, name, type1, arg1, type2, arg2, type3, arg3,      \
+                                  type4, arg4, type5, arg5, type6, arg6)      \
+       type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3,                \
+                           type4 arg4, type5 arg5, type6 arg6) {              \
+          LSS_REG(2, arg1); LSS_REG(3, arg2); LSS_REG(4, arg3);               \
+          LSS_REG(5, arg4); LSS_REG(6, arg5); LSS_REG(7, arg6);               \
+          LSS_BODY(type, name, "0"(__r2), "d"(__r3), "d"(__r4),               \
+                               "d"(__r5), "d"(__r6), "d"(__r7));              \
+       }
+    LSS_INLINE int LSS_NAME(clone)(int (*fn)(void *), void *child_stack,
+                                   int flags, void *arg, int *parent_tidptr,
+                                   void *newtls, int *child_tidptr) {
+      long __ret;
+      {
+        register int  (*__fn)(void *)    __asm__ ("r1")  = fn;
+        register void  *__cstack         __asm__ ("r2")  = child_stack;
+        register int    __flags          __asm__ ("r3")  = flags;
+        register void  *__arg            __asm__ ("r0")  = arg;
+        register int   *__ptidptr        __asm__ ("r4")  = parent_tidptr;
+        register void  *__newtls         __asm__ ("r6")  = newtls;
+        register int   *__ctidptr        __asm__ ("r5")  = child_tidptr;
+        __asm__ __volatile__ (
+    #ifndef __s390x__
+                                  /* arg already in r0 */
+          "ltr %4, %4\n\t"        /* check fn, which is already in r1 */
+          "jz 1f\n\t"             /* NULL function pointer, return -EINVAL */
+          "ltr %5, %5\n\t"        /* check child_stack, which is already in r2 */
+          "jz 1f\n\t"             /* NULL stack pointer, return -EINVAL */
+                                  /* flags already in r3 */
+                                  /* parent_tidptr already in r4 */
+                                  /* child_tidptr already in r5 */
+                                  /* newtls already in r6 */
+          "svc %2\n\t"            /* invoke clone syscall */
+          "ltr %0,%%r2\n\t"       /* load return code into __ret and test */
+          "jnz 1f\n\t"            /* return to parent if non-zero */
+                                  /* start child thread */
+          "lr %%r2, %7\n\t"       /* set first parameter to void *arg */
+          "ahi %%r15, -96\n\t"    /* make room on the stack for the save area */
+          "xc 0(4,%%r15), 0(%%r15)\n\t"
+          "basr %%r14, %4\n\t"    /* jump to fn */
+          "svc %3\n"              /* invoke exit syscall */
+          "1:\n"
+    #else
+                                  /* arg already in r0 */
+          "ltgr %4, %4\n\t"       /* check fn, which is already in r1 */
+          "jz 1f\n\t"             /* NULL function pointer, return -EINVAL */
+          "ltgr %5, %5\n\t"       /* check child_stack, which is already in r2 */
+          "jz 1f\n\t"             /* NULL stack pointer, return -EINVAL */
+                                  /* flags already in r3 */
+                                  /* parent_tidptr already in r4 */
+                                  /* child_tidptr already in r5 */
+                                  /* newtls already in r6 */
+          "svc %2\n\t"            /* invoke clone syscall */
+          "ltgr %0, %%r2\n\t"     /* load return code into __ret and test */
+          "jnz 1f\n\t"            /* return to parent if non-zero */
+                                  /* start child thread */
+          "lgr %%r2, %7\n\t"      /* set first parameter to void *arg */
+          "aghi %%r15, -160\n\t"  /* make room on the stack for the save area */
+          "xc 0(8,%%r15), 0(%%r15)\n\t"
+          "basr %%r14, %4\n\t"    /* jump to fn */
+          "svc %3\n"              /* invoke exit syscall */
+          "1:\n"
+    #endif
+          : "=r" (__ret)
+          : "0" (-EINVAL), "i" (__NR_clone), "i" (__NR_exit),
+            "d" (__fn), "d" (__cstack), "d" (__flags), "d" (__arg),
+            "d" (__ptidptr), "d" (__newtls), "d" (__ctidptr)
+          : "cc", "r14", "memory"
+        );
+      }
+      LSS_RETURN(int, __ret);
+    }
+  #elif defined(__riscv)
+    #undef LSS_REG
+    #define LSS_REG(r,a) register long __a##r __asm__("a"#r) =       \
+                                 (long)(a)
+
+    #undef  LSS_BODY
+    #define LSS_BODY(type, name, args...)                                     \
+          register long __a7 __asm__("a7") = __NR_##name;                     \
+          long __res;                                                         \
+          __asm__ __volatile__ (                                              \
+                                "scall\n\t"                                   \
+                                : "+r" (__a0)                                 \
+                                : "r" (__a7), ##args                          \
+                                : "memory");                                  \
+          __res = __a0;                                                       \
+          LSS_RETURN(type, __res)
+    #undef _syscall0
+    #define _syscall0(type,name)                                              \
+      type LSS_NAME(name)() {                                                 \
+          register long __a7 __asm__("a7") = __NR_##name;                     \
+          register long __a0 __asm__("a0");                                   \
+          long __res;                                                         \
+          __asm__ __volatile__ (                                              \
+                                "scall\n\t"                                   \
+                                : "=r" (__a0)                                 \
+                                : "r" (__a7)                                  \
+                                : "memory");                                  \
+          __res = __a0;                                                       \
+          LSS_RETURN(type, __res);                                            \
+      }
+    #undef _syscall1
+    #define _syscall1(type, name, type1, arg1)                                \
+      type LSS_NAME(name)(type1 arg1) {                                       \
+        /* There is no need for using a volatile temp.  */                    \
+        LSS_REG(0, arg1);                                                     \
+        LSS_BODY(type, name);                                                 \
+      }
+    #undef _syscall2
+    #define _syscall2(type, name, type1, arg1, type2, arg2)                   \
+      type LSS_NAME(name)(type1 arg1, type2 arg2) {                           \
+        LSS_REG(0, arg1);                                                     \
+        LSS_REG(1, arg2);                                                     \
+        LSS_BODY(type, name, "r"(__a1));                                      \
+      }
+    #undef _syscall3
+    #define _syscall3(type, name, type1, arg1, type2, arg2, type3, arg3)      \
+      type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3) {               \
+        LSS_REG(0, arg1);                                                     \
+        LSS_REG(1, arg2);                                                     \
+        LSS_REG(2, arg3);                                                     \
+        LSS_BODY(type, name, "r"(__a1), "r"(__a2));                           \
+      }
+    #undef _syscall4
+    #define _syscall4(type, name, type1, arg1, type2, arg2, type3, arg3,      \
+                      type4, arg4)                                            \
+      type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3, type4 arg4) {   \
+        LSS_REG(0, arg1);                                                     \
+        LSS_REG(1, arg2);                                                     \
+        LSS_REG(2, arg3);                                                     \
+        LSS_REG(3, arg4);                                                     \
+        LSS_BODY(type, name, "r"(__a1), "r"(__a2), "r"(__a3));                \
+      }
+    #undef _syscall5
+    #define _syscall5(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,  \
+                      type5,arg5)                                             \
+      type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3, type4 arg4,     \
+                          type5 arg5) {                                       \
+        LSS_REG(0, arg1);                                                     \
+        LSS_REG(1, arg2);                                                     \
+        LSS_REG(2, arg3);                                                     \
+        LSS_REG(3, arg4);                                                     \
+        LSS_REG(4, arg5);                                                     \
+        LSS_BODY(type, name, "r"(__a1), "r"(__a2), "r"(__a3), "r"(__a4));     \
+      }
+    #undef _syscall6
+    #define _syscall6(type,name,type1,arg1,type2,arg2,type3,arg3,type4,arg4,  \
+                      type5,arg5,type6,arg6)                                  \
+      type LSS_NAME(name)(type1 arg1, type2 arg2, type3 arg3, type4 arg4,     \
+                          type5 arg5, type6 arg6) {                           \
+        LSS_REG(0, arg1);                                                     \
+        LSS_REG(1, arg2);                                                     \
+        LSS_REG(2, arg3);                                                     \
+        LSS_REG(3, arg4);                                                     \
+        LSS_REG(4, arg5);                                                     \
+        LSS_REG(5, arg6);                                                     \
+        LSS_BODY(type, name, "r"(__a1), "r"(__a2), "r"(__a3), "r"(__a4),      \
+                             "r"(__a5));                                      \
+      }
   #endif
   #define __NR__exit   __NR_exit
   #define __NR__gettid __NR_gettid
   #define __NR__mremap __NR_mremap
   LSS_INLINE _syscall1(int,     close,           int,         f)
   LSS_INLINE _syscall1(int,     _exit,           int,         e)
+#if defined(__aarch64__) && defined (__ILP32__)
+  /* aarch64_ilp32 uses fcntl64 for sys_fcntl() */
+  LSS_INLINE _syscall3_long(int,     fcntl,      fcntl64,     int,         f,
+                       int,            c, long,   a)
+#else
   LSS_INLINE _syscall3(int,     fcntl,           int,         f,
                        int,            c, long,   a)
+#endif
+#if defined(__aarch64__) && defined (__ILP32__)
+  /* aarch64_ilp32 uses fstat64 for sys_fstat() */
+  LSS_INLINE _syscall2_long(int,     fstat,       fstat64,    int,         f,
+                      struct kernel_stat*,   b)
+#else
   LSS_INLINE _syscall2(int,     fstat,           int,         f,
                       struct kernel_stat*,   b)
+#endif
   LSS_INLINE _syscall6(int,     futex,           int*,        a,
                        int,            o, int,    v,
                       struct kernel_timespec*, t,
@@ -2120,6 +2613,10 @@
       _LSS_BODY(3, off_t, lseek, off_t, LSS_SYSCALL_ARG(f), (uint64_t)(o),
                                         LSS_SYSCALL_ARG(w));
     }
+  #elif defined(__aarch64__) && defined (__ILP32__)
+    /* aarch64_ilp32 uses llseek for sys_lseek() */
+    LSS_INLINE _syscall3_long(off_t,   lseek,       llseek,    int,         f,
+                         off_t,          o, int,    w)
   #else
     LSS_INLINE _syscall3(off_t,   lseek,           int,         f,
                          off_t,          o, int,    w)
@@ -2165,18 +2662,11 @@
     LSS_INLINE _syscall3(int, socket,             int,   d,
                          int,                     t, int,       p)
   #endif
-  #if defined(__x86_64__)
-    /* Need to make sure __off64_t isn't truncated to 32-bits under x32.  */
-    LSS_INLINE void* LSS_NAME(mmap)(void *s, size_t l, int p, int f, int d,
-                                    __off64_t o) {
-      LSS_BODY(6, void*, mmap, LSS_SYSCALL_ARG(s), LSS_SYSCALL_ARG(l),
-                               LSS_SYSCALL_ARG(p), LSS_SYSCALL_ARG(f),
-                               LSS_SYSCALL_ARG(d), (uint64_t)(o));
-    }
-
+  #if defined(__x86_64__) || defined(__s390x__)
     LSS_INLINE int LSS_NAME(sigaction)(int signum,
                                        const struct kernel_sigaction *act,
                                        struct kernel_sigaction *oldact) {
+      #if defined(__x86_64__)
       /* On x86_64, the kernel requires us to always set our own
        * SA_RESTORER in order to be able to return from a signal handler.
        * This function must have a "magic" signature that the "gdb"
@@ -2188,10 +2678,10 @@
         a.sa_restorer = LSS_NAME(restore_rt)();
         return LSS_NAME(rt_sigaction)(signum, &a, oldact,
                                       (KERNEL_NSIG+7)/8);
-      } else {
+      } else
+      #endif
         return LSS_NAME(rt_sigaction)(signum, act, oldact,
                                       (KERNEL_NSIG+7)/8);
-      }
     }
 
     LSS_INLINE int LSS_NAME(sigprocmask)(int how,
@@ -2201,11 +2691,8 @@
     }
   #endif
   #if (defined(__aarch64__)) || \
-      (defined(__mips__) && (_MIPS_ISA == _MIPS_ISA_MIPS64))
-    LSS_INLINE _syscall6(void*, mmap,              void*, s,
-                         size_t,                   l, int,               p,
-                         int,                      f, int,               d,
-                         __off64_t,                o)
+      (defined(__mips__) \
+       && (_MIPS_SIM == _MIPS_SIM_ABI64 || _MIPS_SIM == _MIPS_SIM_NABI32))
     LSS_INLINE int LSS_NAME(sigaction)(int signum,
                                        const struct kernel_sigaction *act,
                                        struct kernel_sigaction *oldact) {
@@ -2272,26 +2759,30 @@
     }
   }
 
-  #if defined(__i386__) || \
-      defined(__arm__) || \
-     (defined(__mips__) && _MIPS_SIM == _MIPS_SIM_ABI32) || defined(__PPC__)
+  #if defined(__i386__) ||                                                    \
+      defined(__arm__) ||                                                     \
+     (defined(__mips__) && _MIPS_SIM == _MIPS_SIM_ABI32) ||                   \
+      defined(__PPC__) ||                                                     \
+     (defined(__s390__) && !defined(__s390x__))
     #define __NR__sigaction   __NR_sigaction
     #define __NR__sigprocmask __NR_sigprocmask
     LSS_INLINE _syscall2(int, fstat64,             int, f,
                          struct kernel_stat64 *, b)
     LSS_INLINE _syscall5(int, _llseek,     uint, fd, ulong, hi, ulong, lo,
                          loff_t *, res, uint, wh)
-#ifdef __PPC64__
-    LSS_INLINE _syscall6(void*, mmap,              void*, s,
-                         size_t,                   l, int,               p,
-                         int,                      f, int,               d,
-                         off_t,                    o)
-#else
-    #ifndef __ARM_EABI__
-    /* Not available on ARM EABI Linux.  */
-    LSS_INLINE _syscall1(void*, mmap,              void*, a)
-    #endif
-    LSS_INLINE _syscall6(void*, mmap2,             void*, s,
+#if defined(__s390__) && !defined(__s390x__)
+    /* On s390, mmap2() arguments are passed in memory. */
+    LSS_INLINE void* LSS_NAME(_mmap2)(void *s, size_t l, int p, int f, int d,
+                                      off_t o) {
+      unsigned long buf[6] = { (unsigned long) s, (unsigned long) l,
+                               (unsigned long) p, (unsigned long) f,
+                               (unsigned long) d, (unsigned long) o };
+      LSS_REG(2, buf);
+      LSS_BODY(void*, mmap2, "0"(__r2));
+    }
+#elif !defined(__PPC64__)
+    #define __NR__mmap2 __NR_mmap2
+    LSS_INLINE _syscall6(void*, _mmap2,            void*, s,
                          size_t,                   l, int,               p,
                          int,                      f, int,               d,
                          off_t,                    o)
@@ -2384,16 +2875,58 @@
       return rc;
     }
   #endif
+  #if defined(__i386__) ||                                                    \
+      defined(__ARM_ARCH_3__) || defined(__ARM_EABI__) ||                     \
+     (defined(__mips__) && _MIPS_SIM == _MIPS_SIM_ABI32) ||                   \
+     (defined(__PPC__) && !defined(__PPC64__)) ||                             \
+     (defined(__s390__) && !defined(__s390x__))
+    /* On these architectures, implement mmap() with mmap2(). */
+    LSS_INLINE void* LSS_NAME(mmap)(void *s, size_t l, int p, int f, int d,
+                                    int64_t o) {
+      if (o % 4096) {
+        LSS_ERRNO = EINVAL;
+        return (void *) -1;
+      }
+      return LSS_NAME(_mmap2)(s, l, p, f, d, (o / 4096));
+    }
+  #elif defined(__s390x__)
+    /* On s390x, mmap() arguments are passed in memory. */
+    LSS_INLINE void* LSS_NAME(mmap)(void *s, size_t l, int p, int f, int d,
+                                    int64_t o) {
+      unsigned long buf[6] = { (unsigned long) s, (unsigned long) l,
+                               (unsigned long) p, (unsigned long) f,
+                               (unsigned long) d, (unsigned long) o };
+      LSS_REG(2, buf);
+      LSS_BODY(void*, mmap, "0"(__r2));
+    }
+  #elif defined(__x86_64__)
+    /* Need to make sure __off64_t isn't truncated to 32-bits under x32.  */
+    LSS_INLINE void* LSS_NAME(mmap)(void *s, size_t l, int p, int f, int d,
+                                    int64_t o) {
+      LSS_BODY(6, void*, mmap, LSS_SYSCALL_ARG(s), LSS_SYSCALL_ARG(l),
+                               LSS_SYSCALL_ARG(p), LSS_SYSCALL_ARG(f),
+                               LSS_SYSCALL_ARG(d), (uint64_t)(o));
+    }
+  #elif defined(__aarch64__) && defined (__ILP32__)
+    /* aarch64_ilp32 uses mmap2 for sys_mmap() */
+    LSS_INLINE _syscall6_long(void*, mmap, mmap2, void*, addr, size_t, length,
+                              int, prot, int, flags, int, fd, int64_t, offset)
+  #else
+    /* Remaining 64-bit architectures. */
+    LSS_INLINE _syscall6(void*, mmap, void*, addr, size_t, length, int, prot,
+                         int, flags, int, fd, int64_t, offset)
+  #endif
   #if defined(__i386__) || \
       defined(__PPC__) || \
       (defined(__arm__) && !defined(__ARM_EABI__)) || \
-      (defined(__mips__) && _MIPS_SIM == _MIPS_SIM_ABI32)
+      (defined(__mips__) && _MIPS_SIM == _MIPS_SIM_ABI32) || \
+      defined(__s390__)
 
     /* See sys_socketcall in net/socket.c in kernel source.
      * It de-multiplexes on its first arg and unpacks the arglist
      * array in its second arg.
      */
-    LSS_INLINE _syscall2(long, socketcall, int, c, unsigned long*, a)
+    LSS_INLINE _syscall2(int, socketcall, int, c, unsigned long*, a)
 
     LSS_INLINE int LSS_NAME(socket)(int domain, int type, int protocol) {
       unsigned long args[3] = {

diff --git a/src/base/linuxthreads.cc b/src/base/linuxthreads.cc
index 891e70c..1e7f137 100644
--- a/src/base/linuxthreads.cc
+++ b/src/base/linuxthreads.cc

@@ -333,7 +333,7 @@
     sa.sa_flags      = SA_ONSTACK|SA_SIGINFO|SA_RESETHAND;
     sys_sigaction(sync_signals[sig], &sa, (struct kernel_sigaction *)NULL);
   }
-  
+
   /* Read process directories in /proc/...                                   */
   for (;;) {
     /* Some kernels know about threads, and hide them in "/proc"
@@ -349,7 +349,7 @@
     }
     if (sys_fstat(proc, &proc_sb) < 0)
       goto failure;
-    
+
     /* Since we are suspending threads, we cannot call any libc
      * functions that might acquire locks. Most notably, we cannot
      * call malloc(). So, we have to allocate memory on the stack,
@@ -363,7 +363,7 @@
      */
     if (max_threads < proc_sb.st_nlink + 100)
       max_threads = proc_sb.st_nlink + 100;
-    
+
     /* scope */ {
       pid_t pids[max_threads];
       int   added_entries = 0;
@@ -395,11 +395,11 @@
           if (entry->d_ino != 0) {
             const char *ptr = entry->d_name;
             pid_t pid;
-            
+
             /* Some kernels hide threads by preceding the pid with a '.'     */
             if (*ptr == '.')
               ptr++;
-            
+
             /* If the directory is not numeric, it cannot be a
              * process/thread
              */
@@ -413,7 +413,7 @@
               char fname[entry->d_reclen + 48];
               strcat(strcat(strcpy(fname, "/proc/"),
                             entry->d_name), marker_path);
-              
+
               /* Check if the marker is identical to the one we created      */
               if (sys_stat(fname, &tmp_sb) >= 0 &&
                   marker_sb.st_ino == tmp_sb.st_ino) {
@@ -429,7 +429,7 @@
                     goto next_entry;
                   }
                 }
-                
+
                 /* Check whether data structure needs growing                */
                 if (num_threads >= max_threads) {
                   /* Back to square one, this time with more memory          */

diff --git a/src/base/linuxthreads.h b/src/base/linuxthreads.h
index 16bc8c6..a087628 100644
--- a/src/base/linuxthreads.h
+++ b/src/base/linuxthreads.h

@@ -1,3 +1,4 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2005-2007, Google Inc.
  * All rights reserved.
  *
@@ -37,11 +38,12 @@
 /* Include thread_lister.h to get the interface that we implement for linux.
  */
 
-/* We currently only support x86-32 and x86-64 on Linux. Porting to other
+/* We currently only support certain platforms on Linux. Porting to other
  * related platforms should not be difficult.
  */
-#if (defined(__i386__) || defined(__x86_64__) || defined(__ARM_ARCH_3__) || \
-     defined(__mips__) || defined(__PPC__) || defined(__aarch64__)) && defined(__linux)
+#if (defined(__i386__) || defined(__x86_64__) || defined(__arm__) || \
+     defined(__mips__) || defined(__PPC__) || defined(__aarch64__) ||       \
+     defined(__s390__)) && defined(__linux)
 
 /* Define the THREADS symbol to make sure that there is exactly one core dumper
  * built into the library.

diff --git a/src/base/logging.cc b/src/base/logging.cc
index 761c2fd..52d9bd3 100644
--- a/src/base/logging.cc
+++ b/src/base/logging.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/base/logging.h b/src/base/logging.h
index a1afe4d..94b9138 100644
--- a/src/base/logging.h
+++ b/src/base/logging.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/base/low_level_alloc.cc b/src/base/low_level_alloc.cc
index 4d2ae8d..db91155 100644
--- a/src/base/low_level_alloc.cc
+++ b/src/base/low_level_alloc.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -57,6 +57,9 @@
 
 // A first-fit allocator with amortized logarithmic free() time.
 
+LowLevelAlloc::PagesAllocator::~PagesAllocator() {
+}
+
 // ---------------------------------------------------------------------------
 static const int kMaxLevel = 30;
 
@@ -104,8 +107,7 @@
 
 // Return a random integer n:  p(n)=1/(2**n) if 1 <= n; p(n)=0 if n < 1.
 static int Random() {
-  static int32 r = 1;         // no locking---it's not critical
-  ANNOTATE_BENIGN_RACE(&r, "benign race, not critical.");
+  static uint32 r = 1;         // no locking---it's not critical
   int result = 1;
   while ((((r = r*1103515245 + 12345) >> 30) & 1) == 0) {
     result++;
@@ -196,6 +198,7 @@
                           // (init under mu, then ro)
   size_t min_size;        // smallest allocation block size
                           // (init under mu, then ro)
+  PagesAllocator *allocator;
 };
 
 // The default arena, which is used when 0 is passed instead of an Arena
@@ -208,6 +211,17 @@
 static struct LowLevelAlloc::Arena unhooked_arena;
 static struct LowLevelAlloc::Arena unhooked_async_sig_safe_arena;
 
+namespace {
+
+  class DefaultPagesAllocator : public LowLevelAlloc::PagesAllocator {
+  public:
+    virtual ~DefaultPagesAllocator() {};
+    virtual void *MapPages(int32 flags, size_t size);
+    virtual void UnMapPages(int32 flags, void *addr, size_t size);
+  };
+
+}
+
 // magic numbers to identify allocated and unallocated blocks
 static const intptr_t kMagicAllocated = 0x4c833e95;
 static const intptr_t kMagicUnallocated = ~kMagicAllocated;
@@ -234,7 +248,7 @@
       this->arena_->mu.Lock();
     }
     ~ArenaLock() { RAW_CHECK(this->left_, "haven't left Arena region"); }
-    void Leave() /*UNLOCK_FUNCTION()*/ {
+    void Leave() UNLOCK_FUNCTION() {
       this->arena_->mu.Unlock();
 #if 0
       if (this->mask_valid_) {
@@ -289,12 +303,20 @@
       arena->flags = 0;   // other arenas' flags may be overridden by client,
                           // but unhooked_arena will have 0 in 'flags'.
     }
+    arena->allocator = LowLevelAlloc::GetDefaultPagesAllocator();
   }
 }
 
 // L < meta_data_arena->mu
 LowLevelAlloc::Arena *LowLevelAlloc::NewArena(int32 flags,
                                               Arena *meta_data_arena) {
+  return NewArenaWithCustomAlloc(flags, meta_data_arena, NULL);
+}
+
+// L < meta_data_arena->mu
+LowLevelAlloc::Arena *LowLevelAlloc::NewArenaWithCustomAlloc(int32 flags,
+                                                             Arena *meta_data_arena,
+                                                             PagesAllocator *allocator) {
   RAW_CHECK(meta_data_arena != 0, "must pass a valid arena");
   if (meta_data_arena == &default_arena) {
     if ((flags & LowLevelAlloc::kAsyncSignalSafe) != 0) {
@@ -308,6 +330,9 @@
     new (AllocWithArena(sizeof (*result), meta_data_arena)) Arena(0);
   ArenaInit(result);
   result->flags = flags;
+  if (allocator) {
+    result->allocator = allocator;
+  }
   return result;
 }
 
@@ -458,15 +483,7 @@
       // mmap generous 64K chunks to decrease
       // the chances/impact of fragmentation:
       size_t new_pages_size = RoundUp(req_rnd, arena->pagesize * 16);
-      void *new_pages;
-      if ((arena->flags & LowLevelAlloc::kAsyncSignalSafe) != 0) {
-        new_pages = MallocHook::UnhookedMMap(0, new_pages_size,
-            PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
-      } else {
-        new_pages = mmap(0, new_pages_size,
-            PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
-      }
-      RAW_CHECK(new_pages != MAP_FAILED, "mmap error");
+      void *new_pages = arena->allocator->MapPages(arena->flags, new_pages_size);
       arena->mu.Lock();
       s = reinterpret_cast<AllocList *>(new_pages);
       s->header.size = new_pages_size;
@@ -493,7 +510,6 @@
     section.Leave();
     result = &s->levels;
   }
-  ANNOTATE_NEW_MEMORY(result, request);
   return result;
 }
 
@@ -521,3 +537,44 @@
 LowLevelAlloc::Arena *LowLevelAlloc::DefaultArena() {
   return &default_arena;
 }
+
+static DefaultPagesAllocator *default_pages_allocator;
+static union {
+  char chars[sizeof(DefaultPagesAllocator)];
+  void *ptr;
+} debug_pages_allocator_space;
+
+LowLevelAlloc::PagesAllocator *LowLevelAlloc::GetDefaultPagesAllocator(void) {
+  if (default_pages_allocator) {
+    return default_pages_allocator;
+  }
+  default_pages_allocator = new (debug_pages_allocator_space.chars) DefaultPagesAllocator();
+  return default_pages_allocator;
+}
+
+void *DefaultPagesAllocator::MapPages(int32 flags, size_t size) {
+  void *new_pages;
+  if ((flags & LowLevelAlloc::kAsyncSignalSafe) != 0) {
+    new_pages = MallocHook::UnhookedMMap(0, size,
+                                         PROT_WRITE|PROT_READ,
+                                         MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+  } else {
+    new_pages = mmap(0, size,
+                     PROT_WRITE|PROT_READ,
+                     MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+  }
+  RAW_CHECK(new_pages != MAP_FAILED, "mmap error");
+
+  return new_pages;
+}
+
+void DefaultPagesAllocator::UnMapPages(int32 flags, void *region, size_t size) {
+  int munmap_result;
+  if ((flags & LowLevelAlloc::kAsyncSignalSafe) == 0) {
+    munmap_result = munmap(region, size);
+  } else {
+    munmap_result = MallocHook::UnhookedMUnmap(region, size);
+  }
+  RAW_CHECK(munmap_result == 0,
+            "LowLevelAlloc::DeleteArena: munmap failed address");
+}

diff --git a/src/base/low_level_alloc.h b/src/base/low_level_alloc.h
index 4081ff8..406bfff 100644
--- a/src/base/low_level_alloc.h
+++ b/src/base/low_level_alloc.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -43,6 +43,15 @@
 
 class LowLevelAlloc {
  public:
+  class PagesAllocator {
+  public:
+    virtual ~PagesAllocator();
+    virtual void *MapPages(int32 flags, size_t size) = 0;
+    virtual void UnMapPages(int32 flags, void *addr, size_t size) = 0;
+  };
+
+  static PagesAllocator *GetDefaultPagesAllocator(void);
+
   struct Arena;       // an arena from which memory may be allocated
 
   // Returns a pointer to a block of at least "request" bytes
@@ -90,6 +99,10 @@
   };
   static Arena *NewArena(int32 flags, Arena *meta_data_arena);
 
+  // note: pages allocator will never be destroyed and allocated pages will never be freed
+  // When allocator is NULL, it's same as NewArena
+  static Arena *NewArenaWithCustomAlloc(int32 flags, Arena *meta_data_arena, PagesAllocator *allocator);
+
   // Destroys an arena allocated by NewArena and returns true,
   // provided no allocated blocks remain in the arena.
   // If allocated blocks remain in the arena, does nothing and

diff --git a/src/base/simple_mutex.h b/src/base/simple_mutex.h
index a1886e4..5913b9e 100644
--- a/src/base/simple_mutex.h
+++ b/src/base/simple_mutex.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -27,7 +27,7 @@
 // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-// 
+//
 // ---
 // Author: Craig Silverstein.
 //

diff --git a/src/base/spinlock.cc b/src/base/spinlock.cc
index 2021fec..d2a9e1c 100644
--- a/src/base/spinlock.cc
+++ b/src/base/spinlock.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -34,21 +34,14 @@
 
 #include <config.h>
 #include "base/spinlock.h"
-#include "base/synchronization_profiling.h"
 #include "base/spinlock_internal.h"
-#include "base/cycleclock.h"
-#include "base/sysinfo.h"   /* for NumCPUs() */
+#include "base/sysinfo.h"   /* for GetSystemCPUsCount() */
 
 // NOTE on the Lock-state values:
 //
-//   kSpinLockFree represents the unlocked state
-//   kSpinLockHeld represents the locked state with no waiters
-//
-// Values greater than kSpinLockHeld represent the locked state with waiters,
-// where the value is the time the current lock holder had to
-// wait before obtaining the lock.  The kSpinLockSleeper state is a special
-// "locked with waiters" state that indicates that a sleeper needs to
-// be woken, but the thread that just released the lock didn't wait.
+// kSpinLockFree represents the unlocked state
+// kSpinLockHeld represents the locked state with no waiters
+// kSpinLockSleeper represents the locked state with waiters
 
 static int adaptive_spin_count = 0;
 
@@ -60,7 +53,7 @@
   SpinLock_InitHelper() {
     // On multi-cpu machines, spin for longer before yielding
     // the processor or sleeping.  Reduces idle time significantly.
-    if (NumCPUs() > 1) {
+    if (GetSystemCPUsCount() > 1) {
       adaptive_spin_count = 1000;
     }
   }
@@ -71,35 +64,28 @@
 // but nothing lock-intensive should be going on at that time.
 static SpinLock_InitHelper init_helper;
 
+inline void SpinlockPause(void) {
+#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
+  __asm__ __volatile__("rep; nop" : : );
+#endif
+}
+
 }  // unnamed namespace
 
-// Monitor the lock to see if its value changes within some time period
-// (adaptive_spin_count loop iterations).  A timestamp indicating
-// when the thread initially started waiting for the lock is passed in via
-// the initial_wait_timestamp value.  The total wait time in cycles for the
-// lock is returned in the wait_cycles parameter.  The last value read
+// Monitor the lock to see if its value changes within some time
+// period (adaptive_spin_count loop iterations). The last value read
 // from the lock is returned from the method.
-Atomic32 SpinLock::SpinLoop(int64 initial_wait_timestamp,
-                            Atomic32* wait_cycles) {
+Atomic32 SpinLock::SpinLoop() {
   int c = adaptive_spin_count;
   while (base::subtle::NoBarrier_Load(&lockword_) != kSpinLockFree && --c > 0) {
+    SpinlockPause();
   }
-  Atomic32 spin_loop_wait_cycles = CalculateWaitCycles(initial_wait_timestamp);
-  Atomic32 lock_value =
-      base::subtle::Acquire_CompareAndSwap(&lockword_, kSpinLockFree,
-                                           spin_loop_wait_cycles);
-  *wait_cycles = spin_loop_wait_cycles;
-  return lock_value;
+  return base::subtle::Acquire_CompareAndSwap(&lockword_, kSpinLockFree,
+                                              kSpinLockSleeper);
 }
 
 void SpinLock::SlowLock() {
-  // The lock was not obtained initially, so this thread needs to wait for
-  // it.  Record the current timestamp in the local variable wait_start_time
-  // so the total wait time can be stored in the lockword once this thread
-  // obtains the lock.
-  int64 wait_start_time = CycleClock::Now();
-  Atomic32 wait_cycles;
-  Atomic32 lock_value = SpinLoop(wait_start_time, &wait_cycles);
+  Atomic32 lock_value = SpinLoop();
 
   int lock_wait_call_count = 0;
   while (lock_value != kSpinLockFree) {
@@ -114,7 +100,7 @@
                                                         kSpinLockSleeper);
       if (lock_value == kSpinLockHeld) {
         // Successfully transitioned to kSpinLockSleeper.  Pass
-        // kSpinLockSleeper to the SpinLockWait routine to properly indicate
+        // kSpinLockSleeper to the SpinLockDelay routine to properly indicate
         // the last lock_value observed.
         lock_value = kSpinLockSleeper;
       } else if (lock_value == kSpinLockFree) {
@@ -123,7 +109,7 @@
         // this thread obtains the lock.
         lock_value = base::subtle::Acquire_CompareAndSwap(&lockword_,
                                                           kSpinLockFree,
-                                                          wait_cycles);
+                                                          kSpinLockSleeper);
         continue;  // skip the delay at the end of the loop
       }
     }
@@ -133,51 +119,11 @@
                                   ++lock_wait_call_count);
     // Spin again after returning from the wait routine to give this thread
     // some chance of obtaining the lock.
-    lock_value = SpinLoop(wait_start_time, &wait_cycles);
+    lock_value = SpinLoop();
   }
 }
 
-// The wait time for contentionz lock profiling must fit into 32 bits.
-// However, the lower 32-bits of the cycle counter wrap around too quickly
-// with high frequency processors, so a right-shift by 7 is performed to
-// quickly divide the cycles by 128.  Using these 32 bits, reduces the
-// granularity of time measurement to 128 cycles, and loses track
-// of wait time for waits greater than 109 seconds on a 5 GHz machine
-// [(2^32 cycles/5 Ghz)*128 = 109.95 seconds]. Waits this long should be
-// very rare and the reduced granularity should not be an issue given
-// processors in the Google fleet operate at a minimum of one billion
-// cycles/sec.
-enum { PROFILE_TIMESTAMP_SHIFT = 7 };
-
-void SpinLock::SlowUnlock(uint64 wait_cycles) {
-  base::internal::SpinLockWake(&lockword_, false);  // wake waiter if necessary
-
-  // Collect contentionz profile info, expanding the wait_cycles back out to
-  // the full value.  If wait_cycles is <= kSpinLockSleeper, then no wait
-  // was actually performed, so don't record the wait time.  Note, that the
-  // CalculateWaitCycles method adds in kSpinLockSleeper cycles
-  // unconditionally to guarantee the wait time is not kSpinLockFree or
-  // kSpinLockHeld.  The adding in of these small number of cycles may
-  // overestimate the contention by a slight amount 50% of the time.  However,
-  // if this code tried to correct for that addition by subtracting out the
-  // kSpinLockSleeper amount that would underestimate the contention slightly
-  // 50% of the time.  Both ways get the wrong answer, so the code
-  // overestimates to be more conservative. Overestimating also makes the code
-  // a little simpler.
-  //
-  if (wait_cycles > kSpinLockSleeper) {
-    base::SubmitSpinLockProfileData(this,
-                                    wait_cycles << PROFILE_TIMESTAMP_SHIFT);
-  }
-}
-
-inline int32 SpinLock::CalculateWaitCycles(int64 wait_start_time) {
-  int32 wait_cycles = ((CycleClock::Now() - wait_start_time) >>
-                       PROFILE_TIMESTAMP_SHIFT);
-  // The number of cycles waiting for the lock is used as both the
-  // wait_cycles and lock value, so it can't be kSpinLockFree or
-  // kSpinLockHeld.  Make sure the value returned is at least
-  // kSpinLockSleeper.
-  wait_cycles |= kSpinLockSleeper;
-  return wait_cycles;
+void SpinLock::SlowUnlock() {
+  // wake waiter if necessary
+  base::internal::SpinLockWake(&lockword_, false);
 }

diff --git a/src/base/spinlock.h b/src/base/spinlock.h
index 033a75e..118f541 100644
--- a/src/base/spinlock.h
+++ b/src/base/spinlock.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -63,14 +63,11 @@
   }
 
   // Acquire this SpinLock.
-  // TODO(csilvers): uncomment the annotation when we figure out how to
-  //                 support this macro with 0 args (see thread_annotations.h)
-  inline void Lock() /*EXCLUSIVE_LOCK_FUNCTION()*/ {
+  inline void Lock() EXCLUSIVE_LOCK_FUNCTION() {
     if (base::subtle::Acquire_CompareAndSwap(&lockword_, kSpinLockFree,
                                              kSpinLockHeld) != kSpinLockFree) {
       SlowLock();
     }
-    ANNOTATE_RWLOCK_ACQUIRED(this, 1);
   }
 
   // Try to acquire this SpinLock without blocking and return true if the
@@ -81,24 +78,16 @@
     bool res =
         (base::subtle::Acquire_CompareAndSwap(&lockword_, kSpinLockFree,
                                               kSpinLockHeld) == kSpinLockFree);
-    if (res) {
-      ANNOTATE_RWLOCK_ACQUIRED(this, 1);
-    }
     return res;
   }
 
   // Release this SpinLock, which must be held by the calling thread.
-  // TODO(csilvers): uncomment the annotation when we figure out how to
-  //                 support this macro with 0 args (see thread_annotations.h)
-  inline void Unlock() /*UNLOCK_FUNCTION()*/ {
-    ANNOTATE_RWLOCK_RELEASED(this, 1);
-    uint64 wait_cycles = static_cast<uint64>(
+  inline void Unlock() UNLOCK_FUNCTION() {
+    uint64 prev_value = static_cast<uint64>(
         base::subtle::Release_AtomicExchange(&lockword_, kSpinLockFree));
-    if (wait_cycles != kSpinLockHeld) {
-      // Collect contentionz profile info, and speed the wakeup of any waiter.
-      // The wait_cycles value indicates how long this thread spent waiting
-      // for the lock.
-      SlowUnlock(wait_cycles);
+    if (prev_value != kSpinLockHeld) {
+      // Speed the wakeup of any waiter.
+      SlowUnlock();
     }
   }
 
@@ -118,9 +107,8 @@
   volatile Atomic32 lockword_;
 
   void SlowLock();
-  void SlowUnlock(uint64 wait_cycles);
-  Atomic32 SpinLoop(int64 initial_wait_timestamp, Atomic32* wait_cycles);
-  inline int32 CalculateWaitCycles(int64 wait_start_time);
+  void SlowUnlock();
+  Atomic32 SpinLoop();
 
   DISALLOW_COPY_AND_ASSIGN(SpinLock);
 };
@@ -135,9 +123,7 @@
       : lock_(l) {
     l->Lock();
   }
-  // TODO(csilvers): uncomment the annotation when we figure out how to
-  //                 support this macro with 0 args (see thread_annotations.h)
-  inline ~SpinLockHolder() /*UNLOCK_FUNCTION()*/ { lock_->Unlock(); }
+  inline ~SpinLockHolder() UNLOCK_FUNCTION() { lock_->Unlock(); }
 };
 // Catch bug where variable name is omitted, e.g. SpinLockHolder (&lock);
 #define SpinLockHolder(x) COMPILE_ASSERT(0, spin_lock_decl_missing_var_name)

diff --git a/src/base/spinlock_internal.cc b/src/base/spinlock_internal.cc
index e090f9b..d962971 100644
--- a/src/base/spinlock_internal.cc
+++ b/src/base/spinlock_internal.cc

@@ -57,26 +57,6 @@
 namespace base {
 namespace internal {
 
-// See spinlock_internal.h for spec.
-int32 SpinLockWait(volatile Atomic32 *w, int n,
-                   const SpinLockWaitTransition trans[]) {
-  int32 v;
-  bool done = false;
-  for (int loop = 0; !done; loop++) {
-    v = base::subtle::Acquire_Load(w);
-    int i;
-    for (i = 0; i != n && v != trans[i].from; i++) {
-    }
-    if (i == n) {
-      SpinLockDelay(w, v, loop);     // no matching transition
-    } else if (trans[i].to == v ||   // null transition
-               base::subtle::Acquire_CompareAndSwap(w, v, trans[i].to) == v) {
-      done = trans[i].done;
-    }
-  }
-  return v;
-}
-
 // Return a suggested delay in nanoseconds for iteration number "loop"
 static int SuggestedDelayNS(int loop) {
   // Weak pseudo-random number generator to get some spread between threads

diff --git a/src/base/spinlock_internal.h b/src/base/spinlock_internal.h
index 4d3c17f..aa47e67 100644
--- a/src/base/spinlock_internal.h
+++ b/src/base/spinlock_internal.h

@@ -43,20 +43,6 @@
 namespace base {
 namespace internal {
 
-// SpinLockWait() waits until it can perform one of several transitions from
-// "from" to "to".  It returns when it performs a transition where done==true.
-struct SpinLockWaitTransition {
-  int32 from;
-  int32 to;
-  bool done;
-};
-
-// Wait until *w can transition from trans[i].from to trans[i].to for some i
-// satisfying 0<=i<n && trans[i].done, atomically make the transition,
-// then return the old value of *w.   Make any other atomic tranistions
-// where !trans[i].done, but continue waiting.
-int32 SpinLockWait(volatile Atomic32 *w, int n,
-                   const SpinLockWaitTransition trans[]);
 void SpinLockWake(volatile Atomic32 *w, bool all);
 void SpinLockDelay(volatile Atomic32 *w, int32 value, int loop);
 

diff --git a/src/base/spinlock_linux-inl.h b/src/base/spinlock_linux-inl.h
index aadf62a..ad48aae 100644
--- a/src/base/spinlock_linux-inl.h
+++ b/src/base/spinlock_linux-inl.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2009, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -33,15 +33,19 @@
  */
 
 #include <errno.h>
-#include <sched.h>
-#include <time.h>
 #include <limits.h>
-#include "base/linux_syscall_support.h"
+#include <sched.h>
+#include <sys/syscall.h>
+#include <time.h>
+#include <unistd.h>
 
 #define FUTEX_WAIT 0
 #define FUTEX_WAKE 1
 #define FUTEX_PRIVATE_FLAG 128
 
+// Note: Instead of making direct system calls that are inlined, we rely
+//       on the syscall() function in glibc to do the right thing.
+
 static bool have_futex;
 static int futex_private_flag = FUTEX_PRIVATE_FLAG;
 
@@ -51,10 +55,10 @@
     int x = 0;
     // futexes are ints, so we can use them only when
     // that's the same size as the lockword_ in SpinLock.
-    have_futex = (sizeof (Atomic32) == sizeof (int) &&
-                  sys_futex(&x, FUTEX_WAKE, 1, NULL, NULL, 0) >= 0);
-    if (have_futex &&
-        sys_futex(&x, FUTEX_WAKE | futex_private_flag, 1, NULL, NULL, 0) < 0) {
+    have_futex = (sizeof(Atomic32) == sizeof(int) &&
+                  syscall(__NR_futex, &x, FUTEX_WAKE, 1, NULL, NULL, 0) >= 0);
+    if (have_futex && syscall(__NR_futex, &x, FUTEX_WAKE | futex_private_flag,
+                              1, NULL, NULL, 0) < 0) {
       futex_private_flag = 0;
     }
   }
@@ -78,10 +82,9 @@
     }
     if (have_futex) {
       tm.tv_nsec *= 16;  // increase the delay; we expect explicit wakeups
-      sys_futex(reinterpret_cast<int *>(const_cast<Atomic32 *>(w)),
-                FUTEX_WAIT | futex_private_flag,
-                value, reinterpret_cast<struct kernel_timespec *>(&tm),
-                NULL, 0);
+      syscall(__NR_futex, reinterpret_cast<int*>(const_cast<Atomic32*>(w)),
+              FUTEX_WAIT | futex_private_flag, value,
+              reinterpret_cast<struct kernel_timespec*>(&tm), NULL, 0);
     } else {
       nanosleep(&tm, NULL);
     }
@@ -91,9 +94,8 @@
 
 void SpinLockWake(volatile Atomic32 *w, bool all) {
   if (have_futex) {
-    sys_futex(reinterpret_cast<int *>(const_cast<Atomic32 *>(w)),
-              FUTEX_WAKE | futex_private_flag, all? INT_MAX : 1,
-              NULL, NULL, 0);
+    syscall(__NR_futex, reinterpret_cast<int*>(const_cast<Atomic32*>(w)),
+            FUTEX_WAKE | futex_private_flag, all ? INT_MAX : 1, NULL, NULL, 0);
   }
 }
 

diff --git a/src/base/spinlock_posix-inl.h b/src/base/spinlock_posix-inl.h
index e73a30f..f4d217b 100644
--- a/src/base/spinlock_posix-inl.h
+++ b/src/base/spinlock_posix-inl.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2009, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/base/spinlock_win32-inl.h b/src/base/spinlock_win32-inl.h
index 956b965..05caa54 100644
--- a/src/base/spinlock_win32-inl.h
+++ b/src/base/spinlock_win32-inl.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2009, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/base/stl_allocator.h b/src/base/stl_allocator.h
index 2345f46..94debe8 100644
--- a/src/base/stl_allocator.h
+++ b/src/base/stl_allocator.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/base/synchronization_profiling.h b/src/base/synchronization_profiling.h
deleted file mode 100644
index b495034..0000000
--- a/src/base/synchronization_profiling.h
+++ /dev/null

@@ -1,51 +0,0 @@
-// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
-/* Copyright (c) 2010, Google Inc.
- * All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions are
- * met:
- *
- *     * Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above
- * copyright notice, this list of conditions and the following disclaimer
- * in the documentation and/or other materials provided with the
- * distribution.
- *     * Neither the name of Google Inc. nor the names of its
- * contributors may be used to endorse or promote products derived from
- * this software without specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- *
- * ---
- * Author: Chris Ruemmler
- */
-
-#ifndef BASE_AUXILIARY_SYNCHRONIZATION_PROFILING_H_
-#define BASE_AUXILIARY_SYNCHRONIZATION_PROFILING_H_
-
-#include "base/basictypes.h"
-
-namespace base {
-
-// We can do contention-profiling of SpinLocks, but the code is in
-// mutex.cc, which is not always linked in with spinlock.  Hence we
-// provide a weak definition, which are used if mutex.cc isn't linked in.
-
-// Submit the number of cycles the spinlock spent contending.
-ATTRIBUTE_WEAK extern void SubmitSpinLockProfileData(const void *, int64);
-extern void SubmitSpinLockProfileData(const void *contendedlock,
-                                      int64 wait_cycles) {}
-}
-#endif  // BASE_AUXILIARY_SYNCHRONIZATION_PROFILING_H_

diff --git a/src/base/sysinfo.cc b/src/base/sysinfo.cc
index cad751b..669efaf 100644
--- a/src/base/sysinfo.cc
+++ b/src/base/sysinfo.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2006, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -60,7 +60,6 @@
 #include "base/commandlineflags.h"
 #include "base/dynamic_annotations.h"   // for RunningOnValgrind
 #include "base/logging.h"
-#include "base/cycleclock.h"
 
 #ifdef PLATFORM_WINDOWS
 #ifdef MODULEENTRY32
@@ -111,6 +110,40 @@
 //    Some non-trivial getenv-related functions.
 // ----------------------------------------------------------------------
 
+// we reimplement memcmp and friends to avoid depending on any glibc
+// calls too early in the process lifetime. This allows us to use
+// GetenvBeforeMain from inside ifunc handler
+static int slow_memcmp(const void *_a, const void *_b, size_t n) {
+  const uint8_t *a = reinterpret_cast<const uint8_t *>(_a);
+  const uint8_t *b = reinterpret_cast<const uint8_t *>(_b);
+  while (n-- != 0) {
+    uint8_t ac = *a++;
+    uint8_t bc = *b++;
+    if (ac != bc) {
+      if (ac < bc) {
+        return -1;
+      }
+      return 1;
+    }
+  }
+  return 0;
+}
+
+static const char *slow_memchr(const char *s, int c, size_t n) {
+  uint8_t ch = static_cast<uint8_t>(c);
+  while (n--) {
+    if (*s++ == ch) {
+      return s - 1;
+    }
+  }
+  return 0;
+}
+
+static size_t slow_strlen(const char *s) {
+  const char *s2 = slow_memchr(s, '\0', static_cast<size_t>(-1));
+  return s2 - s;
+}
+
 // It's not safe to call getenv() in the malloc hooks, because they
 // might be called extremely early, before libc is done setting up
 // correctly.  In particular, the thread library may not be done
@@ -120,15 +153,12 @@
 // /proc/self/environ has a limit of how much data it exports (around
 // 8K), so it's not an ideal solution.
 const char* GetenvBeforeMain(const char* name) {
+  const int namelen = slow_strlen(name);
 #if defined(HAVE___ENVIRON)   // if we have it, it's declared in unistd.h
   if (__environ) {            // can exist but be NULL, if statically linked
-    const int namelen = strlen(name);
     for (char** p = __environ; *p; p++) {
-      if (strlen(*p) < namelen) {
-        continue;
-      }
-      if (!memcmp(*p, name, namelen) && (*p)[namelen] == '=')  // it's a match
-        return *p + namelen+1;                                 // point after =
+      if (!slow_memcmp(*p, name, namelen) && (*p)[namelen] == '=')
+        return *p + namelen+1;
     }
     return NULL;
   }
@@ -156,14 +186,14 @@
     }
     safeclose(fd);
   }
-  const int namelen = strlen(name);
   const char* p = envbuf;
   while (*p != '\0') {    // will happen at the \0\0 that terminates the buffer
     // proc file has the format NAME=value\0NAME=value\0NAME=value\0...
-    const char* endp = (char*)memchr(p, '\0', sizeof(envbuf) - (p - envbuf));
+    const char* endp = (char*)slow_memchr(p, '\0',
+                                          sizeof(envbuf) - (p - envbuf));
     if (endp == NULL)            // this entry isn't NUL terminated
       return NULL;
-    else if (!memcmp(p, name, namelen) && p[namelen] == '=')    // it's a match
+    else if (!slow_memcmp(p, name, namelen) && p[namelen] == '=')    // it's a match
       return p + namelen+1;      // point after =
     p = endp + 1;
   }
@@ -212,327 +242,22 @@
   return true;
 }
 
-// ----------------------------------------------------------------------
-// CyclesPerSecond()
-// NumCPUs()
-//    It's important this not call malloc! -- they may be called at
-//    global-construct time, before we've set up all our proper malloc
-//    hooks and such.
-// ----------------------------------------------------------------------
-
-static double cpuinfo_cycles_per_second = 1.0;  // 0.0 might be dangerous
-static int cpuinfo_num_cpus = 1;  // Conservative guess
-
-void SleepForMilliseconds(int milliseconds) {
-#ifdef PLATFORM_WINDOWS
-  _sleep(milliseconds);   // Windows's _sleep takes milliseconds argument
-#else
-  // Sleep for a few milliseconds
-  struct timespec sleep_time;
-  sleep_time.tv_sec = milliseconds / 1000;
-  sleep_time.tv_nsec = (milliseconds % 1000) * 1000000;
-  while (nanosleep(&sleep_time, &sleep_time) != 0 && errno == EINTR)
-    ;  // Ignore signals and wait for the full interval to elapse.
-#endif
-}
-
-// Helper function estimates cycles/sec by observing cycles elapsed during
-// sleep(). Using small sleep time decreases accuracy significantly.
-static int64 EstimateCyclesPerSecond(const int estimate_time_ms) {
-  assert(estimate_time_ms > 0);
-  if (estimate_time_ms <= 0)
-    return 1;
-  double multiplier = 1000.0 / (double)estimate_time_ms;  // scale by this much
-
-  const int64 start_ticks = CycleClock::Now();
-  SleepForMilliseconds(estimate_time_ms);
-  const int64 guess = int64(multiplier * (CycleClock::Now() - start_ticks));
-  return guess;
-}
-
-// ReadIntFromFile is only called on linux and cygwin platforms.
-#if defined(__linux__) || defined(__CYGWIN__) || defined(__CYGWIN32__)
-// Helper function for reading an int from a file. Returns true if successful
-// and the memory location pointed to by value is set to the value read.
-static bool ReadIntFromFile(const char *file, int *value) {
-  bool ret = false;
-  int fd = open(file, O_RDONLY);
-  if (fd != -1) {
-    char line[1024];
-    char* err;
-    memset(line, '\0', sizeof(line));
-    read(fd, line, sizeof(line) - 1);
-    const int temp_value = strtol(line, &err, 10);
-    if (line[0] != '\0' && (*err == '\n' || *err == '\0')) {
-      *value = temp_value;
-      ret = true;
-    }
-    close(fd);
-  }
-  return ret;
-}
-#endif
-
-// WARNING: logging calls back to InitializeSystemInfo() so it must
-// not invoke any logging code.  Also, InitializeSystemInfo() can be
-// called before main() -- in fact it *must* be since already_called
-// isn't protected -- before malloc hooks are properly set up, so
-// we make an effort not to call any routines which might allocate
-// memory.
-
-static void InitializeSystemInfo() {
-  static bool already_called = false;   // safe if we run before threads
-  if (already_called)  return;
-  already_called = true;
-
-  bool saw_mhz = false;
-
-  if (RunningOnValgrind()) {
-    // Valgrind may slow the progress of time artificially (--scale-time=N
-    // option). We thus can't rely on CPU Mhz info stored in /sys or /proc
-    // files. Thus, actually measure the cps.
-    cpuinfo_cycles_per_second = EstimateCyclesPerSecond(100);
-    saw_mhz = true;
-  }
-
-#if defined(__linux__) || defined(__CYGWIN__) || defined(__CYGWIN32__)
-  char line[1024];
-  char* err;
-  int freq;
-
-  // If the kernel is exporting the tsc frequency use that. There are issues
-  // where cpuinfo_max_freq cannot be relied on because the BIOS may be
-  // exporintg an invalid p-state (on x86) or p-states may be used to put the
-  // processor in a new mode (turbo mode). Essentially, those frequencies
-  // cannot always be relied upon. The same reasons apply to /proc/cpuinfo as
-  // well.
-  if (!saw_mhz &&
-      ReadIntFromFile("/sys/devices/system/cpu/cpu0/tsc_freq_khz", &freq)) {
-      // The value is in kHz (as the file name suggests).  For example, on a
-      // 2GHz warpstation, the file contains the value "2000000".
-      cpuinfo_cycles_per_second = freq * 1000.0;
-      saw_mhz = true;
-  }
-
-  // If CPU scaling is in effect, we want to use the *maximum* frequency,
-  // not whatever CPU speed some random processor happens to be using now.
-  if (!saw_mhz &&
-      ReadIntFromFile("/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq",
-                      &freq)) {
-    // The value is in kHz.  For example, on a 2GHz machine, the file
-    // contains the value "2000000".
-    cpuinfo_cycles_per_second = freq * 1000.0;
-    saw_mhz = true;
-  }
-
-  // Read /proc/cpuinfo for other values, and if there is no cpuinfo_max_freq.
-  const char* pname = "/proc/cpuinfo";
-  int fd = open(pname, O_RDONLY);
-  if (fd == -1) {
-    perror(pname);
-    if (!saw_mhz) {
-      cpuinfo_cycles_per_second = EstimateCyclesPerSecond(1000);
-    }
-    return;          // TODO: use generic tester instead?
-  }
-
-  double bogo_clock = 1.0;
-  bool saw_bogo = false;
-  int num_cpus = 0;
-  line[0] = line[1] = '\0';
-  int chars_read = 0;
-  do {   // we'll exit when the last read didn't read anything
-    // Move the next line to the beginning of the buffer
-    const int oldlinelen = strlen(line);
-    if (sizeof(line) == oldlinelen + 1)    // oldlinelen took up entire line
-      line[0] = '\0';
-    else                                   // still other lines left to save
-      memmove(line, line + oldlinelen+1, sizeof(line) - (oldlinelen+1));
-    // Terminate the new line, reading more if we can't find the newline
-    char* newline = strchr(line, '\n');
-    if (newline == NULL) {
-      const int linelen = strlen(line);
-      const int bytes_to_read = sizeof(line)-1 - linelen;
-      assert(bytes_to_read > 0);  // because the memmove recovered >=1 bytes
-      chars_read = read(fd, line + linelen, bytes_to_read);
-      line[linelen + chars_read] = '\0';
-      newline = strchr(line, '\n');
-    }
-    if (newline != NULL)
-      *newline = '\0';
-
-#if defined(__powerpc__) || defined(__ppc__)
-    // PowerPC cpus report the frequency in "clock" line
-    if (strncasecmp(line, "clock", sizeof("clock")-1) == 0) {
-      const char* freqstr = strchr(line, ':');
-      if (freqstr) {
-	// PowerPC frequencies are only reported as MHz (check 'show_cpuinfo'
-	// function at arch/powerpc/kernel/setup-common.c)
-	char *endp = strstr(line, "MHz");
-	if (endp) {
-	  *endp = 0;
-	  cpuinfo_cycles_per_second = strtod(freqstr+1, &err) * 1000000.0;
-          if (freqstr[1] != '\0' && *err == '\0' && cpuinfo_cycles_per_second > 0)
-            saw_mhz = true;
-	}
-      }
-#else
-    // When parsing the "cpu MHz" and "bogomips" (fallback) entries, we only
-    // accept postive values. Some environments (virtual machines) report zero,
-    // which would cause infinite looping in WallTime_Init.
-    if (!saw_mhz && strncasecmp(line, "cpu MHz", sizeof("cpu MHz")-1) == 0) {
-      const char* freqstr = strchr(line, ':');
-      if (freqstr) {
-        cpuinfo_cycles_per_second = strtod(freqstr+1, &err) * 1000000.0;
-        if (freqstr[1] != '\0' && *err == '\0' && cpuinfo_cycles_per_second > 0)
-          saw_mhz = true;
-      }
-    } else if (strncasecmp(line, "bogomips", sizeof("bogomips")-1) == 0) {
-      const char* freqstr = strchr(line, ':');
-      if (freqstr) {
-        bogo_clock = strtod(freqstr+1, &err) * 1000000.0;
-        if (freqstr[1] != '\0' && *err == '\0' && bogo_clock > 0)
-          saw_bogo = true;
-      }
-#endif
-    } else if (strncasecmp(line, "processor", sizeof("processor")-1) == 0) {
-      num_cpus++;  // count up every time we see an "processor :" entry
-    }
-  } while (chars_read > 0);
-  close(fd);
-
-  if (!saw_mhz) {
-    if (saw_bogo) {
-      // If we didn't find anything better, we'll use bogomips, but
-      // we're not happy about it.
-      cpuinfo_cycles_per_second = bogo_clock;
-    } else {
-      // If we don't even have bogomips, we'll use the slow estimation.
-      cpuinfo_cycles_per_second = EstimateCyclesPerSecond(1000);
-    }
-  }
-  if (cpuinfo_cycles_per_second == 0.0) {
-    cpuinfo_cycles_per_second = 1.0;   // maybe unnecessary, but safe
-  }
-  if (num_cpus > 0) {
-    cpuinfo_num_cpus = num_cpus;
-  }
-
-#elif defined __FreeBSD__
-  // For this sysctl to work, the machine must be configured without
-  // SMP, APIC, or APM support.  hz should be 64-bit in freebsd 7.0
-  // and later.  Before that, it's a 32-bit quantity (and gives the
-  // wrong answer on machines faster than 2^32 Hz).  See
-  //  http://lists.freebsd.org/pipermail/freebsd-i386/2004-November/001846.html
-  // But also compare FreeBSD 7.0:
-  //  http://fxr.watson.org/fxr/source/i386/i386/tsc.c?v=RELENG70#L223
-  //  231         error = sysctl_handle_quad(oidp, &freq, 0, req);
-  // To FreeBSD 6.3 (it's the same in 6-STABLE):
-  //  http://fxr.watson.org/fxr/source/i386/i386/tsc.c?v=RELENG6#L131
-  //  139         error = sysctl_handle_int(oidp, &freq, sizeof(freq), req);
-#if __FreeBSD__ >= 7
-  uint64_t hz = 0;
-#else
-  unsigned int hz = 0;
-#endif
-  size_t sz = sizeof(hz);
-  const char *sysctl_path = "machdep.tsc_freq";
-  if ( sysctlbyname(sysctl_path, &hz, &sz, NULL, 0) != 0 ) {
-    fprintf(stderr, "Unable to determine clock rate from sysctl: %s: %s\n",
-            sysctl_path, strerror(errno));
-    cpuinfo_cycles_per_second = EstimateCyclesPerSecond(1000);
-  } else {
-    cpuinfo_cycles_per_second = hz;
-  }
-  // TODO(csilvers): also figure out cpuinfo_num_cpus
-
-#elif defined(PLATFORM_WINDOWS)
-# pragma comment(lib, "shlwapi.lib")  // for SHGetValue()
-  // In NT, read MHz from the registry. If we fail to do so or we're in win9x
-  // then make a crude estimate.
-  OSVERSIONINFO os;
-  os.dwOSVersionInfoSize = sizeof(os);
-  DWORD data, data_size = sizeof(data);
-  if (GetVersionEx(&os) &&
-      os.dwPlatformId == VER_PLATFORM_WIN32_NT &&
-      SUCCEEDED(SHGetValueA(HKEY_LOCAL_MACHINE,
-                         "HARDWARE\\DESCRIPTION\\System\\CentralProcessor\\0",
-                           "~MHz", NULL, &data, &data_size)))
-    cpuinfo_cycles_per_second = (int64)data * (int64)(1000 * 1000); // was mhz
-  else
-    cpuinfo_cycles_per_second = EstimateCyclesPerSecond(500); // TODO <500?
-
+int GetSystemCPUsCount()
+{
+#if defined(PLATFORM_WINDOWS)
   // Get the number of processors.
   SYSTEM_INFO info;
   GetSystemInfo(&info);
-  cpuinfo_num_cpus = info.dwNumberOfProcessors;
-
-#elif defined(__MACH__) && defined(__APPLE__)
-  // returning "mach time units" per second. the current number of elapsed
-  // mach time units can be found by calling uint64 mach_absolute_time();
-  // while not as precise as actual CPU cycles, it is accurate in the face
-  // of CPU frequency scaling and multi-cpu/core machines.
-  // Our mac users have these types of machines, and accuracy
-  // (i.e. correctness) trumps precision.
-  // See cycleclock.h: CycleClock::Now(), which returns number of mach time
-  // units on Mac OS X.
-  mach_timebase_info_data_t timebase_info;
-  mach_timebase_info(&timebase_info);
-  double mach_time_units_per_nanosecond =
-      static_cast<double>(timebase_info.denom) /
-      static_cast<double>(timebase_info.numer);
-  cpuinfo_cycles_per_second = mach_time_units_per_nanosecond * 1e9;
-
-  int num_cpus = 0;
-  size_t size = sizeof(num_cpus);
-  int numcpus_name[] = { CTL_HW, HW_NCPU };
-  if (::sysctl(numcpus_name, arraysize(numcpus_name), &num_cpus, &size, 0, 0)
-      == 0
-      && (size == sizeof(num_cpus)))
-    cpuinfo_num_cpus = num_cpus;
-
+  return  info.dwNumberOfProcessors;
 #else
-  // Generic cycles per second counter
-  cpuinfo_cycles_per_second = EstimateCyclesPerSecond(1000);
+  long rv = sysconf(_SC_NPROCESSORS_ONLN);
+  if (rv < 0) {
+    return 1;
+  }
+  return static_cast<int>(rv);
 #endif
 }
 
-double CyclesPerSecond(void) {
-  InitializeSystemInfo();
-  return cpuinfo_cycles_per_second;
-}
-
-int NumCPUs(void) {
-  InitializeSystemInfo();
-  return cpuinfo_num_cpus;
-}
-
-// ----------------------------------------------------------------------
-// HasPosixThreads()
-//      Return true if we're running POSIX (e.g., NPTL on Linux)
-//      threads, as opposed to a non-POSIX thread library.  The thing
-//      that we care about is whether a thread's pid is the same as
-//      the thread that spawned it.  If so, this function returns
-//      true.
-// ----------------------------------------------------------------------
-bool HasPosixThreads() {
-#if defined(__linux__)
-#ifndef _CS_GNU_LIBPTHREAD_VERSION
-#define _CS_GNU_LIBPTHREAD_VERSION 3
-#endif
-  char buf[32];
-  //  We assume that, if confstr() doesn't know about this name, then
-  //  the same glibc is providing LinuxThreads.
-  if (confstr(_CS_GNU_LIBPTHREAD_VERSION, buf, sizeof(buf)) == 0)
-    return false;
-  return strncmp(buf, "NPTL", 4) == 0;
-#elif defined(PLATFORM_WINDOWS) || defined(__CYGWIN__) || defined(__CYGWIN32__)
-  return false;
-#else  // other OS
-  return true;      //  Assume that everything else has Posix
-#endif  // else OS_LINUX
-}
-
 // ----------------------------------------------------------------------
 
 #if defined __linux__ || defined __FreeBSD__ || defined __sun__ || defined __CYGWIN__ || defined __CYGWIN32__

diff --git a/src/base/sysinfo.h b/src/base/sysinfo.h
index cc5cb74..77d956e 100644
--- a/src/base/sysinfo.h
+++ b/src/base/sysinfo.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2006, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -55,7 +55,7 @@
 // routines that run before main(), when the state required for getenv() may
 // not be set up yet.  In particular, errno isn't set up until relatively late
 // (after the pthreads library has a chance to make it threadsafe), and
-// getenv() doesn't work until then. 
+// getenv() doesn't work until then.
 // On some platforms, this call will utilize the same, static buffer for
 // repeated GetenvBeforeMain() calls. Callers should not expect pointers from
 // this routine to be long lived.
@@ -70,13 +70,7 @@
 // reasons, as documented in sysinfo.cc.  path must have space PATH_MAX.
 extern bool GetUniquePathFromEnv(const char* env_name, char* path);
 
-extern int NumCPUs();
-
-void SleepForMilliseconds(int milliseconds);
-
-// processor cycles per second of each processor.  Thread-safe.
-extern double CyclesPerSecond(void);
-
+extern int GetSystemCPUsCount();
 
 //  Return true if we're running POSIX (e.g., NPTL on Linux) threads,
 //  as opposed to a non-POSIX thread library.  The thing that we care

diff --git a/src/base/thread_annotations.h b/src/base/thread_annotations.h
index f57b299..b68e756 100644
--- a/src/base/thread_annotations.h
+++ b/src/base/thread_annotations.h

@@ -1,10 +1,11 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -14,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -35,8 +36,8 @@
 // of their multi-threaded code. The annotations can also help program
 // analysis tools to identify potential thread safety issues.
 //
-// The annotations are implemented using GCC's "attributes" extension.
-// Using the macros defined here instead of the raw GCC attributes allows
+// The annotations are implemented using clang's "attributes" extension.
+// Using the macros defined here instead of the raw clang attributes allows
 // for portability and future compatibility.
 //
 // This functionality is not yet fully implemented in perftools,
@@ -46,9 +47,7 @@
 #define BASE_THREAD_ANNOTATIONS_H_
 
 
-#if defined(__GNUC__) \
-  && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 4)) \
-  && defined(__SUPPORT_TS_ANNOTATION__) && (!defined(SWIG))
+#if defined(__clang__)
 #define THREAD_ANNOTATION_ATTRIBUTE__(x)   __attribute__((x))
 #else
 #define THREAD_ANNOTATION_ATTRIBUTE__(x)   // no-op
@@ -113,19 +112,19 @@
 
 // The following annotations specify lock and unlock primitives.
 #define EXCLUSIVE_LOCK_FUNCTION(x) \
-  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_lock(x))
+  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_lock_function(x))
 
 #define SHARED_LOCK_FUNCTION(x) \
-  THREAD_ANNOTATION_ATTRIBUTE__(shared_lock(x))
+  THREAD_ANNOTATION_ATTRIBUTE__(shared_lock_function(x))
 
 #define EXCLUSIVE_TRYLOCK_FUNCTION(x) \
-  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_trylock(x))
+  THREAD_ANNOTATION_ATTRIBUTE__(exclusive_trylock_function(x))
 
 #define SHARED_TRYLOCK_FUNCTION(x) \
-  THREAD_ANNOTATION_ATTRIBUTE__(shared_trylock(x))
+  THREAD_ANNOTATION_ATTRIBUTE__(shared_trylock_function(x))
 
 #define UNLOCK_FUNCTION(x) \
-  THREAD_ANNOTATION_ATTRIBUTE__(unlock(x))
+  THREAD_ANNOTATION_ATTRIBUTE__(unlock_function(x))
 
 // An escape hatch for thread safety analysis to ignore the annotated function.
 #define NO_THREAD_SAFETY_ANALYSIS \

diff --git a/src/base/thread_lister.c b/src/base/thread_lister.c
index ca1b2de..2d7beb3 100644
--- a/src/base/thread_lister.c
+++ b/src/base/thread_lister.c

@@ -1,3 +1,4 @@
+/* -*- Mode: c; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2005-2007, Google Inc.
  * All rights reserved.
  *
@@ -32,11 +33,17 @@
  */
 
 #include "config.h"
+
+#include "base/thread_lister.h"
+
 #include <stdio.h>         /* needed for NULL on some powerpc platforms (?!) */
+#include <sys/types.h>
+#include <unistd.h>        /* for getpid */
+
 #ifdef HAVE_SYS_PRCTL
 # include <sys/prctl.h>
 #endif
-#include "base/thread_lister.h"
+
 #include "base/linuxthreads.h"
 /* Include other thread listers here that define THREADS macro
  * only when they can provide a good implementation.

diff --git a/src/base/vdso_support.cc b/src/base/vdso_support.cc
index 730df30..e4805e9 100644
--- a/src/base/vdso_support.cc
+++ b/src/base/vdso_support.cc

@@ -1,10 +1,11 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -14,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -43,7 +44,6 @@
 #include <stddef.h>   // for ptrdiff_t
 
 #include "base/atomicops.h"  // for MemoryBarrier
-#include "base/linux_syscall_support.h"
 #include "base/logging.h"
 #include "base/dynamic_annotations.h"
 #include "base/basictypes.h"  // for COMPILE_ASSERT

diff --git a/src/base/vdso_support.h b/src/base/vdso_support.h
index c1209a4..073386c 100644
--- a/src/base/vdso_support.h
+++ b/src/base/vdso_support.h

@@ -1,3 +1,4 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
 //
@@ -61,7 +62,11 @@
 
 #ifdef HAVE_ELF_MEM_IMAGE
 
+// Enable VDSO support only for the architectures/operating systems that
+// support it.
+#if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
 #define HAVE_VDSO_SUPPORT 1
+#endif
 
 #include <stdlib.h>     // for NULL
 

diff --git a/src/central_freelist.cc b/src/central_freelist.cc
index 11b190d..d064c2f 100644
--- a/src/central_freelist.cc
+++ b/src/central_freelist.cc

@@ -110,7 +110,6 @@
   if (span->objects == NULL) {
     tcmalloc::DLL_Remove(span);
     tcmalloc::DLL_Prepend(&nonempty_, span);
-    Event(span, 'N', 0);
   }
 
   // The following check is expensive, so it is disabled by default
@@ -129,7 +128,6 @@
   counter_++;
   span->refcount--;
   if (span->refcount == 0) {
-    Event(span, '#', 0);
     counter_ -= ((span->length<<kPageShift) /
                  Static::sizemap()->ByteSizeForClass(span->sizeclass));
     tcmalloc::DLL_Remove(span);
@@ -152,14 +150,14 @@
     int locked_size_class, bool force) {
   static int race_counter = 0;
   int t = race_counter++;  // Updated without a lock, but who cares.
-  if (t >= kNumClasses) {
-    while (t >= kNumClasses) {
-      t -= kNumClasses;
+  if (t >= Static::num_size_classes()) {
+    while (t >= Static::num_size_classes()) {
+      t -= Static::num_size_classes();
     }
     race_counter = t;
   }
   ASSERT(t >= 0);
-  ASSERT(t < kNumClasses);
+  ASSERT(t < Static::num_size_classes());
   if (t == locked_size_class) return false;
   return Static::central_cache()[t].ShrinkCache(locked_size_class, force);
 }
@@ -305,7 +303,6 @@
     // Move to empty list
     tcmalloc::DLL_Remove(span);
     tcmalloc::DLL_Prepend(&empty_, span);
-    Event(span, 'E', 0);
   }
 
   *start = span->objects;
@@ -340,7 +337,7 @@
   // (Instead of being eager, we could just replace any stale info
   // about this span, but that seems to be no better in practice.)
   for (int i = 0; i < npages; i++) {
-    Static::pageheap()->CacheSizeClass(span->start + i, size_class_);
+    Static::pageheap()->SetCachedSizeClass(span->start + i, size_class_);
   }
 
   // Split the block into pieces and add to the free-list

diff --git a/src/central_freelist.h b/src/central_freelist.h
index 4148680..0f66e0c 100644
--- a/src/central_freelist.h
+++ b/src/central_freelist.h

@@ -82,11 +82,11 @@
 
   // Lock/Unlock the internal SpinLock. Used on the pthread_atfork call
   // to set the lock in a consistent state before the fork.
-  void Lock() {
+  void Lock() EXCLUSIVE_LOCK_FUNCTION(lock_) {
     lock_.Lock();
   }
 
-  void Unlock() {
+  void Unlock() UNLOCK_FUNCTION(lock_) {
     lock_.Unlock();
   }
 

diff --git a/src/common.cc b/src/common.cc
index 3b66afe..5e9e11e 100644
--- a/src/common.cc
+++ b/src/common.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -44,14 +44,14 @@
 // thread and central caches.
 static int32 FLAGS_tcmalloc_transfer_num_objects;
 
-static const int32 kDefaultTransferNumObjecs = 32768;
+static const int32 kDefaultTransferNumObjecs = 32;
 
 // The init function is provided to explicit initialize the variable value
 // from the env. var to avoid C++ global construction that might defer its
 // initialization after a malloc/new call.
 static inline void InitTCMallocTransferNumObjects()
 {
-  if (UNLIKELY(FLAGS_tcmalloc_transfer_num_objects == 0)) {
+  if (FLAGS_tcmalloc_transfer_num_objects == 0) {
     const char *envval = TCMallocGetenvSafe("TCMALLOC_TRANSFER_NUM_OBJ");
     FLAGS_tcmalloc_transfer_num_objects = !envval ? kDefaultTransferNumObjecs :
       strtol(envval, NULL, 10);
@@ -173,14 +173,15 @@
     class_to_size_[sc] = size;
     sc++;
   }
-  if (sc != kNumClasses) {
+  num_size_classes = sc;
+  if (sc > kClassSizesMax) {
     Log(kCrash, __FILE__, __LINE__,
-        "wrong number of size classes: (found vs. expected )", sc, kNumClasses);
+        "too many size classes: (found vs. max)", sc, kClassSizesMax);
   }
 
   // Initialize the mapping arrays
   int next_size = 0;
-  for (int c = 1; c < kNumClasses; c++) {
+  for (int c = 1; c < num_size_classes; c++) {
     const int max_size_in_class = class_to_size_[c];
     for (int s = next_size; s <= max_size_in_class; s += kAlignment) {
       class_array_[ClassIndex(s)] = c;
@@ -191,7 +192,7 @@
   // Double-check sizes just to be safe
   for (size_t size = 0; size <= kMaxSize;) {
     const int sc = SizeClass(size);
-    if (sc <= 0 || sc >= kNumClasses) {
+    if (sc <= 0 || sc >= num_size_classes) {
       Log(kCrash, __FILE__, __LINE__,
           "Bad size class (class, size)", sc, size);
     }
@@ -211,8 +212,23 @@
     }
   }
 
+  // Our fast-path aligned allocation functions rely on 'naturally
+  // aligned' sizes to produce aligned addresses. Lets check if that
+  // holds for size classes that we produced.
+  //
+  // I.e. we're checking that
+  //
+  // align = (1 << shift), malloc(i * align) % align == 0,
+  //
+  // for all align values up to kPageSize.
+  for (size_t align = kMinAlign; align <= kPageSize; align <<= 1) {
+    for (size_t size = align; size < kPageSize; size += align) {
+      CHECK_CONDITION(class_to_size_[SizeClass(size)] % align == 0);
+    }
+  }
+
   // Initialize the num_objects_to_move array.
-  for (size_t cl = 1; cl  < kNumClasses; ++cl) {
+  for (size_t cl = 1; cl  < num_size_classes; ++cl) {
     num_objects_to_move_[cl] = NumMoveSize(ByteSizeForClass(cl));
   }
 }
@@ -220,10 +236,9 @@
 // Metadata allocator -- keeps stats about how many bytes allocated.
 static uint64_t metadata_system_bytes_ = 0;
 static const size_t kMetadataAllocChunkSize = 8*1024*1024;
-static const size_t kMetadataBigAllocThreshold = kMetadataAllocChunkSize / 8;
-// usually malloc uses larger alignments, but because metadata cannot
-// have and fancy simd types, aligning on pointer size seems fine
-static const size_t kMetadataAllignment = sizeof(void *);
+// As ThreadCache objects are allocated with MetaDataAlloc, and also
+// CACHELINE_ALIGNED, we must use the same alignment as TCMalloc_SystemAlloc.
+static const size_t kMetadataAllignment = sizeof(MemoryAligner);
 
 static char *metadata_chunk_alloc_;
 static size_t metadata_chunk_avail_;

diff --git a/src/common.h b/src/common.h
index c3484d3..caa3e4a 100644
--- a/src/common.h
+++ b/src/common.h

@@ -44,14 +44,6 @@
 #include "internal_logging.h"  // for ASSERT, etc
 #include "base/basictypes.h"   // for LIKELY, etc
 
-#ifdef HAVE_BUILTIN_EXPECT
-#define LIKELY(x) __builtin_expect(!!(x), 1)
-#define UNLIKELY(x) __builtin_expect(!!(x), 0)
-#else
-#define LIKELY(x) (x)
-#define UNLIKELY(x) (x)
-#endif
-
 // Type that can hold a page number
 typedef uintptr_t PageID;
 
@@ -68,11 +60,8 @@
 // Keep in mind when using the 16 bytes alignment you can have a space
 // waste due alignment of 25%. (eg malloc of 24 bytes will get 32 bytes)
 static const size_t kMinAlign   = 8;
-// Number of classes created until reach page size 128.
-static const size_t kBaseClasses = 16;
 #else
 static const size_t kMinAlign   = 16;
-static const size_t kBaseClasses = 9;
 #endif
 
 // Using large pages speeds up the execution at a cost of larger memory use.
@@ -83,24 +72,20 @@
 // the thread cache allowance to avoid passing more free ranges to and from
 // central lists.  Also, larger pages are less likely to get freed.
 // These two factors cause a bounded increase in memory use.
-#if defined(TCMALLOC_32K_PAGES)
-static const size_t kPageShift  = 15;
-static const size_t kNumClasses = kBaseClasses + 69;
-#elif defined(TCMALLOC_64K_PAGES)
-static const size_t kPageShift  = 16;
-static const size_t kNumClasses = kBaseClasses + 73;
+#if defined(TCMALLOC_PAGE_SIZE_SHIFT)
+static const size_t kPageShift  = TCMALLOC_PAGE_SIZE_SHIFT;
 #else
 static const size_t kPageShift  = 13;
-static const size_t kNumClasses = kBaseClasses + 79;
 #endif
 
+static const size_t kClassSizesMax = 128;
+
 static const size_t kMaxThreadCacheSize = 4 << 20;
 
 static const size_t kPageSize   = 1 << kPageShift;
 static const size_t kMaxSize    = 256 * 1024;
 static const size_t kAlignment  = 8;
-static const size_t kLargeSizeClass = 0;
-// For all span-lengths < kMaxPages we keep an exact-size list.
+// For all span-lengths <= kMaxPages we keep an exact-size list in PageHeap.
 static const size_t kMaxPages = 1 << (20 - kPageShift);
 
 // Default bound on the total amount of thread caches.
@@ -133,14 +118,27 @@
 
 static const Length kMaxValidPages = (~static_cast<Length>(0)) >> kPageShift;
 
-#if defined __x86_64__
-// All current and planned x86_64 processors only look at the lower 48 bits
-// in virtual to physical address translation.  The top 16 are thus unused.
-// TODO(rus): Under what operating systems can we increase it safely to 17?
-// This lets us use smaller page maps.  On first allocation, a 36-bit page map
-// uses only 96 KB instead of the 4.5 MB used by a 52-bit page map.
+#if __aarch64__ || __x86_64__ || _M_AMD64 || _M_ARM64
+// All current x86_64 processors only look at the lower 48 bits in
+// virtual to physical address translation. The top 16 are all same as
+// bit 47. And bit 47 value 1 reserved for kernel-space addresses in
+// practice. So it is actually 47 usable bits from malloc
+// perspective. This lets us use faster two level page maps on this
+// architecture.
+//
+// There is very similar story on 64-bit arms except it has full 48
+// bits for user-space. Because of that, and because in principle OSes
+// can start giving some of highest-bit-set addresses to user-space,
+// we don't bother to limit x86 to 47 bits.
+//
+// As of now there are published plans to add more bits to x86-64
+// virtual address space, but since 48 bits has been norm for long
+// time and lots of software is relying on it, it will be opt-in from
+// OS perspective. So we can keep doing "48 bits" at least for now.
 static const int kAddressBits = (sizeof(void*) < 8 ? (8 * sizeof(void*)) : 48);
 #else
+// mipsen and ppcs have more general hardware so we have to support
+// full 64-bits of addresses.
 static const int kAddressBits = 8 * sizeof(void*);
 #endif
 
@@ -160,13 +158,6 @@
 // Size-class information + mapping
 class SizeMap {
  private:
-  // Number of objects to move between a per-thread list and a central
-  // list in one shot.  We want this to be not too small so we can
-  // amortize the lock overhead for accessing the central list.  Making
-  // it too big may temporarily cause unnecessary memory wastage in the
-  // per-thread free list until the scavenger cleans up the list.
-  int num_objects_to_move_[kNumClasses];
-
   //-------------------------------------------------------------------
   // Mapping from size to size_class and vice versa
   //-------------------------------------------------------------------
@@ -194,27 +185,59 @@
       ((kMaxSize + 127 + (120 << 7)) >> 7) + 1;
   unsigned char class_array_[kClassArraySize];
 
+  static inline size_t SmallSizeClass(size_t s) {
+    return (static_cast<uint32_t>(s) + 7) >> 3;
+  }
+
+  static inline size_t LargeSizeClass(size_t s) {
+    return (static_cast<uint32_t>(s) + 127 + (120 << 7)) >> 7;
+  }
+
+  // If size is no more than kMaxSize, compute index of the
+  // class_array[] entry for it, putting the class index in output
+  // parameter idx and returning true. Otherwise return false.
+  static inline bool ATTRIBUTE_ALWAYS_INLINE ClassIndexMaybe(size_t s,
+                                                             uint32* idx) {
+    if (PREDICT_TRUE(s <= kMaxSmallSize)) {
+      *idx = (static_cast<uint32>(s) + 7) >> 3;
+      return true;
+    } else if (s <= kMaxSize) {
+      *idx = (static_cast<uint32>(s) + 127 + (120 << 7)) >> 7;
+      return true;
+    }
+    return false;
+  }
+
   // Compute index of the class_array[] entry for a given size
-  static inline size_t ClassIndex(int s) {
+  static inline size_t ClassIndex(size_t s) {
     // Use unsigned arithmetic to avoid unnecessary sign extensions.
     ASSERT(0 <= s);
     ASSERT(s <= kMaxSize);
-    if (LIKELY(s <= kMaxSmallSize)) {
-      return (static_cast<uint32_t>(s) + 7) >> 3;
+    if (PREDICT_TRUE(s <= kMaxSmallSize)) {
+      return SmallSizeClass(s);
     } else {
-      return (static_cast<uint32_t>(s) + 127 + (120 << 7)) >> 7;
+      return LargeSizeClass(s);
     }
   }
 
+  // Number of objects to move between a per-thread list and a central
+  // list in one shot.  We want this to be not too small so we can
+  // amortize the lock overhead for accessing the central list.  Making
+  // it too big may temporarily cause unnecessary memory wastage in the
+  // per-thread free list until the scavenger cleans up the list.
+  int num_objects_to_move_[kClassSizesMax];
+
   int NumMoveSize(size_t size);
 
   // Mapping from size class to max size storable in that class
-  size_t class_to_size_[kNumClasses];
+  int32 class_to_size_[kClassSizesMax];
 
   // Mapping from size class to number of pages to allocate at a time
-  size_t class_to_pages_[kNumClasses];
+  size_t class_to_pages_[kClassSizesMax];
 
  public:
+  size_t num_size_classes;
+
   // Constructor should do nothing since we rely on explicit Init()
   // call, which may or may not be called before the constructor runs.
   SizeMap() { }
@@ -222,22 +245,34 @@
   // Initialize the mapping arrays
   void Init();
 
-  inline int SizeClass(int size) {
+  inline int SizeClass(size_t size) {
     return class_array_[ClassIndex(size)];
   }
 
+  // Check if size is small enough to be representable by a size
+  // class, and if it is, put matching size class into *cl. Returns
+  // true iff matching size class was found.
+  inline bool ATTRIBUTE_ALWAYS_INLINE GetSizeClass(size_t size, uint32* cl) {
+    uint32 idx;
+    if (!ClassIndexMaybe(size, &idx)) {
+      return false;
+    }
+    *cl = class_array_[idx];
+    return true;
+  }
+
   // Get the byte-size for a specified class
-  inline size_t ByteSizeForClass(size_t cl) {
+  inline int32 ATTRIBUTE_ALWAYS_INLINE ByteSizeForClass(uint32 cl) {
     return class_to_size_[cl];
   }
 
   // Mapping from size class to max size storable in that class
-  inline size_t class_to_size(size_t cl) {
+  inline int32 class_to_size(uint32 cl) {
     return class_to_size_[cl];
   }
 
   // Mapping from size class to number of pages to allocate at a time
-  inline size_t class_to_pages(size_t cl) {
+  inline size_t class_to_pages(uint32 cl) {
     return class_to_pages_[cl];
   }
 
@@ -246,7 +281,7 @@
   // amortize the lock overhead for accessing the central list.  Making
   // it too big may temporarily cause unnecessary memory wastage in the
   // per-thread free list until the scavenger cleans up the list.
-  inline int num_objects_to_move(size_t cl) {
+  inline int num_objects_to_move(uint32 cl) {
     return num_objects_to_move_[cl];
   }
 };

diff --git a/src/config_for_unittests.h b/src/config_for_unittests.h
index 66592a7..12bf614 100644
--- a/src/config_for_unittests.h
+++ b/src/config_for_unittests.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/debugallocation.cc b/src/debugallocation.cc
index c170bc7..b0f7509 100644
--- a/src/debugallocation.cc
+++ b/src/debugallocation.cc

@@ -109,9 +109,12 @@
             "with a guard page following the allocation (to catch buffer "
             "overruns right when they happen).");
 DEFINE_bool(malloc_page_fence_never_reclaim,
-            EnvToBool("TCMALLOC_PAGE_FRANCE_NEVER_RECLAIM", false),
+            EnvToBool("TCMALLOC_PAGE_FENCE_NEVER_RECLAIM", false),
             "Enables making the virtual address space inaccessible "
             "upon a deallocation instead of returning it and reusing later.");
+DEFINE_bool(malloc_page_fence_readable,
+            EnvToBool("TCMALLOC_PAGE_FENCE_READABLE", false),
+            "Permits reads to the page fence.");
 #else
 DEFINE_bool(malloc_page_fence, false, "Not usable (requires mmap)");
 DEFINE_bool(malloc_page_fence_never_reclaim, false, "Not usable (required mmap)");
@@ -206,9 +209,10 @@
       // Adjust the number of frames to skip (4) if you change the
       // location of this call.
       num_deleter_pcs =
-          GetStackTrace(deleter_pcs,
-                        sizeof(deleter_pcs) / sizeof(deleter_pcs[0]),
-                        4);
+        MallocHook::GetCallerStackTrace(
+          deleter_pcs,
+          sizeof(deleter_pcs) / sizeof(deleter_pcs[0]),
+          4);
       deleter_threadid = pthread_self();
     } else {
       num_deleter_pcs = 0;
@@ -272,8 +276,8 @@
   // We use either do_malloc or mmap to make the actual allocation. In
   // order to remember which one of the two was used for any block, we store an
   // appropriate magic word next to the block.
-  static const int kMagicMalloc = 0xDEADBEEF;
-  static const int kMagicMMap = 0xABCDEFAB;
+  static const size_t kMagicMalloc = 0xDEADBEEF;
+  static const size_t kMagicMMap = 0xABCDEFAB;
 
   // This array will be filled with 0xCD, for use with memcmp.
   static unsigned char kMagicDeletedBuffer[1024];
@@ -299,7 +303,7 @@
   // then come the size2_ and magic2_, or a full page of mprotect-ed memory
   // if the malloc_page_fence feature is enabled.
   size_t size2_;
-  int magic2_;
+  size_t magic2_;
 
  private:  // static data and helpers
 
@@ -342,7 +346,7 @@
 
   bool IsMMapped() const { return kMagicMMap == magic1_; }
 
-  bool IsValidMagicValue(int value) const {
+  bool IsValidMagicValue(size_t value) const {
     return kMagicMMap == value  ||  kMagicMalloc == value;
   }
 
@@ -375,8 +379,8 @@
     return (const size_t*)((char*)&size2_ + size1_);
   }
 
-  int* magic2_addr() { return (int*)(size2_addr() + 1); }
-  const int* magic2_addr() const { return (const int*)(size2_addr() + 1); }
+  size_t* magic2_addr() { return (size_t*)(size2_addr() + 1); }
+  const size_t* magic2_addr() const { return (const size_t*)(size2_addr() + 1); }
 
  private:  // other helpers
 
@@ -394,28 +398,30 @@
     offset_ = 0;
     alloc_type_ = type;
     if (!IsMMapped()) {
-      *magic2_addr() = magic1_;
-      *size2_addr() = size;
+      bit_store(magic2_addr(), &magic1_);
+      bit_store(size2_addr(), &size);
     }
     alloc_map_lock_.Unlock();
     memset(data_addr(), kMagicUninitializedByte, size);
     if (!IsMMapped()) {
-      RAW_CHECK(size1_ == *size2_addr(), "should hold");
-      RAW_CHECK(magic1_ == *magic2_addr(), "should hold");
+      RAW_CHECK(memcmp(&size1_, size2_addr(), sizeof(size1_)) == 0, "should hold");
+      RAW_CHECK(memcmp(&magic1_, magic2_addr(), sizeof(magic1_)) == 0, "should hold");
     }
   }
 
-  size_t CheckAndClear(int type) {
+  size_t CheckAndClear(int type, size_t given_size) {
     alloc_map_lock_.Lock();
     CheckLocked(type);
     if (!IsMMapped()) {
-      RAW_CHECK(size1_ == *size2_addr(), "should hold");
+      RAW_CHECK(memcmp(&size1_, size2_addr(), sizeof(size1_)) == 0, "should hold");
     }
     // record us as deallocated in the map
     alloc_map_->Insert(data_addr(), type | kDeallocatedTypeBit);
     alloc_map_lock_.Unlock();
     // clear us
     const size_t size = real_size();
+    RAW_CHECK(!given_size || given_size == size1_,
+              "right size must be passed to sized delete");
     memset(this, kMagicDeletedByte, size);
     return size;
   }
@@ -449,11 +455,13 @@
                      data_addr());
     }
     if (!IsMMapped()) {
-      if (size1_ != *size2_addr()) {
+      if (memcmp(&size1_, size2_addr(), sizeof(size1_))) {
         RAW_LOG(FATAL, "memory stomping bug: a word after object at %p "
                        "has been corrupted", data_addr());
       }
-      if (!IsValidMagicValue(*magic2_addr())) {
+      size_t addr;
+      bit_store(&addr, magic2_addr());
+      if (!IsValidMagicValue(addr)) {
         RAW_LOG(FATAL, "memory stomping bug: a word after object at %p "
                 "has been corrupted", data_addr());
       }
@@ -498,11 +506,12 @@
     // the address space could take more.
     static size_t max_size_t = ~0;
     if (size > max_size_t - sizeof(MallocBlock)) {
-      RAW_LOG(ERROR, "Massive size passed to malloc: %" PRIuS "", size);
+      RAW_LOG(ERROR, "Massive size passed to malloc: %zu", size);
       return NULL;
     }
     MallocBlock* b = NULL;
     const bool use_malloc_page_fence = FLAGS_malloc_page_fence;
+    const bool malloc_page_fence_readable = FLAGS_malloc_page_fence_readable;
 #ifdef HAVE_MMAP
     if (use_malloc_page_fence) {
       // Put the block towards the end of the page and make the next page
@@ -521,7 +530,8 @@
                 strerror(errno));
       }
       // Mark the page after the block inaccessible
-      if (mprotect(p + (num_pages - 1) * pagesize, pagesize, PROT_NONE)) {
+      if (mprotect(p + (num_pages - 1) * pagesize, pagesize,
+                   PROT_NONE|(malloc_page_fence_readable ? PROT_READ : 0))) {
         RAW_LOG(FATAL, "Guard page setup failed: %s", strerror(errno));
       }
       b = (MallocBlock*) (p + (num_pages - 1) * pagesize - sz);
@@ -543,10 +553,10 @@
     return b;
   }
 
-  void Deallocate(int type) {
+  void Deallocate(int type, size_t given_size) {
     if (IsMMapped()) {  // have to do this before CheckAndClear
 #ifdef HAVE_MMAP
-      int size = CheckAndClear(type);
+      int size = CheckAndClear(type, given_size);
       int pagesize = getpagesize();
       int num_pages = (size + pagesize - 1) / pagesize + 1;
       char* p = (char*) this;
@@ -559,7 +569,7 @@
       }
 #endif
     } else {
-      const size_t size = CheckAndClear(type);
+      const size_t size = CheckAndClear(type, given_size);
       if (FLAGS_malloc_reclaim_memory) {
         // Instead of freeing the block immediately, push it onto a queue of
         // recently freed blocks.  Free only enough blocks to keep from
@@ -615,7 +625,6 @@
         free_queue_lock_.Lock();
       }
     }
-    RAW_CHECK(free_queue_size_ >= 0, "Free queue size went negative!");
     free_queue_lock_.Unlock();
     for (int i = 0; i < num_entries; i++) {
       CheckForDanglingWrites(entries[i]);
@@ -837,8 +846,8 @@
 
 // ========================================================================= //
 
-const int MallocBlock::kMagicMalloc;
-const int MallocBlock::kMagicMMap;
+const size_t MallocBlock::kMagicMalloc;
+const size_t MallocBlock::kMagicMMap;
 
 MallocBlock::AllocMap* MallocBlock::alloc_map_ = NULL;
 SpinLock MallocBlock::alloc_map_lock_(SpinLock::LINKER_INITIALIZED);
@@ -885,6 +894,7 @@
   const char *p = fmt;
   char numbuf[25];
   if (fd < 0) {
+    va_end(ap);
     return;
   }
   numbuf[sizeof(numbuf)-1] = 0;
@@ -998,7 +1008,7 @@
   do {                                                                  \
     if (FLAGS_malloctrace) {                                            \
       SpinLockHolder l(&malloc_trace_lock);                             \
-      TracePrintf(TraceFd(), "%s\t%" PRIuS "\t%p\t%" GPRIuPTHREAD,      \
+      TracePrintf(TraceFd(), "%s\t%zu\t%p\t%" GPRIuPTHREAD,      \
                   name, size, addr, PRINTABLE_PTHREAD(pthread_self())); \
       TraceStack();                                                     \
       TracePrintf(TraceFd(), "\n");                                     \
@@ -1030,11 +1040,11 @@
   return ptr->data_addr();
 }
 
-static inline void DebugDeallocate(void* ptr, int type) {
+static inline void DebugDeallocate(void* ptr, int type, size_t given_size) {
   MALLOC_TRACE("free",
                (ptr != 0 ? MallocBlock::FromRawPointer(ptr)->data_size() : 0),
                ptr);
-  if (ptr)  MallocBlock::FromRawPointer(ptr)->Deallocate(type);
+  if (ptr)  MallocBlock::FromRawPointer(ptr)->Deallocate(type, given_size);
 }
 
 // ========================================================================= //
@@ -1147,8 +1157,8 @@
 
 REGISTER_MODULE_INITIALIZER(debugallocation, {
 #if (__cplusplus >= 201103L)
-    COMPILE_ASSERT(alignof(debug_malloc_implementation_space) >= alignof(DebugMallocImplementation),
-                   debug_malloc_implementation_space_is_not_properly_aligned);
+    static_assert(alignof(decltype(debug_malloc_implementation_space)) >= alignof(DebugMallocImplementation),
+                  "DebugMallocImplementation is expected to need just word alignment");
 #endif
   // Either we or valgrind will control memory management.  We
   // register our extension if we're the winner. Otherwise let
@@ -1210,18 +1220,47 @@
 
 // Exported routines
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) __THROW {
+// frame forcer and force_frame exist only to prevent tail calls to
+// DebugDeallocate to be actually implemented as tail calls. This is
+// important because stack trace capturing in MallocBlockQueueEntry
+// relies on google_malloc section being on stack and tc_XXX functions
+// are in that section. So they must not jump to DebugDeallocate but
+// have to do call. frame_forcer call at the end of such functions
+// prevents tail calls to DebugDeallocate.
+static int frame_forcer;
+static void force_frame() {
+  int dummy = *(int volatile *)&frame_forcer;
+  (void)dummy;
+}
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) PERFTOOLS_NOTHROW {
+  if (ThreadCache::IsUseEmergencyMalloc()) {
+    return tcmalloc::EmergencyMalloc(size);
+  }
   void* ptr = do_debug_malloc_or_debug_cpp_alloc(size);
   MallocHook::InvokeNewHook(ptr, size);
   return ptr;
 }
 
-extern "C" PERFTOOLS_DLL_DECL void tc_free(void* ptr) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_free(void* ptr) PERFTOOLS_NOTHROW {
+  if (tcmalloc::IsEmergencyPtr(ptr)) {
+    return tcmalloc::EmergencyFree(ptr);
+  }
   MallocHook::InvokeDeleteHook(ptr);
-  DebugDeallocate(ptr, MallocBlock::kMallocType);
+  DebugDeallocate(ptr, MallocBlock::kMallocType, 0);
+  force_frame();
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_calloc(size_t count, size_t size) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_free_sized(void *ptr, size_t size) PERFTOOLS_NOTHROW {
+  MallocHook::InvokeDeleteHook(ptr);
+  DebugDeallocate(ptr, MallocBlock::kMallocType, size);
+  force_frame();
+}
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_calloc(size_t count, size_t size) PERFTOOLS_NOTHROW {
+  if (ThreadCache::IsUseEmergencyMalloc()) {
+    return tcmalloc::EmergencyCalloc(count, size);
+  }
   // Overflow check
   const size_t total_size = count * size;
   if (size != 0 && total_size / size != count) return NULL;
@@ -1232,12 +1271,19 @@
   return block;
 }
 
-extern "C" PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) PERFTOOLS_NOTHROW {
+  if (tcmalloc::IsEmergencyPtr(ptr)) {
+    return tcmalloc::EmergencyFree(ptr);
+  }
   MallocHook::InvokeDeleteHook(ptr);
-  DebugDeallocate(ptr, MallocBlock::kMallocType);
+  DebugDeallocate(ptr, MallocBlock::kMallocType, 0);
+  force_frame();
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_realloc(void* ptr, size_t size) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void* tc_realloc(void* ptr, size_t size) PERFTOOLS_NOTHROW {
+  if (tcmalloc::IsEmergencyPtr(ptr)) {
+    return tcmalloc::EmergencyRealloc(ptr, size);
+  }
   if (ptr == NULL) {
     ptr = do_debug_malloc_or_debug_cpp_alloc(size);
     MallocHook::InvokeNewHook(ptr, size);
@@ -1245,7 +1291,7 @@
   }
   if (size == 0) {
     MallocHook::InvokeDeleteHook(ptr);
-    DebugDeallocate(ptr, MallocBlock::kMallocType);
+    DebugDeallocate(ptr, MallocBlock::kMallocType, 0);
     return NULL;
   }
   MallocBlock* old = MallocBlock::FromRawPointer(ptr);
@@ -1270,7 +1316,7 @@
   memcpy(p->data_addr(), ptr, (old_size < size) ? old_size : size);
   MallocHook::InvokeDeleteHook(ptr);
   MallocHook::InvokeNewHook(p->data_addr(), size);
-  DebugDeallocate(ptr, MallocBlock::kMallocType);
+  DebugDeallocate(ptr, MallocBlock::kMallocType, 0);
   MALLOC_TRACE("realloc", p->data_size(), p->data_addr());
   return p->data_addr();
 }
@@ -1279,59 +1325,75 @@
   void* ptr = debug_cpp_alloc(size, MallocBlock::kNewType, false);
   MallocHook::InvokeNewHook(ptr, size);
   if (ptr == NULL) {
-    RAW_LOG(FATAL, "Unable to allocate %" PRIuS " bytes: new failed.", size);
+    RAW_LOG(FATAL, "Unable to allocate %zu bytes: new failed.", size);
   }
   return ptr;
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_new_nothrow(size_t size, const std::nothrow_t&) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void* tc_new_nothrow(size_t size, const std::nothrow_t&) PERFTOOLS_NOTHROW {
   void* ptr = debug_cpp_alloc(size, MallocBlock::kNewType, true);
   MallocHook::InvokeNewHook(ptr, size);
   return ptr;
 }
 
-extern "C" PERFTOOLS_DLL_DECL void tc_delete(void* p) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_delete(void* p) PERFTOOLS_NOTHROW {
   MallocHook::InvokeDeleteHook(p);
-  DebugDeallocate(p, MallocBlock::kNewType);
+  DebugDeallocate(p, MallocBlock::kNewType, 0);
+  force_frame();
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_sized(void* p, size_t size) PERFTOOLS_NOTHROW {
+  MallocHook::InvokeDeleteHook(p);
+  DebugDeallocate(p, MallocBlock::kNewType, size);
+  force_frame();
 }
 
 // Some STL implementations explicitly invoke this.
 // It is completely equivalent to a normal delete (delete never throws).
-extern "C" PERFTOOLS_DLL_DECL void tc_delete_nothrow(void* p, const std::nothrow_t&) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_nothrow(void* p, const std::nothrow_t&) PERFTOOLS_NOTHROW {
   MallocHook::InvokeDeleteHook(p);
-  DebugDeallocate(p, MallocBlock::kNewType);
+  DebugDeallocate(p, MallocBlock::kNewType, 0);
+  force_frame();
 }
 
 extern "C" PERFTOOLS_DLL_DECL void* tc_newarray(size_t size) {
   void* ptr = debug_cpp_alloc(size, MallocBlock::kArrayNewType, false);
   MallocHook::InvokeNewHook(ptr, size);
   if (ptr == NULL) {
-    RAW_LOG(FATAL, "Unable to allocate %" PRIuS " bytes: new[] failed.", size);
+    RAW_LOG(FATAL, "Unable to allocate %zu bytes: new[] failed.", size);
   }
   return ptr;
 }
 
 extern "C" PERFTOOLS_DLL_DECL void* tc_newarray_nothrow(size_t size, const std::nothrow_t&)
-    __THROW {
+    PERFTOOLS_NOTHROW {
   void* ptr = debug_cpp_alloc(size, MallocBlock::kArrayNewType, true);
   MallocHook::InvokeNewHook(ptr, size);
   return ptr;
 }
 
-extern "C" PERFTOOLS_DLL_DECL void tc_deletearray(void* p) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray(void* p) PERFTOOLS_NOTHROW {
   MallocHook::InvokeDeleteHook(p);
-  DebugDeallocate(p, MallocBlock::kArrayNewType);
+  DebugDeallocate(p, MallocBlock::kArrayNewType, 0);
+  force_frame();
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_sized(void* p, size_t size) PERFTOOLS_NOTHROW {
+  MallocHook::InvokeDeleteHook(p);
+  DebugDeallocate(p, MallocBlock::kArrayNewType, size);
+  force_frame();
 }
 
 // Some STL implementations explicitly invoke this.
 // It is completely equivalent to a normal delete (delete never throws).
-extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_nothrow(void* p, const std::nothrow_t&) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_nothrow(void* p, const std::nothrow_t&) PERFTOOLS_NOTHROW {
   MallocHook::InvokeDeleteHook(p);
-  DebugDeallocate(p, MallocBlock::kArrayNewType);
+  DebugDeallocate(p, MallocBlock::kArrayNewType, 0);
+  force_frame();
 }
 
 // This is mostly the same as do_memalign in tcmalloc.cc.
-static void *do_debug_memalign(size_t alignment, size_t size) {
+static void *do_debug_memalign(size_t alignment, size_t size, int type) {
   // Allocate >= size bytes aligned on "alignment" boundary
   // "alignment" is a power of two.
   void *p = 0;
@@ -1341,7 +1403,7 @@
   // a further data_offset bytes for an additional fake header.
   size_t extra_bytes = data_offset + alignment - 1;
   if (size + extra_bytes < size) return NULL;         // Overflow
-  p = DebugAllocate(size + extra_bytes, MallocBlock::kMallocType);
+  p = DebugAllocate(size + extra_bytes, type);
   if (p != 0) {
     intptr_t orig_p = reinterpret_cast<intptr_t>(p);
     // Leave data_offset bytes for fake header, and round up to meet
@@ -1366,16 +1428,21 @@
 struct memalign_retry_data {
   size_t align;
   size_t size;
+  int type;
 };
 
 static void *retry_debug_memalign(void *arg) {
   memalign_retry_data *data = static_cast<memalign_retry_data *>(arg);
-  return do_debug_memalign(data->align, data->size);
+  return do_debug_memalign(data->align, data->size, data->type);
 }
 
+ATTRIBUTE_ALWAYS_INLINE
 inline void* do_debug_memalign_or_debug_cpp_memalign(size_t align,
-                                                     size_t size) {
-  void* p = do_debug_memalign(align, size);
+                                                     size_t size,
+                                                     int type,
+                                                     bool from_operator,
+                                                     bool nothrow) {
+  void* p = do_debug_memalign(align, size, type);
   if (p != NULL) {
     return p;
   }
@@ -1383,26 +1450,27 @@
   struct memalign_retry_data data;
   data.align = align;
   data.size = size;
+  data.type = type;
   return handle_oom(retry_debug_memalign, &data,
-                    false, true);
+                    from_operator, nothrow);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_memalign(size_t align, size_t size) __THROW {
-  void *p = do_debug_memalign_or_debug_cpp_memalign(align, size);
+extern "C" PERFTOOLS_DLL_DECL void* tc_memalign(size_t align, size_t size) PERFTOOLS_NOTHROW {
+  void *p = do_debug_memalign_or_debug_cpp_memalign(align, size, MallocBlock::kMallocType, false, true);
   MallocHook::InvokeNewHook(p, size);
   return p;
 }
 
 // Implementation taken from tcmalloc/tcmalloc.cc
 extern "C" PERFTOOLS_DLL_DECL int tc_posix_memalign(void** result_ptr, size_t align, size_t size)
-    __THROW {
+    PERFTOOLS_NOTHROW {
   if (((align % sizeof(void*)) != 0) ||
       ((align & (align - 1)) != 0) ||
       (align == 0)) {
     return EINVAL;
   }
 
-  void* result = do_debug_memalign_or_debug_cpp_memalign(align, size);
+  void* result = do_debug_memalign_or_debug_cpp_memalign(align, size, MallocBlock::kMallocType, false, true);
   MallocHook::InvokeNewHook(result, size);
   if (result == NULL) {
     return ENOMEM;
@@ -1412,14 +1480,14 @@
   }
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_valloc(size_t size) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void* tc_valloc(size_t size) PERFTOOLS_NOTHROW {
   // Allocate >= size bytes starting on a page boundary
-  void *p = do_debug_memalign_or_debug_cpp_memalign(getpagesize(), size);
+  void *p = do_debug_memalign_or_debug_cpp_memalign(getpagesize(), size, MallocBlock::kMallocType, false, true);
   MallocHook::InvokeNewHook(p, size);
   return p;
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t size) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t size) PERFTOOLS_NOTHROW {
   // Round size up to a multiple of pages
   // then allocate memory on a page boundary
   int pagesize = getpagesize();
@@ -1427,31 +1495,93 @@
   if (size == 0) {     // pvalloc(0) should allocate one page, according to
     size = pagesize;   // http://man.free4web.biz/man3/libmpatrol.3.html
   }
-  void *p = do_debug_memalign_or_debug_cpp_memalign(pagesize, size);
+  void *p = do_debug_memalign_or_debug_cpp_memalign(pagesize, size, MallocBlock::kMallocType, false, true);
   MallocHook::InvokeNewHook(p, size);
   return p;
 }
 
+#if defined(ENABLE_ALIGNED_NEW_DELETE)
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_new_aligned(size_t size, std::align_val_t align) {
+  void* result = do_debug_memalign_or_debug_cpp_memalign(static_cast<size_t>(align), size, MallocBlock::kNewType, true, false);
+  MallocHook::InvokeNewHook(result, size);
+  return result;
+}
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_new_aligned_nothrow(size_t size, std::align_val_t align, const std::nothrow_t&) PERFTOOLS_NOTHROW {
+  void* result = do_debug_memalign_or_debug_cpp_memalign(static_cast<size_t>(align), size, MallocBlock::kNewType, true, true);
+  MallocHook::InvokeNewHook(result, size);
+  return result;
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_aligned(void* p, std::align_val_t) PERFTOOLS_NOTHROW {
+  tc_delete(p);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_sized_aligned(void* p, size_t size, std::align_val_t align) PERFTOOLS_NOTHROW {
+  // Reproduce actual size calculation done by do_debug_memalign
+  const size_t alignment = static_cast<size_t>(align);
+  const size_t data_offset = MallocBlock::data_offset();
+  const size_t extra_bytes = data_offset + alignment - 1;
+
+  tc_delete_sized(p, size + extra_bytes);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_aligned_nothrow(void* p, std::align_val_t, const std::nothrow_t&) PERFTOOLS_NOTHROW {
+  tc_delete(p);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_newarray_aligned(size_t size, std::align_val_t align) {
+  void* result = do_debug_memalign_or_debug_cpp_memalign(static_cast<size_t>(align), size, MallocBlock::kArrayNewType, true, false);
+  MallocHook::InvokeNewHook(result, size);
+  return result;
+}
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_newarray_aligned_nothrow(size_t size, std::align_val_t align, const std::nothrow_t& nt) PERFTOOLS_NOTHROW {
+  void* result = do_debug_memalign_or_debug_cpp_memalign(static_cast<size_t>(align), size, MallocBlock::kArrayNewType, true, true);
+  MallocHook::InvokeNewHook(result, size);
+  return result;
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_aligned(void* p, std::align_val_t) PERFTOOLS_NOTHROW {
+  tc_deletearray(p);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_sized_aligned(void* p, size_t size, std::align_val_t align) PERFTOOLS_NOTHROW {
+  // Reproduce actual size calculation done by do_debug_memalign
+  const size_t alignment = static_cast<size_t>(align);
+  const size_t data_offset = MallocBlock::data_offset();
+  const size_t extra_bytes = data_offset + alignment - 1;
+
+  tc_deletearray_sized(p, size + extra_bytes);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_aligned_nothrow(void* p, std::align_val_t, const std::nothrow_t&) PERFTOOLS_NOTHROW {
+  tc_deletearray(p);
+}
+
+#endif // defined(ENABLE_ALIGNED_NEW_DELETE)
+
 // malloc_stats just falls through to the base implementation.
-extern "C" PERFTOOLS_DLL_DECL void tc_malloc_stats(void) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_malloc_stats(void) PERFTOOLS_NOTHROW {
   do_malloc_stats();
 }
 
-extern "C" PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) __THROW {
+extern "C" PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) PERFTOOLS_NOTHROW {
   return do_mallopt(cmd, value);
 }
 
 #ifdef HAVE_STRUCT_MALLINFO
-extern "C" PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW {
+extern "C" PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) PERFTOOLS_NOTHROW {
   return do_mallinfo();
 }
 #endif
 
-extern "C" PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW {
+extern "C" PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) PERFTOOLS_NOTHROW {
   return MallocExtension::instance()->GetAllocatedSize(ptr);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size) PERFTOOLS_NOTHROW {
   void* result = DebugAllocate(size, MallocBlock::kMallocType);
   MallocHook::InvokeNewHook(result, size);
   return result;

diff --git a/src/emergency_malloc.cc b/src/emergency_malloc.cc
new file mode 100644
index 0000000..6c0946a
--- /dev/null
+++ b/src/emergency_malloc.cc

@@ -0,0 +1,169 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// Copyright (c) 2014, gperftools Contributors
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+
+#include "config.h"
+
+#include "emergency_malloc.h"
+
+#include <errno.h>                      // for ENOMEM, errno
+#include <string.h>                     // for memset
+
+#include "base/basictypes.h"
+#include "base/logging.h"
+#include "base/low_level_alloc.h"
+#include "base/spinlock.h"
+#include "internal_logging.h"
+
+
+namespace tcmalloc {
+  __attribute__ ((visibility("internal"))) char *emergency_arena_start;
+  __attribute__ ((visibility("internal"))) uintptr_t emergency_arena_start_shifted;
+
+  static CACHELINE_ALIGNED SpinLock emergency_malloc_lock(base::LINKER_INITIALIZED);
+  static char *emergency_arena_end;
+  static LowLevelAlloc::Arena *emergency_arena;
+
+  class EmergencyArenaPagesAllocator : public LowLevelAlloc::PagesAllocator {
+    ~EmergencyArenaPagesAllocator() {}
+    void *MapPages(int32 flags, size_t size) {
+      char *new_end = emergency_arena_end + size;
+      if (new_end > emergency_arena_start + kEmergencyArenaSize) {
+        RAW_LOG(FATAL, "Unable to allocate %zu bytes in emergency zone.", size);
+      }
+      char *rv = emergency_arena_end;
+      emergency_arena_end = new_end;
+      return static_cast<void *>(rv);
+    }
+    void UnMapPages(int32 flags, void *addr, size_t size) {
+      RAW_LOG(FATAL, "UnMapPages is not implemented for emergency arena");
+    }
+  };
+
+  static union {
+    char bytes[sizeof(EmergencyArenaPagesAllocator)];
+    void *ptr;
+  } pages_allocator_place;
+
+  static void InitEmergencyMalloc(void) {
+    const int32 flags = LowLevelAlloc::kAsyncSignalSafe;
+
+    void *arena = LowLevelAlloc::GetDefaultPagesAllocator()->MapPages(flags, kEmergencyArenaSize * 2);
+
+    uintptr_t arena_ptr = reinterpret_cast<uintptr_t>(arena);
+    uintptr_t ptr = (arena_ptr + kEmergencyArenaSize - 1) & ~(kEmergencyArenaSize-1);
+
+    emergency_arena_end = emergency_arena_start = reinterpret_cast<char *>(ptr);
+    EmergencyArenaPagesAllocator *allocator = new (pages_allocator_place.bytes) EmergencyArenaPagesAllocator();
+    emergency_arena = LowLevelAlloc::NewArenaWithCustomAlloc(0, LowLevelAlloc::DefaultArena(), allocator);
+
+    emergency_arena_start_shifted = reinterpret_cast<uintptr_t>(emergency_arena_start) >> kEmergencyArenaShift;
+
+    uintptr_t head_unmap_size = ptr - arena_ptr;
+    CHECK_CONDITION(head_unmap_size < kEmergencyArenaSize);
+    if (head_unmap_size != 0) {
+      LowLevelAlloc::GetDefaultPagesAllocator()->UnMapPages(flags, arena, ptr - arena_ptr);
+    }
+
+    uintptr_t tail_unmap_size = kEmergencyArenaSize - head_unmap_size;
+    void *tail_start = reinterpret_cast<void *>(arena_ptr + head_unmap_size + kEmergencyArenaSize);
+    LowLevelAlloc::GetDefaultPagesAllocator()->UnMapPages(flags, tail_start, tail_unmap_size);
+  }
+
+  PERFTOOLS_DLL_DECL void *EmergencyMalloc(size_t size) {
+    SpinLockHolder l(&emergency_malloc_lock);
+
+    if (emergency_arena_start == NULL) {
+      InitEmergencyMalloc();
+      CHECK_CONDITION(emergency_arena_start != NULL);
+    }
+
+    void *rv = LowLevelAlloc::AllocWithArena(size, emergency_arena);
+    if (rv == NULL) {
+      errno = ENOMEM;
+    }
+    return rv;
+  }
+
+  PERFTOOLS_DLL_DECL void EmergencyFree(void *p) {
+    SpinLockHolder l(&emergency_malloc_lock);
+    if (emergency_arena_start == NULL) {
+      InitEmergencyMalloc();
+      CHECK_CONDITION(emergency_arena_start != NULL);
+      free(p);
+      return;
+    }
+    CHECK_CONDITION(emergency_arena_start);
+    LowLevelAlloc::Free(p);
+  }
+
+  PERFTOOLS_DLL_DECL void *EmergencyRealloc(void *_old_ptr, size_t new_size) {
+    if (_old_ptr == NULL) {
+      return EmergencyMalloc(new_size);
+    }
+    if (new_size == 0) {
+      EmergencyFree(_old_ptr);
+      return NULL;
+    }
+    SpinLockHolder l(&emergency_malloc_lock);
+    CHECK_CONDITION(emergency_arena_start);
+
+    char *old_ptr = static_cast<char *>(_old_ptr);
+    CHECK_CONDITION(old_ptr <= emergency_arena_end);
+    CHECK_CONDITION(emergency_arena_start <= old_ptr);
+
+    // NOTE: we don't know previous size of old_ptr chunk. So instead
+    // of trying to figure out right size of copied memory, we just
+    // copy largest possible size. We don't care about being slow.
+    size_t old_ptr_size = emergency_arena_end - old_ptr;
+    size_t copy_size = (new_size < old_ptr_size) ? new_size : old_ptr_size;
+
+    void *new_ptr = LowLevelAlloc::AllocWithArena(new_size, emergency_arena);
+    if (new_ptr == NULL) {
+      errno = ENOMEM;
+      return NULL;
+    }
+    memcpy(new_ptr, old_ptr, copy_size);
+
+    LowLevelAlloc::Free(old_ptr);
+    return new_ptr;
+  }
+
+  PERFTOOLS_DLL_DECL void *EmergencyCalloc(size_t n, size_t elem_size) {
+    // Overflow check
+    const size_t size = n * elem_size;
+    if (elem_size != 0 && size / elem_size != n) return NULL;
+    void *rv = EmergencyMalloc(size);
+    if (rv != NULL) {
+      memset(rv, 0, size);
+    }
+    return rv;
+  }
+};

diff --git a/src/emergency_malloc.h b/src/emergency_malloc.h
new file mode 100644
index 0000000..8a82cfc
--- /dev/null
+++ b/src/emergency_malloc.h

@@ -0,0 +1,60 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// Copyright (c) 2014, gperftools Contributors
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#ifndef EMERGENCY_MALLOC_H
+#define EMERGENCY_MALLOC_H
+#include "config.h"
+
+#include <stddef.h>
+
+#include "base/basictypes.h"
+#include "common.h"
+
+namespace tcmalloc {
+  static const uintptr_t kEmergencyArenaShift = 20+4; // 16 megs
+  static const uintptr_t kEmergencyArenaSize = 1 << kEmergencyArenaShift;
+
+  extern __attribute__ ((visibility("internal"))) char *emergency_arena_start;
+  extern __attribute__ ((visibility("internal"))) uintptr_t emergency_arena_start_shifted;;
+
+  PERFTOOLS_DLL_DECL void *EmergencyMalloc(size_t size);
+  PERFTOOLS_DLL_DECL void EmergencyFree(void *p);
+  PERFTOOLS_DLL_DECL void *EmergencyCalloc(size_t n, size_t elem_size);
+  PERFTOOLS_DLL_DECL void *EmergencyRealloc(void *old_ptr, size_t new_size);
+
+  static inline bool IsEmergencyPtr(const void *_ptr) {
+    uintptr_t ptr = reinterpret_cast<uintptr_t>(_ptr);
+    return PREDICT_FALSE((ptr >> kEmergencyArenaShift) == emergency_arena_start_shifted)
+      && emergency_arena_start_shifted;
+  }
+
+} // namespace tcmalloc
+
+#endif

diff --git a/src/emergency_malloc_for_stacktrace.cc b/src/emergency_malloc_for_stacktrace.cc
new file mode 100644
index 0000000..f1dc35e
--- /dev/null
+++ b/src/emergency_malloc_for_stacktrace.cc

@@ -0,0 +1,48 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// Copyright (c) 2014, gperftools Contributors
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#include "emergency_malloc.h"
+#include "thread_cache.h"
+
+namespace tcmalloc {
+  bool EnterStacktraceScope(void);
+  void LeaveStacktraceScope(void);
+}
+
+bool tcmalloc::EnterStacktraceScope(void) {
+  if (ThreadCache::IsUseEmergencyMalloc()) {
+    return false;
+  }
+  ThreadCache::SetUseEmergencyMalloc();
+  return true;
+}
+
+void tcmalloc::LeaveStacktraceScope(void) {
+  ThreadCache::ResetUseEmergencyMalloc();
+}

diff --git a/src/fake_stacktrace_scope.cc b/src/fake_stacktrace_scope.cc
new file mode 100644
index 0000000..ee35a04
--- /dev/null
+++ b/src/fake_stacktrace_scope.cc

@@ -0,0 +1,39 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// Copyright (c) 2014, gperftools Contributors
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#include "base/basictypes.h"
+
+namespace tcmalloc {
+  ATTRIBUTE_WEAK bool EnterStacktraceScope(void) {
+    return true;
+  }
+  ATTRIBUTE_WEAK void LeaveStacktraceScope(void) {
+  }
+}

diff --git a/src/getenv_safe.h b/src/getenv_safe.h
index 3b9f4db..59094b1 100644
--- a/src/getenv_safe.h
+++ b/src/getenv_safe.h

@@ -1,11 +1,11 @@
 /* -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
  * Copyright (c) 2014, gperftools Contributors
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -36,7 +36,7 @@
 extern "C" {
 #endif
 
-/* 
+/*
  * This getenv function is safe to call before the C runtime is initialized.
  * On Windows, it utilizes GetEnvironmentVariable() and on unix it uses
  * /proc/self/environ instead calling getenv().  It's intended to be used in
@@ -50,7 +50,7 @@
  * Note that on unix, /proc only has the environment at the time the
  * application was started, so this routine ignores setenv() calls/etc.  Also
  * note it only reads the first 16K of the environment.
- * 
+ *
  * NOTE: this is version of GetenvBeforeMain that's usable from
  * C. Implementation is in sysinfo.cc
  */

diff --git a/src/getpc.h b/src/getpc.h
index 25fee39..9605363 100644
--- a/src/getpc.h
+++ b/src/getpc.h

@@ -56,6 +56,9 @@
 //#define _XOPEN_SOURCE 500
 
 #include <string.h>         // for memcmp
+#ifdef HAVE_ASM_PTRACE_H
+#include <asm/ptrace.h>
+#endif
 #if defined(HAVE_SYS_UCONTEXT_H)
 #include <sys/ucontext.h>
 #elif defined(HAVE_UCONTEXT_H)
@@ -179,7 +182,12 @@
 // configure.ac (or set it manually in your config.h).
 #else
 inline void* GetPC(const ucontext_t& signal_ucontext) {
+#if defined(__s390__) && !defined(__s390x__)
+  // Mask out the AMODE31 bit from the PC recorded in the context.
+  return (void*)((unsigned long)signal_ucontext.PC_FROM_UCONTEXT & 0x7fffffffUL);
+#else
   return (void*)signal_ucontext.PC_FROM_UCONTEXT;   // defined in config.h
+#endif
 }
 
 #endif

diff --git a/src/google/heap-checker.h b/src/google/heap-checker.h
index 7cacf1f..6b9ffe5 100644
--- a/src/google/heap-checker.h
+++ b/src/google/heap-checker.h

@@ -30,7 +30,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/heap-checker.h is deprecated. Use gperftools/heap-checker.h instead"
 #endif
 #include <gperftools/heap-checker.h>

diff --git a/src/google/heap-profiler.h b/src/google/heap-profiler.h
index 3fc26cf..0c46f63 100644
--- a/src/google/heap-profiler.h
+++ b/src/google/heap-profiler.h

@@ -1,10 +1,10 @@
 /* Copyright (c) 2005, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -31,7 +31,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/heap-profiler.h is deprecated. Use gperftools/heap-profiler.h instead"
 #endif
 #include <gperftools/heap-profiler.h>

diff --git a/src/google/malloc_extension.h b/src/google/malloc_extension.h
index 7cacc34..ad34dec 100644
--- a/src/google/malloc_extension.h
+++ b/src/google/malloc_extension.h

@@ -1,10 +1,10 @@
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -30,7 +30,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/malloc_extension.h is deprecated. Use gperftools/malloc_extension.h instead"
 #endif
 #include <gperftools/malloc_extension.h>

diff --git a/src/google/malloc_extension_c.h b/src/google/malloc_extension_c.h
index f34a835..9141805 100644
--- a/src/google/malloc_extension_c.h
+++ b/src/google/malloc_extension_c.h

@@ -1,10 +1,10 @@
 /* Copyright (c) 2008, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -31,7 +31,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/malloc_extension_c.h is deprecated. Use gperftools/malloc_extension_c.h instead"
 #endif
 #include <gperftools/malloc_extension_c.h>

diff --git a/src/google/malloc_hook.h b/src/google/malloc_hook.h
index 371aba4..416283b 100644
--- a/src/google/malloc_hook.h
+++ b/src/google/malloc_hook.h

@@ -1,10 +1,10 @@
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -30,7 +30,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/malloc_hook.h is deprecated. Use gperftools/malloc_hook.h instead"
 #endif
 #include <gperftools/malloc_hook.h>

diff --git a/src/google/malloc_hook_c.h b/src/google/malloc_hook_c.h
index f882c16..1fa1a4a 100644
--- a/src/google/malloc_hook_c.h
+++ b/src/google/malloc_hook_c.h

@@ -1,10 +1,10 @@
 /* Copyright (c) 2008, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -31,7 +31,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/malloc_hook_c.h is deprecated. Use gperftools/malloc_hook_c.h instead"
 #endif
 #include <gperftools/malloc_hook_c.h>

diff --git a/src/google/profiler.h b/src/google/profiler.h
index 3674c9e..2f99679 100644
--- a/src/google/profiler.h
+++ b/src/google/profiler.h

@@ -1,10 +1,10 @@
 /* Copyright (c) 2005, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -31,7 +31,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/profiler.h is deprecated. Use gperftools/profiler.h instead"
 #endif
 #include <gperftools/profiler.h>

diff --git a/src/google/stacktrace.h b/src/google/stacktrace.h
index 53d2947..829b303 100644
--- a/src/google/stacktrace.h
+++ b/src/google/stacktrace.h

@@ -1,10 +1,10 @@
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -30,7 +30,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/stacktrace.h is deprecated. Use gperftools/stacktrace.h instead"
 #endif
 #include <gperftools/stacktrace.h>

diff --git a/src/google/tcmalloc.h b/src/google/tcmalloc.h
index a2db70e..ee8bb15 100644
--- a/src/google/tcmalloc.h
+++ b/src/google/tcmalloc.h

@@ -1,10 +1,10 @@
 /* Copyright (c) 2003, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -31,7 +31,7 @@
 /* The code has moved to gperftools/.  Use that include-directory for
  * new code.
  */
-#ifdef __GNUC__
+#if defined(__GNUC__) && !defined(GPERFTOOLS_SUPPRESS_LEGACY_WARNING)
 #warning "google/tcmalloc.h is deprecated. Use gperftools/tcmalloc.h instead"
 #endif
 #include <gperftools/tcmalloc.h>

diff --git a/src/gperftools/heap-checker.h b/src/gperftools/heap-checker.h
index 5a87d8d..edd6cc7 100644
--- a/src/gperftools/heap-checker.h
+++ b/src/gperftools/heap-checker.h

@@ -34,7 +34,7 @@
 //
 // Module for detecing heap (memory) leaks.
 //
-// For full(er) information, see doc/heap_checker.html
+// For full(er) information, see docs/heap_checker.html
 //
 // This module can be linked into programs with
 // no slowdown caused by this unless you activate the leak-checker:

diff --git a/src/gperftools/heap-profiler.h b/src/gperftools/heap-profiler.h
index 9b67364..f8076e9 100644
--- a/src/gperftools/heap-profiler.h
+++ b/src/gperftools/heap-profiler.h

@@ -1,11 +1,11 @@
-// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+/* -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2005, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -33,7 +33,7 @@
  *
  * Module for heap-profiling.
  *
- * For full(er) information, see doc/heapprofile.html
+ * For full(er) information, see docs/heapprofile.html
  *
  * This module can be linked into your program with
  * no slowdown caused by this unless you activate the profiler

diff --git a/src/gperftools/malloc_extension.h b/src/gperftools/malloc_extension.h
index 95b35cb..d203394 100644
--- a/src/gperftools/malloc_extension.h
+++ b/src/gperftools/malloc_extension.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -107,8 +107,12 @@
   virtual bool MallocMemoryStats(int* blocks, size_t* total,
                                  int histogram[kMallocHistogramSize]);
 
-  // Get a human readable description of the current state of the malloc
-  // data structures.  The state is stored as a null-terminated string
+  // Get a human readable description of the following malloc data structures.
+  // - Total inuse memory by application.
+  // - Free memory(thread, central and page heap),
+  // - Freelist of central cache, each class.
+  // - Page heap freelist.
+  // The state is stored as a null-terminated string
   // in a prefix of "buffer[0,buffer_length-1]".
   // REQUIRES: buffer_length > 0.
   virtual void GetStats(char* buffer, int buffer_length);
@@ -119,6 +123,10 @@
   // therefore be passed to "pprof". This function is equivalent to
   // ReadStackTraces. The main difference is that this function returns
   // serialized data appropriately formatted for use by the pprof tool.
+  //
+  // Since gperftools 2.8 heap samples are not de-duplicated by the
+  // library anymore.
+  //
   // NOTE: by default, tcmalloc does not do any heap sampling, and this
   //       function will always return an empty sample.  To get useful
   //       data from GetHeapSample, you must also set the environment
@@ -160,6 +168,14 @@
   //            freed memory regions
   //      This property is not writable.
   //
+  //  "generic.total_physical_bytes"
+  //      Estimate of total bytes of the physical memory usage by the
+  //      allocator ==
+  //            current_allocated_bytes +
+  //            fragmentation +
+  //            metadata
+  //      This property is not writable.
+  //
   // tcmalloc
   // --------
   // "tcmalloc.max_total_thread_cache_bytes"
@@ -391,6 +407,15 @@
   // Like ReadStackTraces(), but returns stack traces that caused growth
   // in the address space size.
   virtual void** ReadHeapGrowthStackTraces();
+
+  // Returns the size in bytes of the calling threads cache.
+  virtual size_t GetThreadCacheSize();
+
+  // Like MarkThreadIdle, but does not destroy the internal data
+  // structures of the thread cache. When the thread resumes, it wil
+  // have an empty cache but will not need to pay to reconstruct the
+  // cache data structures.
+  virtual void MarkThreadTemporarilyIdle();
 };
 
 namespace base {

diff --git a/src/gperftools/malloc_extension_c.h b/src/gperftools/malloc_extension_c.h
index baa013d..a43534f 100644
--- a/src/gperftools/malloc_extension_c.h
+++ b/src/gperftools/malloc_extension_c.h

@@ -1,10 +1,10 @@
 /* Copyright (c) 2008, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -79,6 +79,8 @@
 PERFTOOLS_DLL_DECL void MallocExtension_ReleaseFreeMemory(void);
 PERFTOOLS_DLL_DECL size_t MallocExtension_GetEstimatedAllocatedSize(size_t size);
 PERFTOOLS_DLL_DECL size_t MallocExtension_GetAllocatedSize(const void* p);
+PERFTOOLS_DLL_DECL size_t MallocExtension_GetThreadCacheSize(void);
+PERFTOOLS_DLL_DECL void MallocExtension_MarkThreadTemporarilyIdle(void);
 
 /*
  * NOTE: These enum values MUST be kept in sync with the version in

diff --git a/src/gperftools/malloc_hook.h b/src/gperftools/malloc_hook.h
index 9d56fb1..ab655f6 100644
--- a/src/gperftools/malloc_hook.h
+++ b/src/gperftools/malloc_hook.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -70,7 +70,7 @@
 #include <stddef.h>
 #include <sys/types.h>
 extern "C" {
-#include <gperftools/malloc_hook_c.h>  // a C version of the malloc_hook interface
+#include "malloc_hook_c.h"  // a C version of the malloc_hook interface
 }
 
 // Annoying stuff for windows -- makes sure clients can import these functions

diff --git a/src/gperftools/malloc_hook_c.h b/src/gperftools/malloc_hook_c.h
index 56337e1..5cee782 100644
--- a/src/gperftools/malloc_hook_c.h
+++ b/src/gperftools/malloc_hook_c.h

@@ -1,10 +1,10 @@
 /* Copyright (c) 2008, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/gperftools/nallocx.h b/src/gperftools/nallocx.h
new file mode 100644
index 0000000..01f874c
--- /dev/null
+++ b/src/gperftools/nallocx.h

@@ -0,0 +1,37 @@
+#ifndef _NALLOCX_H_
+#define _NALLOCX_H_
+#include <stddef.h>
+
+#ifndef PERFTOOLS_DLL_DECL
+# ifdef _WIN32
+#  define PERFTOOLS_DLL_DECL  __declspec(dllimport)
+# else
+#  define PERFTOOLS_DLL_DECL
+# endif
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define MALLOCX_LG_ALIGN(la) ((int)(la))
+
+/*
+ * The nallocx function allocates no memory, but it performs the same size
+ * computation as the malloc function, and returns the real size of the
+ * allocation that would result from the equivalent malloc function call.
+ * nallocx is a malloc extension originally implemented by jemalloc:
+ * http://www.unix.com/man-page/freebsd/3/nallocx/
+ *
+ * Note, we only support MALLOCX_LG_ALIGN flag and nothing else.
+ */
+PERFTOOLS_DLL_DECL size_t nallocx(size_t size, int flags);
+
+/* same as above but never weak */
+PERFTOOLS_DLL_DECL size_t tc_nallocx(size_t size, int flags);
+
+#ifdef __cplusplus
+}   /* extern "C" */
+#endif
+
+#endif /* _NALLOCX_H_ */

diff --git a/src/gperftools/profiler.h b/src/gperftools/profiler.h
index 2d272d6..89e34a2 100644
--- a/src/gperftools/profiler.h
+++ b/src/gperftools/profiler.h

@@ -1,11 +1,11 @@
-// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+/* -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2005, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -33,7 +33,7 @@
  *
  * Module for CPU profiling based on periodic pc-sampling.
  *
- * For full(er) information, see doc/cpuprofile.html
+ * For full(er) information, see docs/cpuprofile.html
  *
  * This module is linked into your program with
  * no slowdown caused by this unless you activate the profiler
@@ -162,6 +162,10 @@
 };
 PERFTOOLS_DLL_DECL void ProfilerGetCurrentState(struct ProfilerState* state);
 
+/* Returns the current stack trace, to be called from a SIGPROF handler. */
+PERFTOOLS_DLL_DECL int ProfilerGetStackTrace(
+    void** result, int max_depth, int skip_count, const void *uc);
+
 #ifdef __cplusplus
 }  // extern "C"
 #endif

diff --git a/src/gperftools/stacktrace.h b/src/gperftools/stacktrace.h
index 2b9c5a1..a0890f4 100644
--- a/src/gperftools/stacktrace.h
+++ b/src/gperftools/stacktrace.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/gperftools/tcmalloc.h.in b/src/gperftools/tcmalloc.h.in
index d43184d..0c8a3dd 100644
--- a/src/gperftools/tcmalloc.h.in
+++ b/src/gperftools/tcmalloc.h.in

@@ -1,11 +1,11 @@
-// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+/* -*- Mode: C; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2003, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -36,37 +36,38 @@
 #ifndef TCMALLOC_TCMALLOC_H_
 #define TCMALLOC_TCMALLOC_H_
 
-#include <stddef.h>                     // for size_t
-#ifdef HAVE_SYS_CDEFS_H
-#include <sys/cdefs.h>   // where glibc defines __THROW
+#include <stddef.h>                     /* for size_t */
+#ifdef __cplusplus
+#include <new>                          /* for std::nothrow_t, std::align_val_t */
 #endif
 
-// __THROW is defined in glibc systems.  It means, counter-intuitively,
-// "This function will never throw an exception."  It's an optional
-// optimization tool, but we may need to use it to match glibc prototypes.
-#ifndef __THROW    /* I guess we're not on a glibc system */
-# define __THROW   /* __THROW is just an optimization, so ok to make it "" */
-#endif
-
-// Define the version number so folks can check against it
+/* Define the version number so folks can check against it */
 #define TC_VERSION_MAJOR  @TC_VERSION_MAJOR@
 #define TC_VERSION_MINOR  @TC_VERSION_MINOR@
 #define TC_VERSION_PATCH  "@TC_VERSION_PATCH@"
 #define TC_VERSION_STRING "gperftools @TC_VERSION_MAJOR@.@TC_VERSION_MINOR@@TC_VERSION_PATCH@"
 
-// For struct mallinfo, if it's defined.
-#ifdef HAVE_STRUCT_MALLINFO
-// Malloc can be in several places on older versions of OS X.
-# if defined(HAVE_MALLOC_H)
+/* For struct mallinfo, if it's defined. */
+#if @ac_cv_have_struct_mallinfo@
 # include <malloc.h>
-# elif defined(HAVE_SYS_MALLOC_H)
-# include <sys/malloc.h>
-# elif defined(HAVE_MALLOC_MALLOC_H)
-# include <malloc/malloc.h>
+#endif
+
+#ifndef PERFTOOLS_NOTHROW
+
+#if __cplusplus >= 201103L
+#define PERFTOOLS_NOTHROW noexcept
+#elif defined(__cplusplus)
+#define PERFTOOLS_NOTHROW throw()
+#else
+# ifdef __GNUC__
+#  define PERFTOOLS_NOTHROW __attribute__((__nothrow__))
+# else
+#  define PERFTOOLS_NOTHROW
 # endif
 #endif
 
-// Annoying stuff for windows -- makes sure clients can import these functions
+#endif
+
 #ifndef PERFTOOLS_DLL_DECL
 # ifdef _WIN32
 #   define PERFTOOLS_DLL_DECL  __declspec(dllimport)
@@ -76,60 +77,87 @@
 #endif
 
 #ifdef __cplusplus
-namespace std {
-struct nothrow_t;
-}
-
 extern "C" {
 #endif
-  // Returns a human-readable version string.  If major, minor,
-  // and/or patch are not NULL, they are set to the major version,
-  // minor version, and patch-code (a string, usually "").
+  /*
+   * Returns a human-readable version string.  If major, minor,
+   * and/or patch are not NULL, they are set to the major version,
+   * minor version, and patch-code (a string, usually "").
+   */
   PERFTOOLS_DLL_DECL const char* tc_version(int* major, int* minor,
-                                            const char** patch) __THROW;
+                                            const char** patch) PERFTOOLS_NOTHROW;
 
-  PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void tc_free(void* ptr) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_realloc(void* ptr, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_calloc(size_t nmemb, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) __THROW;
+  PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_free(void* ptr) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_free_sized(void *ptr, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_realloc(void* ptr, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_calloc(size_t nmemb, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) PERFTOOLS_NOTHROW;
 
   PERFTOOLS_DLL_DECL void* tc_memalign(size_t __alignment,
-                                       size_t __size) __THROW;
+                                       size_t __size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL int tc_posix_memalign(void** ptr,
-                                           size_t align, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_valloc(size_t __size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t __size) __THROW;
+                                           size_t align, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_valloc(size_t __size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t __size) PERFTOOLS_NOTHROW;
 
-  PERFTOOLS_DLL_DECL void tc_malloc_stats(void) __THROW;
-  PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) __THROW;
+  PERFTOOLS_DLL_DECL void tc_malloc_stats(void) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) PERFTOOLS_NOTHROW;
 #if @ac_cv_have_struct_mallinfo@
-  PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW;
+  PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) PERFTOOLS_NOTHROW;
 #endif
 
-  // This is an alias for MallocExtension::instance()->GetAllocatedSize().
-  // It is equivalent to
-  //    OS X: malloc_size()
-  //    glibc: malloc_usable_size()
-  //    Windows: _msize()
-  PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW;
+  /*
+   * This is an alias for MallocExtension::instance()->GetAllocatedSize().
+   * It is equivalent to
+   *    OS X: malloc_size()
+   *    glibc: malloc_usable_size()
+   *    Windows: _msize()
+   */
+  PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) PERFTOOLS_NOTHROW;
 
 #ifdef __cplusplus
-  PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) __THROW;
+  PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void* tc_new(size_t size);
   PERFTOOLS_DLL_DECL void* tc_new_nothrow(size_t size,
-                                          const std::nothrow_t&) __THROW;
-  PERFTOOLS_DLL_DECL void tc_delete(void* p) __THROW;
+                                          const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete(void* p) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_sized(void* p, size_t size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void tc_delete_nothrow(void* p,
-                                            const std::nothrow_t&) __THROW;
+                                            const std::nothrow_t&) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void* tc_newarray(size_t size);
   PERFTOOLS_DLL_DECL void* tc_newarray_nothrow(size_t size,
-                                               const std::nothrow_t&) __THROW;
-  PERFTOOLS_DLL_DECL void tc_deletearray(void* p) __THROW;
+                                               const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray(void* p) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_sized(void* p, size_t size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void tc_deletearray_nothrow(void* p,
-                                                 const std::nothrow_t&) __THROW;
+                                                 const std::nothrow_t&) PERFTOOLS_NOTHROW;
+
+#if @ac_cv_have_std_align_val_t@ && __cplusplus >= 201703L
+  PERFTOOLS_DLL_DECL void* tc_new_aligned(size_t size, std::align_val_t al);
+  PERFTOOLS_DLL_DECL void* tc_new_aligned_nothrow(size_t size, std::align_val_t al,
+                                          const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_aligned(void* p, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_sized_aligned(void* p, size_t size, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_aligned_nothrow(void* p, std::align_val_t al,
+                                            const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_newarray_aligned(size_t size, std::align_val_t al);
+  PERFTOOLS_DLL_DECL void* tc_newarray_aligned_nothrow(size_t size, std::align_val_t al,
+                                               const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_aligned(void* p, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_sized_aligned(void* p, size_t size, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_aligned_nothrow(void* p, std::align_val_t al,
+                                                 const std::nothrow_t&) PERFTOOLS_NOTHROW;
+#endif
 }
 #endif
 
-#endif  // #ifndef TCMALLOC_TCMALLOC_H_
+/* We're only un-defining for public */
+#if !defined(GPERFTOOLS_CONFIG_H_)
+
+#undef PERFTOOLS_NOTHROW
+
+#endif /* GPERFTOOLS_CONFIG_H_ */
+
+#endif  /* #ifndef TCMALLOC_TCMALLOC_H_ */

diff --git a/src/heap-checker-bcad.cc b/src/heap-checker-bcad.cc
index 00efdb7..8b0dbe1 100644
--- a/src/heap-checker-bcad.cc
+++ b/src/heap-checker-bcad.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/heap-checker.cc b/src/heap-checker.cc
old mode 100755
new mode 100644
index 9c82dea..199fc93
--- a/src/heap-checker.cc
+++ b/src/heap-checker.cc

@@ -131,8 +131,8 @@
 
 // This is the default if you don't link in -lprofiler
 extern "C" {
-ATTRIBUTE_WEAK PERFTOOLS_DLL_DECL bool ProfilingIsEnabledForAllThreads();
-bool ProfilingIsEnabledForAllThreads() { return false; }
+ATTRIBUTE_WEAK PERFTOOLS_DLL_DECL int ProfilingIsEnabledForAllThreads();
+int ProfilingIsEnabledForAllThreads() { return false; }
 }
 
 //----------------------------------------------------------------------
@@ -495,7 +495,7 @@
   InitThreadDisableCounter() {
     perftools_pthread_key_create(&thread_disable_counter_key, NULL);
     // Set up the main thread's value, which we have a special variable for.
-    void* p = (void*)main_thread_counter;   // store the counter directly
+    void* p = (void*)(intptr_t)main_thread_counter;   // store the counter directly
     perftools_pthread_setspecific(thread_disable_counter_key, p);
     use_main_thread_counter = false;
   }
@@ -568,7 +568,7 @@
   if (ptr != NULL) {
     const int counter = get_thread_disable_counter();
     const bool ignore = (counter > 0);
-    RAW_VLOG(16, "Recording Alloc: %p of %" PRIuS "; %d", ptr, size,
+    RAW_VLOG(16, "Recording Alloc: %p of %zu; %d", ptr, size,
              int(counter));
 
     // Fetch the caller's stack trace before acquiring heap_checker_lock.
@@ -588,7 +588,7 @@
         }
       }
     }
-    RAW_VLOG(17, "Alloc Recorded: %p of %" PRIuS "", ptr, size);
+    RAW_VLOG(17, "Alloc Recorded: %p of %zu", ptr, size);
   }
 }
 
@@ -771,14 +771,14 @@
         // and the rest of the region where the stack lives can well
         // contain outdated stack variables which are not live anymore,
         // hence should not be treated as such.
-        RAW_VLOG(11, "Not %s-disabling %" PRIuS " bytes at %p"
+        RAW_VLOG(11, "Not %s-disabling %zu bytes at %p"
                     ": have stack inside: %p",
                     (stack_disable ? "stack" : "range"),
                     info.object_size, ptr, AsPtr(*iter));
         return;
       }
     }
-    RAW_VLOG(11, "%s-disabling %" PRIuS " bytes at %p",
+    RAW_VLOG(11, "%s-disabling %zu bytes at %p",
                 (stack_disable ? "Stack" : "Range"), info.object_size, ptr);
     live_objects->push_back(AllocObject(ptr, info.object_size,
                                         MUST_BE_ON_HEAP));
@@ -1070,7 +1070,7 @@
   if (thread_registers.size()) {
     // Make thread registers be live heap data sources.
     // we rely here on the fact that vector is in one memory chunk:
-    RAW_VLOG(11, "Live registers at %p of %" PRIuS " bytes",
+    RAW_VLOG(11, "Live registers at %p of %zu bytes",
                 &thread_registers[0], thread_registers.size() * sizeof(void*));
     live_objects->push_back(AllocObject(&thread_registers[0],
                                         thread_registers.size() * sizeof(void*),
@@ -1107,7 +1107,7 @@
     for (IgnoredObjectsMap::const_iterator object = ignored_objects->begin();
          object != ignored_objects->end(); ++object) {
       const void* ptr = AsPtr(object->first);
-      RAW_VLOG(11, "Ignored live object at %p of %" PRIuS " bytes",
+      RAW_VLOG(11, "Ignored live object at %p of %zu bytes",
                   ptr, object->second);
       live_objects->
         push_back(AllocObject(ptr, object->second, MUST_BE_ON_HEAP));
@@ -1116,7 +1116,7 @@
       size_t object_size;
       if (!(heap_profile->FindAlloc(ptr, &object_size)  &&
             object->second == object_size)) {
-        RAW_LOG(FATAL, "Object at %p of %" PRIuS " bytes from an"
+        RAW_LOG(FATAL, "Object at %p of %zu bytes from an"
                        " IgnoreObject() has disappeared", ptr, object->second);
       }
     }
@@ -1404,7 +1404,7 @@
       live_object_count += 1;
       live_byte_count += size;
     }
-    RAW_VLOG(13, "Looking for heap pointers in %p of %" PRIuS " bytes",
+    RAW_VLOG(13, "Looking for heap pointers in %p of %zu bytes",
                 object, size);
     const char* const whole_object = object;
     size_t const whole_size = size;
@@ -1475,8 +1475,8 @@
           // a heap object which is in fact leaked.
           // I.e. in very rare and probably not repeatable/lasting cases
           // we might miss some real heap memory leaks.
-          RAW_VLOG(14, "Found pointer to %p of %" PRIuS " bytes at %p "
-                      "inside %p of size %" PRIuS "",
+          RAW_VLOG(14, "Found pointer to %p of %zu bytes at %p "
+                      "inside %p of size %zu",
                       ptr, object_size, object, whole_object, whole_size);
           if (VLOG_IS_ON(15)) {
             // log call stacks to help debug how come something is not a leak
@@ -1523,7 +1523,7 @@
   if (!HaveOnHeapLocked(&ptr, &object_size)) {
     RAW_LOG(ERROR, "No live heap object at %p to ignore", ptr);
   } else {
-    RAW_VLOG(10, "Going to ignore live object at %p of %" PRIuS " bytes",
+    RAW_VLOG(10, "Going to ignore live object at %p of %zu bytes",
                 ptr, object_size);
     if (ignored_objects == NULL)  {
       ignored_objects = new(Allocator::Allocate(sizeof(IgnoredObjectsMap)))
@@ -1550,7 +1550,7 @@
         ignored_objects->erase(object);
         found = true;
         RAW_VLOG(10, "Now not going to ignore live object "
-                    "at %p of %" PRIuS " bytes", ptr, object_size);
+                    "at %p of %zu bytes", ptr, object_size);
       }
     }
     if (!found)  RAW_LOG(FATAL, "Object at %p has not been ignored", ptr);
@@ -1598,8 +1598,8 @@
       const HeapProfileTable::Stats& t = heap_profile->total();
       const size_t start_inuse_bytes = t.alloc_size - t.free_size;
       const size_t start_inuse_allocs = t.allocs - t.frees;
-      RAW_VLOG(10, "Start check \"%s\" profile: %" PRIuS " bytes "
-               "in %" PRIuS " objects",
+      RAW_VLOG(10, "Start check \"%s\" profile: %zu bytes "
+               "in %zu objects",
                name_, start_inuse_bytes, start_inuse_allocs);
     } else {
       RAW_LOG(WARNING, "Heap checker is not active, "

diff --git a/src/heap-profile-stats.h b/src/heap-profile-stats.h
index ae45d58..1e0359a 100644
--- a/src/heap-profile-stats.h
+++ b/src/heap-profile-stats.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2013, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/heap-profile-table.cc b/src/heap-profile-table.cc
index 7486468..93d592c 100644
--- a/src/heap-profile-table.cc
+++ b/src/heap-profile-table.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2006, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -440,22 +440,20 @@
                                     AllocationMap* allocations) {
   RAW_VLOG(1, "Dumping non-live heap profile to %s", file_name);
   RawFD fd = RawOpenForWriting(file_name);
-  if (fd != kIllegalRawFD) {
-    RawWrite(fd, kProfileHeader, strlen(kProfileHeader));
-    char buf[512];
-    int len = UnparseBucket(total, buf, 0, sizeof(buf), " heapprofile",
-                            NULL);
-    RawWrite(fd, buf, len);
-    const DumpArgs args(fd, NULL);
-    allocations->Iterate<const DumpArgs&>(DumpNonLiveIterator, args);
-    RawWrite(fd, kProcSelfMapsHeader, strlen(kProcSelfMapsHeader));
-    DumpProcSelfMaps(fd);
-    RawClose(fd);
-    return true;
-  } else {
+  if (fd == kIllegalRawFD) {
     RAW_LOG(ERROR, "Failed dumping filtered heap profile to %s", file_name);
     return false;
   }
+  RawWrite(fd, kProfileHeader, strlen(kProfileHeader));
+  char buf[512];
+  int len = UnparseBucket(total, buf, 0, sizeof(buf), " heapprofile", NULL);
+  RawWrite(fd, buf, len);
+  const DumpArgs args(fd, NULL);
+  allocations->Iterate<const DumpArgs&>(DumpNonLiveIterator, args);
+  RawWrite(fd, kProcSelfMapsHeader, strlen(kProcSelfMapsHeader));
+  DumpProcSelfMaps(fd);
+  RawClose(fd);
+  return true;
 }
 
 void HeapProfileTable::CleanupOldProfiles(const char* prefix) {
@@ -551,8 +549,8 @@
   // This is only used by the heap leak checker, but is intimately
   // tied to the allocation map that belongs in this module and is
   // therefore placed here.
-  RAW_LOG(ERROR, "Leak check %s detected leaks of %" PRIuS " bytes "
-          "in %" PRIuS " objects",
+  RAW_LOG(ERROR, "Leak check %s detected leaks of %zu bytes "
+          "in %zu objects",
           checker_name,
           size_t(total_.alloc_size),
           size_t(total_.allocs));
@@ -622,7 +620,7 @@
                                               char* unused) {
   // Perhaps also log the allocation stack trace (unsymbolized)
   // on this line in case somebody finds it useful.
-  RAW_LOG(ERROR, "leaked %" PRIuS " byte object %p", v->bytes, ptr);
+  RAW_LOG(ERROR, "leaked %zu byte object %p", v->bytes, ptr);
 }
 
 void HeapProfileTable::Snapshot::ReportIndividualObjects() {

diff --git a/src/heap-profile-table.h b/src/heap-profile-table.h
index 3c62847..afe1319 100644
--- a/src/heap-profile-table.h
+++ b/src/heap-profile-table.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2006, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/heap-profiler.cc b/src/heap-profiler.cc
old mode 100755
new mode 100644
index 17d8697..47df779
--- a/src/heap-profiler.cc
+++ b/src/heap-profiler.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -82,8 +82,8 @@
 #endif
 #endif
 
-using STL_NAMESPACE::string;
-using STL_NAMESPACE::sort;
+using std::string;
+using std::sort;
 
 //----------------------------------------------------------------------
 // Flags that control heap-profiling
@@ -272,7 +272,7 @@
     const int64 inuse_bytes = total.alloc_size - total.free_size;
     bool need_to_dump = false;
     char buf[128];
-    int64 current_time = time(NULL);
+
     if (FLAGS_heap_profile_allocation_interval > 0 &&
         total.alloc_size >=
         last_dump_alloc + FLAGS_heap_profile_allocation_interval) {
@@ -293,13 +293,15 @@
       snprintf(buf, sizeof(buf), "%" PRId64 " MB currently in use",
                inuse_bytes >> 20);
       need_to_dump = true;
-    } else if (FLAGS_heap_profile_time_interval > 0 &&
-               current_time - last_dump_time >=
-               FLAGS_heap_profile_time_interval) {
-      snprintf(buf, sizeof(buf), "%" PRId64 " sec since the last dump",
-               current_time - last_dump_time);
-      need_to_dump = true;
-      last_dump_time = current_time;
+    } else if (FLAGS_heap_profile_time_interval > 0 ) {
+      int64 current_time = time(NULL);
+      if (current_time - last_dump_time >=
+          FLAGS_heap_profile_time_interval) {
+        snprintf(buf, sizeof(buf), "%" PRId64 " sec since the last dump",
+                 current_time - last_dump_time);
+        need_to_dump = true;
+        last_dump_time = current_time;
+      }
     }
     if (need_to_dump) {
       DumpProfileLocked(buf);
@@ -358,11 +360,11 @@
 static void MmapHook(const void* result, const void* start, size_t size,
                      int prot, int flags, int fd, off_t offset) {
   if (FLAGS_mmap_log) {  // log it
-    // We use PRIxS not just '%p' to avoid deadlocks
+    // We use PRIxPTR not just '%p' to avoid deadlocks
     // in pretty-printing of NULL as "nil".
     // TODO(maxim): instead should use a safe snprintf reimplementation
     RAW_LOG(INFO,
-            "mmap(start=0x%" PRIxPTR ", len=%" PRIuS ", prot=0x%x, flags=0x%x, "
+            "mmap(start=0x%" PRIxPTR ", len=%zu, prot=0x%x, flags=0x%x, "
             "fd=%d, offset=0x%x) = 0x%" PRIxPTR "",
             (uintptr_t) start, size, prot, flags, fd, (unsigned int) offset,
             (uintptr_t) result);
@@ -376,12 +378,12 @@
                        size_t old_size, size_t new_size,
                        int flags, const void* new_addr) {
   if (FLAGS_mmap_log) {  // log it
-    // We use PRIxS not just '%p' to avoid deadlocks
+    // We use PRIxPTR not just '%p' to avoid deadlocks
     // in pretty-printing of NULL as "nil".
     // TODO(maxim): instead should use a safe snprintf reimplementation
     RAW_LOG(INFO,
-            "mremap(old_addr=0x%" PRIxPTR ", old_size=%" PRIuS ", "
-            "new_size=%" PRIuS ", flags=0x%x, new_addr=0x%" PRIxPTR ") = "
+            "mremap(old_addr=0x%" PRIxPTR ", old_size=%zu, "
+            "new_size=%zu, flags=0x%x, new_addr=0x%" PRIxPTR ") = "
             "0x%" PRIxPTR "",
             (uintptr_t) old_addr, old_size, new_size, flags,
             (uintptr_t) new_addr, (uintptr_t) result);
@@ -393,10 +395,10 @@
 
 static void MunmapHook(const void* ptr, size_t size) {
   if (FLAGS_mmap_log) {  // log it
-    // We use PRIxS not just '%p' to avoid deadlocks
+    // We use PRIxPTR not just '%p' to avoid deadlocks
     // in pretty-printing of NULL as "nil".
     // TODO(maxim): instead should use a safe snprintf reimplementation
-    RAW_LOG(INFO, "munmap(start=0x%" PRIxPTR ", len=%" PRIuS ")",
+    RAW_LOG(INFO, "munmap(start=0x%" PRIxPTR ", len=%zu)",
                   (uintptr_t) ptr, size);
 #ifdef TODO_REENABLE_STACK_TRACING
     DumpStackTrace(1, RawInfoStackDumper, NULL);
@@ -406,7 +408,7 @@
 
 static void SbrkHook(const void* result, ptrdiff_t increment) {
   if (FLAGS_mmap_log) {  // log it
-    RAW_LOG(INFO, "sbrk(inc=%" PRIdS ") = 0x%" PRIxPTR "",
+    RAW_LOG(INFO, "sbrk(inc=%zd) = 0x%" PRIxPTR "",
                   increment, (uintptr_t) result);
 #ifdef TODO_REENABLE_STACK_TRACING
     DumpStackTrace(1, RawInfoStackDumper, NULL);

diff --git a/src/internal_logging.cc b/src/internal_logging.cc
index 4e7fc87..ca1c86e 100644
--- a/src/internal_logging.cc
+++ b/src/internal_logging.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -45,8 +45,6 @@
 #include "base/logging.h"   // for perftools_vsnprintf
 #include "base/spinlock.h"              // for SpinLockHolder, SpinLock
 
-static const int kLogBufSize = 800;
-
 // Variables for storing crash output.  Allocated statically since we
 // may not be able to heap-allocate while crashing.
 static SpinLock crash_lock(base::LINKER_INITIALIZED);

diff --git a/src/internal_logging.h b/src/internal_logging.h
index 0c300c3..1b0468e 100644
--- a/src/internal_logging.h
+++ b/src/internal_logging.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/libc_override.h b/src/libc_override.h
index c01a97c..c981c3d 100644
--- a/src/libc_override.h
+++ b/src/libc_override.h

@@ -58,6 +58,14 @@
 #endif
 #include <gperftools/tcmalloc.h>
 
+#if __cplusplus >= 201103L || (defined(_MSC_VER) && _MSC_VER >= 1900)
+#define CPP_NOTHROW noexcept
+#define CPP_BADALLOC
+#else
+#define CPP_NOTHROW throw()
+#define CPP_BADALLOC throw(std::bad_alloc)
+#endif
+
 static void ReplaceSystemAlloc();  // defined in the .h files below
 
 // For windows, there are two ways to get tcmalloc.  If we're

diff --git a/src/libc_override_gcc_and_weak.h b/src/libc_override_gcc_and_weak.h
index 818e43d..bb99b69 100644
--- a/src/libc_override_gcc_and_weak.h
+++ b/src/libc_override_gcc_and_weak.h

@@ -44,6 +44,9 @@
 #endif
 #include <gperftools/tcmalloc.h>
 
+#include "getenv_safe.h" // TCMallocGetenvSafe
+#include "base/commandlineflags.h"
+
 #ifndef __THROW    // I guess we're not on a glibc-like system
 # define __THROW   // __THROW is just an optimization, so ok to make it ""
 #endif
@@ -52,24 +55,157 @@
 # error libc_override_gcc_and_weak.h is for gcc distributions only.
 #endif
 
-#define ALIAS(tc_fn)   __attribute__ ((alias (#tc_fn)))
+#define ALIAS(tc_fn)   __attribute__ ((alias (#tc_fn), used))
 
-void* operator new(size_t size) throw (std::bad_alloc)
-    ALIAS(tc_new);
-void operator delete(void* p) __THROW
-    ALIAS(tc_delete);
-void* operator new[](size_t size) throw (std::bad_alloc)
-    ALIAS(tc_newarray);
-void operator delete[](void* p) __THROW
-    ALIAS(tc_deletearray);
-void* operator new(size_t size, const std::nothrow_t& nt) __THROW
-    ALIAS(tc_new_nothrow);
-void* operator new[](size_t size, const std::nothrow_t& nt) __THROW
-    ALIAS(tc_newarray_nothrow);
-void operator delete(void* p, const std::nothrow_t& nt) __THROW
-    ALIAS(tc_delete_nothrow);
-void operator delete[](void* p, const std::nothrow_t& nt) __THROW
-    ALIAS(tc_deletearray_nothrow);
+void* operator new(size_t size) CPP_BADALLOC  ALIAS(tc_new);
+void operator delete(void* p) CPP_NOTHROW     ALIAS(tc_delete);
+void* operator new[](size_t size) CPP_BADALLOC ALIAS(tc_newarray);
+void operator delete[](void* p) CPP_NOTHROW   ALIAS(tc_deletearray);
+void* operator new(size_t size, const std::nothrow_t& nt) CPP_NOTHROW
+                                              ALIAS(tc_new_nothrow);
+void* operator new[](size_t size, const std::nothrow_t& nt) CPP_NOTHROW
+                                              ALIAS(tc_newarray_nothrow);
+void operator delete(void* p, const std::nothrow_t& nt) CPP_NOTHROW
+                                              ALIAS(tc_delete_nothrow);
+void operator delete[](void* p, const std::nothrow_t& nt) CPP_NOTHROW
+                                              ALIAS(tc_deletearray_nothrow);
+
+#if defined(ENABLE_SIZED_DELETE)
+
+void operator delete(void *p, size_t size) CPP_NOTHROW
+    ALIAS(tc_delete_sized);
+void operator delete[](void *p, size_t size) CPP_NOTHROW
+    ALIAS(tc_deletearray_sized);
+
+#elif defined(ENABLE_DYNAMIC_SIZED_DELETE) && \
+  (__GNUC__ * 100 + __GNUC_MINOR__) >= 405
+
+static void delegate_sized_delete(void *p, size_t s) {
+  (operator delete)(p);
+}
+
+static void delegate_sized_deletearray(void *p, size_t s) {
+  (operator delete[])(p);
+}
+
+extern "C" __attribute__((weak))
+int tcmalloc_sized_delete_enabled(void);
+
+static bool sized_delete_enabled(void) {
+  if (tcmalloc_sized_delete_enabled != 0) {
+    return !!tcmalloc_sized_delete_enabled();
+  }
+
+  const char *flag = TCMallocGetenvSafe("TCMALLOC_ENABLE_SIZED_DELETE");
+  return tcmalloc::commandlineflags::StringToBool(flag, false);
+}
+
+extern "C" {
+
+static void *resolve_delete_sized(void) {
+  if (sized_delete_enabled()) {
+    return reinterpret_cast<void *>(tc_delete_sized);
+  }
+  return reinterpret_cast<void *>(delegate_sized_delete);
+}
+
+static void *resolve_deletearray_sized(void) {
+  if (sized_delete_enabled()) {
+    return reinterpret_cast<void *>(tc_deletearray_sized);
+  }
+  return reinterpret_cast<void *>(delegate_sized_deletearray);
+}
+
+}
+
+void operator delete(void *p, size_t size) CPP_NOTHROW
+  __attribute__((ifunc("resolve_delete_sized")));
+void operator delete[](void *p, size_t size) CPP_NOTHROW
+  __attribute__((ifunc("resolve_deletearray_sized")));
+
+#else /* !ENABLE_SIZED_DELETE && !ENABLE_DYN_SIZED_DELETE */
+
+void operator delete(void *p, size_t size) CPP_NOTHROW
+  ALIAS(tc_delete_sized);
+void operator delete[](void *p, size_t size) CPP_NOTHROW
+  ALIAS(tc_deletearray_sized);
+
+#endif /* !ENABLE_SIZED_DELETE && !ENABLE_DYN_SIZED_DELETE */
+
+#if defined(ENABLE_ALIGNED_NEW_DELETE)
+
+void* operator new(size_t size, std::align_val_t al)
+    ALIAS(tc_new_aligned);
+void operator delete(void* p, std::align_val_t al) CPP_NOTHROW
+    ALIAS(tc_delete_aligned);
+void* operator new[](size_t size, std::align_val_t al)
+    ALIAS(tc_newarray_aligned);
+void operator delete[](void* p, std::align_val_t al) CPP_NOTHROW
+    ALIAS(tc_deletearray_aligned);
+void* operator new(size_t size, std::align_val_t al, const std::nothrow_t& nt) CPP_NOTHROW
+    ALIAS(tc_new_aligned_nothrow);
+void* operator new[](size_t size, std::align_val_t al, const std::nothrow_t& nt) CPP_NOTHROW
+    ALIAS(tc_newarray_aligned_nothrow);
+void operator delete(void* p, std::align_val_t al, const std::nothrow_t& nt) CPP_NOTHROW
+    ALIAS(tc_delete_aligned_nothrow);
+void operator delete[](void* p, std::align_val_t al, const std::nothrow_t& nt) CPP_NOTHROW
+    ALIAS(tc_deletearray_aligned_nothrow);
+
+#if defined(ENABLE_SIZED_DELETE)
+
+void operator delete(void *p, size_t size, std::align_val_t al) CPP_NOTHROW
+    ALIAS(tc_delete_sized_aligned);
+void operator delete[](void *p, size_t size, std::align_val_t al) CPP_NOTHROW
+    ALIAS(tc_deletearray_sized_aligned);
+
+#else /* defined(ENABLE_SIZED_DELETE) */
+
+#if defined(ENABLE_DYNAMIC_SIZED_DELETE) && \
+  (__GNUC__ * 100 + __GNUC_MINOR__) >= 405
+
+static void delegate_sized_aligned_delete(void *p, size_t s, std::align_val_t al) {
+  (operator delete)(p, al);
+}
+
+static void delegate_sized_aligned_deletearray(void *p, size_t s, std::align_val_t al) {
+  (operator delete[])(p, al);
+}
+
+extern "C" {
+
+static void *resolve_delete_sized_aligned(void) {
+  if (sized_delete_enabled()) {
+    return reinterpret_cast<void *>(tc_delete_sized_aligned);
+  }
+  return reinterpret_cast<void *>(delegate_sized_aligned_delete);
+}
+
+static void *resolve_deletearray_sized_aligned(void) {
+  if (sized_delete_enabled()) {
+    return reinterpret_cast<void *>(tc_deletearray_sized_aligned);
+  }
+  return reinterpret_cast<void *>(delegate_sized_aligned_deletearray);
+}
+
+}
+
+void operator delete(void *p, size_t size, std::align_val_t al) CPP_NOTHROW
+  __attribute__((ifunc("resolve_delete_sized_aligned")));
+void operator delete[](void *p, size_t size, std::align_val_t al) CPP_NOTHROW
+  __attribute__((ifunc("resolve_deletearray_sized_aligned")));
+
+#else /* defined(ENABLE_DYN_SIZED_DELETE) */
+
+void operator delete(void *p, size_t size, std::align_val_t al) CPP_NOTHROW
+  ALIAS(tc_delete_sized_aligned);
+void operator delete[](void *p, size_t size, std::align_val_t al) CPP_NOTHROW
+  ALIAS(tc_deletearray_sized_aligned);
+
+#endif /* defined(ENABLE_DYN_SIZED_DELETE) */
+
+#endif /* defined(ENABLE_SIZED_DELETE) */
+
+#endif /* defined(ENABLE_ALIGNED_NEW_DELETE) */
 
 extern "C" {
   void* malloc(size_t size) __THROW               ALIAS(tc_malloc);
@@ -78,6 +214,7 @@
   void* calloc(size_t n, size_t size) __THROW     ALIAS(tc_calloc);
   void cfree(void* ptr) __THROW                   ALIAS(tc_cfree);
   void* memalign(size_t align, size_t s) __THROW  ALIAS(tc_memalign);
+  void* aligned_alloc(size_t align, size_t s) __THROW ALIAS(tc_memalign);
   void* valloc(size_t size) __THROW               ALIAS(tc_valloc);
   void* pvalloc(size_t size) __THROW              ALIAS(tc_pvalloc);
   int posix_memalign(void** r, size_t a, size_t s) __THROW

diff --git a/src/libc_override_glibc.h b/src/libc_override_glibc.h
index b6843e1..3269213 100644
--- a/src/libc_override_glibc.h
+++ b/src/libc_override_glibc.h

@@ -38,9 +38,6 @@
 
 #include <config.h>
 #include <features.h>     // for __GLIBC__
-#ifdef HAVE_SYS_CDEFS_H
-#include <sys/cdefs.h>    // for __THROW
-#endif
 #include <gperftools/tcmalloc.h>
 
 #ifndef __GLIBC__
@@ -89,60 +86,6 @@
 
 #endif  // #if defined(__GNUC__) && !defined(__MACH__)
 
-
-// We also have to hook libc malloc.  While our work with weak symbols
-// should make sure libc malloc is never called in most situations, it
-// can be worked around by shared libraries with the DEEPBIND
-// environment variable set.  The below hooks libc to call our malloc
-// routines even in that situation.  In other situations, this hook
-// should never be called.
-extern "C" {
-static void* glibc_override_malloc(size_t size, const void *caller) {
-  return tc_malloc(size);
-}
-static void* glibc_override_realloc(void *ptr, size_t size,
-                                    const void *caller) {
-  return tc_realloc(ptr, size);
-}
-static void glibc_override_free(void *ptr, const void *caller) {
-  tc_free(ptr);
-}
-static void* glibc_override_memalign(size_t align, size_t size,
-                                     const void *caller) {
-  return tc_memalign(align, size);
-}
-
-// We should be using __malloc_initialize_hook here, like the #if 0
-// code below.  (See http://swoolley.org/man.cgi/3/malloc_hook.)
-// However, this causes weird linker errors with programs that link
-// with -static, so instead we just assign the vars directly at
-// static-constructor time.  That should serve the same effect of
-// making sure the hooks are set before the first malloc call the
-// program makes.
-#if 0
-#include <malloc.h>  // for __malloc_hook, etc.
-void glibc_override_malloc_init_hook(void) {
-  __malloc_hook = glibc_override_malloc;
-  __realloc_hook = glibc_override_realloc;
-  __free_hook = glibc_override_free;
-  __memalign_hook = glibc_override_memalign;
-}
-
-void (* MALLOC_HOOK_MAYBE_VOLATILE __malloc_initialize_hook)(void)
-    = &glibc_override_malloc_init_hook;
-#endif
-
-void* (* MALLOC_HOOK_MAYBE_VOLATILE __malloc_hook)(size_t, const void*)
-    = &glibc_override_malloc;
-void* (* MALLOC_HOOK_MAYBE_VOLATILE __realloc_hook)(void*, size_t, const void*)
-    = &glibc_override_realloc;
-void (* MALLOC_HOOK_MAYBE_VOLATILE __free_hook)(void*, const void*)
-    = &glibc_override_free;
-void* (* MALLOC_HOOK_MAYBE_VOLATILE __memalign_hook)(size_t,size_t, const void*)
-    = &glibc_override_memalign;
-
-}   // extern "C"
-
 // No need to write ReplaceSystemAlloc(); one of the #includes above
 // did it for us.
 

diff --git a/src/libc_override_osx.h b/src/libc_override_osx.h
index b801f22..9d5d611 100644
--- a/src/libc_override_osx.h
+++ b/src/libc_override_osx.h

@@ -211,6 +211,33 @@
   size_t malloc_usable_size(void* p)     { return tc_malloc_size(p); }
 }  // extern "C"
 
+static malloc_zone_t *get_default_zone() {
+   malloc_zone_t **zones = NULL;
+   unsigned int num_zones = 0;
+
+   /*
+    * On OSX 10.12, malloc_default_zone returns a special zone that is not
+    * present in the list of registered zones. That zone uses a "lite zone"
+    * if one is present (apparently enabled when malloc stack logging is
+    * enabled), or the first registered zone otherwise. In practice this
+    * means unless malloc stack logging is enabled, the first registered
+    * zone is the default.
+    * So get the list of zones to get the first one, instead of relying on
+    * malloc_default_zone.
+    */
+   if (KERN_SUCCESS != malloc_get_all_zones(0, NULL, (vm_address_t**) &zones,
+                                            &num_zones)) {
+       /* Reset the value in case the failure happened after it was set. */
+       num_zones = 0;
+   }
+
+   if (num_zones)
+     return zones[0];
+
+   return malloc_default_zone();
+}
+
+
 static void ReplaceSystemAlloc() {
   static malloc_introspection_t tcmalloc_introspection;
   memset(&tcmalloc_introspection, 0, sizeof(tcmalloc_introspection));
@@ -273,7 +300,7 @@
   // zone.  The default zone is then re-registered to ensure that
   // allocations made from it earlier will be handled correctly.
   // Things are not guaranteed to work that way, but it's how they work now.
-  malloc_zone_t *default_zone = malloc_default_zone();
+  malloc_zone_t *default_zone = get_default_zone();
   malloc_zone_unregister(default_zone);
   malloc_zone_register(default_zone);
 }

diff --git a/src/libc_override_redefine.h b/src/libc_override_redefine.h
index a1e50f8..4d61b25 100644
--- a/src/libc_override_redefine.h
+++ b/src/libc_override_redefine.h

@@ -42,49 +42,86 @@
 #ifndef TCMALLOC_LIBC_OVERRIDE_REDEFINE_H_
 #define TCMALLOC_LIBC_OVERRIDE_REDEFINE_H_
 
-#ifdef HAVE_SYS_CDEFS_H
-#include <sys/cdefs.h>    // for __THROW
-#endif
-
-#ifndef __THROW    // I guess we're not on a glibc-like system
-# define __THROW   // __THROW is just an optimization, so ok to make it ""
-#endif
-
 void* operator new(size_t size)                  { return tc_new(size);       }
-void operator delete(void* p) __THROW            { tc_delete(p);              }
+void operator delete(void* p) CPP_NOTHROW        { tc_delete(p);              }
 void* operator new[](size_t size)                { return tc_newarray(size);  }
-void operator delete[](void* p) __THROW          { tc_deletearray(p);         }
-void* operator new(size_t size, const std::nothrow_t& nt) __THROW {
+void operator delete[](void* p) CPP_NOTHROW      { tc_deletearray(p);         }
+void* operator new(size_t size, const std::nothrow_t& nt) CPP_NOTHROW {
   return tc_new_nothrow(size, nt);
 }
-void* operator new[](size_t size, const std::nothrow_t& nt) __THROW {
+void* operator new[](size_t size, const std::nothrow_t& nt) CPP_NOTHROW {
   return tc_newarray_nothrow(size, nt);
 }
-void operator delete(void* ptr, const std::nothrow_t& nt) __THROW {
+void operator delete(void* ptr, const std::nothrow_t& nt) CPP_NOTHROW {
   return tc_delete_nothrow(ptr, nt);
 }
-void operator delete[](void* ptr, const std::nothrow_t& nt) __THROW {
+void operator delete[](void* ptr, const std::nothrow_t& nt) CPP_NOTHROW {
   return tc_deletearray_nothrow(ptr, nt);
 }
+
+#ifdef ENABLE_SIZED_DELETE
+void operator delete(void* p, size_t s) CPP_NOTHROW  { tc_delete_sized(p, s);     }
+void operator delete[](void* p, size_t s) CPP_NOTHROW{ tc_deletearray_sized(p, s);}
+#endif
+
+#if defined(ENABLE_ALIGNED_NEW_DELETE)
+
+void* operator new(size_t size, std::align_val_t al) {
+  return tc_new_aligned(size, al);
+}
+void operator delete(void* p, std::align_val_t al) CPP_NOTHROW {
+  tc_delete_aligned(p, al);
+}
+void* operator new[](size_t size, std::align_val_t al) {
+  return tc_newarray_aligned(size, al);
+}
+void operator delete[](void* p, std::align_val_t al) CPP_NOTHROW {
+  tc_deletearray_aligned(p, al);
+}
+void* operator new(size_t size, std::align_val_t al, const std::nothrow_t& nt) CPP_NOTHROW {
+  return tc_new_aligned_nothrow(size, al, nt);
+}
+void* operator new[](size_t size, std::align_val_t al, const std::nothrow_t& nt) CPP_NOTHROW {
+  return tc_newarray_aligned_nothrow(size, al, nt);
+}
+void operator delete(void* ptr, std::align_val_t al, const std::nothrow_t& nt) CPP_NOTHROW {
+  return tc_delete_aligned_nothrow(ptr, al, nt);
+}
+void operator delete[](void* ptr, std::align_val_t al, const std::nothrow_t& nt) CPP_NOTHROW {
+  return tc_deletearray_aligned_nothrow(ptr, al, nt);
+}
+
+#ifdef ENABLE_SIZED_DELETE
+void operator delete(void* p, size_t s, std::align_val_t al) CPP_NOTHROW {
+  tc_delete_sized_aligned(p, s, al);
+}
+void operator delete[](void* p, size_t s, std::align_val_t al) CPP_NOTHROW {
+  tc_deletearray_sized_aligned(p, s, al);
+}
+#endif
+
+#endif // defined(ENABLE_ALIGNED_NEW_DELETE)
+
 extern "C" {
-  void* malloc(size_t s) __THROW                 { return tc_malloc(s);       }
-  void  free(void* p) __THROW                    { tc_free(p);                }
-  void* realloc(void* p, size_t s) __THROW       { return tc_realloc(p, s);   }
-  void* calloc(size_t n, size_t s) __THROW       { return tc_calloc(n, s);    }
-  void  cfree(void* p) __THROW                   { tc_cfree(p);               }
-  void* memalign(size_t a, size_t s) __THROW     { return tc_memalign(a, s);  }
-  void* valloc(size_t s) __THROW                 { return tc_valloc(s);       }
-  void* pvalloc(size_t s) __THROW                { return tc_pvalloc(s);      }
-  int posix_memalign(void** r, size_t a, size_t s) __THROW {
+  void* malloc(size_t s)                         { return tc_malloc(s);       }
+  void  free(void* p)                            { tc_free(p);                }
+  void* realloc(void* p, size_t s)               { return tc_realloc(p, s);   }
+  void* calloc(size_t n, size_t s)               { return tc_calloc(n, s);    }
+  void  cfree(void* p)                           { tc_cfree(p);               }
+  void* memalign(size_t a, size_t s)             { return tc_memalign(a, s);  }
+  void* aligned_alloc(size_t a, size_t s)        { return tc_memalign(a, s);  }
+  void* valloc(size_t s)                         { return tc_valloc(s);       }
+  void* pvalloc(size_t s)                        { return tc_pvalloc(s);      }
+  int posix_memalign(void** r, size_t a, size_t s)         {
     return tc_posix_memalign(r, a, s);
   }
-  void malloc_stats(void) __THROW                { tc_malloc_stats();         }
-  int mallopt(int cmd, int v) __THROW            { return tc_mallopt(cmd, v); }
+  void malloc_stats(void)                        { tc_malloc_stats();         }
+  int mallopt(int cmd, int v)                    { return tc_mallopt(cmd, v); }
 #ifdef HAVE_STRUCT_MALLINFO
-  struct mallinfo mallinfo(void) __THROW         { return tc_mallinfo();      }
+  struct mallinfo mallinfo(void)                 { return tc_mallinfo();      }
 #endif
-  size_t malloc_size(void* p) __THROW            { return tc_malloc_size(p); }
-  size_t malloc_usable_size(void* p) __THROW     { return tc_malloc_size(p); }
+  size_t malloc_size(void* p)                    { return tc_malloc_size(p); }
+  size_t malloc_usable_size(void* p)             { return tc_malloc_size(p); }
 }  // extern "C"
 
 // No need to do anything at tcmalloc-registration time: we do it all

diff --git a/src/linked_list.h b/src/linked_list.h
index 66a0741..f25b6f8 100644
--- a/src/linked_list.h
+++ b/src/linked_list.h

@@ -50,8 +50,9 @@
 }
 
 inline void SLL_Push(void **list, void *element) {
-  SLL_SetNext(element, *list);
+  void *next = *list;
   *list = element;
+  SLL_SetNext(element, next);
 }
 
 inline void *SLL_Pop(void **list) {
@@ -60,6 +61,17 @@
   return result;
 }
 
+inline bool SLL_TryPop(void **list, void **rv) {
+  void *result = *list;
+  if (!result) {
+    return false;
+  }
+  void *next = SLL_Next(*list);
+  *list = next;
+  *rv = result;
+  return true;
+}
+
 // Remove N elements from a linked list to which head points.  head will be
 // modified to point to the new head.  start and end will point to the first
 // and last nodes of the range.  Note that end will point to NULL after this

diff --git a/src/malloc_extension.cc b/src/malloc_extension.cc
index 4ff719c..68cb98a 100644
--- a/src/malloc_extension.cc
+++ b/src/malloc_extension.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -53,8 +53,8 @@
 #include "maybe_threads.h"
 #include "base/googleinit.h"
 
-using STL_NAMESPACE::string;
-using STL_NAMESPACE::vector;
+using std::string;
+using std::vector;
 
 static void DumpAddressMap(string* result) {
   *result += "\nMAPPED_LIBRARIES:\n";
@@ -193,6 +193,14 @@
   v->clear();
 }
 
+size_t MallocExtension::GetThreadCacheSize() {
+  return 0;
+}
+
+void MallocExtension::MarkThreadTemporarilyIdle() {
+  // Default implementation does nothing
+}
+
 // The current malloc extension object.
 
 static MallocExtension* current_instance;
@@ -369,6 +377,8 @@
 C_SHIM(ReleaseToSystem, void, (size_t num_bytes), (num_bytes));
 C_SHIM(GetEstimatedAllocatedSize, size_t, (size_t size), (size));
 C_SHIM(GetAllocatedSize, size_t, (const void* p), (p));
+C_SHIM(GetThreadCacheSize, size_t, (void), ());
+C_SHIM(MarkThreadTemporarilyIdle, void, (void), ());
 
 // Can't use the shim here because of the need to translate the enums.
 extern "C"

diff --git a/src/malloc_hook-inl.h b/src/malloc_hook-inl.h
index 9e74ec8..b07704e 100644
--- a/src/malloc_hook-inl.h
+++ b/src/malloc_hook-inl.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -44,6 +44,8 @@
 #include "base/basictypes.h"
 #include <gperftools/malloc_hook.h>
 
+#include "common.h" // for UNLIKELY
+
 namespace base { namespace internal {
 
 // Capacity of 8 means that HookList is 9 words.
@@ -121,7 +123,7 @@
 }
 
 inline void MallocHook::InvokeNewHook(const void* p, size_t s) {
-  if (!base::internal::new_hooks_.empty()) {
+  if (PREDICT_FALSE(!base::internal::new_hooks_.empty())) {
     InvokeNewHookSlow(p, s);
   }
 }
@@ -132,7 +134,7 @@
 }
 
 inline void MallocHook::InvokeDeleteHook(const void* p) {
-  if (!base::internal::delete_hooks_.empty()) {
+  if (PREDICT_FALSE(!base::internal::delete_hooks_.empty())) {
     InvokeDeleteHookSlow(p);
   }
 }

diff --git a/src/malloc_hook.cc b/src/malloc_hook.cc
index 681d8a2..9d5741e 100644
--- a/src/malloc_hook.cc
+++ b/src/malloc_hook.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -49,6 +49,7 @@
 #include <algorithm>
 #include "base/logging.h"
 #include "base/spinlock.h"
+#include "maybe_emergency_malloc.h"
 #include "maybe_threads.h"
 #include "malloc_hook-inl.h"
 #include <gperftools/malloc_hook.h>
@@ -491,10 +492,16 @@
 
 
 void MallocHook::InvokeNewHookSlow(const void* p, size_t s) {
+  if (tcmalloc::IsEmergencyPtr(p)) {
+    return;
+  }
   INVOKE_HOOKS(NewHook, new_hooks_, (p, s));
 }
 
 void MallocHook::InvokeDeleteHookSlow(const void* p) {
+  if (tcmalloc::IsEmergencyPtr(p)) {
+    return;
+  }
   INVOKE_HOOKS(DeleteHook, delete_hooks_, (p));
 }
 
@@ -560,6 +567,8 @@
 
 #undef INVOKE_HOOKS
 
+#ifndef NO_TCMALLOC_SAMPLES
+
 DEFINE_ATTRIBUTE_SECTION_VARS(google_malloc);
 DECLARE_ATTRIBUTE_SECTION_VARS(google_malloc);
   // actual functions are in debugallocation.cc or tcmalloc.cc
@@ -605,6 +614,8 @@
   }
 }
 
+#endif // !NO_TCMALLOC_SAMPLES
+
 // We can improve behavior/compactness of this function
 // if we pass a generic test function (with a generic arg)
 // into the implementations for GetStackTrace instead of the skip_count.
@@ -636,6 +647,14 @@
     return 0;
   for (int i = 0; i < depth; ++i) {  // stack[0] is our immediate caller
     if (InHookCaller(stack[i])) {
+      // fast-path to slow-path calls may be implemented by compiler
+      // as non-tail calls. Causing two functions on stack trace to be
+      // inside google_malloc. In such case we're skipping to
+      // outermost such frame since this is where malloc stack frames
+      // really start.
+      while (i + 1 < depth && InHookCaller(stack[i+1])) {
+        i++;
+      }
       RAW_VLOG(10, "Found hooked allocator at %d: %p <- %p",
                    i, stack[i], stack[i+1]);
       i += 1;  // skip hook caller frame

diff --git a/src/malloc_hook_mmap_linux.h b/src/malloc_hook_mmap_linux.h
old mode 100755
new mode 100644
index 0f531db..cbf3782
--- a/src/malloc_hook_mmap_linux.h
+++ b/src/malloc_hook_mmap_linux.h

@@ -40,11 +40,10 @@
 # error Should only be including malloc_hook_mmap_linux.h on linux systems.
 #endif
 
-#include <unistd.h>
-#include <syscall.h>
-#include <sys/mman.h>
 #include <errno.h>
-#include "base/linux_syscall_support.h"
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <unistd.h>
 
 // The x86-32 case and the x86-64 case differ:
 // 32b has a mmap2() syscall, 64b does not.
@@ -52,12 +51,23 @@
 
 // I test for 64-bit first so I don't have to do things like
 // '#if (defined(__mips__) && !defined(__MIPS64__))' as a mips32 check.
-#if defined(__x86_64__) || defined(__PPC64__) || defined(__aarch64__) || (defined(_MIPS_SIM) && _MIPS_SIM == _ABI64)
+#if defined(__x86_64__) \
+    || defined(__PPC64__) \
+    || defined(__aarch64__) \
+    || (defined(_MIPS_SIM) && (_MIPS_SIM == _ABI64 || _MIPS_SIM == _ABIN32)) \
+    || defined(__s390__) || (defined(__riscv) && __riscv_xlen == 64) \
+    || defined(__e2k__)
 
 static inline void* do_mmap64(void *start, size_t length,
                               int prot, int flags,
-                              int fd, __off64_t offset) __THROW {
-  return sys_mmap(start, length, prot, flags, fd, offset);
+                              int fd, off64_t offset) __THROW {
+#if defined(__s390__)
+  long args[6] = { (long)start, (long)length, (long)prot, (long)flags,
+                   (long)fd, (long)offset };
+  return (void*)syscall(SYS_mmap, args);
+#else
+  return (void*)syscall(SYS_mmap, start, length, prot, flags, fd, offset);
+#endif
 }
 
 #define MALLOC_HOOK_HAVE_DO_MMAP64 1
@@ -67,7 +77,7 @@
 
 static inline void* do_mmap64(void *start, size_t length,
                               int prot, int flags,
-                              int fd, __off64_t offset) __THROW {
+                              int fd, off64_t offset) __THROW {
   void *result;
 
   // Try mmap2() unless it's not supported
@@ -133,12 +143,13 @@
 // malloc_hook section,
 // so that MallocHook::GetCallerStackTrace can function accurately:
 
-// Make sure mmap doesn't get #define'd away by <sys/mman.h>
+// Make sure mmap64 and mmap doesn't get #define'd away by <sys/mman.h>
+# undef mmap64
 # undef mmap
 
 extern "C" {
   void* mmap64(void *start, size_t length, int prot, int flags,
-               int fd, __off64_t offset  ) __THROW
+               int fd, off64_t offset  ) __THROW
     ATTRIBUTE_SECTION(malloc_hook);
   void* mmap(void *start, size_t length,int prot, int flags,
              int fd, off_t offset) __THROW
@@ -148,12 +159,12 @@
   void* mremap(void* old_addr, size_t old_size, size_t new_size,
                int flags, ...) __THROW
     ATTRIBUTE_SECTION(malloc_hook);
-  void* sbrk(ptrdiff_t increment) __THROW
+  void* sbrk(intptr_t increment) __THROW
     ATTRIBUTE_SECTION(malloc_hook);
 }
 
 extern "C" void* mmap64(void *start, size_t length, int prot, int flags,
-                        int fd, __off64_t offset) __THROW {
+                        int fd, off64_t offset) __THROW {
   MallocHook::InvokePreMmapHook(start, length, prot, flags, fd, offset);
   void *result;
   if (!MallocHook::InvokeMmapReplacement(
@@ -185,7 +196,7 @@
   MallocHook::InvokeMunmapHook(start, length);
   int result;
   if (!MallocHook::InvokeMunmapReplacement(start, length, &result)) {
-    result = sys_munmap(start, length);
+    result = syscall(SYS_munmap, start, length);
   }
   return result;
 }
@@ -196,17 +207,18 @@
   va_start(ap, flags);
   void *new_address = va_arg(ap, void *);
   va_end(ap);
-  void* result = sys_mremap(old_addr, old_size, new_size, flags, new_address);
+  void* result = (void*)syscall(SYS_mremap, old_addr, old_size, new_size, flags,
+                                new_address);
   MallocHook::InvokeMremapHook(result, old_addr, old_size, new_size, flags,
                                new_address);
   return result;
 }
 
-#ifndef __UCLIBC__
+#ifdef HAVE___SBRK
 // libc's version:
-extern "C" void* __sbrk(ptrdiff_t increment);
+extern "C" void* __sbrk(intptr_t increment);
 
-extern "C" void* sbrk(ptrdiff_t increment) __THROW {
+extern "C" void* sbrk(intptr_t increment) __THROW {
   MallocHook::InvokePreSbrkHook(increment);
   void *result = __sbrk(increment);
   MallocHook::InvokeSbrkHook(result, increment);

diff --git a/src/maybe_emergency_malloc.h b/src/maybe_emergency_malloc.h
new file mode 100644
index 0000000..250ecf0
--- /dev/null
+++ b/src/maybe_emergency_malloc.h

@@ -0,0 +1,55 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// Copyright (c) 2014, gperftools Contributors
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+#ifndef MAYBE_EMERGENCY_MALLOC_H
+#define MAYBE_EMERGENCY_MALLOC_H
+
+#include "config.h"
+
+#ifdef ENABLE_EMERGENCY_MALLOC
+
+#include "emergency_malloc.h"
+
+#else
+
+namespace tcmalloc {
+  static inline void *EmergencyMalloc(size_t size) {return NULL;}
+  static inline void EmergencyFree(void *p) {}
+  static inline void *EmergencyCalloc(size_t n, size_t elem_size) {return NULL;}
+  static inline void *EmergencyRealloc(void *old_ptr, size_t new_size) {return NULL;}
+
+  static inline bool IsEmergencyPtr(const void *_ptr) {
+    return false;
+  }
+}
+
+#endif // ENABLE_EMERGENCY_MALLOC
+
+#endif

diff --git a/src/maybe_threads.cc b/src/maybe_threads.cc
index 6dd0d8d..f973fbf 100644
--- a/src/maybe_threads.cc
+++ b/src/maybe_threads.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -48,6 +48,7 @@
 #include <string>
 #include "maybe_threads.h"
 #include "base/basictypes.h"
+#include "base/logging.h"
 
 // __THROW is defined in glibc systems.  It means, counter-intuitively,
 // "This function will never throw an exception."  It's an optional
@@ -68,6 +69,12 @@
       __THROW ATTRIBUTE_WEAK;
   int pthread_once(pthread_once_t *, void (*)(void))
       ATTRIBUTE_WEAK;
+#ifdef HAVE_FORK
+  int pthread_atfork(void (*__prepare) (void),
+                     void (*__parent) (void),
+                     void (*__child) (void))
+    __THROW ATTRIBUTE_WEAK;
+#endif
 }
 
 #define MAX_PERTHREAD_VALS 16
@@ -155,3 +162,16 @@
     return 0;
   }
 }
+
+#ifdef HAVE_FORK
+
+void perftools_pthread_atfork(void (*before)(),
+                              void (*parent_after)(),
+                              void (*child_after)()) {
+  if (pthread_atfork) {
+    int rv = pthread_atfork(before, parent_after, child_after);
+    CHECK(rv == 0);
+  }
+}
+
+#endif

diff --git a/src/maybe_threads.h b/src/maybe_threads.h
index b60f4ef..00f6969 100644
--- a/src/maybe_threads.h
+++ b/src/maybe_threads.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -51,4 +51,11 @@
 int perftools_pthread_once(pthread_once_t *ctl,
                            void  (*init_routine) (void));
 
+// Our wrapper for pthread_atfork. Does _nothing_ when there are no
+// threads. See static_vars.cc:SetupAtForkLocksHandler for only user
+// of this.
+void perftools_pthread_atfork(void (*before)(),
+                              void (*parent_after)(),
+                              void (*child_after)());
+
 #endif  /* GOOGLE_MAYBE_THREADS_H_ */

diff --git a/src/memfs_malloc.cc b/src/memfs_malloc.cc
index ce20891..ef0ba5c 100644
--- a/src/memfs_malloc.cc
+++ b/src/memfs_malloc.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -85,6 +85,10 @@
 DEFINE_bool(memfs_malloc_map_private,
             EnvToBool("TCMALLOC_MEMFS_MAP_PRIVATE", false),
 	    "Use MAP_PRIVATE with mmap");
+DEFINE_bool(memfs_malloc_disable_fallback,
+            EnvToBool("TCMALLOC_MEMFS_DISABLE_FALLBACK", false),
+            "If we run out of hugepage memory don't fallback to default "
+            "allocator.");
 
 // Hugetlbfs based allocator for tcmalloc
 class HugetlbSysAllocator: public SysAllocator {
@@ -111,19 +115,23 @@
 
   SysAllocator* fallback_;  // Default system allocator to fall back to.
 };
-static char hugetlb_space[sizeof(HugetlbSysAllocator)];
+static union {
+  char buf[sizeof(HugetlbSysAllocator)];
+  void *ptr;
+} hugetlb_space;
 
 // No locking needed here since we assume that tcmalloc calls
 // us with an internal lock held (see tcmalloc/system-alloc.cc).
 void* HugetlbSysAllocator::Alloc(size_t size, size_t *actual_size,
                                  size_t alignment) {
-  if (failed_) {
+  if (!FLAGS_memfs_malloc_disable_fallback && failed_) {
     return fallback_->Alloc(size, actual_size, alignment);
   }
 
   // We don't respond to allocation requests smaller than big_page_size_ unless
   // the caller is ok to take more than they asked for. Used by MetaDataAlloc.
-  if (actual_size == NULL && size < big_page_size_) {
+  if (!FLAGS_memfs_malloc_disable_fallback &&
+      actual_size == NULL && size < big_page_size_) {
     return fallback_->Alloc(size, actual_size, alignment);
   }
 
@@ -132,13 +140,15 @@
   if (new_alignment < big_page_size_) new_alignment = big_page_size_;
   size_t aligned_size = ((size + new_alignment - 1) /
                          new_alignment) * new_alignment;
-  if (aligned_size < size) {
+  if (!FLAGS_memfs_malloc_disable_fallback && aligned_size < size) {
     return fallback_->Alloc(size, actual_size, alignment);
   }
 
   void* result = AllocInternal(aligned_size, actual_size, new_alignment);
   if (result != NULL) {
     return result;
+  } else if (FLAGS_memfs_malloc_disable_fallback) {
+    return NULL;
   }
   Log(kLog, __FILE__, __LINE__,
       "HugetlbSysAllocator: (failed, allocated)", failed_, hugetlb_base_);
@@ -258,7 +268,8 @@
 REGISTER_MODULE_INITIALIZER(memfs_malloc, {
   if (FLAGS_memfs_malloc_path.length()) {
     SysAllocator* alloc = MallocExtension::instance()->GetSystemAllocator();
-    HugetlbSysAllocator* hp = new (hugetlb_space) HugetlbSysAllocator(alloc);
+    HugetlbSysAllocator* hp =
+      new (hugetlb_space.buf) HugetlbSysAllocator(alloc);
     if (hp->Initialize()) {
       MallocExtension::instance()->SetSystemAllocator(hp);
     }

diff --git a/src/memory_region_map.cc b/src/memory_region_map.cc
old mode 100755
new mode 100644
index e885859..5fb17d3
--- a/src/memory_region_map.cc
+++ b/src/memory_region_map.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -120,6 +120,7 @@
 
 #include "memory_region_map.h"
 
+#include "base/googleinit.h"
 #include "base/logging.h"
 #include "base/low_level_alloc.h"
 #include "malloc_hook-inl.h"
@@ -162,7 +163,8 @@
 // Simple hook into execution of global object constructors,
 // so that we do not call pthread_self() when it does not yet work.
 static bool libpthread_initialized = false;
-static bool initializer = (libpthread_initialized = true, true);
+REGISTER_MODULE_INITIALIZER(libpthread_initialized_setter,
+                            libpthread_initialized = true);
 
 static inline bool current_thread_is(pthread_t should_be) {
   // Before main() runs, there's only one thread, so we're always that thread
@@ -232,6 +234,9 @@
     memset(bucket_table_, 0, table_bytes);
     num_buckets_ = 0;
   }
+  if (regions_ == NULL) {  // init regions_
+    InitRegionSetLocked();
+  }
   Unlock();
   RAW_VLOG(10, "MemoryRegionMap Init done");
 }
@@ -534,6 +539,15 @@
   }
 }
 
+inline void MemoryRegionMap::InitRegionSetLocked() {
+  RAW_VLOG(12, "Initializing region set");
+  regions_ = regions_rep.region_set();
+  recursive_insert = true;
+  new (regions_) RegionSet();
+  HandleSavedRegionsLocked(&DoInsertRegionLocked);
+  recursive_insert = false;
+}
+
 inline void MemoryRegionMap::InsertRegionLocked(const Region& region) {
   RAW_CHECK(LockIsHeld(), "should be held (by this thread)");
   // We can be called recursively, because RegionSet constructor
@@ -554,12 +568,7 @@
     saved_regions[saved_regions_count++] = region;
   } else {  // not a recusrive call
     if (regions_ == NULL) {  // init regions_
-      RAW_VLOG(12, "Initializing region set");
-      regions_ = regions_rep.region_set();
-      recursive_insert = true;
-      new(regions_) RegionSet();
-      HandleSavedRegionsLocked(&DoInsertRegionLocked);
-      recursive_insert = false;
+      InitRegionSetLocked();
     }
     recursive_insert = true;
     // Do the actual insertion work to put new regions into regions_:
@@ -673,7 +682,7 @@
   uintptr_t start_addr = reinterpret_cast<uintptr_t>(start);
   uintptr_t end_addr = start_addr + size;
   // subtract start_addr, end_addr from all the regions
-  RAW_VLOG(10, "Removing global region %p..%p; have %" PRIuS " regions",
+  RAW_VLOG(10, "Removing global region %p..%p; have %zu regions",
               reinterpret_cast<void*>(start_addr),
               reinterpret_cast<void*>(end_addr),
               regions_->size());
@@ -740,7 +749,7 @@
     }
     ++region;
   }
-  RAW_VLOG(12, "Removed region %p..%p; have %" PRIuS " regions",
+  RAW_VLOG(12, "Removed region %p..%p; have %zu regions",
               reinterpret_cast<void*>(start_addr),
               reinterpret_cast<void*>(end_addr),
               regions_->size());
@@ -763,9 +772,9 @@
                                const void* start, size_t size,
                                int prot, int flags,
                                int fd, off_t offset) {
-  // TODO(maxim): replace all 0x%" PRIxS " by %p when RAW_VLOG uses a safe
+  // TODO(maxim): replace all 0x%" PRIxPTR " by %p when RAW_VLOG uses a safe
   // snprintf reimplementation that does not malloc to pretty-print NULL
-  RAW_VLOG(10, "MMap = 0x%" PRIxPTR " of %" PRIuS " at %" PRIu64 " "
+  RAW_VLOG(10, "MMap = 0x%" PRIxPTR " of %zu at %" PRIu64 " "
               "prot %d flags %d fd %d offs %" PRId64,
               reinterpret_cast<uintptr_t>(result), size,
               reinterpret_cast<uint64>(start), prot, flags, fd,
@@ -776,7 +785,7 @@
 }
 
 void MemoryRegionMap::MunmapHook(const void* ptr, size_t size) {
-  RAW_VLOG(10, "MUnmap of %p %" PRIuS "", ptr, size);
+  RAW_VLOG(10, "MUnmap of %p %zu", ptr, size);
   if (size != 0) {
     RecordRegionRemoval(ptr, size);
   }
@@ -786,8 +795,8 @@
                                  const void* old_addr, size_t old_size,
                                  size_t new_size, int flags,
                                  const void* new_addr) {
-  RAW_VLOG(10, "MRemap = 0x%" PRIxPTR " of 0x%" PRIxPTR " %" PRIuS " "
-              "to %" PRIuS " flags %d new_addr=0x%" PRIxPTR,
+  RAW_VLOG(10, "MRemap = 0x%" PRIxPTR " of 0x%" PRIxPTR " %zu "
+              "to %zu flags %d new_addr=0x%" PRIxPTR,
               (uintptr_t)result, (uintptr_t)old_addr,
                old_size, new_size, flags,
                flags & MREMAP_FIXED ? (uintptr_t)new_addr : 0);
@@ -798,7 +807,7 @@
 }
 
 void MemoryRegionMap::SbrkHook(const void* result, ptrdiff_t increment) {
-  RAW_VLOG(10, "Sbrk = 0x%" PRIxPTR " of %" PRIdS "", (uintptr_t)result, increment);
+  RAW_VLOG(10, "Sbrk = 0x%" PRIxPTR " of %zd", (uintptr_t)result, increment);
   if (result != reinterpret_cast<void*>(-1)) {
     if (increment > 0) {
       void* new_end = sbrk(0);

diff --git a/src/memory_region_map.h b/src/memory_region_map.h
index ec388e1..c21fac3 100644
--- a/src/memory_region_map.h
+++ b/src/memory_region_map.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -362,6 +362,9 @@
   // table where all buckets eventually should be.
   static void RestoreSavedBucketsLocked();
 
+  // Initialize RegionSet regions_.
+  inline static void InitRegionSetLocked();
+
   // Wrapper around DoInsertRegionLocked
   // that handles the case of recursive allocator calls.
   inline static void InsertRegionLocked(const Region& region);

diff --git a/src/packed-cache-inl.h b/src/packed-cache-inl.h
index 0946260..7c216e5 100644
--- a/src/packed-cache-inl.h
+++ b/src/packed-cache-inl.h

@@ -118,6 +118,7 @@
 #include <stdint.h>                     // for uintptr_t
 #endif
 #include "base/basictypes.h"
+#include "common.h"
 #include "internal_logging.h"
 
 // A safe way of doing "(1 << n) - 1" -- without worrying about overflow
@@ -128,14 +129,16 @@
 
 // The types K and V provide upper bounds on the number of valid keys
 // and values, but we explicitly require the keys to be less than
-// 2^kKeybits and the values to be less than 2^kValuebits.  The size of
-// the table is controlled by kHashbits, and the type of each entry in
-// the cache is T.  See also the big comment at the top of the file.
-template <int kKeybits, typename T>
+// 2^kKeybits and the values to be less than 2^kValuebits.  The size
+// of the table is controlled by kHashbits, and the type of each entry
+// in the cache is uintptr_t (native machine word).  See also the big
+// comment at the top of the file.
+template <int kKeybits>
 class PackedCache {
  public:
+  typedef uintptr_t T;
   typedef uintptr_t K;
-  typedef size_t V;
+  typedef uint32 V;
 #ifdef TCMALLOC_SMALL_BUT_SLOW
   // Decrease the size map cache if running in the small memory mode.
   static const int kHashbits = 12;
@@ -143,15 +146,36 @@
   static const int kHashbits = 16;
 #endif
   static const int kValuebits = 7;
-  static const bool kUseWholeKeys = kKeybits + kValuebits <= 8 * sizeof(T);
+  // one bit after value bits
+  static const int kInvalidMask = 0x80;
 
-  explicit PackedCache(V initial_value) {
-    COMPILE_ASSERT(kKeybits <= sizeof(K) * 8, key_size);
-    COMPILE_ASSERT(kValuebits <= sizeof(V) * 8, value_size);
+  explicit PackedCache() {
+    COMPILE_ASSERT(kKeybits + kValuebits + 1 <= 8 * sizeof(T), use_whole_keys);
     COMPILE_ASSERT(kHashbits <= kKeybits, hash_function);
-    COMPILE_ASSERT(kKeybits - kHashbits + kValuebits <= kTbits,
-                   entry_size_must_be_big_enough);
-    Clear(initial_value);
+    COMPILE_ASSERT(kHashbits >= kValuebits + 1, small_values_space);
+    Clear();
+  }
+
+  bool TryGet(K key, V* out) const {
+    // As with other code in this class, we touch array_ as few times
+    // as we can.  Assuming entries are read atomically then certain
+    // races are harmless.
+    ASSERT(key == (key & kKeyMask));
+    T hash = Hash(key);
+    T expected_entry = key;
+    expected_entry &= ~N_ONES_(T, kHashbits);
+    T entry = array_[hash];
+    entry ^= expected_entry;
+    if (PREDICT_FALSE(entry >= (1 << kValuebits))) {
+      return false;
+    }
+    *out = static_cast<V>(entry);
+    return true;
+  }
+
+  void Clear() {
+    // sets 'invalid' bit in every byte, include value byte
+    memset(const_cast<T* >(array_), kInvalidMask, sizeof(array_));
   }
 
   void Put(K key, V value) {
@@ -160,72 +184,25 @@
     array_[Hash(key)] = KeyToUpper(key) | value;
   }
 
-  bool Has(K key) const {
+  void Invalidate(K key) {
     ASSERT(key == (key & kKeyMask));
-    return KeyMatch(array_[Hash(key)], key);
-  }
-
-  V GetOrDefault(K key, V default_value) const {
-    // As with other code in this class, we touch array_ as few times
-    // as we can.  Assuming entries are read atomically (e.g., their
-    // type is uintptr_t on most hardware) then certain races are
-    // harmless.
-    ASSERT(key == (key & kKeyMask));
-    T entry = array_[Hash(key)];
-    return KeyMatch(entry, key) ? EntryToValue(entry) : default_value;
-  }
-
-  void Clear(V value) {
-    ASSERT(value == (value & kValueMask));
-    for (int i = 0; i < 1 << kHashbits; i++) {
-      ASSERT(kUseWholeKeys || KeyToUpper(i) == 0);
-      array_[i] = kUseWholeKeys ? (value | KeyToUpper(i)) : value;
-    }
+    array_[Hash(key)] = KeyToUpper(key) | kInvalidMask;
   }
 
  private:
-  // We are going to pack a value and the upper part of a key (or a
-  // whole key) into an entry of type T.  The UPPER type is for the
-  // upper part of a key, after the key has been masked and shifted
-  // for inclusion in an entry.
-  typedef T UPPER;
-
-  static V EntryToValue(T t) { return t & kValueMask; }
-
-  // If we have space for a whole key, we just shift it left.
-  // Otherwise kHashbits determines where in a K to find the upper
-  // part of the key, and kValuebits determines where in the entry to
-  // put it.
-  static UPPER KeyToUpper(K k) {
-    if (kUseWholeKeys) {
-      return static_cast<T>(k) << kValuebits;
-    } else {
-      const int shift = kHashbits - kValuebits;
-      // Assume kHashbits >= kValuebits.  It'd be easy to lift this assumption.
-      return static_cast<T>(k >> shift) & kUpperMask;
-    }
+  // we just wipe all hash bits out of key. I.e. clear lower
+  // kHashbits. We rely on compiler knowing value of Hash(k).
+  static T KeyToUpper(K k) {
+    return static_cast<T>(k) ^ Hash(k);
   }
 
-  static size_t Hash(K key) {
-    return static_cast<size_t>(key) & N_ONES_(size_t, kHashbits);
+  static T Hash(K key) {
+    return static_cast<T>(key) & N_ONES_(size_t, kHashbits);
   }
 
-  // Does the entry match the relevant part of the given key?
-  static bool KeyMatch(T entry, K key) {
-    return kUseWholeKeys ?
-        (entry >> kValuebits == key) :
-        ((KeyToUpper(key) ^ entry) & kUpperMask) == 0;
-  }
-
-  static const int kTbits = 8 * sizeof(T);
-  static const int kUpperbits = kUseWholeKeys ? kKeybits : kKeybits - kHashbits;
-
   // For masking a K.
   static const K kKeyMask = N_ONES_(K, kKeybits);
 
-  // For masking a T.
-  static const T kUpperMask = N_ONES_(T, kUpperbits) << kValuebits;
-
   // For masking a V or a T.
   static const V kValueMask = N_ONES_(V, kValuebits);
 

diff --git a/src/page_heap.cc b/src/page_heap.cc
index f52ae2a..44ad654 100644
--- a/src/page_heap.cc
+++ b/src/page_heap.cc

@@ -64,14 +64,11 @@
 
 PageHeap::PageHeap()
     : pagemap_(MetaDataAlloc),
-      pagemap_cache_(0),
       scavenge_counter_(0),
       // Start scavenging at kMaxPages list
       release_index_(kMaxPages),
       aggressive_decommit_(false) {
-  COMPILE_ASSERT(kNumClasses <= (1 << PageMapCache::kValuebits), valuebits);
-  DLL_Init(&large_.normal);
-  DLL_Init(&large_.returned);
+  COMPILE_ASSERT(kClassSizesMax <= (1 << PageMapCache::kValuebits), valuebits);
   for (int i = 0; i < kMaxPages; i++) {
     DLL_Init(&free_[i].normal);
     DLL_Init(&free_[i].returned);
@@ -83,15 +80,15 @@
   ASSERT(n > 0);
 
   // Find first size >= n that has a non-empty list
-  for (Length s = n; s < kMaxPages; s++) {
-    Span* ll = &free_[s].normal;
+  for (Length s = n; s <= kMaxPages; s++) {
+    Span* ll = &free_[s - 1].normal;
     // If we're lucky, ll is non-empty, meaning it has a suitable span.
     if (!DLL_IsEmpty(ll)) {
       ASSERT(ll->next->location == Span::ON_NORMAL_FREELIST);
       return Carve(ll->next, n);
     }
     // Alternatively, maybe there's a usable returned span.
-    ll = &free_[s].returned;
+    ll = &free_[s - 1].returned;
     if (!DLL_IsEmpty(ll)) {
       // We did not call EnsureLimit before, to avoid releasing the span
       // that will be taken immediately back.
@@ -169,45 +166,38 @@
 }
 
 Span* PageHeap::AllocLarge(Length n) {
-  // find the best span (closest to n in size).
-  // The following loops implements address-ordered best-fit.
   Span *best = NULL;
+  Span *best_normal = NULL;
 
-  // Search through normal list
-  for (Span* span = large_.normal.next;
-       span != &large_.normal;
-       span = span->next) {
-    if (span->length >= n) {
-      if ((best == NULL)
-          || (span->length < best->length)
-          || ((span->length == best->length) && (span->start < best->start))) {
-        best = span;
-        ASSERT(best->location == Span::ON_NORMAL_FREELIST);
-      }
-    }
+  // Create a Span to use as an upper bound.
+  Span bound;
+  bound.start = 0;
+  bound.length = n;
+
+  // First search the NORMAL spans..
+  SpanSet::iterator place = large_normal_.upper_bound(SpanPtrWithLength(&bound));
+  if (place != large_normal_.end()) {
+    best = place->span;
+    best_normal = best;
+    ASSERT(best->location == Span::ON_NORMAL_FREELIST);
   }
 
-  Span *bestNormal = best;
-
-  // Search through released list in case it has a better fit
-  for (Span* span = large_.returned.next;
-       span != &large_.returned;
-       span = span->next) {
-    if (span->length >= n) {
-      if ((best == NULL)
-          || (span->length < best->length)
-          || ((span->length == best->length) && (span->start < best->start))) {
-        best = span;
-        ASSERT(best->location == Span::ON_RETURNED_FREELIST);
-      }
-    }
+  // Try to find better fit from RETURNED spans.
+  place = large_returned_.upper_bound(SpanPtrWithLength(&bound));
+  if (place != large_returned_.end()) {
+    Span *c = place->span;
+    ASSERT(c->location == Span::ON_RETURNED_FREELIST);
+    if (best_normal == NULL
+        || c->length < best->length
+        || (c->length == best->length && c->start < best->start))
+      best = place->span;
   }
 
-  if (best == bestNormal) {
+  if (best == best_normal) {
     return best == NULL ? NULL : Carve(best, n);
   }
 
-  // best comes from returned list.
+  // best comes from RETURNED set.
 
   if (EnsureLimit(n, false)) {
     return Carve(best, n);
@@ -215,13 +205,13 @@
 
   if (EnsureLimit(n, true)) {
     // best could have been destroyed by coalescing.
-    // bestNormal is not a best-fit, and it could be destroyed as well.
+    // best_normal is not a best-fit, and it could be destroyed as well.
     // We retry, the limit is already ensured:
     return AllocLarge(n);
   }
 
-  // If bestNormal existed, EnsureLimit would succeeded:
-  ASSERT(bestNormal == NULL);
+  // If best_normal existed, EnsureLimit would succeeded:
+  ASSERT(best_normal == NULL);
   // We are not allowed to take best from returned list.
   return NULL;
 }
@@ -231,12 +221,10 @@
   ASSERT(n < span->length);
   ASSERT(span->location == Span::IN_USE);
   ASSERT(span->sizeclass == 0);
-  Event(span, 'T', n);
 
   const int extra = span->length - n;
   Span* leftover = NewSpan(span->start + n, extra);
   ASSERT(leftover->location == Span::IN_USE);
-  Event(leftover, 'U', extra);
   RecordSpan(leftover);
   pagemap_.set(span->start + n - 1, span); // Update map from pageid to span
   span->length = n;
@@ -245,16 +233,22 @@
 }
 
 void PageHeap::CommitSpan(Span* span) {
+  ++stats_.commit_count;
+
   TCMalloc_SystemCommit(reinterpret_cast<void*>(span->start << kPageShift),
                         static_cast<size_t>(span->length << kPageShift));
   stats_.committed_bytes += span->length << kPageShift;
+  stats_.total_commit_bytes += (span->length << kPageShift);
 }
 
 bool PageHeap::DecommitSpan(Span* span) {
+  ++stats_.decommit_count;
+
   bool rv = TCMalloc_SystemRelease(reinterpret_cast<void*>(span->start << kPageShift),
                                    static_cast<size_t>(span->length << kPageShift));
   if (rv) {
     stats_.committed_bytes -= span->length << kPageShift;
+    stats_.total_decommit_bytes += (span->length << kPageShift);
   }
 
   return rv;
@@ -266,14 +260,12 @@
   const int old_location = span->location;
   RemoveFromFreeList(span);
   span->location = Span::IN_USE;
-  Event(span, 'A', n);
 
   const int extra = span->length - n;
   ASSERT(extra >= 0);
   if (extra > 0) {
     Span* leftover = NewSpan(span->start + n, extra);
     leftover->location = old_location;
-    Event(leftover, 'S', extra);
     RecordSpan(leftover);
 
     // The previous span of |leftover| was just splitted -- no need to
@@ -313,18 +305,35 @@
   span->sizeclass = 0;
   span->sample = 0;
   span->location = Span::ON_NORMAL_FREELIST;
-  Event(span, 'D', span->length);
   MergeIntoFreeList(span);  // Coalesces if possible
   IncrementalScavenge(n);
   ASSERT(stats_.unmapped_bytes+ stats_.committed_bytes==stats_.system_bytes);
   ASSERT(Check());
 }
 
-bool PageHeap::MayMergeSpans(Span *span, Span *other) {
-  if (aggressive_decommit_) {
-    return other->location != Span::IN_USE;
+// Given span we're about to free and other span (still on free list),
+// checks if 'other' span is mergable with 'span'. If it is, removes
+// other span from free list, performs aggressive decommit if
+// necessary and returns 'other' span. Otherwise 'other' span cannot
+// be merged and is left untouched. In that case NULL is returned.
+Span* PageHeap::CheckAndHandlePreMerge(Span* span, Span* other) {
+  if (other == NULL) {
+    return other;
   }
-  return span->location == other->location;
+  // if we're in aggressive decommit mode and span is decommitted,
+  // then we try to decommit adjacent span.
+  if (aggressive_decommit_ && other->location == Span::ON_NORMAL_FREELIST
+      && span->location == Span::ON_RETURNED_FREELIST) {
+    bool worked = DecommitSpan(other);
+    if (!worked) {
+      return NULL;
+    }
+  } else if (other->location != span->location) {
+    return NULL;
+  }
+
+  RemoveFromFreeList(other);
+  return other;
 }
 
 void PageHeap::MergeIntoFreeList(Span* span) {
@@ -340,15 +349,6 @@
   //
   // The following applies if aggressive_decommit_ is enabled:
   //
-  // Note that the adjacent spans we merge into "span" may come out of a
-  // "normal" (committed) list, and cleanly merge with our IN_USE span, which
-  // is implicitly committed.  If the adjacents spans are on the "returned"
-  // (decommitted) list, then we must get both spans into the same state before
-  // or after we coalesce them.  The current code always decomits. This is
-  // achieved by blindly decommitting the entire coalesced region, which  may
-  // include any combination of committed and decommitted spans, at the end of
-  // the method.
-
   // TODO(jar): "Always decommit" causes some extra calls to commit when we are
   // called in GrowHeap() during an allocation :-/.  We need to eval the cost of
   // that oscillation, and possibly do something to reduce it.
@@ -356,65 +356,60 @@
   // TODO(jar): We need a better strategy for deciding to commit, or decommit,
   // based on memory usage and free heap sizes.
 
-  uint64_t temp_committed = 0;
-
   const PageID p = span->start;
   const Length n = span->length;
-  Span* prev = GetDescriptor(p-1);
-  if (prev != NULL && MayMergeSpans(span, prev)) {
+
+  if (aggressive_decommit_ && span->location == Span::ON_NORMAL_FREELIST) {
+    if (DecommitSpan(span)) {
+      span->location = Span::ON_RETURNED_FREELIST;
+    }
+  }
+
+  Span* prev = CheckAndHandlePreMerge(span, GetDescriptor(p-1));
+  if (prev != NULL) {
     // Merge preceding span into this span
     ASSERT(prev->start + prev->length == p);
     const Length len = prev->length;
-    if (aggressive_decommit_ && prev->location == Span::ON_RETURNED_FREELIST) {
-      // We're about to put the merge span into the returned freelist and call
-      // DecommitSpan() on it, which will mark the entire span including this
-      // one as released and decrease stats_.committed_bytes by the size of the
-      // merged span.  To make the math work out we temporarily increase the
-      // stats_.committed_bytes amount.
-      temp_committed = prev->length << kPageShift;
-    }
-    RemoveFromFreeList(prev);
     DeleteSpan(prev);
     span->start -= len;
     span->length += len;
     pagemap_.set(span->start, span);
-    Event(span, 'L', len);
   }
-  Span* next = GetDescriptor(p+n);
-  if (next != NULL && MayMergeSpans(span, next)) {
+  Span* next = CheckAndHandlePreMerge(span, GetDescriptor(p+n));
+  if (next != NULL) {
     // Merge next span into this span
     ASSERT(next->start == p+n);
     const Length len = next->length;
-    if (aggressive_decommit_ && next->location == Span::ON_RETURNED_FREELIST) {
-      // See the comment below 'if (prev->location ...' for explanation.
-      temp_committed += next->length << kPageShift;
-    }
-    RemoveFromFreeList(next);
     DeleteSpan(next);
     span->length += len;
     pagemap_.set(span->start + span->length - 1, span);
-    Event(span, 'R', len);
   }
 
-  if (aggressive_decommit_) {
-    if (DecommitSpan(span)) {
-      span->location = Span::ON_RETURNED_FREELIST;
-      stats_.committed_bytes += temp_committed;
-    } else {
-      ASSERT(temp_committed == 0);
-    }
-  }
   PrependToFreeList(span);
 }
 
 void PageHeap::PrependToFreeList(Span* span) {
   ASSERT(span->location != Span::IN_USE);
-  SpanList* list = (span->length < kMaxPages) ? &free_[span->length] : &large_;
-  if (span->location == Span::ON_NORMAL_FREELIST) {
+  if (span->location == Span::ON_NORMAL_FREELIST)
     stats_.free_bytes += (span->length << kPageShift);
+  else
+    stats_.unmapped_bytes += (span->length << kPageShift);
+
+  if (span->length > kMaxPages) {
+    SpanSet *set = &large_normal_;
+    if (span->location == Span::ON_RETURNED_FREELIST)
+      set = &large_returned_;
+    std::pair<SpanSet::iterator, bool> p =
+        set->insert(SpanPtrWithLength(span));
+    ASSERT(p.second); // We never have duplicates since span->start is unique.
+    span->SetSpanSetIterator(p.first);
+    return;
+  }
+
+  SpanList* list = &free_[span->length - 1];
+  if (span->location == Span::ON_NORMAL_FREELIST) {
     DLL_Prepend(&list->normal, span);
   } else {
-    stats_.unmapped_bytes += (span->length << kPageShift);
     DLL_Prepend(&list->returned, span);
   }
 }
@@ -426,7 +421,17 @@
   } else {
     stats_.unmapped_bytes -= (span->length << kPageShift);
   }
-  DLL_Remove(span);
+  if (span->length > kMaxPages) {
+    SpanSet *set = &large_normal_;
+    if (span->location == Span::ON_RETURNED_FREELIST)
+      set = &large_returned_;
+    SpanSet::iterator iter = span->ExtractSpanSetIterator();
+    ASSERT(iter->span == span);
+    ASSERT(set->find(SpanPtrWithLength(span)) == iter);
+    set->erase(iter);
+  } else {
+    DLL_Remove(span);
+  }
 }
 
 void PageHeap::IncrementalScavenge(Length n) {
@@ -441,6 +446,8 @@
     return;
   }
 
+  ++stats_.scavenge_count;
+
   Length released_pages = ReleaseAtLeastNPages(1);
 
   if (released_pages == 0) {
@@ -460,8 +467,7 @@
   }
 }
 
-Length PageHeap::ReleaseLastNormalSpan(SpanList* slist) {
-  Span* s = slist->normal.prev;
+Length PageHeap::ReleaseSpan(Span* s) {
   ASSERT(s->location == Span::ON_NORMAL_FREELIST);
 
   if (DecommitSpan(s)) {
@@ -478,21 +484,35 @@
 Length PageHeap::ReleaseAtLeastNPages(Length num_pages) {
   Length released_pages = 0;
 
-  // Round robin through the lists of free spans, releasing the last
-  // span in each list.  Stop after releasing at least num_pages
+  // Round robin through the lists of free spans, releasing a
+  // span from each list.  Stop after releasing at least num_pages
   // or when there is nothing more to release.
   while (released_pages < num_pages && stats_.free_bytes > 0) {
     for (int i = 0; i < kMaxPages+1 && released_pages < num_pages;
          i++, release_index_++) {
+      Span *s;
       if (release_index_ > kMaxPages) release_index_ = 0;
-      SpanList* slist = (release_index_ == kMaxPages) ?
-          &large_ : &free_[release_index_];
-      if (!DLL_IsEmpty(&slist->normal)) {
-        Length released_len = ReleaseLastNormalSpan(slist);
-        // Some systems do not support release
-        if (released_len == 0) return released_pages;
-        released_pages += released_len;
+
+      if (release_index_ == kMaxPages) {
+        if (large_normal_.empty()) {
+          continue;
+        }
+        s = (large_normal_.begin())->span;
+      } else {
+        SpanList* slist = &free_[release_index_];
+        if (DLL_IsEmpty(&slist->normal)) {
+          continue;
+        }
+        s = slist->normal.prev;
       }
+      // TODO(todd) if the remaining number of pages to release
+      // is significantly smaller than s->length, and s is on the
+      // large freelist, should we carve s instead of releasing?
+      // the whole thing?
+      Length released_len = ReleaseSpan(s);
+      // Some systems do not support release
+      if (released_len == 0) return released_pages;
+      released_pages += released_len;
     }
   }
   return released_pages;
@@ -522,12 +542,11 @@
   return takenPages + n <= limit;
 }
 
-void PageHeap::RegisterSizeClass(Span* span, size_t sc) {
+void PageHeap::RegisterSizeClass(Span* span, uint32 sc) {
   // Associate span object with all interior pages as well
   ASSERT(span->location == Span::IN_USE);
   ASSERT(GetDescriptor(span->start) == span);
   ASSERT(GetDescriptor(span->start+span->length-1) == span);
-  Event(span, 'C', sc);
   span->sizeclass = sc;
   for (Length i = 1; i < span->length-1; i++) {
     pagemap_.set(span->start+i, span);
@@ -535,9 +554,9 @@
 }
 
 void PageHeap::GetSmallSpanStats(SmallSpanStats* result) {
-  for (int s = 0; s < kMaxPages; s++) {
-    result->normal_length[s] = DLL_Length(&free_[s].normal);
-    result->returned_length[s] = DLL_Length(&free_[s].returned);
+  for (int i = 0; i < kMaxPages; i++) {
+    result->normal_length[i] = DLL_Length(&free_[i].normal);
+    result->returned_length[i] = DLL_Length(&free_[i].returned);
   }
 }
 
@@ -545,12 +564,12 @@
   result->spans = 0;
   result->normal_pages = 0;
   result->returned_pages = 0;
-  for (Span* s = large_.normal.next; s != &large_.normal; s = s->next) {
-    result->normal_pages += s->length;;
+  for (SpanSet::iterator it = large_normal_.begin(); it != large_normal_.end(); ++it) {
+    result->normal_pages += it->length;
     result->spans++;
   }
-  for (Span* s = large_.returned.next; s != &large_.returned; s = s->next) {
-    result->returned_pages += s->length;
+  for (SpanSet::iterator it = large_returned_.begin(); it != large_returned_.end(); ++it) {
+    result->returned_pages += it->length;
     result->spans++;
   }
 }
@@ -616,9 +635,16 @@
   ask = actual_size >> kPageShift;
   RecordGrowth(ask << kPageShift);
 
+  ++stats_.reserve_count;
+  ++stats_.commit_count;
+
   uint64_t old_system_bytes = stats_.system_bytes;
   stats_.system_bytes += (ask << kPageShift);
   stats_.committed_bytes += (ask << kPageShift);
+
+  stats_.total_commit_bytes += (ask << kPageShift);
+  stats_.total_reserve_bytes += (ask << kPageShift);
+
   const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
   ASSERT(p > 0);
 
@@ -651,18 +677,16 @@
 }
 
 bool PageHeap::Check() {
-  ASSERT(free_[0].normal.next == &free_[0].normal);
-  ASSERT(free_[0].returned.next == &free_[0].returned);
   return true;
 }
 
 bool PageHeap::CheckExpensive() {
   bool result = Check();
-  CheckList(&large_.normal, kMaxPages, 1000000000, Span::ON_NORMAL_FREELIST);
-  CheckList(&large_.returned, kMaxPages, 1000000000, Span::ON_RETURNED_FREELIST);
-  for (Length s = 1; s < kMaxPages; s++) {
-    CheckList(&free_[s].normal, s, s, Span::ON_NORMAL_FREELIST);
-    CheckList(&free_[s].returned, s, s, Span::ON_RETURNED_FREELIST);
+  CheckSet(&large_normal_, kMaxPages + 1, Span::ON_NORMAL_FREELIST);
+  CheckSet(&large_returned_, kMaxPages + 1, Span::ON_RETURNED_FREELIST);
+  for (int s = 1; s <= kMaxPages; s++) {
+    CheckList(&free_[s - 1].normal, s, s, Span::ON_NORMAL_FREELIST);
+    CheckList(&free_[s - 1].returned, s, s, Span::ON_RETURNED_FREELIST);
   }
   return result;
 }
@@ -679,4 +703,16 @@
   return true;
 }
 
+bool PageHeap::CheckSet(SpanSet* spanset, Length min_pages,int freelist) {
+  for (SpanSet::iterator it = spanset->begin(); it != spanset->end(); ++it) {
+    Span* s = it->span;
+    CHECK_CONDITION(s->length == it->length);
+    CHECK_CONDITION(s->location == freelist);  // NORMAL or RETURNED
+    CHECK_CONDITION(s->length >= min_pages);
+    CHECK_CONDITION(GetDescriptor(s->start) == s);
+    CHECK_CONDITION(GetDescriptor(s->start+s->length-1) == s);
+  }
+  return true;
+}
+
 }  // namespace tcmalloc

diff --git a/src/page_heap.h b/src/page_heap.h
index 18abed1..bf50394 100644
--- a/src/page_heap.h
+++ b/src/page_heap.h

@@ -83,14 +83,24 @@
 template <int BITS> class MapSelector {
  public:
   typedef TCMalloc_PageMap3<BITS-kPageShift> Type;
-  typedef PackedCache<BITS-kPageShift, uint64_t> CacheType;
 };
 
+#ifndef TCMALLOC_SMALL_BUT_SLOW
+// x86-64 and arm64 are using 48 bits of address space. So we can use
+// just two level map, but since initial ram consumption of this mode
+// is a bit on the higher side, we opt-out of it in
+// TCMALLOC_SMALL_BUT_SLOW mode.
+template <> class MapSelector<48> {
+ public:
+  typedef TCMalloc_PageMap2<48-kPageShift> Type;
+};
+
+#endif // TCMALLOC_SMALL_BUT_SLOW
+
 // A two-level map for 32-bit machines
 template <> class MapSelector<32> {
  public:
   typedef TCMalloc_PageMap2<32-kPageShift> Type;
-  typedef PackedCache<32-kPageShift, uint16_t> CacheType;
 };
 
 // -------------------------------------------------------------------------
@@ -119,7 +129,7 @@
   // specified size-class.
   // REQUIRES: span was returned by an earlier call to New()
   //           and has not yet been deleted.
-  void RegisterSizeClass(Span* span, size_t sc);
+  void RegisterSizeClass(Span* span, uint32 sc);
 
   // Split an allocated span into two spans: one of length "n" pages
   // followed by another span of length "span->length - n" pages.
@@ -133,7 +143,8 @@
 
   // Return the descriptor for the specified page.  Returns NULL if
   // this PageID was not allocated previously.
-  inline Span* GetDescriptor(PageID p) const {
+  inline ATTRIBUTE_ALWAYS_INLINE
+  Span* GetDescriptor(PageID p) const {
     return reinterpret_cast<Span*>(pagemap_.get(p));
   }
 
@@ -143,18 +154,32 @@
 
   // Page heap statistics
   struct Stats {
-    Stats() : system_bytes(0), free_bytes(0), unmapped_bytes(0), committed_bytes(0) {}
+    Stats() : system_bytes(0), free_bytes(0), unmapped_bytes(0), committed_bytes(0),
+        scavenge_count(0), commit_count(0), total_commit_bytes(0),
+        decommit_count(0), total_decommit_bytes(0),
+        reserve_count(0), total_reserve_bytes(0) {}
     uint64_t system_bytes;    // Total bytes allocated from system
     uint64_t free_bytes;      // Total bytes on normal freelists
     uint64_t unmapped_bytes;  // Total bytes on returned freelists
     uint64_t committed_bytes;  // Bytes committed, always <= system_bytes_.
 
+    uint64_t scavenge_count;   // Number of times scavagened flush pages
+
+    uint64_t commit_count;          // Number of virtual memory commits
+    uint64_t total_commit_bytes;    // Bytes committed in lifetime of process
+    uint64_t decommit_count;        // Number of virtual memory decommits
+    uint64_t total_decommit_bytes;  // Bytes decommitted in lifetime of process
+
+    uint64_t reserve_count;         // Number of virtual memory reserves
+    uint64_t total_reserve_bytes;   // Bytes reserved in lifetime of process
   };
   inline Stats stats() const { return stats_; }
 
   struct SmallSpanStats {
     // For each free list of small spans, the length (in spans) of the
     // normal and returned free lists for that size.
+    //
+    // NOTE: index 'i' accounts the number of spans of length 'i + 1'.
     int64 normal_length[kMaxPages];
     int64 returned_length[kMaxPages];
   };
@@ -173,6 +198,7 @@
   bool CheckExpensive();
   bool CheckList(Span* list, Length min_pages, Length max_pages,
                  int freelist);  // ON_NORMAL_FREELIST or ON_RETURNED_FREELIST
+  bool CheckSet(SpanSet *s, Length min_pages, int freelist);
 
   // Try to release at least num_pages for reuse by the OS.  Returns
   // the actual number of pages released, which may be less than
@@ -182,15 +208,22 @@
   // smaller released and unreleased ranges.
   Length ReleaseAtLeastNPages(Length num_pages);
 
-  // Return 0 if we have no information, or else the correct sizeclass for p.
   // Reads and writes to pagemap_cache_ do not require locking.
-  // The entries are 64 bits on 64-bit hardware and 16 bits on
-  // 32-bit hardware, and we don't mind raciness as long as each read of
-  // an entry yields a valid entry, not a partially updated entry.
-  size_t GetSizeClassIfCached(PageID p) const {
-    return pagemap_cache_.GetOrDefault(p, 0);
+  bool TryGetSizeClass(PageID p, uint32* out) const {
+    return pagemap_cache_.TryGet(p, out);
   }
-  void CacheSizeClass(PageID p, size_t cl) const { pagemap_cache_.Put(p, cl); }
+  void SetCachedSizeClass(PageID p, uint32 cl) {
+    ASSERT(cl != 0);
+    pagemap_cache_.Put(p, cl);
+  }
+  void InvalidateCachedSizeClass(PageID p) { pagemap_cache_.Invalidate(p); }
+  uint32 GetSizeClassOrZero(PageID p) const {
+    uint32 cached_value;
+    if (!TryGetSizeClass(p, &cached_value)) {
+      cached_value = 0;
+    }
+    return cached_value;
+  }
 
   bool GetAggressiveDecommit(void) {return aggressive_decommit_;}
   void SetAggressiveDecommit(bool aggressive_decommit) {
@@ -222,9 +255,9 @@
 
   // Pick the appropriate map and cache types based on pointer size
   typedef MapSelector<kAddressBits>::Type PageMap;
-  typedef MapSelector<kAddressBits>::CacheType PageMapCache;
-  PageMap pagemap_;
+  typedef PackedCache<kAddressBits - kPageShift> PageMapCache;
   mutable PageMapCache pagemap_cache_;
+  PageMap pagemap_;
 
   // We segregate spans of a given size into two circular linked
   // lists: one for normal spans, and one for spans whose memory
@@ -234,10 +267,16 @@
     Span        returned;
   };
 
-  // List of free spans of length >= kMaxPages
-  SpanList large_;
+  // Sets of spans with length > kMaxPages.
+  //
+  // Rather than using a linked list, we use sets here for efficient
+  // best-fit search.
+  SpanSet large_normal_;
+  SpanSet large_returned_;
 
   // Array mapping from span length to a doubly linked list of free spans
+  //
+  // NOTE: index 'i' stores spans of length 'i + 1'.
   SpanList free_[kMaxPages];
 
   // Statistics on system, free, and unmapped bytes
@@ -287,16 +326,19 @@
   // IncrementalScavenge(n) is called whenever n pages are freed.
   void IncrementalScavenge(Length n);
 
-  // Release the last span on the normal portion of this list.
-  // Return the length of that span or zero if release failed.
-  Length ReleaseLastNormalSpan(SpanList* slist);
+  // Attempts to decommit 's' and move it to the returned freelist.
+  //
+  // Returns the length of the Span or zero if release failed.
+  //
+  // REQUIRES: 's' must be on the NORMAL freelist.
+  Length ReleaseSpan(Span *s);
 
   // Checks if we are allowed to take more memory from the system.
   // If limit is reached and allowRelease is true, tries to release
   // some unused spans.
   bool EnsureLimit(Length n, bool allowRelease = true);
 
-  bool MayMergeSpans(Span *span, Span *other);
+  Span* CheckAndHandlePreMerge(Span *span, Span *other);
 
   // Number of pages to deallocate before doing more scavenging
   int64_t scavenge_counter_;

diff --git a/src/page_heap_allocator.h b/src/page_heap_allocator.h
index 892d1c1..3fecabd 100644
--- a/src/page_heap_allocator.h
+++ b/src/page_heap_allocator.h

@@ -109,6 +109,71 @@
   int inuse_;
 };
 
+// STL-compatible allocator which forwards allocations to a PageHeapAllocator.
+//
+// Like PageHeapAllocator, this requires external synchronization. To avoid multiple
+// separate STLPageHeapAllocator<T> from sharing the same underlying PageHeapAllocator<T>,
+// the |LockingTag| template argument should be used. Template instantiations with
+// different locking tags can safely be used concurrently.
+template <typename T, class LockingTag>
+class STLPageHeapAllocator {
+ public:
+  typedef size_t     size_type;
+  typedef ptrdiff_t  difference_type;
+  typedef T*         pointer;
+  typedef const T*   const_pointer;
+  typedef T&         reference;
+  typedef const T&   const_reference;
+  typedef T          value_type;
+
+  template <class T1> struct rebind {
+    typedef STLPageHeapAllocator<T1, LockingTag> other;
+  };
+
+  STLPageHeapAllocator() { }
+  STLPageHeapAllocator(const STLPageHeapAllocator&) { }
+  template <class T1> STLPageHeapAllocator(const STLPageHeapAllocator<T1, LockingTag>&) { }
+  ~STLPageHeapAllocator() { }
+
+  pointer address(reference x) const { return &x; }
+  const_pointer address(const_reference x) const { return &x; }
+
+  size_type max_size() const { return size_t(-1) / sizeof(T); }
+
+  void construct(pointer p, const T& val) { ::new(p) T(val); }
+  void construct(pointer p) { ::new(p) T(); }
+  void destroy(pointer p) { p->~T(); }
+
+  // There's no state, so these allocators are always equal
+  bool operator==(const STLPageHeapAllocator&) const { return true; }
+  bool operator!=(const STLPageHeapAllocator&) const { return false; }
+
+  pointer allocate(size_type n, const void* = 0) {
+    if (!underlying_.initialized) {
+      underlying_.allocator.Init();
+      underlying_.initialized = true;
+    }
+
+    CHECK_CONDITION(n == 1);
+    return underlying_.allocator.New();
+  }
+  void deallocate(pointer p, size_type n) {
+    CHECK_CONDITION(n == 1);
+    underlying_.allocator.Delete(p);
+  }
+
+ private:
+  struct Storage {
+    explicit Storage(base::LinkerInitialized x) {}
+    PageHeapAllocator<T> allocator;
+    bool initialized;
+  };
+  static Storage underlying_;
+};
+
+template<typename T, class LockingTag>
+typename STLPageHeapAllocator<T, LockingTag>::Storage STLPageHeapAllocator<T, LockingTag>::underlying_(base::LINKER_INITIALIZED);
+
 }  // namespace tcmalloc
 
 #endif  // TCMALLOC_PAGE_HEAP_ALLOCATOR_H_

diff --git a/src/pagemap.h b/src/pagemap.h
index dd94423..dfa336c 100644
--- a/src/pagemap.h
+++ b/src/pagemap.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -89,6 +89,7 @@
 
   // Return the current value for KEY.  Returns NULL if not yet set,
   // or if k is out of range.
+  ATTRIBUTE_ALWAYS_INLINE
   void* get(Number k) const {
     if ((k >> BITS) > 0) {
       return NULL;
@@ -119,19 +120,18 @@
 template <int BITS>
 class TCMalloc_PageMap2 {
  private:
-  // Put 32 entries in the root and (2^BITS)/32 entries in each leaf.
-  static const int ROOT_BITS = 5;
-  static const int ROOT_LENGTH = 1 << ROOT_BITS;
-
-  static const int LEAF_BITS = BITS - ROOT_BITS;
+  static const int LEAF_BITS = (BITS + 1) / 2;
   static const int LEAF_LENGTH = 1 << LEAF_BITS;
 
+  static const int ROOT_BITS = BITS - LEAF_BITS;
+  static const int ROOT_LENGTH = 1 << ROOT_BITS;
+
   // Leaf node
   struct Leaf {
     void* values[LEAF_LENGTH];
   };
 
-  Leaf* root_[ROOT_LENGTH];             // Pointers to 32 child nodes
+  Leaf* root_[ROOT_LENGTH];             // Pointers to child nodes
   void* (*allocator_)(size_t);          // Memory allocator
 
  public:
@@ -142,6 +142,7 @@
     memset(root_, 0, sizeof(root_));
   }
 
+  ATTRIBUTE_ALWAYS_INLINE
   void* get(Number k) const {
     const Number i1 = k >> LEAF_BITS;
     const Number i2 = k & (LEAF_LENGTH-1);
@@ -182,11 +183,13 @@
 
   void PreallocateMoreMemory() {
     // Allocate enough to keep track of all possible pages
-    Ensure(0, 1 << BITS);
+    if (BITS < 20) {
+      Ensure(0, Number(1) << BITS);
+    }
   }
 
   void* Next(Number k) const {
-    while (k < (1 << BITS)) {
+    while (k < (Number(1) << BITS)) {
       const Number i1 = k >> LEAF_BITS;
       Leaf* leaf = root_[i1];
       if (leaf != NULL) {
@@ -226,7 +229,7 @@
     void* values[LEAF_LENGTH];
   };
 
-  Node* root_;                          // Root of radix tree
+  Node  root_;                          // Root of radix tree
   void* (*allocator_)(size_t);          // Memory allocator
 
   Node* NewNode() {
@@ -242,18 +245,19 @@
 
   explicit TCMalloc_PageMap3(void* (*allocator)(size_t)) {
     allocator_ = allocator;
-    root_ = NewNode();
+    memset(&root_, 0, sizeof(root_));
   }
 
+  ATTRIBUTE_ALWAYS_INLINE
   void* get(Number k) const {
     const Number i1 = k >> (LEAF_BITS + INTERIOR_BITS);
     const Number i2 = (k >> LEAF_BITS) & (INTERIOR_LENGTH-1);
     const Number i3 = k & (LEAF_LENGTH-1);
     if ((k >> BITS) > 0 ||
-        root_->ptrs[i1] == NULL || root_->ptrs[i1]->ptrs[i2] == NULL) {
+        root_.ptrs[i1] == NULL || root_.ptrs[i1]->ptrs[i2] == NULL) {
       return NULL;
     }
-    return reinterpret_cast<Leaf*>(root_->ptrs[i1]->ptrs[i2])->values[i3];
+    return reinterpret_cast<Leaf*>(root_.ptrs[i1]->ptrs[i2])->values[i3];
   }
 
   void set(Number k, void* v) {
@@ -261,7 +265,7 @@
     const Number i1 = k >> (LEAF_BITS + INTERIOR_BITS);
     const Number i2 = (k >> LEAF_BITS) & (INTERIOR_LENGTH-1);
     const Number i3 = k & (LEAF_LENGTH-1);
-    reinterpret_cast<Leaf*>(root_->ptrs[i1]->ptrs[i2])->values[i3] = v;
+    reinterpret_cast<Leaf*>(root_.ptrs[i1]->ptrs[i2])->values[i3] = v;
   }
 
   bool Ensure(Number start, size_t n) {
@@ -274,18 +278,18 @@
         return false;
 
       // Make 2nd level node if necessary
-      if (root_->ptrs[i1] == NULL) {
+      if (root_.ptrs[i1] == NULL) {
         Node* n = NewNode();
         if (n == NULL) return false;
-        root_->ptrs[i1] = n;
+        root_.ptrs[i1] = n;
       }
 
       // Make leaf node if necessary
-      if (root_->ptrs[i1]->ptrs[i2] == NULL) {
+      if (root_.ptrs[i1]->ptrs[i2] == NULL) {
         Leaf* leaf = reinterpret_cast<Leaf*>((*allocator_)(sizeof(Leaf)));
         if (leaf == NULL) return false;
         memset(leaf, 0, sizeof(*leaf));
-        root_->ptrs[i1]->ptrs[i2] = reinterpret_cast<Node*>(leaf);
+        root_.ptrs[i1]->ptrs[i2] = reinterpret_cast<Node*>(leaf);
       }
 
       // Advance key past whatever is covered by this leaf node
@@ -301,11 +305,11 @@
     while (k < (Number(1) << BITS)) {
       const Number i1 = k >> (LEAF_BITS + INTERIOR_BITS);
       const Number i2 = (k >> LEAF_BITS) & (INTERIOR_LENGTH-1);
-      if (root_->ptrs[i1] == NULL) {
+      if (root_.ptrs[i1] == NULL) {
         // Advance to next top-level entry
         k = (i1 + 1) << (LEAF_BITS + INTERIOR_BITS);
       } else {
-        Leaf* leaf = reinterpret_cast<Leaf*>(root_->ptrs[i1]->ptrs[i2]);
+        Leaf* leaf = reinterpret_cast<Leaf*>(root_.ptrs[i1]->ptrs[i2]);
         if (leaf != NULL) {
           for (Number i3 = (k & (LEAF_LENGTH-1)); i3 < LEAF_LENGTH; i3++) {
             if (leaf->values[i3] != NULL) {

diff --git a/src/pprof b/src/pprof
index c0c64bc..3a816c6 100755
--- a/src/pprof
+++ b/src/pprof

@@ -148,13 +148,13 @@
 sub usage_string {
   return <<EOF;
 Usage:
-pprof [options] <program> <profiles>
+$0 [options] <program> <profiles>
    <profiles> is a space separated list of profile names.
-pprof [options] <symbolized-profiles>
+$0 [options] <symbolized-profiles>
    <symbolized-profiles> is a list of profile files where each file contains
    the necessary symbol mappings  as well as profile data (likely generated
    with --raw).
-pprof [options] <profile>
+$0 [options] <profile>
    <profile> is a remote form.  Symbols are obtained from host:port$SYMBOL_PAGE
 
    Each name can be:
@@ -165,9 +165,9 @@
                          $GROWTH_PAGE, $CONTENTION_PAGE, /pprof/wall,
                          $CENSUSPROFILE_PAGE, or /pprof/filteredprofile.
    For instance:
-     pprof http://myserver.com:80$HEAP_PAGE
+     $0 http://myserver.com:80$HEAP_PAGE
    If /<service> is omitted, the service defaults to $PROFILE_PAGE (cpu profiling).
-pprof --symbols <program>
+$0 --symbols <program>
    Maps addresses to symbol names.  In this mode, stdin should be a
    list of library mappings, in the same format as is found in the heap-
    and cpu-profile files (this loosely matches that of /proc/self/maps
@@ -204,7 +204,7 @@
    --disasm=<regexp>   Generate disassembly of matching routines
    --symbols           Print demangled symbol names found at given addresses
    --dot               Generate DOT file to stdout
-   --ps                Generate Postcript to stdout
+   --ps                Generate Postscript to stdout
    --pdf               Generate PDF to stdout
    --svg               Generate SVG to stdout
    --gif               Generate GIF to stdout
@@ -252,29 +252,29 @@
 
 Examples:
 
-pprof /bin/ls ls.prof
+$0 /bin/ls ls.prof
                        Enters "interactive" mode
-pprof --text /bin/ls ls.prof
+$0 --text /bin/ls ls.prof
                        Outputs one line per procedure
-pprof --web /bin/ls ls.prof
+$0 --web /bin/ls ls.prof
                        Displays annotated call-graph in web browser
-pprof --gv /bin/ls ls.prof
+$0 --gv /bin/ls ls.prof
                        Displays annotated call-graph via 'gv'
-pprof --gv --focus=Mutex /bin/ls ls.prof
+$0 --gv --focus=Mutex /bin/ls ls.prof
                        Restricts to code paths including a .*Mutex.* entry
-pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof
+$0 --gv --focus=Mutex --ignore=string /bin/ls ls.prof
                        Code paths including Mutex but not string
-pprof --list=getdir /bin/ls ls.prof
+$0 --list=getdir /bin/ls ls.prof
                        (Per-line) annotated source listing for getdir()
-pprof --disasm=getdir /bin/ls ls.prof
+$0 --disasm=getdir /bin/ls ls.prof
                        (Per-PC) annotated disassembly for getdir()
 
-pprof http://localhost:1234/
+$0 http://localhost:1234/
                        Enters "interactive" mode
-pprof --text localhost:1234
+$0 --text localhost:1234
                        Outputs one line per procedure for localhost:1234
-pprof --raw localhost:1234 > ./local.raw
-pprof --text ./local.raw
+$0 --raw localhost:1234 > ./local.raw
+$0 --text ./local.raw
                        Fetches a remote profile for later analysis and then
                        analyzes it in text mode.
 EOF
@@ -1302,7 +1302,7 @@
     $filename = "&STDOUT";
   }
   open(CG, ">$filename");
-  printf CG ("events: Hits\n\n");
+  print CG ("events: Hits\n\n");
   foreach my $call ( map { $_->[0] }
                      sort { $a->[1] cmp $b ->[1] ||
                             $a->[2] <=> $b->[2] }
@@ -1318,14 +1318,14 @@
     # TODO(csilvers): for better compression, collect all the
     # caller/callee_files and functions first, before printing
     # anything, and only compress those referenced more than once.
-    printf CG CompressedCGName("fl", $caller_file, \%filename_to_index_map);
-    printf CG CompressedCGName("fn", $caller_function, \%fnname_to_index_map);
+    print CG CompressedCGName("fl", $caller_file, \%filename_to_index_map);
+    print CG CompressedCGName("fn", $caller_function, \%fnname_to_index_map);
     if (defined $6) {
-      printf CG CompressedCGName("cfl", $callee_file, \%filename_to_index_map);
-      printf CG CompressedCGName("cfn", $callee_function, \%fnname_to_index_map);
-      printf CG ("calls=$count $callee_line\n");
+      print CG CompressedCGName("cfl", $callee_file, \%filename_to_index_map);
+      print CG CompressedCGName("cfn", $callee_function, \%fnname_to_index_map);
+      print CG ("calls=$count $callee_line\n");
     }
-    printf CG ("$caller_line $count\n\n");
+    print CG ("$caller_line $count\n\n");
   }
 }
 
@@ -3410,7 +3410,7 @@
 # Add a timeout flat to URL_FETCHER.  Returns a new list.
 sub AddFetchTimeout {
   my $timeout = shift;
-  my @fetcher = shift;
+  my @fetcher = @_;
   if (defined($timeout)) {
     if (join(" ", @fetcher) =~ m/\bcurl -s/) {
       push(@fetcher, "--max-time", sprintf("%d", $timeout));
@@ -4511,7 +4511,6 @@
   my $zero_offset = HexExtend("0");
 
   my $buildvar = "";
-  my $priorlib = "";
   foreach my $l (split("\n", $map)) {
     if ($l =~ m/^\s*build=(.*)$/) {
       $buildvar = $1;
@@ -4521,7 +4520,7 @@
     my $finish;
     my $offset;
     my $lib;
-    if ($l =~ /^($h)-($h)\s+..x.\s+($h)\s+\S+:\S+\s+\d+\s+(.+\.(so|dll|dylib|bundle)((\.\d+)+\w*(\.\d+){0,3})?)$/i) {
+    if ($l =~ /^($h)-($h)\s+..x.\s+($h)\s+\S+:\S+\s+\d+\s+(.+\.(so|dll|dylib|bundle|node)((\.\d+)+\w*(\.\d+){0,3})?)$/i) {
       # Full line from /proc/self/maps.  Example:
       #   40000000-40015000 r-xp 00000000 03:01 12845071   /lib/ld-2.3.2.so
       $start = HexExtend($1);
@@ -4568,16 +4567,7 @@
       }
     }
 
-    # If we find multiple executable segments for a single library, merge them
-    # into a single entry that spans the complete address range.
-    if ($lib eq $priorlib) {
-      my $prior = pop(@{$result});
-      $start = @$prior[1];
-      # TODO $offset may be wrong if .text is not in the final segment.
-    }
-
     push(@{$result}, [$lib, $start, $finish, $offset]);
-    $priorlib = $lib;
   }
 
   # Append special entry for additional library (not relocated)
@@ -5161,7 +5151,7 @@
     }
     print STDERR "If you want to investigate this profile further, you can do:\n";
     print STDERR "\n";
-    print STDERR "  pprof \\\n";
+    print STDERR "  $0 \\\n";
     print STDERR "    $main::prog \\\n";
     print STDERR "    $main::collected_profile\n";
     print STDERR "\n";
@@ -5332,7 +5322,7 @@
   my $demangle_flag = "";
   my $cppfilt_flag = "";
   my $to_devnull = ">$dev_null 2>&1";
-  if (system(ShellEscape($nm, "--demangle", "image") . $to_devnull) == 0) {
+  if (system(ShellEscape($nm, "--demangle", $image) . $to_devnull) == 0) {
     # In this mode, we do "nm --demangle <foo>"
     $demangle_flag = "--demangle";
     $cppfilt_flag = "";

diff --git a/src/profile-handler.cc b/src/profile-handler.cc
index 66c9d74..0db17bb 100644
--- a/src/profile-handler.cc
+++ b/src/profile-handler.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2009, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -49,8 +49,11 @@
 #if HAVE_LINUX_SIGEV_THREAD_ID
 // for timer_{create,settime} and associated typedefs & constants
 #include <time.h>
-// for sys_gettid
-#include "base/linux_syscall_support.h"
+// for sigevent
+#include <signal.h>
+// for SYS_gettid
+#include <sys/syscall.h>
+
 // for perftools_pthread_key_create
 #include "maybe_threads.h"
 #endif
@@ -61,6 +64,18 @@
 #include "base/spinlock.h"
 #include "maybe_threads.h"
 
+// Some Linux systems don't have sigev_notify_thread_id defined in
+// signal.h (despite having SIGEV_THREAD_ID defined) and also lack
+// working linux/signal.h. So lets workaround. Note, we know that at
+// least on Linux sigev_notify_thread_id is macro.
+//
+// See https://sourceware.org/bugzilla/show_bug.cgi?id=27417 and
+// https://bugzilla.kernel.org/show_bug.cgi?id=200081
+//
+#if __linux__ && HAVE_LINUX_SIGEV_THREAD_ID && !defined(sigev_notify_thread_id)
+#define sigev_notify_thread_id _sigev_un._tid
+#endif
+
 using std::list;
 using std::string;
 
@@ -79,39 +94,44 @@
   void* callback_arg;
 };
 
+// Blocks a signal from being delivered to the current thread while the object
+// is alive. Unblocks it upon destruction.
+class ScopedSignalBlocker {
+ public:
+  ScopedSignalBlocker(int signo) {
+    sigemptyset(&sig_set_);
+    sigaddset(&sig_set_, signo);
+    RAW_CHECK(sigprocmask(SIG_BLOCK, &sig_set_, NULL) == 0,
+              "sigprocmask (block)");
+  }
+  ~ScopedSignalBlocker() {
+    RAW_CHECK(sigprocmask(SIG_UNBLOCK, &sig_set_, NULL) == 0,
+              "sigprocmask (unblock)");
+  }
+
+ private:
+  sigset_t sig_set_;
+};
+
 // This class manages profile timers and associated signal handler. This is a
 // a singleton.
 class ProfileHandler {
  public:
-  // Registers the current thread with the profile handler. On systems which
-  // have a separate interval timer for each thread, this function starts the
-  // timer for the current thread.
-  //
-  // The function also attempts to determine whether or not timers are shared by
-  // all threads in the process.  (With LinuxThreads, and with NPTL on some
-  // Linux kernel versions, each thread has separate timers.)
-  //
-  // Prior to determining whether timers are shared, this function will
-  // unconditionally start the timer.  However, if this function determines
-  // that timers are shared, then it will stop the timer if no callbacks are
-  // currently registered.
+  // Registers the current thread with the profile handler.
   void RegisterThread();
 
   // Registers a callback routine to receive profile timer ticks. The returned
   // token is to be used when unregistering this callback and must not be
-  // deleted by the caller. Registration of the first callback enables the
-  // SIGPROF handler (or SIGALRM if using ITIMER_REAL).
+  // deleted by the caller.
   ProfileHandlerToken* RegisterCallback(ProfileHandlerCallback callback,
                                         void* callback_arg);
 
   // Unregisters a previously registered callback. Expects the token returned
-  // by the corresponding RegisterCallback routine. Unregistering the last
-  // callback disables the SIGPROF handler (or SIGALRM if using ITIMER_REAL).
+  // by the corresponding RegisterCallback routine.
   void UnregisterCallback(ProfileHandlerToken* token)
       NO_THREAD_SAFETY_ANALYSIS;
 
-  // Unregisters all the callbacks, stops the timer if shared, disables the
-  // SIGPROF (or SIGALRM) handler and clears the timer_sharing_ state.
+  // Unregisters all the callbacks and stops the timer(s).
   void Reset();
 
   // Gets the current state of profile handler.
@@ -138,13 +158,18 @@
   // Initializes the ProfileHandler singleton via GoogleOnceInit.
   static void Init();
 
-  // The number of SIGPROF (or SIGALRM for ITIMER_REAL) interrupts received.
+  // Timer state as configured previously.
+  bool timer_running_;
+
+  // The number of profiling signal interrupts received.
   int64 interrupts_ GUARDED_BY(signal_lock_);
 
-  // SIGPROF/SIGALRM interrupt frequency, read-only after construction.
+  // Profiling signal interrupt frequency, read-only after construction.
   int32 frequency_;
 
-  // ITIMER_PROF (which uses SIGPROF), or ITIMER_REAL (which uses SIGALRM)
+  // ITIMER_PROF (which uses SIGPROF), or ITIMER_REAL (which uses SIGALRM).
+  // Translated into an equivalent choice of clock if per_thread_timer_enabled_
+  // is true.
   int timer_type_;
 
   // Signal number for timer signal.
@@ -156,6 +181,7 @@
   // Is profiling allowed at all?
   bool allowed_;
 
+  // Must be false if HAVE_LINUX_SIGEV_THREAD_ID is not defined.
   bool per_thread_timer_enabled_;
 
 #ifdef HAVE_LINUX_SIGEV_THREAD_ID
@@ -164,19 +190,6 @@
   pthread_key_t thread_timer_key;
 #endif
 
-  // Whether or not the threading system provides interval timers that are
-  // shared by all threads in a process.
-  enum {
-    // No timer initialization attempted yet.
-    TIMERS_UNTOUCHED,
-    // First thread has registered and set timer.
-    TIMERS_ONE_SET,
-    // Timers are shared by all threads.
-    TIMERS_SHARED,
-    // Timers are separate in each thread.
-    TIMERS_SEPARATE
-  } timer_sharing_ GUARDED_BY(control_lock_);
-
   // This lock serializes the registration of threads and protects the
   // callbacks_ list below.
   // Locking order:
@@ -203,32 +216,16 @@
   typedef CallbackList::iterator CallbackIterator;
   CallbackList callbacks_ GUARDED_BY(signal_lock_);
 
-  // Starts the interval timer.  If the thread library shares timers between
-  // threads, this function starts the shared timer. Otherwise, this will start
-  // the timer in the current thread.
-  void StartTimer() EXCLUSIVE_LOCKS_REQUIRED(control_lock_);
-
-  // Stops the interval timer. If the thread library shares timers between
-  // threads, this fucntion stops the shared timer. Otherwise, this will stop
-  // the timer in the current thread.
-  void StopTimer() EXCLUSIVE_LOCKS_REQUIRED(control_lock_);
-
-  // Returns true if the profile interval timer is enabled in the current
-  // thread.  This actually checks the kernel's interval timer setting.  (It is
-  // used to detect whether timers are shared or separate.)
-  bool IsTimerRunning() EXCLUSIVE_LOCKS_REQUIRED(control_lock_);
-
-  // Sets the timer interrupt signal handler.
-  void EnableHandler() EXCLUSIVE_LOCKS_REQUIRED(control_lock_);
-
-  // Disables (ignores) the timer interrupt signal.
-  void DisableHandler() EXCLUSIVE_LOCKS_REQUIRED(control_lock_);
+  // Starts or stops the interval timer.
+  // Will ignore any requests to enable or disable when
+  // per_thread_timer_enabled_ is true.
+  void UpdateTimer(bool enable) EXCLUSIVE_LOCKS_REQUIRED(signal_lock_);
 
   // Returns true if the handler is not being used by something else.
   // This checks the kernel's signal handler table.
   bool IsSignalHandlerAvailable();
 
-  // SIGPROF/SIGALRM handler. Iterate over and call all the registered callbacks.
+  // Signal handler. Iterates over and calls all the registered callbacks.
   static void SignalHandler(int sig, siginfo_t* sinfo, void* ucontext);
 
   DISALLOW_COPY_AND_ASSIGN(ProfileHandler);
@@ -240,32 +237,25 @@
 const int32 ProfileHandler::kMaxFrequency;
 const int32 ProfileHandler::kDefaultFrequency;
 
-// If we are LD_PRELOAD-ed against a non-pthreads app, then
-// pthread_once won't be defined.  We declare it here, for that
-// case (with weak linkage) which will cause the non-definition to
-// resolve to NULL.  We can then check for NULL or not in Instance.
-extern "C" int pthread_once(pthread_once_t *, void (*)(void))
-    ATTRIBUTE_WEAK;
+// If we are LD_PRELOAD-ed against a non-pthreads app, then these functions
+// won't be defined.  We declare them here, for that case (with weak linkage)
+// which will cause the non-definition to resolve to NULL.  We can then check
+// for NULL or not in Instance.
+extern "C" {
+int pthread_once(pthread_once_t *, void (*)(void)) ATTRIBUTE_WEAK;
+int pthread_kill(pthread_t thread_id, int signo) ATTRIBUTE_WEAK;
 
 #if HAVE_LINUX_SIGEV_THREAD_ID
-
-// We use weak alias to timer_create to avoid runtime dependency on
-// -lrt and in turn -lpthread.
-//
-// At runtime we detect if timer_create is available and if so we
-// can enable linux-sigev-thread mode of profiling
-extern "C" {
-  int timer_create(clockid_t clockid, struct sigevent *evp,
-                            timer_t *timerid)
-    ATTRIBUTE_WEAK;
-  int timer_delete(timer_t timerid)
-    ATTRIBUTE_WEAK;
-  int timer_settime(timer_t timerid, int flags,
-                    const struct itimerspec *value,
-                    struct itimerspec *ovalue)
-    ATTRIBUTE_WEAK;
+int timer_create(clockid_t clockid, struct sigevent* evp,
+                 timer_t* timerid) ATTRIBUTE_WEAK;
+int timer_delete(timer_t timerid) ATTRIBUTE_WEAK;
+int timer_settime(timer_t timerid, int flags, const struct itimerspec* value,
+                  struct itimerspec* ovalue) ATTRIBUTE_WEAK;
+#endif
 }
 
+#if HAVE_LINUX_SIGEV_THREAD_ID
+
 struct timer_id_holder {
   timer_t timerid;
   timer_id_holder(timer_t _timerid) : timerid(_timerid) {}
@@ -297,7 +287,7 @@
   struct itimerspec its;
   memset(&sevp, 0, sizeof(sevp));
   sevp.sigev_notify = SIGEV_THREAD_ID;
-  sevp._sigev_un._tid = sys_gettid();
+  sevp.sigev_notify_thread_id = syscall(SYS_gettid);
   sevp.sigev_signo = signal_number;
   clockid_t clock = CLOCK_THREAD_CPUTIME_ID;
   if (timer_type == ITIMER_REAL) {
@@ -343,11 +333,11 @@
 }
 
 ProfileHandler::ProfileHandler()
-    : interrupts_(0),
+    : timer_running_(false),
+      interrupts_(0),
       callback_count_(0),
       allowed_(true),
-      per_thread_timer_enabled_(false),
-      timer_sharing_(TIMERS_UNTOUCHED) {
+      per_thread_timer_enabled_(false) {
   SpinLockHolder cl(&control_lock_);
 
   timer_type_ = (getenv("CPUPROFILE_REALTIME") ? ITIMER_REAL : ITIMER_PROF);
@@ -376,7 +366,6 @@
 
   if (per_thread || signal_number) {
     if (timer_create && pthread_once) {
-      timer_sharing_ = TIMERS_SEPARATE;
       CreateThreadTimerKey(&thread_timer_key);
       per_thread_timer_enabled_ = true;
       // Override signal number if requested.
@@ -401,10 +390,12 @@
     return;
   }
 
-  // Ignore signals until we decide to turn profiling on.  (Paranoia;
-  // should already be ignored.)
-  DisableHandler();
-
+  // Install the signal handler.
+  struct sigaction sa;
+  sa.sa_sigaction = SignalHandler;
+  sa.sa_flags = SA_RESTART | SA_SIGINFO;
+  sigemptyset(&sa.sa_mask);
+  RAW_CHECK(sigaction(signal_number_, &sa, NULL) == 0, "sigprof (enable)");
 }
 
 ProfileHandler::~ProfileHandler() {
@@ -423,47 +414,17 @@
     return;
   }
 
-  // We try to detect whether timers are being shared by setting a
-  // timer in the first call to this function, then checking whether
-  // it's set in the second call.
-  //
-  // Note that this detection method requires that the first two calls
-  // to RegisterThread must be made from different threads.  (Subsequent
-  // calls will see timer_sharing_ set to either TIMERS_SEPARATE or
-  // TIMERS_SHARED, and won't try to detect the timer sharing type.)
-  //
-  // Also note that if timer settings were inherited across new thread
-  // creation but *not* shared, this approach wouldn't work.  That's
-  // not an issue for any Linux threading implementation, and should
-  // not be a problem for a POSIX-compliant threads implementation.
-  switch (timer_sharing_) {
-    case TIMERS_UNTOUCHED:
-      StartTimer();
-      timer_sharing_ = TIMERS_ONE_SET;
-      break;
-    case TIMERS_ONE_SET:
-      // If the timer is running, that means that the main thread's
-      // timer setup is seen in this (second) thread -- and therefore
-      // that timers are shared.
-      if (IsTimerRunning()) {
-        timer_sharing_ = TIMERS_SHARED;
-        // If callback is already registered, we have to keep the timer
-        // running.  If not, we disable the timer here.
-        if (callback_count_ == 0) {
-          StopTimer();
-        }
-      } else {
-        timer_sharing_ = TIMERS_SEPARATE;
-        StartTimer();
-      }
-      break;
-    case TIMERS_SHARED:
-      // Nothing needed.
-      break;
-    case TIMERS_SEPARATE:
-      StartTimer();
-      break;
+  // Record the thread identifier and start the timer if profiling is on.
+  ScopedSignalBlocker block(signal_number_);
+  SpinLockHolder sl(&signal_lock_);
+#if HAVE_LINUX_SIGEV_THREAD_ID
+  if (per_thread_timer_enabled_) {
+    StartLinuxThreadTimer(timer_type_, signal_number_, frequency_,
+                          thread_timer_key);
+    return;
   }
+#endif
+  UpdateTimer(callback_count_ > 0);
 }
 
 ProfileHandlerToken* ProfileHandler::RegisterCallback(
@@ -472,17 +433,13 @@
   ProfileHandlerToken* token = new ProfileHandlerToken(callback, callback_arg);
 
   SpinLockHolder cl(&control_lock_);
-  DisableHandler();
   {
+    ScopedSignalBlocker block(signal_number_);
     SpinLockHolder sl(&signal_lock_);
     callbacks_.push_back(token);
+    ++callback_count_;
+    UpdateTimer(true);
   }
-  // Start the timer if timer is shared and this is a first callback.
-  if ((callback_count_ == 0) && (timer_sharing_ == TIMERS_SHARED)) {
-    StartTimer();
-  }
-  ++callback_count_;
-  EnableHandler();
   return token;
 }
 
@@ -492,17 +449,14 @@
        ++it) {
     if ((*it) == token) {
       RAW_CHECK(callback_count_ > 0, "Invalid callback count");
-      DisableHandler();
       {
+        ScopedSignalBlocker block(signal_number_);
         SpinLockHolder sl(&signal_lock_);
         delete *it;
         callbacks_.erase(it);
-      }
-      --callback_count_;
-      if (callback_count_ > 0) {
-        EnableHandler();
-      } else if (timer_sharing_ == TIMERS_SHARED) {
-        StopTimer();
+        --callback_count_;
+        if (callback_count_ == 0)
+          UpdateTimer(false);
       }
       return;
     }
@@ -513,8 +467,8 @@
 
 void ProfileHandler::Reset() {
   SpinLockHolder cl(&control_lock_);
-  DisableHandler();
   {
+    ScopedSignalBlocker block(signal_number_);
     SpinLockHolder sl(&signal_lock_);
     CallbackIterator it = callbacks_.begin();
     while (it != callbacks_.end()) {
@@ -523,96 +477,44 @@
       delete *tmp;
       callbacks_.erase(tmp);
     }
+    callback_count_ = 0;
+    UpdateTimer(false);
   }
-  callback_count_ = 0;
-  if (timer_sharing_ == TIMERS_SHARED) {
-    StopTimer();
-  }
-  timer_sharing_ = TIMERS_UNTOUCHED;
 }
 
 void ProfileHandler::GetState(ProfileHandlerState* state) {
   SpinLockHolder cl(&control_lock_);
-  DisableHandler();
   {
+    ScopedSignalBlocker block(signal_number_);
     SpinLockHolder sl(&signal_lock_);  // Protects interrupts_.
     state->interrupts = interrupts_;
   }
-  if (callback_count_ > 0) {
-    EnableHandler();
-  }
   state->frequency = frequency_;
   state->callback_count = callback_count_;
   state->allowed = allowed_;
 }
 
-void ProfileHandler::StartTimer() {
-  if (!allowed_) {
+void ProfileHandler::UpdateTimer(bool enable) {
+  if (per_thread_timer_enabled_) {
+    // Ignore any attempts to disable it because that's not supported, and it's
+    // always enabled so enabling is always a NOP.
     return;
   }
 
-#if HAVE_LINUX_SIGEV_THREAD_ID
-  if (per_thread_timer_enabled_) {
-    StartLinuxThreadTimer(timer_type_, signal_number_, frequency_, thread_timer_key);
+  if (enable == timer_running_) {
     return;
   }
-#endif
+  timer_running_ = enable;
 
   struct itimerval timer;
-  timer.it_interval.tv_sec = 0;
-  timer.it_interval.tv_usec = 1000000 / frequency_;
+  static const int kMillion = 1000000;
+  int interval_usec = enable ? kMillion / frequency_ : 0;
+  timer.it_interval.tv_sec = interval_usec / kMillion;
+  timer.it_interval.tv_usec = interval_usec % kMillion;
   timer.it_value = timer.it_interval;
   setitimer(timer_type_, &timer, 0);
 }
 
-void ProfileHandler::StopTimer() {
-  if (!allowed_) {
-    return;
-  }
-  if (per_thread_timer_enabled_) {
-    RAW_LOG(FATAL, "StopTimer cannot be called in linux-per-thread-timers mode");
-  }
-
-  struct itimerval timer;
-  memset(&timer, 0, sizeof timer);
-  setitimer(timer_type_, &timer, 0);
-}
-
-bool ProfileHandler::IsTimerRunning() {
-  if (!allowed_) {
-    return false;
-  }
-  if (per_thread_timer_enabled_) {
-    return false;
-  }
-  struct itimerval current_timer;
-  RAW_CHECK(0 == getitimer(timer_type_, &current_timer), "getitimer");
-  return (current_timer.it_value.tv_sec != 0 ||
-          current_timer.it_value.tv_usec != 0);
-}
-
-void ProfileHandler::EnableHandler() {
-  if (!allowed_) {
-    return;
-  }
-  struct sigaction sa;
-  sa.sa_sigaction = SignalHandler;
-  sa.sa_flags = SA_RESTART | SA_SIGINFO;
-  sigemptyset(&sa.sa_mask);
-  RAW_CHECK(sigaction(signal_number_, &sa, NULL) == 0, "sigprof (enable)");
-}
-
-void ProfileHandler::DisableHandler() {
-  if (!allowed_) {
-    return;
-  }
-  struct sigaction sa;
-  sa.sa_handler = SIG_IGN;
-  sa.sa_flags = SA_RESTART;
-  sigemptyset(&sa.sa_mask);
-  RAW_CHECK(sigaction(signal_number_, &sa, NULL) == 0, "sigprof (disable)");
-}
-
 bool ProfileHandler::IsSignalHandlerAvailable() {
   struct sigaction sa;
   RAW_CHECK(sigaction(signal_number_, NULL, &sa) == 0, "is-signal-handler avail");
@@ -632,7 +534,7 @@
   // At this moment, instance_ must be initialized because the handler is
   // enabled in RegisterThread or RegisterCallback only after
   // ProfileHandler::Instance runs.
-  ProfileHandler* instance = ANNOTATE_UNPROTECTED_READ(instance_);
+  ProfileHandler* instance = instance_;
   RAW_CHECK(instance != NULL, "ProfileHandler is not initialized");
   {
     SpinLockHolder sl(&instance->signal_lock_);
@@ -650,24 +552,24 @@
 // executed in the context of the main thread.
 REGISTER_MODULE_INITIALIZER(profile_main, ProfileHandlerRegisterThread());
 
-extern "C" void ProfileHandlerRegisterThread() {
+void ProfileHandlerRegisterThread() {
   ProfileHandler::Instance()->RegisterThread();
 }
 
-extern "C" ProfileHandlerToken* ProfileHandlerRegisterCallback(
+ProfileHandlerToken* ProfileHandlerRegisterCallback(
     ProfileHandlerCallback callback, void* callback_arg) {
   return ProfileHandler::Instance()->RegisterCallback(callback, callback_arg);
 }
 
-extern "C" void ProfileHandlerUnregisterCallback(ProfileHandlerToken* token) {
+void ProfileHandlerUnregisterCallback(ProfileHandlerToken* token) {
   ProfileHandler::Instance()->UnregisterCallback(token);
 }
 
-extern "C" void ProfileHandlerReset() {
+void ProfileHandlerReset() {
   return ProfileHandler::Instance()->Reset();
 }
 
-extern "C" void ProfileHandlerGetState(ProfileHandlerState* state) {
+void ProfileHandlerGetState(ProfileHandlerState* state) {
   ProfileHandler::Instance()->GetState(state);
 }
 
@@ -677,21 +579,21 @@
 // work as well for profiling, and also interferes with alarm().  Because of
 // these issues, unless a specific need is identified, profiler support is
 // disabled under Cygwin.
-extern "C" void ProfileHandlerRegisterThread() {
+void ProfileHandlerRegisterThread() {
 }
 
-extern "C" ProfileHandlerToken* ProfileHandlerRegisterCallback(
+ProfileHandlerToken* ProfileHandlerRegisterCallback(
     ProfileHandlerCallback callback, void* callback_arg) {
   return NULL;
 }
 
-extern "C" void ProfileHandlerUnregisterCallback(ProfileHandlerToken* token) {
+void ProfileHandlerUnregisterCallback(ProfileHandlerToken* token) {
 }
 
-extern "C" void ProfileHandlerReset() {
+void ProfileHandlerReset() {
 }
 
-extern "C" void ProfileHandlerGetState(ProfileHandlerState* state) {
+void ProfileHandlerGetState(ProfileHandlerState* state) {
 }
 
 #endif  // OS_CYGWIN

diff --git a/src/profile-handler.h b/src/profile-handler.h
index 4f96a18..b6bb0a1 100644
--- a/src/profile-handler.h
+++ b/src/profile-handler.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2009, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -32,15 +32,17 @@
  * Author: Nabeel Mian
  *
  * This module manages the cpu profile timers and the associated interrupt
- * handler. When enabled, all registered threads in the program are profiled.
- * (Note: if using linux 2.4 or earlier, you must use the Thread class, in
- * google3/thread, to ensure all threads are profiled.)
+ * handler. When enabled, all threads in the program are profiled.
  *
  * Any component interested in receiving a profile timer interrupt can do so by
  * registering a callback. All registered callbacks must be async-signal-safe.
  *
- * Note: This module requires the sole ownership of ITIMER_PROF timer and the
- * SIGPROF signal.
+ * Note: This module requires the sole ownership of the configured timer and
+ * signal. The timer defaults to ITIMER_PROF, can be changed to ITIMER_REAL by
+ * the environment variable CPUPROFILE_REALTIME, or is changed to a POSIX timer
+ * with CPUPROFILE_PER_THREAD_TIMERS. The signal defaults to SIGPROF/SIGALRM to
+ * match the choice of timer and can be set to an arbitrary value using
+ * CPUPROFILE_TIMER_SIGNAL with CPUPROFILE_PER_THREAD_TIMERS.
  */
 
 #ifndef BASE_PROFILE_HANDLER_H_
@@ -53,11 +55,6 @@
 #endif
 #include "base/basictypes.h"
 
-/* All this code should be usable from within C apps. */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* Forward declaration. */
 struct ProfileHandlerToken;
 
@@ -142,8 +139,4 @@
 };
 void ProfileHandlerGetState(struct ProfileHandlerState* state);
 
-#ifdef __cplusplus
-}  /* extern "C" */
-#endif
-
 #endif  /* BASE_PROFILE_HANDLER_H_ */

diff --git a/src/profiledata.cc b/src/profiledata.cc
index 8b05d3a..7bfb727 100644
--- a/src/profiledata.cc
+++ b/src/profiledata.cc

@@ -192,7 +192,7 @@
   DumpProcSelfMaps(out_);
 
   Reset();
-  fprintf(stderr, "PROFILE: interrupts/evictions/bytes = %d/%d/%" PRIuS "\n",
+  fprintf(stderr, "PROFILE: interrupts/evictions/bytes = %d/%d/%zu\n",
           count_, evictions_, total_bytes_);
 }
 

diff --git a/src/profiledata.h b/src/profiledata.h
index 44033f0..a4f0446 100644
--- a/src/profiledata.h
+++ b/src/profiledata.h

@@ -35,7 +35,7 @@
 // Collect profiling data.
 //
 // The profile data file format is documented in
-// doc/cpuprofile-fileformat.html
+// docs/cpuprofile-fileformat.html
 
 
 #ifndef BASE_PROFILEDATA_H_
@@ -101,7 +101,7 @@
     int      frequency_;                  // Sample frequency.
   };
 
-  static const int kMaxStackDepth = 64;  // Max stack depth stored in profile
+  static const int kMaxStackDepth = 254;  // Max stack depth stored in profile
 
   ProfileData();
   ~ProfileData();

diff --git a/src/profiler.cc b/src/profiler.cc
index b862ae6..227deb2 100644
--- a/src/profiler.cc
+++ b/src/profiler.cc

@@ -65,9 +65,6 @@
 #include "base/sysinfo.h"             /* for GetUniquePathFromEnv, etc */
 #include "profiledata.h"
 #include "profile-handler.h"
-#ifdef HAVE_CONFLICT_SIGNAL_H
-#include "conflict-signal.h"          /* used on msvc machines */
-#endif
 
 using std::string;
 
@@ -144,34 +141,31 @@
 // number is defined in the environment variable CPUPROFILESIGNAL.
 static void CpuProfilerSwitch(int signal_number)
 {
-    bool static started = false;
-	static unsigned profile_count = 0;
-    static char base_profile_name[1024] = "\0";
+  static unsigned profile_count;
+  static char base_profile_name[PATH_MAX];
+  static bool started = false;
 
-	if (base_profile_name[0] == '\0') {
-    	if (!GetUniquePathFromEnv("CPUPROFILE", base_profile_name)) {
-        	RAW_LOG(FATAL,"Cpu profiler switch is registered but no CPUPROFILE is defined");
-        	return;
-    	}
-	}
-    if (!started) 
-    {
-    	char full_profile_name[1024];
-
-		snprintf(full_profile_name, sizeof(full_profile_name), "%s.%u",
-                 base_profile_name, profile_count++);
-
-        if(!ProfilerStart(full_profile_name))
-        {
-            RAW_LOG(FATAL, "Can't turn on cpu profiling for '%s': %s\n",
-                    full_profile_name, strerror(errno));
-        }
+  if (base_profile_name[0] == '\0') {
+    if (!GetUniquePathFromEnv("CPUPROFILE", base_profile_name)) {
+      RAW_LOG(FATAL,"Cpu profiler switch is registered but no CPUPROFILE is defined");
+      return;
     }
-    else    
-    {
-        ProfilerStop();
+  }
+
+  if (!started) {
+    char full_profile_name[PATH_MAX + 16];
+
+    snprintf(full_profile_name, sizeof(full_profile_name), "%s.%u",
+             base_profile_name, profile_count++);
+
+    if(!ProfilerStart(full_profile_name)) {
+      RAW_LOG(FATAL, "Can't turn on cpu profiling for '%s': %s\n",
+              full_profile_name, strerror(errno));
     }
-    started = !started;
+  } else {
+    ProfilerStop();
+  }
+  started = !started;
 }
 
 // Profile data structure singleton: Constructor will check to see if
@@ -360,7 +354,7 @@
                                          3, signal_ucontext);
 
     void **used_stack;
-    if (stack[1] == stack[0]) {
+    if (depth > 0 && stack[1] == stack[0]) {
       // in case of non-frame-pointer-based unwinding we will get
       // duplicate of PC in stack[1], which we don't want
       used_stack = stack + 1;
@@ -405,6 +399,11 @@
   CpuProfiler::instance_.GetCurrentState(state);
 }
 
+extern "C" PERFTOOLS_DLL_DECL int ProfilerGetStackTrace(
+    void** result, int max_depth, int skip_count, const void *uc) {
+  return GetStackTraceWithContext(result, max_depth, skip_count, uc);
+}
+
 #else  // OS_CYGWIN
 
 // ITIMER_PROF doesn't work under cygwin.  ITIMER_REAL is available, but doesn't
@@ -423,6 +422,10 @@
 extern "C" void ProfilerGetCurrentState(ProfilerState* state) {
   memset(state, 0, sizeof(*state));
 }
+extern "C" int ProfilerGetStackTrace(
+    void** result, int max_depth, int skip_count, const void *uc) {
+  return 0;
+}
 
 #endif  // OS_CYGWIN
 

diff --git a/src/raw_printer.cc b/src/raw_printer.cc
index 3cf028e..785d473 100644
--- a/src/raw_printer.cc
+++ b/src/raw_printer.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/raw_printer.h b/src/raw_printer.h
index 9288bb5..5f57bbf 100644
--- a/src/raw_printer.h
+++ b/src/raw_printer.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/sampler.cc b/src/sampler.cc
old mode 100755
new mode 100644
index cc71112..20bf1ad
--- a/src/sampler.cc
+++ b/src/sampler.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -57,34 +57,16 @@
 
 namespace tcmalloc {
 
-// Statics for Sampler
-double Sampler::log_table_[1<<kFastlogNumBits];
-
-// Populate the lookup table for FastLog2.
-// This approximates the log2 curve with a step function.
-// Steps have height equal to log2 of the mid-point of the step.
-void Sampler::PopulateFastLog2Table() {
-  for (int i = 0; i < (1<<kFastlogNumBits); i++) {
-    log_table_[i] = (log(1.0 + static_cast<double>(i+0.5)/(1<<kFastlogNumBits))
-                     / log(2.0));
-  }
-}
-
 int Sampler::GetSamplePeriod() {
   return FLAGS_tcmalloc_sample_parameter;
 }
 
 // Run this before using your sampler
-void Sampler::Init(uint32_t seed) {
+void Sampler::Init(uint64_t seed) {
+  ASSERT(seed != 0);
+
   // Initialize PRNG
-  if (seed != 0) {
-    rnd_ = seed;
-  } else {
-    rnd_ = static_cast<uint32_t>(reinterpret_cast<uintptr_t>(this));
-    if (rnd_ == 0) {
-      rnd_ = 1;
-    }
-  }
+  rnd_ = seed;
   // Step it forward 20 times for good measure
   for (int i = 0; i < 20; i++) {
     rnd_ = NextRandom(rnd_);
@@ -93,10 +75,7 @@
   bytes_until_sample_ = PickNextSamplingPoint();
 }
 
-// Initialize the Statics for the Sampler class
-void Sampler::InitStatics() {
-  PopulateFastLog2Table();
-}
+#define MAX_SSIZE (static_cast<ssize_t>(static_cast<size_t>(static_cast<ssize_t>(-1)) >> 1))
 
 // Generates a geometric variable with the specified mean (512K by default).
 // This is done by generating a random number between 0 and 1 and applying
@@ -109,7 +88,17 @@
 // -log_e(q)/m = x
 // log_2(q) * (-log_e(2) * 1/m) = x
 // In the code, q is actually in the range 1 to 2**26, hence the -26 below
-size_t Sampler::PickNextSamplingPoint() {
+ssize_t Sampler::PickNextSamplingPoint() {
+  if (FLAGS_tcmalloc_sample_parameter <= 0) {
+    // In this case, we don't want to sample ever, and the larger a
+    // value we put here, the longer until we hit the slow path
+    // again. However, we have to support the flag changing at
+    // runtime, so pick something reasonably large (to keep overhead
+    // low) but small enough that we'll eventually start to sample
+    // again.
+    return 16 << 20;
+  }
+
   rnd_ = NextRandom(rnd_);
   // Take the top 26 bits as the random number
   // (This plus the 1<<58 sampling bound give a max possible step of
@@ -119,13 +108,27 @@
   // under piii debug for some binaries.
   double q = static_cast<uint32_t>(rnd_ >> (prng_mod_power - 26)) + 1.0;
   // Put the computed p-value through the CDF of a geometric.
-  // For faster performance (save ~1/20th exec time), replace
-  // min(0.0, FastLog2(q) - 26)  by  (Fastlog2(q) - 26.000705)
-  // The value 26.000705 is used rather than 26 to compensate
-  // for inaccuracies in FastLog2 which otherwise result in a
-  // negative answer.
-  return static_cast<size_t>(min(0.0, (FastLog2(q) - 26)) * (-log(2.0)
-                             * FLAGS_tcmalloc_sample_parameter) + 1);
+  double interval =
+      (log2(q) - 26) * (-log(2.0) * FLAGS_tcmalloc_sample_parameter);
+
+  // Very large values of interval overflow ssize_t. If we happen to
+  // hit such improbable condition, we simply cheat and clamp interval
+  // to largest supported value.
+  return static_cast<ssize_t>(
+    std::min<double>(interval, static_cast<double>(MAX_SSIZE)));
+}
+
+bool Sampler::RecordAllocationSlow(size_t k) {
+  if (!initialized_) {
+    initialized_ = true;
+    Init(reinterpret_cast<uintptr_t>(this));
+    if (static_cast<size_t>(bytes_until_sample_) >= k) {
+      bytes_until_sample_ -= k;
+      return true;
+    }
+  }
+  bytes_until_sample_ = PickNextSamplingPoint();
+  return FLAGS_tcmalloc_sample_parameter <= 0;
 }
 
 }  // namespace tcmalloc

diff --git a/src/sampler.h b/src/sampler.h
old mode 100755
new mode 100644
index eb316d7..82b1e67
--- a/src/sampler.h
+++ b/src/sampler.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -44,6 +44,7 @@
 #include <string.h>                     // for memcpy
 #include "base/basictypes.h"  // for ASSERT
 #include "internal_logging.h"  // for ASSERT
+#include "static_vars.h"
 
 namespace tcmalloc {
 
@@ -80,7 +81,7 @@
 // a geometric with mean tcmalloc_sample_parameter. (ie. the
 // probability that at least one byte in the range is marked). This
 // is accurately given by the CDF of the corresponding exponential
-// distribution : 1 - e^(X/tcmalloc_sample_parameter_)
+// distribution : 1 - e^(-X/tcmalloc_sample_parameter_)
 // Independence of the byte marking ensures independence of
 // the sampling of each allocation.
 //
@@ -100,52 +101,116 @@
 // only result in a single call to PickNextSamplingPoint.
 //-------------------------------------------------------------------
 
+class SamplerTest;
+
 class PERFTOOLS_DLL_DECL Sampler {
  public:
-  // Initialize this sampler.
-  // Passing a seed of 0 gives a non-deterministic
-  // seed value given by casting the object ("this")
-  void Init(uint32_t seed);
-  void Cleanup();
+  constexpr Sampler() {}
 
-  // Record allocation of "k" bytes.  Return true iff allocation
-  // should be sampled
-  bool SampleAllocation(size_t k);
+  // Initialize this sampler.
+  void Init(uint64_t seed);
+
+  // Record allocation of "k" bytes.  Return true if no further work
+  // is need, and false if allocation needed to be sampled.
+  bool RecordAllocation(size_t k);
+
+  // Same as above (but faster), except:
+  // a) REQUIRES(k < std::numeric_limits<ssize_t>::max())
+  // b) if this returns false, you must call RecordAllocation
+  //    to confirm if sampling truly needed.
+  //
+  // The point of this function is to only deal with common case of no
+  // sampling and let caller (which is in malloc fast-path) to
+  // "escalate" to fuller and slower logic only if necessary.
+  bool TryRecordAllocationFast(size_t k);
 
   // Generate a geometric with mean 512K (or FLAG_tcmalloc_sample_parameter)
-  size_t PickNextSamplingPoint();
-
-  // Initialize the statics for the Sampler class
-  static void InitStatics();
+  ssize_t PickNextSamplingPoint();
 
   // Returns the current sample period
-  int GetSamplePeriod();
+  static int GetSamplePeriod();
 
   // The following are public for the purposes of testing
   static uint64_t NextRandom(uint64_t rnd_);  // Returns the next prng value
-  static double FastLog2(const double & d);  // Computes Log2(x) quickly
-  static void PopulateFastLog2Table();  // Populate the lookup table
 
+  // C++03 requires that types stored in TLS be POD.  As a result, you must
+  // initialize these members to {0, 0, false} before using this class!
+  //
+  // TODO(ahh): C++11 support will let us make these private.
+
+  // Bytes until we sample next.
+  //
+  // More specifically when bytes_until_sample_ is X, we can allocate
+  // X bytes without triggering sampling; on the (X+1)th allocated
+  // byte, the containing allocation will be sampled.
+  //
+  // Always non-negative with only very brief exceptions (see
+  // DecrementFast{,Finish}, so casting to size_t is ok.
  private:
-  size_t        bytes_until_sample_;    // Bytes until we sample next
-  uint64_t      rnd_;                   // Cheap random number generator
+  friend class SamplerTest;
+  bool RecordAllocationSlow(size_t k);
 
-  // Statics for the fast log
-  // Note that this code may not depend on anything in //util
-  // hence the duplication of functionality here
-  static const int kFastlogNumBits = 10;
-  static const int kFastlogMask = (1 << kFastlogNumBits) - 1;
-  static double log_table_[1<<kFastlogNumBits];  // Constant
+  ssize_t bytes_until_sample_{};
+  uint64_t rnd_{};  // Cheap random number generator
+  bool initialized_{};
 };
 
-inline bool Sampler::SampleAllocation(size_t k) {
-  if (bytes_until_sample_ < k) {
-    bytes_until_sample_ = PickNextSamplingPoint();
-    return true;
+inline bool Sampler::RecordAllocation(size_t k) {
+  // The first time we enter this function we expect bytes_until_sample_
+  // to be zero, and we must call SampleAllocationSlow() to ensure
+  // proper initialization of static vars.
+  ASSERT(Static::IsInited() || bytes_until_sample_ == 0);
+
+  // Note that we have to deal with arbitrarily large values of k
+  // here. Thus we're upcasting bytes_until_sample_ to unsigned rather
+  // than the other way around. And this is why this code cannot be
+  // merged with DecrementFast code below.
+  if (static_cast<size_t>(bytes_until_sample_) < k) {
+    bool result = RecordAllocationSlow(k);
+    ASSERT(Static::IsInited());
+    return result;
   } else {
     bytes_until_sample_ -= k;
+    ASSERT(Static::IsInited());
+    return true;
+  }
+}
+
+inline bool Sampler::TryRecordAllocationFast(size_t k) {
+  // For efficiency reason, we're testing bytes_until_sample_ after
+  // decrementing it by k. This allows compiler to do sub <reg>, <mem>
+  // followed by conditional jump on sign. But it is correct only if k
+  // is actually smaller than largest ssize_t value. Otherwise
+  // converting k to signed value overflows.
+  //
+  // It would be great for generated code to be sub <reg>, <mem>
+  // followed by conditional jump on 'carry', which would work for
+  // arbitrary values of k, but there seem to be no way to express
+  // that in C++.
+  //
+  // Our API contract explicitly states that only small values of k
+  // are permitted. And thus it makes sense to assert on that.
+  ASSERT(static_cast<ssize_t>(k) >= 0);
+
+  bytes_until_sample_ -= static_cast<ssize_t>(k);
+  if (PREDICT_FALSE(bytes_until_sample_ < 0)) {
+    // Note, we undo sampling counter update, since we're not actually
+    // handling slow path in the "needs sampling" case (calling
+    // RecordAllocationSlow to reset counter). And we do that in order
+    // to avoid non-tail calls in malloc fast-path. See also comments
+    // on declaration inside Sampler class.
+    //
+    // volatile is used here to improve compiler's choice of
+    // instuctions. We know that this path is very rare and that there
+    // is no need to keep previous value of bytes_until_sample_ in
+    // register. This helps compiler generate slightly more efficient
+    // sub <reg>, <mem> instruction for subtraction above.
+    volatile ssize_t *ptr =
+        const_cast<volatile ssize_t *>(&bytes_until_sample_);
+    *ptr += k;
     return false;
   }
+  return true;
 }
 
 // Inline functions which are public for testing purposes
@@ -154,27 +219,14 @@
 // pRNG is: aX+b mod c with a = 0x5DEECE66D, b =  0xB, c = 1<<48
 // This is the lrand64 generator.
 inline uint64_t Sampler::NextRandom(uint64_t rnd) {
-  const uint64_t prng_mult = 0x5DEECE66DLL;
+  const uint64_t prng_mult = 0x5DEECE66DULL;
   const uint64_t prng_add = 0xB;
   const uint64_t prng_mod_power = 48;
   const uint64_t prng_mod_mask =
-                ~((~static_cast<uint64_t>(0)) << prng_mod_power);
+      ~((~static_cast<uint64_t>(0)) << prng_mod_power);
   return (prng_mult * rnd + prng_add) & prng_mod_mask;
 }
 
-// Adapted from //util/math/fastmath.[h|cc] by Noam Shazeer
-// This mimics the VeryFastLog2 code in those files
-inline double Sampler::FastLog2(const double & d) {
-  ASSERT(d>0);
-  COMPILE_ASSERT(sizeof(d) == sizeof(uint64_t), DoubleMustBe64Bits);
-  uint64_t x;
-  memcpy(&x, &d, sizeof(x));   // we depend on the compiler inlining this
-  const uint32_t x_high = x >> 32;
-  const uint32_t y = x_high >> (20 - kFastlogNumBits) & kFastlogMask;
-  const int32_t exponent = ((x_high >> 20) & 0x7FF) - 1023;
-  return exponent + log_table_[y];
-}
-
 }  // namespace tcmalloc
 
 #endif  // TCMALLOC_SAMPLER_H_

diff --git a/src/span.cc b/src/span.cc
index 4d08964..eac43f4 100644
--- a/src/span.cc
+++ b/src/span.cc

@@ -42,23 +42,11 @@
 
 namespace tcmalloc {
 
-#ifdef SPAN_HISTORY
-void Event(Span* span, char op, int v = 0) {
-  span->history[span->nexthistory] = op;
-  span->value[span->nexthistory] = v;
-  span->nexthistory++;
-  if (span->nexthistory == sizeof(span->history)) span->nexthistory = 0;
-}
-#endif
-
 Span* NewSpan(PageID p, Length len) {
   Span* result = Static::span_allocator()->New();
   memset(result, 0, sizeof(*result));
   result->start = p;
   result->length = len;
-#ifdef SPAN_HISTORY
-  result->nexthistory = 0;
-#endif
   return result;
 }
 

diff --git a/src/span.h b/src/span.h
index 83feda1..7068893 100644
--- a/src/span.h
+++ b/src/span.h

@@ -37,39 +37,98 @@
 #define TCMALLOC_SPAN_H_
 
 #include <config.h>
+#include <set>
 #include "common.h"
+#include "base/logging.h"
+#include "page_heap_allocator.h"
 
 namespace tcmalloc {
 
+struct SpanBestFitLess;
+struct Span;
+
+// Store a pointer to a span along with a cached copy of its length.
+// These are used as set elements to improve the performance of
+// comparisons during tree traversal: the lengths are inline with the
+// tree nodes and thus avoid expensive cache misses to dereference
+// the actual Span objects in most cases.
+struct SpanPtrWithLength {
+  explicit SpanPtrWithLength(Span* s);
+
+  Span* span;
+  Length length;
+};
+typedef std::set<SpanPtrWithLength, SpanBestFitLess, STLPageHeapAllocator<SpanPtrWithLength, void> > SpanSet;
+
+// Comparator for best-fit search, with address order as a tie-breaker.
+struct SpanBestFitLess {
+  bool operator()(SpanPtrWithLength a, SpanPtrWithLength b) const;
+};
+
 // Information kept for a span (a contiguous run of pages).
 struct Span {
   PageID        start;          // Starting page number
   Length        length;         // Number of pages in span
   Span*         next;           // Used when in link list
   Span*         prev;           // Used when in link list
-  void*         objects;        // Linked list of free objects
+  union {
+    void* objects;              // Linked list of free objects
+
+    // Span may contain iterator pointing back at SpanSet entry of
+    // this span into set of large spans. It is used to quickly delete
+    // spans from those sets. span_iter_space is space for such
+    // iterator which lifetime is controlled explicitly.
+    char span_iter_space[sizeof(SpanSet::iterator)];
+  };
   unsigned int  refcount : 16;  // Number of non-free objects
   unsigned int  sizeclass : 8;  // Size-class for small objects (or 0)
   unsigned int  location : 2;   // Is the span on a freelist, and if so, which?
   unsigned int  sample : 1;     // Sampled object?
+  bool          has_span_iter : 1; // Iff span_iter_space has valid
+                                   // iterator. Only for debug builds.
 
-#undef SPAN_HISTORY
-#ifdef SPAN_HISTORY
-  // For debugging, we can keep a log events per span
-  int nexthistory;
-  char history[64];
-  int value[64];
-#endif
+  // Sets iterator stored in span_iter_space.
+  // Requires has_span_iter == 0.
+  void SetSpanSetIterator(const SpanSet::iterator& iter);
+  // Copies out and destroys iterator stored in span_iter_space.
+  SpanSet::iterator ExtractSpanSetIterator();
 
   // What freelist the span is on: IN_USE if on none, or normal or returned
   enum { IN_USE, ON_NORMAL_FREELIST, ON_RETURNED_FREELIST };
 };
 
-#ifdef SPAN_HISTORY
-void Event(Span* span, char op, int v = 0);
-#else
-#define Event(s,o,v) ((void) 0)
-#endif
+inline SpanPtrWithLength::SpanPtrWithLength(Span* s)
+    : span(s),
+      length(s->length) {
+}
+
+inline bool SpanBestFitLess::operator()(SpanPtrWithLength a, SpanPtrWithLength b) const {
+  if (a.length < b.length)
+    return true;
+  if (a.length > b.length)
+    return false;
+  return a.span->start < b.span->start;
+}
+
+inline void Span::SetSpanSetIterator(const SpanSet::iterator& iter) {
+  ASSERT(!has_span_iter);
+  has_span_iter = 1;
+
+  new (span_iter_space) SpanSet::iterator(iter);
+}
+
+inline SpanSet::iterator Span::ExtractSpanSetIterator() {
+  typedef SpanSet::iterator iterator_type;
+
+  ASSERT(has_span_iter);
+  has_span_iter = 0;
+
+  iterator_type* this_iter =
+    reinterpret_cast<iterator_type*>(span_iter_space);
+  iterator_type retval = *this_iter;
+  this_iter->~iterator_type();
+  return retval;
+}
 
 // Allocator/deallocator for spans
 Span* NewSpan(PageID p, Length len);

diff --git a/src/stack_trace_table.cc b/src/stack_trace_table.cc
index 1862124..5888dc0 100644
--- a/src/stack_trace_table.cc
+++ b/src/stack_trace_table.cc

@@ -42,27 +42,15 @@
 
 namespace tcmalloc {
 
-bool StackTraceTable::Bucket::KeyEqual(uintptr_t h,
-                                       const StackTrace& t) const {
-  const bool eq = (this->hash == h && this->trace.depth == t.depth);
-  for (int i = 0; eq && i < t.depth; ++i) {
-    if (this->trace.stack[i] != t.stack[i]) {
-      return false;
-    }
-  }
-  return eq;
-}
-
 StackTraceTable::StackTraceTable()
     : error_(false),
       depth_total_(0),
       bucket_total_(0),
-      table_(new Bucket*[kHashTableSize]()) {
-  memset(table_, 0, kHashTableSize * sizeof(Bucket*));
+      head_(nullptr) {
 }
 
 StackTraceTable::~StackTraceTable() {
-  delete[] table_;
+  ASSERT(head_ == nullptr);
 }
 
 void StackTraceTable::AddTrace(const StackTrace& t) {
@@ -70,89 +58,64 @@
     return;
   }
 
-  // Hash function borrowed from base/heap-profile-table.cc
-  uintptr_t h = 0;
-  for (int i = 0; i < t.depth; ++i) {
-    h += reinterpret_cast<uintptr_t>(t.stack[i]);
-    h += h << 10;
-    h ^= h >> 6;
-  }
-  h += h << 3;
-  h ^= h >> 11;
-
-  const int idx = h % kHashTableSize;
-
-  Bucket* b = table_[idx];
-  while (b != NULL && !b->KeyEqual(h, t)) {
-    b = b->next;
-  }
-  if (b != NULL) {
-    b->count++;
-    b->trace.size += t.size;  // keep cumulative size
+  depth_total_ += t.depth;
+  bucket_total_++;
+  Entry* entry = allocator_.allocate(1);
+  if (entry == nullptr) {
+    Log(kLog, __FILE__, __LINE__,
+        "tcmalloc: could not allocate bucket", sizeof(*entry));
+    error_ = true;
   } else {
-    depth_total_ += t.depth;
-    bucket_total_++;
-    b = Static::bucket_allocator()->New();
-    if (b == NULL) {
-      Log(kLog, __FILE__, __LINE__,
-          "tcmalloc: could not allocate bucket", sizeof(*b));
-      error_ = true;
-    } else {
-      b->hash = h;
-      b->trace = t;
-      b->count = 1;
-      b->next = table_[idx];
-      table_[idx] = b;
-    }
+    entry->trace = t;
+    entry->next = head_;
+    head_ = entry;
   }
 }
 
 void** StackTraceTable::ReadStackTracesAndClear() {
-  if (error_) {
-    return NULL;
-  }
+  void** out = nullptr;
 
-  // Allocate output array
   const int out_len = bucket_total_ * 3 + depth_total_ + 1;
-  void** out = new void*[out_len];
-  if (out == NULL) {
-    Log(kLog, __FILE__, __LINE__,
-        "tcmalloc: allocation failed for stack traces",
-        out_len * sizeof(*out));
-    return NULL;
-  }
-
-  // Fill output array
-  int idx = 0;
-  for (int i = 0; i < kHashTableSize; ++i) {
-    Bucket* b = table_[i];
-    while (b != NULL) {
-      out[idx++] = reinterpret_cast<void*>(static_cast<uintptr_t>(b->count));
-      out[idx++] = reinterpret_cast<void*>(b->trace.size);  // cumulative size
-      out[idx++] = reinterpret_cast<void*>(b->trace.depth);
-      for (int d = 0; d < b->trace.depth; ++d) {
-        out[idx++] = b->trace.stack[d];
-      }
-      b = b->next;
+  if (!error_) {
+    // Allocate output array
+    out = new (std::nothrow_t{}) void*[out_len];
+    if (out == nullptr) {
+      Log(kLog, __FILE__, __LINE__,
+          "tcmalloc: allocation failed for stack traces",
+          out_len * sizeof(*out));
     }
   }
-  out[idx++] = NULL;
-  ASSERT(idx == out_len);
+
+  if (out) {
+    // Fill output array
+    int idx = 0;
+    Entry* entry = head_;
+    while (entry != NULL) {
+      out[idx++] = reinterpret_cast<void*>(uintptr_t{1});   // count
+      out[idx++] = reinterpret_cast<void*>(entry->trace.size);  // cumulative size
+      out[idx++] = reinterpret_cast<void*>(entry->trace.depth);
+      for (int d = 0; d < entry->trace.depth; ++d) {
+        out[idx++] = entry->trace.stack[d];
+      }
+      entry = entry->next;
+    }
+    out[idx++] = NULL;
+    ASSERT(idx == out_len);
+  }
 
   // Clear state
   error_ = false;
   depth_total_ = 0;
   bucket_total_ = 0;
+
   SpinLockHolder h(Static::pageheap_lock());
-  for (int i = 0; i < kHashTableSize; ++i) {
-    Bucket* b = table_[i];
-    while (b != NULL) {
-      Bucket* next = b->next;
-      Static::bucket_allocator()->Delete(b);
-      b = next;
-    }
-    table_[i] = NULL;
+  Entry* entry = head_;
+  while (entry != nullptr) {
+    Entry* next = entry->next;
+    allocator_.deallocate(entry, 1);
+    entry = next;
   }
+  head_ = nullptr;
 
   return out;
 }

diff --git a/src/stack_trace_table.h b/src/stack_trace_table.h
index e289771..46b86ba 100644
--- a/src/stack_trace_table.h
+++ b/src/stack_trace_table.h

@@ -41,6 +41,7 @@
 #include <stdint.h>                     // for uintptr_t
 #endif
 #include "common.h"
+#include "page_heap_allocator.h"
 
 namespace tcmalloc {
 
@@ -62,29 +63,21 @@
   void** ReadStackTracesAndClear();
 
   // Exposed for PageHeapAllocator
-  struct Bucket {
-    // Key
-    uintptr_t hash;
-    StackTrace trace;
-
-    // Payload
-    int count;
-    Bucket* next;
-
-    bool KeyEqual(uintptr_t h, const StackTrace& t) const;
-  };
-
   // For testing
   int depth_total() const { return depth_total_; }
   int bucket_total() const { return bucket_total_; }
 
  private:
-  static const int kHashTableSize = 1 << 14; // => table_ is 128k
+  struct Entry {
+    Entry* next;
+    StackTrace trace;
+  };
 
   bool error_;
   int depth_total_;
   int bucket_total_;
-  Bucket** table_;
+  Entry* head_;
+  STLPageHeapAllocator<Entry, void> allocator_;
 };
 
 }  // namespace tcmalloc

diff --git a/src/stacktrace.cc b/src/stacktrace.cc
index 999863c..2a2c648 100644
--- a/src/stacktrace.cc
+++ b/src/stacktrace.cc

@@ -60,6 +60,7 @@
 #include "gperftools/stacktrace.h"
 #include "base/commandlineflags.h"
 #include "base/googleinit.h"
+#include "getenv_safe.h"
 
 
 // we're using plain struct and not class to avoid any possible issues
@@ -90,6 +91,15 @@
 #define HAVE_GST_generic
 #endif
 
+#ifdef HAVE_UNWIND_BACKTRACE
+#define STACKTRACE_INL_HEADER "stacktrace_libgcc-inl.h"
+#define GST_SUFFIX libgcc
+#include "stacktrace_impl_setup-inl.h"
+#undef GST_SUFFIX
+#undef STACKTRACE_INL_HEADER
+#define HAVE_GST_libgcc
+#endif
+
 // libunwind uses __thread so we check for both libunwind.h and
 // __thread support
 #if defined(HAVE_LIBUNWIND_H) && defined(HAVE_TLS)
@@ -110,6 +120,27 @@
 #define HAVE_GST_x86
 #endif // i386 || x86_64
 
+// Sadly, different OSes have very different mcontexts even for
+// identical hardware arch. So keep it linux-only for now.
+#if defined(__GNUC__) && __linux__ && (defined(__x86_64__) || defined(__aarch64__) || defined(__riscv))
+#define STACKTRACE_INL_HEADER "stacktrace_generic_fp-inl.h"
+#define GST_SUFFIX generic_fp
+#include "stacktrace_impl_setup-inl.h"
+#undef GST_SUFFIX
+#undef STACKTRACE_INL_HEADER
+#define HAVE_GST_generic_fp
+
+#undef TCMALLOC_UNSAFE_GENERIC_FP_STACKTRACE
+#define TCMALLOC_UNSAFE_GENERIC_FP_STACKTRACE 1
+
+#define STACKTRACE_INL_HEADER "stacktrace_generic_fp-inl.h"
+#define GST_SUFFIX generic_fp_unsafe
+#include "stacktrace_impl_setup-inl.h"
+#undef GST_SUFFIX
+#undef STACKTRACE_INL_HEADER
+#define HAVE_GST_generic_fp_unsafe
+#endif
+
 #if defined(__ppc__) || defined(__PPC__)
 #if defined(__linux__)
 #define STACKTRACE_INL_HEADER "stacktrace_powerpc-linux-inl.h"
@@ -153,9 +184,18 @@
 #endif
 
 static GetStackImplementation *all_impls[] = {
+#ifdef HAVE_GST_libgcc
+  &impl__libgcc,
+#endif
 #ifdef HAVE_GST_generic
   &impl__generic,
 #endif
+#ifdef HAVE_GST_generic_fp
+  &impl__generic_fp,
+#endif
+#ifdef HAVE_GST_generic_fp
+  &impl__generic_fp_unsafe,
+#endif
 #ifdef HAVE_GST_libunwind
   &impl__libunwind,
 #endif
@@ -179,26 +219,34 @@
 
 // ppc and i386 implementations prefer arch-specific asm implementations.
 // arm's asm implementation is broken
-#if defined(__i386__) || defined(__x86_64__) || defined(__ppc__) || defined(__PPC__)
+#if defined(__i386__) || defined(__ppc__) || defined(__PPC__)
 #if !defined(NO_FRAME_POINTER)
 #define TCMALLOC_DONT_PREFER_LIBUNWIND
 #endif
 #endif
 
+static bool get_stack_impl_inited;
+
 #if defined(HAVE_GST_instrument)
 static GetStackImplementation *get_stack_impl = &impl__instrument;
 #elif defined(HAVE_GST_win32)
 static GetStackImplementation *get_stack_impl = &impl__win32;
+#elif defined(HAVE_GST_generic_fp) && !defined(NO_FRAME_POINTER) \
+   && !defined(__riscv) \
+   && (!defined(HAVE_GST_libunwind) || defined(TCMALLOC_DONT_PREFER_LIBUNWIND))
+static GetStackImplementation *get_stack_impl = &impl__generic_fp;
 #elif defined(HAVE_GST_x86) && defined(TCMALLOC_DONT_PREFER_LIBUNWIND)
 static GetStackImplementation *get_stack_impl = &impl__x86;
 #elif defined(HAVE_GST_ppc) && defined(TCMALLOC_DONT_PREFER_LIBUNWIND)
 static GetStackImplementation *get_stack_impl = &impl__ppc;
 #elif defined(HAVE_GST_libunwind)
 static GetStackImplementation *get_stack_impl = &impl__libunwind;
-#elif defined(HAVE_GST_arm)
-static GetStackImplementation *get_stack_impl = &impl__arm;
+#elif defined(HAVE_GST_libgcc)
+static GetStackImplementation *get_stack_impl = &impl__libgcc;
 #elif defined(HAVE_GST_generic)
 static GetStackImplementation *get_stack_impl = &impl__generic;
+#elif defined(HAVE_GST_arm)
+static GetStackImplementation *get_stack_impl = &impl__arm;
 #elif 0
 // This is for the benefit of code analysis tools that may have
 // trouble with the computed #include above.
@@ -217,13 +265,52 @@
   return rv;
 }
 
+static void init_default_stack_impl_inner(void);
+
+namespace tcmalloc {
+  bool EnterStacktraceScope(void);
+  void LeaveStacktraceScope(void);
+}
+
+namespace {
+  using tcmalloc::EnterStacktraceScope;
+  using tcmalloc::LeaveStacktraceScope;
+
+  class StacktraceScope {
+    bool stacktrace_allowed;
+  public:
+    StacktraceScope() {
+      stacktrace_allowed = true;
+      stacktrace_allowed = EnterStacktraceScope();
+    }
+    bool IsStacktraceAllowed() {
+      return stacktrace_allowed;
+    }
+    ~StacktraceScope() {
+      if (stacktrace_allowed) {
+        LeaveStacktraceScope();
+      }
+    }
+  };
+}
+
 PERFTOOLS_DLL_DECL int GetStackFrames(void** result, int* sizes, int max_depth,
                                       int skip_count) {
+  StacktraceScope scope;
+  if (!scope.IsStacktraceAllowed()) {
+    return 0;
+  }
+  init_default_stack_impl_inner();
   return frame_forcer(get_stack_impl->GetStackFramesPtr(result, sizes, max_depth, skip_count));
 }
 
 PERFTOOLS_DLL_DECL int GetStackFramesWithContext(void** result, int* sizes, int max_depth,
                                                  int skip_count, const void *uc) {
+  StacktraceScope scope;
+  if (!scope.IsStacktraceAllowed()) {
+    return 0;
+  }
+  init_default_stack_impl_inner();
   return frame_forcer(get_stack_impl->GetStackFramesWithContextPtr(
                         result, sizes, max_depth,
                         skip_count, uc));
@@ -231,18 +318,56 @@
 
 PERFTOOLS_DLL_DECL int GetStackTrace(void** result, int max_depth,
                                      int skip_count) {
+  StacktraceScope scope;
+  if (!scope.IsStacktraceAllowed()) {
+    return 0;
+  }
+  init_default_stack_impl_inner();
   return frame_forcer(get_stack_impl->GetStackTracePtr(result, max_depth, skip_count));
 }
 
 PERFTOOLS_DLL_DECL int GetStackTraceWithContext(void** result, int max_depth,
                                                 int skip_count, const void *uc) {
+  StacktraceScope scope;
+  if (!scope.IsStacktraceAllowed()) {
+    return 0;
+  }
+  init_default_stack_impl_inner();
   return frame_forcer(get_stack_impl->GetStackTraceWithContextPtr(
                         result, max_depth, skip_count, uc));
 }
 
+// As of this writing, aarch64 has completely borked libunwind, so
+// lets test this case and fall back to frame pointers (which is
+// nearly but not quite perfect).
+ATTRIBUTE_NOINLINE
+static void maybe_convert_libunwind_to_generic_fp() {
+#if defined(HAVE_GST_libunwind) && defined(HAVE_GST_generic_fp)
+  if (get_stack_impl != &impl__libunwind) {
+    return;
+  }
+
+  // Okay we're on libunwind and we have generic_fp, check if
+  // libunwind returns bogus results.
+  void* stack[4];
+  int rv = get_stack_impl->GetStackTracePtr(stack, 4, 0);
+  if (rv > 2) {
+    // Seems fine
+    return;
+  }
+  // bogus. So replacing with generic_fp
+  get_stack_impl = &impl__generic_fp;
+#endif
+}
+
 static void init_default_stack_impl_inner(void) {
-  char *val = getenv("TCMALLOC_STACKTRACE_METHOD");
+  if (get_stack_impl_inited) {
+    return;
+  }
+  get_stack_impl_inited = true;
+  const char *val = TCMallocGetenvSafe("TCMALLOC_STACKTRACE_METHOD");
   if (!val || !*val) {
+    maybe_convert_libunwind_to_generic_fp();
     return;
   }
   for (GetStackImplementation **p = all_impls; *p; p++) {
@@ -255,6 +380,7 @@
   fprintf(stderr, "Unknown or unsupported stacktrace method requested: %s. Ignoring it\n", val);
 }
 
+ATTRIBUTE_NOINLINE
 static void init_default_stack_impl(void) {
   init_default_stack_impl_inner();
   if (EnvToBool("TCMALLOC_STACKTRACE_METHOD_VERBOSE", false)) {

diff --git a/src/stacktrace_generic_fp-inl.h b/src/stacktrace_generic_fp-inl.h
new file mode 100644
index 0000000..d458109
--- /dev/null
+++ b/src/stacktrace_generic_fp-inl.h

@@ -0,0 +1,222 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// Copyright (c) 2021, gperftools Contributors
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// This file contains "generic" stack frame pointer backtracing
+// code. Attempt is made to minimize amount of arch- or os-specific
+// code and keep everything as generic as possible. Currently
+// supported are x86-64, aarch64 and riscv.
+#ifndef BASE_STACKTRACE_GENERIC_FP_INL_H_
+#define BASE_STACKTRACE_GENERIC_FP_INL_H_
+
+#if defined(HAVE_SYS_UCONTEXT_H)
+#include <sys/ucontext.h>
+#elif defined(HAVE_UCONTEXT_H)
+#include <ucontext.h>
+#endif
+
+// This is only used on OS-es with mmap support.
+#include <sys/mman.h>
+
+// Set this to true to disable "probing" of addresses that are read to
+// make backtracing less-safe, but faster.
+#ifndef TCMALLOC_UNSAFE_GENERIC_FP_STACKTRACE
+#define TCMALLOC_UNSAFE_GENERIC_FP_STACKTRACE 0
+#endif
+
+namespace {
+namespace stacktrace_generic_fp {
+
+struct frame {
+  uintptr_t parent;
+  void* pc;
+};
+
+frame* adjust_fp(frame* f) {
+#ifdef __riscv
+  return f - 1;
+#else
+  return f;
+#endif
+}
+
+static bool CheckPageIsReadable(void* ptr, void* checked_ptr) {
+  static uintptr_t pagesize;
+  if (pagesize == 0) {
+    pagesize = getpagesize();
+  }
+
+  uintptr_t addr = reinterpret_cast<uintptr_t>(ptr);
+  uintptr_t parent_frame = reinterpret_cast<uintptr_t>(checked_ptr);
+
+  parent_frame &= ~(pagesize - 1);
+  addr &= ~(pagesize - 1);
+
+  if (parent_frame != 0 && addr == parent_frame) {
+    return true;
+  }
+
+  return (msync(reinterpret_cast<void*>(addr), pagesize, MS_ASYNC) == 0);
+}
+
+ATTRIBUTE_NOINLINE // forces architectures with link register to save it
+int capture(void **result, int max_depth, int skip_count,
+            void* initial_frame, void* const * initial_pc) {
+  int i = 0;
+
+  if (initial_pc != nullptr) {
+    // This is 'with ucontext' case. We take first pc from ucontext
+    // and then skip_count is ignored as we assume that caller only
+    // needed stack trace up to signal handler frame.
+    skip_count = 0;
+    if (max_depth == 0) {
+      return 0;
+    }
+    result[0] = *initial_pc;
+    i++;
+  }
+
+  constexpr uintptr_t kTooSmallAddr = 16 << 10;
+  constexpr uintptr_t kFrameSizeThreshold = 128 << 10;
+
+  // This is simplistic yet. Here we're targeting x86-64, aarch64 and
+  // riscv. All have 16 bytes stack alignment (even 32 bit
+  // riscv). This can be made more elaborate as we consider more
+  // architectures. Note, it allows us to only readability of check
+  // f->parent address.
+  constexpr uintptr_t kAlignment = 16;
+
+  uintptr_t initial_frame_addr = reinterpret_cast<uintptr_t>(initial_frame);
+  if ((initial_frame_addr & (kAlignment - 1)) != 0) {
+    return i;
+  }
+  if (initial_frame_addr < kTooSmallAddr) {
+    return i;
+  }
+
+  frame* prev_f = nullptr;
+  frame *f = adjust_fp(reinterpret_cast<frame*>(initial_frame));
+
+  while (i < max_depth) {
+    if (!TCMALLOC_UNSAFE_GENERIC_FP_STACKTRACE
+        && !CheckPageIsReadable(&f->parent, prev_f)) {
+      break;
+    }
+
+    void* pc = f->pc;
+    if (pc == nullptr) {
+      break;
+    }
+
+    if (i >= skip_count) {
+      result[i - skip_count] = pc;
+    }
+
+    i++;
+
+    uintptr_t parent_frame_addr = f->parent;
+    uintptr_t child_frame_addr = reinterpret_cast<uintptr_t>(f);
+
+    if (parent_frame_addr < kTooSmallAddr) {
+      break;
+    }
+    // stack grows towards smaller addresses, so if we didn't see
+    // frame address increased (going from child to parent), it is bad
+    // frame. We also test if frame is too big since that is another
+    // sign of bad stack frame.
+    if (parent_frame_addr - child_frame_addr > kFrameSizeThreshold) {
+      break;
+    }
+
+    if ((parent_frame_addr & (kAlignment - 1)) != 0) {
+      // not aligned, so we keep it safe and assume frame is bogus
+      break;
+    }
+
+    prev_f = f;
+
+    f = adjust_fp(reinterpret_cast<frame*>(parent_frame_addr));
+  }
+  return i;
+}
+
+}  // namespace stacktrace_generic_fp
+}  // namespace
+
+#endif  // BASE_STACKTRACE_GENERIC_FP_INL_H_
+
+// Note: this part of the file is included several times.
+// Do not put globals below.
+
+// The following 4 functions are generated from the code below:
+//   GetStack{Trace,Frames}()
+//   GetStack{Trace,Frames}WithContext()
+//
+// These functions take the following args:
+//   void** result: the stack-trace, as an array
+//   int* sizes: the size of each stack frame, as an array
+//               (GetStackFrames* only)
+//   int max_depth: the size of the result (and sizes) array(s)
+//   int skip_count: how many stack pointers to skip before storing in result
+//   void* ucp: a ucontext_t* (GetStack{Trace,Frames}WithContext only)
+
+static int GET_STACK_TRACE_OR_FRAMES {
+#if IS_STACK_FRAMES
+  memset(sizes, 0, sizeof(*sizes) * max_depth);
+#endif
+
+  // one for this function
+  skip_count += 1;
+
+  void* const * initial_pc = nullptr;
+  void* initial_frame = __builtin_frame_address(0);
+
+#if IS_WITH_CONTEXT
+  if (ucp) {
+    auto uc = static_cast<const ucontext_t*>(ucp);
+#ifdef __riscv
+    initial_pc = reinterpret_cast<void* const *>(&uc->uc_mcontext.__gregs[REG_PC]);
+    initial_frame = reinterpret_cast<void*>(uc->uc_mcontext.__gregs[REG_S0]);
+#elif __aarch64__
+    initial_pc = reinterpret_cast<void* const *>(&uc->uc_mcontext.pc);
+    initial_frame = reinterpret_cast<void*>(uc->uc_mcontext.regs[29]);
+#else
+    initial_pc = reinterpret_cast<void* const *>(&uc->uc_mcontext.gregs[REG_RIP]);
+    initial_frame = reinterpret_cast<void*>(uc->uc_mcontext.gregs[REG_RBP]);
+#endif
+  }
+#endif  // IS_WITH_CONTEXT
+
+  int n = stacktrace_generic_fp::capture(result, max_depth, skip_count,
+                                         initial_frame, initial_pc);
+
+  // make sure we don't tail-call capture
+  (void)*(const_cast<void * volatile *>(result));
+  return n;
+}

diff --git a/src/stacktrace_instrument-inl.h b/src/stacktrace_instrument-inl.h
old mode 100755
new mode 100644


diff --git a/src/stacktrace_libgcc-inl.h b/src/stacktrace_libgcc-inl.h
new file mode 100644
index 0000000..ce9cf51
--- /dev/null
+++ b/src/stacktrace_libgcc-inl.h

@@ -0,0 +1,111 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// Copyright (c) 2016, gperftools Contributors
+// All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are
+// met:
+//
+//     * Redistributions of source code must retain the above copyright
+// notice, this list of conditions and the following disclaimer.
+//     * Redistributions in binary form must reproduce the above
+// copyright notice, this list of conditions and the following disclaimer
+// in the documentation and/or other materials provided with the
+// distribution.
+//     * Neither the name of Google Inc. nor the names of its
+// contributors may be used to endorse or promote products derived from
+// this software without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+// This file implements backtrace capturing via libgcc's
+// _Unwind_Backtrace. This generally works almost always. It will fail
+// sometimes when we're trying to capture backtrace from signal
+// handler (i.e. in cpu profiler) while some C++ code is throwing
+// exception.
+
+#ifndef BASE_STACKTRACE_LIBGCC_INL_H_
+#define BASE_STACKTRACE_LIBGCC_INL_H_
+// Note: this file is included into stacktrace.cc more than once.
+// Anything that should only be defined once should be here:
+
+extern "C" {
+#include <assert.h>
+#include <string.h>   // for memset()
+}
+
+#include <unwind.h>
+
+#include "gperftools/stacktrace.h"
+
+struct libgcc_backtrace_data {
+  void **array;
+  int skip;
+  int pos;
+  int limit;
+};
+
+static _Unwind_Reason_Code libgcc_backtrace_helper(struct _Unwind_Context *ctx,
+                                                   void *_data) {
+  libgcc_backtrace_data *data =
+    reinterpret_cast<libgcc_backtrace_data *>(_data);
+
+  if (data->skip > 0) {
+    data->skip--;
+    return _URC_NO_REASON;
+  }
+
+  if (data->pos < data->limit) {
+    void *ip = reinterpret_cast<void *>(_Unwind_GetIP(ctx));;
+    data->array[data->pos++] = ip;
+  }
+
+  return _URC_NO_REASON;
+}
+
+#endif  // BASE_STACKTRACE_LIBGCC_INL_H_
+
+// Note: this part of the file is included several times.
+// Do not put globals below.
+
+// The following 4 functions are generated from the code below:
+//   GetStack{Trace,Frames}()
+//   GetStack{Trace,Frames}WithContext()
+//
+// These functions take the following args:
+//   void** result: the stack-trace, as an array
+//   int* sizes: the size of each stack frame, as an array
+//               (GetStackFrames* only)
+//   int max_depth: the size of the result (and sizes) array(s)
+//   int skip_count: how many stack pointers to skip before storing in result
+//   void* ucp: a ucontext_t* (GetStack{Trace,Frames}WithContext only)
+static int GET_STACK_TRACE_OR_FRAMES {
+  libgcc_backtrace_data data;
+  data.array = result;
+  // we're also skipping current and parent's frame
+  data.skip = skip_count + 2;
+  data.pos = 0;
+  data.limit = max_depth;
+
+  _Unwind_Backtrace(libgcc_backtrace_helper, &data);
+
+  if (data.pos > 1 && data.array[data.pos - 1] == NULL)
+    --data.pos;
+
+#if IS_STACK_FRAMES
+  // No implementation for finding out the stack frame sizes.
+  memset(sizes, 0, sizeof(*sizes) * data.pos);
+#endif
+
+  return data.pos;
+}

diff --git a/src/stacktrace_libunwind-inl.h b/src/stacktrace_libunwind-inl.h
index 8a4a731..6f361ec 100644
--- a/src/stacktrace_libunwind-inl.h
+++ b/src/stacktrace_libunwind-inl.h

@@ -47,6 +47,8 @@
 #include <libunwind.h>
 }
 #include "gperftools/stacktrace.h"
+
+#include "base/basictypes.h"
 #include "base/logging.h"
 
 // Sometimes, we can try to get a stack trace from within a stack
@@ -56,7 +58,7 @@
 // recursive request, we'd end up with infinite recursion or deadlock.
 // Luckily, it's safe to ignore those subsequent traces.  In such
 // cases, we return 0 to indicate the situation.
-static __thread int recursive;
+static __thread int recursive ATTR_INITIAL_EXEC;
 
 #if defined(TCMALLOC_ENABLE_UNWIND_FROM_UCONTEXT) && (defined(__i386__) || defined(__x86_64__)) && defined(__GNU_LIBRARY__)
 #define BASE_STACKTRACE_UNW_CONTEXT_IS_UCONTEXT 1

diff --git a/src/stacktrace_powerpc-darwin-inl.h b/src/stacktrace_powerpc-darwin-inl.h
index c4c2edb..3f6c367 100644
--- a/src/stacktrace_powerpc-darwin-inl.h
+++ b/src/stacktrace_powerpc-darwin-inl.h

@@ -1,3 +1,4 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
 //
@@ -97,7 +98,11 @@
   // different asm syntax.  I don't know quite the best way to discriminate
   // systems using the old as from the new one; I've gone with __APPLE__.
   // TODO(csilvers): use autoconf instead, to look for 'as --version' == 1 or 2
+#ifdef __FreeBSD__
+  __asm__ volatile ("mr %0,1" : "=r" (sp));
+#else
   __asm__ volatile ("mr %0,r1" : "=r" (sp));
+#endif
 
   // On PowerPC, the "Link Register" or "Link Record" (LR), is a stack
   // entry that holds the return address of the subroutine call (what

diff --git a/src/stacktrace_powerpc-inl.h b/src/stacktrace_powerpc-inl.h
index 811d6cc..124bc4e 100644
--- a/src/stacktrace_powerpc-inl.h
+++ b/src/stacktrace_powerpc-inl.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/stacktrace_powerpc-linux-inl.h b/src/stacktrace_powerpc-linux-inl.h
index 5d16fa1..a301a46 100644
--- a/src/stacktrace_powerpc-linux-inl.h
+++ b/src/stacktrace_powerpc-linux-inl.h

@@ -44,6 +44,7 @@
 
 #include <stdint.h>   // for uintptr_t
 #include <stdlib.h>   // for NULL
+#include <signal.h>  // for siginfo_t
 #include <gperftools/stacktrace.h>
 #include <base/vdso_support.h>
 
@@ -52,7 +53,6 @@
 #elif defined(HAVE_UCONTEXT_H)
 #include <ucontext.h>  // for ucontext_t
 #endif
-typedef ucontext ucontext_t;
 
 // PowerPC64 Little Endian follows BE wrt. backchain, condition register,
 // and LR save area, so no need to adjust the reading struct.
@@ -201,7 +201,7 @@
         struct rt_signal_frame_32 {
           char dummy[64 + 16];
           siginfo_t info;
-          struct ucontext uc;
+          ucontext_t uc;
           // We don't care about the rest, since IP value is at 'uc' field.A
         } *sigframe = reinterpret_cast<rt_signal_frame_32*>(current);
         result[n] = (void*) sigframe->uc.uc_mcontext.uc_regs->gregs[PT_NIP];

diff --git a/src/static_vars.cc b/src/static_vars.cc
index 09d2b59..fef6ed1 100644
--- a/src/static_vars.cc
+++ b/src/static_vars.cc

@@ -43,6 +43,7 @@
 #include "sampler.h"           // for Sampler
 #include "getenv_safe.h"       // TCMallocGetenvSafe
 #include "base/googleinit.h"
+#include "maybe_threads.h"
 
 namespace tcmalloc {
 
@@ -51,31 +52,30 @@
 // sure the central_cache locks remain in a consisten state in the forked
 // version of the thread.
 
-void CentralCacheLockAll()
+void CentralCacheLockAll() NO_THREAD_SAFETY_ANALYSIS
 {
   Static::pageheap_lock()->Lock();
-  for (int i = 0; i < kNumClasses; ++i)
+  for (int i = 0; i < Static::num_size_classes(); ++i)
     Static::central_cache()[i].Lock();
 }
 
-void CentralCacheUnlockAll()
+void CentralCacheUnlockAll() NO_THREAD_SAFETY_ANALYSIS
 {
-  for (int i = 0; i < kNumClasses; ++i)
+  for (int i = 0; i < Static::num_size_classes(); ++i)
     Static::central_cache()[i].Unlock();
   Static::pageheap_lock()->Unlock();
 }
 #endif
 
+bool Static::inited_;
 SpinLock Static::pageheap_lock_(SpinLock::LINKER_INITIALIZED);
 SizeMap Static::sizemap_;
-CentralFreeListPadded Static::central_cache_[kNumClasses];
+CentralFreeListPadded Static::central_cache_[kClassSizesMax];
 PageHeapAllocator<Span> Static::span_allocator_;
 PageHeapAllocator<StackTrace> Static::stacktrace_allocator_;
 Span Static::sampled_objects_;
-PageHeapAllocator<StackTraceTable::Bucket> Static::bucket_allocator_;
 StackTrace* Static::growth_stacks_ = NULL;
-PageHeap* Static::pageheap_ = NULL;
-
+Static::PageHeapStorage Static::pageheap_;
 
 void Static::InitStaticVars() {
   sizemap_.Init();
@@ -83,43 +83,70 @@
   span_allocator_.New(); // Reduce cache conflicts
   span_allocator_.New(); // Reduce cache conflicts
   stacktrace_allocator_.Init();
-  bucket_allocator_.Init();
   // Do a bit of sanitizing: make sure central_cache is aligned properly
   CHECK_CONDITION((sizeof(central_cache_[0]) % 64) == 0);
-  for (int i = 0; i < kNumClasses; ++i) {
+  for (int i = 0; i < num_size_classes(); ++i) {
     central_cache_[i].Init(i);
   }
 
-  // It's important to have PageHeap allocated, not in static storage,
-  // so that HeapLeakChecker does not consider all the byte patterns stored
-  // in is caches as pointers that are sources of heap object liveness,
-  // which leads to it missing some memory leaks.
-  pageheap_ = new (MetaDataAlloc(sizeof(PageHeap))) PageHeap;
+  new (&pageheap_.memory) PageHeap;
+
+#if defined(ENABLE_AGGRESSIVE_DECOMMIT_BY_DEFAULT)
+  const bool kDefaultAggressiveDecommit = true;
+#else
+  const bool kDefaultAggressiveDecommit = false;
+#endif
+
 
   bool aggressive_decommit =
     tcmalloc::commandlineflags::StringToBool(
-      TCMallocGetenvSafe("TCMALLOC_AGGRESSIVE_DECOMMIT"), true);
+      TCMallocGetenvSafe("TCMALLOC_AGGRESSIVE_DECOMMIT"),
+                         kDefaultAggressiveDecommit);
 
-  pageheap_->SetAggressiveDecommit(aggressive_decommit);
+  pageheap()->SetAggressiveDecommit(aggressive_decommit);
+
+  inited_ = true;
 
   DLL_Init(&sampled_objects_);
-  Sampler::InitStatics();
 }
 
+void Static::InitLateMaybeRecursive() {
+#if defined(HAVE_FORK) && defined(HAVE_PTHREAD) \
+  && !defined(__APPLE__) && !defined(TCMALLOC_NO_ATFORK)
+  // OSX has it's own way of handling atfork in malloc (see
+  // libc_override_osx.h).
+  //
+  // For other OSes we do pthread_atfork even if standard seemingly
+  // discourages pthread_atfork, asking apps to do only
+  // async-signal-safe calls between fork and exec.
+  //
+  // We're deliberately attempting to register atfork handlers as part
+  // of malloc initialization. So very early. This ensures that our
+  // handler is called last and that means fork will try to grab
+  // tcmalloc locks last avoiding possible issues with many other
+  // locks that are held around calls to malloc. I.e. if we don't do
+  // that, fork() grabbing malloc lock before such other lock would be
+  // prone to deadlock, if some other thread holds other lock and
+  // calls malloc.
+  //
+  // We still leave some way of disabling it via
+  // -DTCMALLOC_NO_ATFORK. It looks like on glibc even with fully
+  // static binaries malloc is really initialized very early. But I
+  // can see how combination of static linking and other libc-s could
+  // be less fortunate and allow some early app constructors to run
+  // before malloc is ever called.
 
-#if defined(HAVE_FORK) && defined(HAVE_PTHREAD)
+  perftools_pthread_atfork(
+    CentralCacheLockAll,    // parent calls before fork
+    CentralCacheUnlockAll,  // parent calls after fork
+    CentralCacheUnlockAll); // child calls after fork
+#endif
 
-static inline
-void SetupAtForkLocksHandler()
-{
-#if !defined(__APPLE__)
-  pthread_atfork(CentralCacheLockAll,    // parent calls before fork
-                 CentralCacheUnlockAll,  // parent calls after fork
-                 CentralCacheUnlockAll); // child calls after fork
+#ifndef NDEBUG
+  // pthread_atfork above may malloc sometimes. Lets ensure we test
+  // that malloc works from here.
+  free(malloc(1));
 #endif
 }
-REGISTER_MODULE_INITIALIZER(tcmalloc_fork_handler, SetupAtForkLocksHandler());
-
-#endif
 
 }  // namespace tcmalloc

diff --git a/src/static_vars.h b/src/static_vars.h
index c662e40..bef0180 100644
--- a/src/static_vars.h
+++ b/src/static_vars.h

@@ -37,6 +37,7 @@
 #define TCMALLOC_STATIC_VARS_H_
 
 #include <config.h>
+#include "base/basictypes.h"
 #include "base/spinlock.h"
 #include "central_freelist.h"
 #include "common.h"
@@ -54,6 +55,7 @@
 
   // Must be called before calling any of the accessors below.
   static void InitStaticVars();
+  static void InitLateMaybeRecursive();
 
   // Central cache -- an array of free-lists, one per size-class.
   // We have a separate lock per free-list to reduce contention.
@@ -61,12 +63,14 @@
 
   static SizeMap* sizemap() { return &sizemap_; }
 
+  static unsigned num_size_classes() { return sizemap_.num_size_classes; }
+
   //////////////////////////////////////////////////////////////////////
   // In addition to the explicit initialization comment, the variables below
   // must be protected by pageheap_lock.
 
   // Page-level allocator.
-  static PageHeap* pageheap() { return pageheap_; }
+  static PageHeap* pageheap() { return reinterpret_cast<PageHeap *>(&pageheap_.memory); }
 
   static PageHeapAllocator<Span>* span_allocator() { return &span_allocator_; }
 
@@ -79,35 +83,42 @@
 
   // State kept for sampled allocations (/pprof/heap support)
   static Span* sampled_objects() { return &sampled_objects_; }
-  static PageHeapAllocator<StackTraceTable::Bucket>* bucket_allocator() {
-    return &bucket_allocator_;
-  }
 
   // Check if InitStaticVars() has been run.
-  static bool IsInited() { return pageheap() != NULL; }
+  static bool IsInited() { return inited_; }
 
  private:
-  static SpinLock pageheap_lock_;
+  // some unit tests depend on this and link to static vars
+  // imperfectly. Thus we keep those unhidden for now. Thankfully
+  // they're not performance-critical.
+  /* ATTRIBUTE_HIDDEN */ static bool inited_;
+  /* ATTRIBUTE_HIDDEN */ static SpinLock pageheap_lock_;
 
   // These static variables require explicit initialization.  We cannot
   // count on their constructors to do any initialization because other
   // static variables may try to allocate memory before these variables
   // can run their constructors.
 
-  static SizeMap sizemap_;
-  static CentralFreeListPadded central_cache_[kNumClasses];
-  static PageHeapAllocator<Span> span_allocator_;
-  static PageHeapAllocator<StackTrace> stacktrace_allocator_;
-  static Span sampled_objects_;
-  static PageHeapAllocator<StackTraceTable::Bucket> bucket_allocator_;
+  ATTRIBUTE_HIDDEN static SizeMap sizemap_;
+  ATTRIBUTE_HIDDEN static CentralFreeListPadded central_cache_[kClassSizesMax];
+  ATTRIBUTE_HIDDEN static PageHeapAllocator<Span> span_allocator_;
+  ATTRIBUTE_HIDDEN static PageHeapAllocator<StackTrace> stacktrace_allocator_;
+  ATTRIBUTE_HIDDEN static Span sampled_objects_;
 
   // Linked list of stack traces recorded every time we allocated memory
   // from the system.  Useful for finding allocation sites that cause
   // increase in the footprint of the system.  The linked list pointer
   // is stored in trace->stack[kMaxStackDepth-1].
-  static StackTrace* growth_stacks_;
+  ATTRIBUTE_HIDDEN static StackTrace* growth_stacks_;
 
-  static PageHeap* pageheap_;
+  // PageHeap uses a constructor for initialization.  Like the members above,
+  // we can't depend on initialization order, so pageheap is new'd
+  // into this buffer.
+  union PageHeapStorage {
+    char memory[sizeof(PageHeap)];
+    uintptr_t extra;  // To force alignment
+  };
+  ATTRIBUTE_HIDDEN static PageHeapStorage pageheap_;
 };
 
 }  // namespace tcmalloc

diff --git a/src/symbolize.cc b/src/symbolize.cc
old mode 100755
new mode 100644
index a27106e..8c94c18
--- a/src/symbolize.cc
+++ b/src/symbolize.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2009, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -60,19 +60,24 @@
 #include "base/commandlineflags.h"
 #include "base/logging.h"
 #include "base/sysinfo.h"
+#if defined(__FreeBSD__)
+#include <sys/sysctl.h>
+#endif
 
 using std::string;
 using tcmalloc::DumpProcSelfMaps;   // from sysinfo.h
 
-
-DEFINE_string(symbolize_pprof,
-              EnvToString("PPROF_PATH", "pprof"),
-              "Path to pprof to call for reporting function names.");
-
-// heap_profile_table_pprof may be referenced after destructors are
+// pprof may be used after destructors are
 // called (since that's when leak-checking is done), so we make
 // a more-permanent copy that won't ever get destroyed.
-static string* g_pprof_path = new string(FLAGS_symbolize_pprof);
+static char* get_pprof_path() {
+  static char* result = ([] () {
+      string pprof_string = EnvToString("PPROF_PATH", "pprof-symbolize");
+      return strdup(pprof_string.c_str());
+    })();
+
+  return result;
+}
 
 // Returns NULL if we're on an OS where we can't get the invocation name.
 // Using a static var is ok because we're not called from a thread.
@@ -94,6 +99,13 @@
       return NULL;
   }
   return program_invocation_name;
+#elif defined(__FreeBSD__)
+  static char program_invocation_name[PATH_MAX];
+  size_t len = sizeof(program_invocation_name);
+  static const int name[4] = { CTL_KERN, KERN_PROC, KERN_PROC_PATHNAME, -1 };
+  if (!sysctl(name, 4, program_invocation_name, &len, NULL, 0))
+    return program_invocation_name;
+  return NULL;
 #else
   return NULL;   // figure out a way to get argv[0]
 #endif
@@ -134,7 +146,7 @@
     PrintError("Cannot figure out the name of this executable (argv0)");
     return 0;
   }
-  if (access(g_pprof_path->c_str(), R_OK) != 0) {
+  if (access(get_pprof_path(), R_OK) != 0) {
     PrintError("Cannot find 'pprof' (is PPROF_PATH set correctly?)");
     return 0;
   }
@@ -196,7 +208,7 @@
       unsetenv("HEAPPROFILE");
       unsetenv("HEAPCHECK");
       unsetenv("PERFTOOLS_VERBOSE");
-      execlp(g_pprof_path->c_str(), g_pprof_path->c_str(),
+      execlp(get_pprof_path(), get_pprof_path(),
              "--symbols", argv0, NULL);
       _exit(3);  // if execvp fails, it's bad news for us
     }
@@ -238,6 +250,7 @@
       }
       write(child_in[1], pprof_buffer, strlen(pprof_buffer));
       close(child_in[1]);             // that's all we need to write
+      delete[] pprof_buffer;
 
       const int kSymbolBufferSize = kSymbolSize * symbolization_table_.size();
       int total_bytes_read = 0;

diff --git a/src/symbolize.h b/src/symbolize.h
index 728d073..aa0aa33 100644
--- a/src/symbolize.h
+++ b/src/symbolize.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2009, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/system-alloc.cc b/src/system-alloc.cc
old mode 100755
new mode 100644
index e61c087..e84a5f1
--- a/src/system-alloc.cc
+++ b/src/system-alloc.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -62,6 +62,14 @@
 # define MAP_ANONYMOUS MAP_ANON
 #endif
 
+// Linux added support for MADV_FREE in 4.5 but we aren't ready to use it
+// yet. Among other things, using compile-time detection leads to poor
+// results when compiling on a system with MADV_FREE and running on a
+// system without it. See https://github.com/gperftools/gperftools/issues/780.
+#if defined(__linux__) && defined(MADV_FREE) && !defined(TCMALLOC_USE_MADV_FREE)
+# undef MADV_FREE
+#endif
+
 // MADV_FREE is specifically designed for use by malloc(), but only
 // FreeBSD supports it; in linux we fall back to the somewhat inferior
 // MADV_DONTNEED.
@@ -88,31 +96,18 @@
 using tcmalloc::kLog;
 using tcmalloc::Log;
 
-// Anonymous namespace to avoid name conflicts on "CheckAddressBits".
-namespace {
-
 // Check that no bit is set at position ADDRESS_BITS or higher.
-template <int ADDRESS_BITS> bool CheckAddressBits(uintptr_t ptr) {
-  return (ptr >> ADDRESS_BITS) == 0;
+static bool CheckAddressBits(uintptr_t ptr) {
+  bool always_ok = (kAddressBits == 8 * sizeof(void*));
+  // this is a bit insane but otherwise we get compiler warning about
+  // shifting right by word size even if this code is dead :(
+  int shift_bits = always_ok ? 0 : kAddressBits;
+  return always_ok || ((ptr >> shift_bits) == 0);
 }
 
-// Specialize for the bit width of a pointer to avoid undefined shift.
-template <> bool CheckAddressBits<8 * sizeof(void*)>(uintptr_t ptr) {
-  return true;
-}
-
-}  // Anonymous namespace to avoid name conflicts on "CheckAddressBits".
-
 COMPILE_ASSERT(kAddressBits <= 8 * sizeof(void*),
                address_bits_larger_than_pointer_size);
 
-// Structure for discovering alignment
-union MemoryAligner {
-  void*  p;
-  double d;
-  size_t s;
-} CACHELINE_ALIGNED;
-
 static SpinLock spinlock(SpinLock::LINKER_INITIALIZED);
 
 #if defined(HAVE_MMAP) || defined(MADV_FREE)
@@ -121,7 +116,7 @@
 #endif
 
 // The current system allocator
-SysAllocator* sys_alloc = NULL;
+SysAllocator* tcmalloc_sys_alloc = NULL;
 
 // Number of bytes taken from system.
 size_t TCMalloc_SystemTaken = 0;
@@ -153,7 +148,10 @@
   }
   void* Alloc(size_t size, size_t *actual_size, size_t alignment);
 };
-static char sbrk_space[sizeof(SbrkSysAllocator)];
+static union {
+  char buf[sizeof(SbrkSysAllocator)];
+  void *ptr;
+} sbrk_space;
 
 class MmapSysAllocator : public SysAllocator {
 public:
@@ -161,7 +159,10 @@
   }
   void* Alloc(size_t size, size_t *actual_size, size_t alignment);
 };
-static char mmap_space[sizeof(MmapSysAllocator)];
+static union {
+  char buf[sizeof(MmapSysAllocator)];
+  void *ptr;
+} mmap_space;
 
 class DevMemSysAllocator : public SysAllocator {
 public:
@@ -195,7 +196,10 @@
   SysAllocator* allocs_[kMaxAllocators];
   const char* names_[kMaxAllocators];
 };
-static char default_space[sizeof(DefaultSysAllocator)];
+static union {
+  char buf[sizeof(DefaultSysAllocator)];
+  void *ptr;
+} default_space;
 static const char sbrk_name[] = "SbrkSysAllocator";
 static const char mmap_name[] = "MmapSysAllocator";
 
@@ -455,8 +459,8 @@
 
 static bool system_alloc_inited = false;
 void InitSystemAllocators(void) {
-  MmapSysAllocator *mmap = new (mmap_space) MmapSysAllocator();
-  SbrkSysAllocator *sbrk = new (sbrk_space) SbrkSysAllocator();
+  MmapSysAllocator *mmap = new (mmap_space.buf) MmapSysAllocator();
+  SbrkSysAllocator *sbrk = new (sbrk_space.buf) SbrkSysAllocator();
 
   // In 64-bit debug mode, place the mmap allocator first since it
   // allocates pointers that do not fit in 32 bits and therefore gives
@@ -465,7 +469,7 @@
   // likely to look like pointers and therefore the conservative gc in
   // the heap-checker is less likely to misinterpret a number as a
   // pointer).
-  DefaultSysAllocator *sdef = new (default_space) DefaultSysAllocator();
+  DefaultSysAllocator *sdef = new (default_space.buf) DefaultSysAllocator();
   if (kDebugMode && sizeof(void*) > 4) {
     sdef->SetChildAllocator(mmap, 0, mmap_name);
     sdef->SetChildAllocator(sbrk, 1, sbrk_name);
@@ -474,7 +478,7 @@
     sdef->SetChildAllocator(mmap, 1, mmap_name);
   }
 
-  sys_alloc = tc_get_sysalloc_override(sdef);
+  tcmalloc_sys_alloc = tc_get_sysalloc_override(sdef);
 }
 
 void* TCMalloc_SystemAlloc(size_t size, size_t *actual_size,
@@ -497,11 +501,10 @@
     actual_size = &actual_size_storage;
   }
 
-  void* result = sys_alloc->Alloc(size, actual_size, alignment);
+  void* result = tcmalloc_sys_alloc->Alloc(size, actual_size, alignment);
   if (result != NULL) {
     CHECK_CONDITION(
-      CheckAddressBits<kAddressBits>(
-        reinterpret_cast<uintptr_t>(result) + *actual_size - 1));
+      CheckAddressBits(reinterpret_cast<uintptr_t>(result) + *actual_size - 1));
     TCMalloc_SystemTaken += *actual_size;
   }
   return result;

diff --git a/src/system-alloc.h b/src/system-alloc.h
index 8233f96..e88948d 100644
--- a/src/system-alloc.h
+++ b/src/system-alloc.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -84,7 +84,7 @@
 void TCMalloc_SystemCommit(void* start, size_t length);
 
 // The current system allocator.
-extern PERFTOOLS_DLL_DECL SysAllocator* sys_alloc;
+extern PERFTOOLS_DLL_DECL SysAllocator* tcmalloc_sys_alloc;
 
 // Number of bytes taken from system.
 extern PERFTOOLS_DLL_DECL size_t TCMalloc_SystemTaken;

diff --git a/src/tcmalloc.cc b/src/tcmalloc.cc
index b7d1913..9ec663e 100644
--- a/src/tcmalloc.cc
+++ b/src/tcmalloc.cc

@@ -34,7 +34,7 @@
 // A malloc that uses a per-thread cache to satisfy small malloc requests.
 // (The time for malloc/free of a small object drops from 300 ns to 50 ns.)
 //
-// See doc/tcmalloc.html for a high-level
+// See docs/tcmalloc.html for a high-level
 // description of how this malloc works.
 //
 // SYNCHRONIZATION
@@ -88,12 +88,12 @@
 //   goes from about 1100 ns to about 300 ns.
 
 #include "config.h"
+// At least for gcc on Linux/i386 and Linux/amd64 not adding throw()
+// to tc_xxx functions actually ends up generating better code.
+#define PERFTOOLS_NOTHROW
 #include <gperftools/tcmalloc.h>
 
 #include <errno.h>                      // for ENOMEM, EINVAL, errno
-#ifdef HAVE_SYS_CDEFS_H
-#include <sys/cdefs.h>                  // for __THROW
-#endif
 #if defined HAVE_STDINT_H
 #include <stdint.h>
 #elif defined HAVE_INTTYPES_H
@@ -114,6 +114,7 @@
 
 #include <gperftools/malloc_extension.h>
 #include <gperftools/malloc_hook.h>         // for MallocHook
+#include <gperftools/nallocx.h>
 #include "base/basictypes.h"            // for int64
 #include "base/commandlineflags.h"      // for RegisterFlagValidator, etc
 #include "base/dynamic_annotations.h"   // for RunningOnValgrind
@@ -132,16 +133,7 @@
 #include "tcmalloc_guard.h"    // for TCMallocGuard
 #include "thread_cache.h"      // for ThreadCache
 
-#ifdef __clang__
-// clang's apparent focus on code size somehow causes it to ignore
-// normal inline directives even for few functions which inlining is
-// key for performance. In order to get performance of clang's
-// generated code closer to normal, we're forcing inlining via
-// attribute.
-#define ALWAYS_INLINE inline __attribute__((always_inline))
-#else
-#define ALWAYS_INLINE inline
-#endif
+#include "maybe_emergency_malloc.h"
 
 #if (defined(_WIN32) && !defined(__CYGWIN__) && !defined(__CYGWIN32__)) && !defined(WIN32_OVERRIDE_ALLOCATORS)
 # define WIN32_DO_PATCHING 1
@@ -150,20 +142,13 @@
 // Some windows file somewhere (at least on cygwin) #define's small (!)
 #undef small
 
-using STL_NAMESPACE::max;
-using STL_NAMESPACE::numeric_limits;
-using STL_NAMESPACE::vector;
+using std::max;
+using std::min;
+using std::numeric_limits;
+using std::vector;
 
 #include "libc_override.h"
 
-// __THROW is defined in glibc (via <sys/cdefs.h>).  It means,
-// counter-intuitively, "This function will never throw an exception."
-// It's an optional optimization tool, but we may need to use it to
-// match glibc prototypes.
-#ifndef __THROW    // I guess we're not on a glibc system
-# define __THROW   // __THROW is just an optimization, so ok to make it ""
-#endif
-
 using tcmalloc::AlignmentForSize;
 using tcmalloc::kLog;
 using tcmalloc::kCrash;
@@ -177,8 +162,30 @@
 using tcmalloc::Static;
 using tcmalloc::ThreadCache;
 
-DECLARE_int64(tcmalloc_sample_parameter);
 DECLARE_double(tcmalloc_release_rate);
+DECLARE_int64(tcmalloc_heap_limit_mb);
+
+// Those common architectures are known to be safe w.r.t. aliasing function
+// with "extra" unused args to function with fewer arguments (e.g.
+// tc_delete_nothrow being aliased to tc_delete).
+//
+// Benefit of aliasing is relatively moderate. It reduces instruction
+// cache pressure a bit (not relevant for largely unused
+// tc_delete_nothrow, but is potentially relevant for
+// tc_delete_aligned (or sized)). It also used to be the case that gcc
+// 5+ optimization for merging identical functions kicked in and
+// "screwed" one of the otherwise identical functions with extra
+// jump. I am not able to reproduce that anymore.
+#if !defined(__i386__) && !defined(__x86_64__) && \
+    !defined(__ppc__) && !defined(__PPC__) && \
+    !defined(__aarch64__) && !defined(__mips__) && !defined(__arm__)
+#undef TCMALLOC_NO_ALIASES
+#define TCMALLOC_NO_ALIASES
+#endif
+
+#if defined(__GNUC__) && defined(__ELF__) && !defined(TCMALLOC_NO_ALIASES)
+#define TC_ALIAS(name) __attribute__((alias(#name)))
+#endif
 
 // For windows, the printf we use to report large allocs is
 // potentially dangerous: it could cause a malloc that would cause an
@@ -213,64 +220,97 @@
 // MallocHook::GetCallerStackTrace can function accurately.
 #ifndef _WIN32   // windows doesn't have attribute_section, so don't bother
 extern "C" {
-  void* tc_malloc(size_t size) __THROW
+  void* tc_malloc(size_t size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  void tc_free(void* ptr) __THROW
+  void tc_free(void* ptr) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  void* tc_realloc(void* ptr, size_t size) __THROW
+  void tc_free_sized(void* ptr, size_t size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  void* tc_calloc(size_t nmemb, size_t size) __THROW
+  void* tc_realloc(void* ptr, size_t size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  void tc_cfree(void* ptr) __THROW
+  void* tc_calloc(size_t nmemb, size_t size) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_cfree(void* ptr) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
 
-  void* tc_memalign(size_t __alignment, size_t __size) __THROW
+  void* tc_memalign(size_t __alignment, size_t __size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  int tc_posix_memalign(void** ptr, size_t align, size_t size) __THROW
+  int tc_posix_memalign(void** ptr, size_t align, size_t size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  void* tc_valloc(size_t __size) __THROW
+  void* tc_valloc(size_t __size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  void* tc_pvalloc(size_t __size) __THROW
+  void* tc_pvalloc(size_t __size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
 
-  void tc_malloc_stats(void) __THROW
+  void tc_malloc_stats(void) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  int tc_mallopt(int cmd, int value) __THROW
+  int tc_mallopt(int cmd, int value) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
 #ifdef HAVE_STRUCT_MALLINFO
-  struct mallinfo tc_mallinfo(void) __THROW
+  struct mallinfo tc_mallinfo(void) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
 #endif
 
   void* tc_new(size_t size)
       ATTRIBUTE_SECTION(google_malloc);
-  void tc_delete(void* p) __THROW
+  void tc_delete(void* p) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_delete_sized(void* p, size_t size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
   void* tc_newarray(size_t size)
       ATTRIBUTE_SECTION(google_malloc);
-  void tc_deletearray(void* p) __THROW
+  void tc_deletearray(void* p) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_deletearray_sized(void* p, size_t size) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
 
   // And the nothrow variants of these:
-  void* tc_new_nothrow(size_t size, const std::nothrow_t&) __THROW
+  void* tc_new_nothrow(size_t size, const std::nothrow_t&) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  void* tc_newarray_nothrow(size_t size, const std::nothrow_t&) __THROW
+  void* tc_newarray_nothrow(size_t size, const std::nothrow_t&) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
   // Surprisingly, standard C++ library implementations use a
   // nothrow-delete internally.  See, eg:
   // http://www.dinkumware.com/manuals/?manual=compleat&page=new.html
-  void tc_delete_nothrow(void* ptr, const std::nothrow_t&) __THROW
+  void tc_delete_nothrow(void* ptr, const std::nothrow_t&) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
-  void tc_deletearray_nothrow(void* ptr, const std::nothrow_t&) __THROW
+  void tc_deletearray_nothrow(void* ptr, const std::nothrow_t&) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
 
+#if defined(ENABLE_ALIGNED_NEW_DELETE)
+
+  void* tc_new_aligned(size_t size, std::align_val_t al)
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_delete_aligned(void* p, std::align_val_t al) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_delete_sized_aligned(void* p, size_t size, std::align_val_t al) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void* tc_newarray_aligned(size_t size, std::align_val_t al)
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_deletearray_aligned(void* p, std::align_val_t al) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_deletearray_sized_aligned(void* p, size_t size, std::align_val_t al) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+
+  // And the nothrow variants of these:
+  void* tc_new_aligned_nothrow(size_t size, std::align_val_t al, const std::nothrow_t&) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void* tc_newarray_aligned_nothrow(size_t size, std::align_val_t al, const std::nothrow_t&) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_delete_aligned_nothrow(void* ptr, std::align_val_t al, const std::nothrow_t&) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+  void tc_deletearray_aligned_nothrow(void* ptr, std::align_val_t al, const std::nothrow_t&) PERFTOOLS_NOTHROW
+      ATTRIBUTE_SECTION(google_malloc);
+
+#endif // defined(ENABLE_ALIGNED_NEW_DELETE)
+
   // Some non-standard extensions that we support.
 
   // This is equivalent to
   //    OS X: malloc_size()
   //    glibc: malloc_usable_size()
   //    Windows: _msize()
-  size_t tc_malloc_size(void* p) __THROW
+  size_t tc_malloc_size(void* p) PERFTOOLS_NOTHROW
       ATTRIBUTE_SECTION(google_malloc);
 }  // extern "C"
 #endif  // #ifndef _WIN32
@@ -285,7 +325,11 @@
 // the pagemap cache has a non-zero sizeclass.) This is a cheap (source-editing
 // required) kind of exception handling for these routines.
 namespace {
-void InvalidFree(void* ptr) {
+ATTRIBUTE_NOINLINE void InvalidFree(void* ptr) {
+  if (tcmalloc::IsEmergencyPtr(ptr)) {
+    tcmalloc::EmergencyFree(ptr);
+    return;
+  }
   Log(kCrash, __FILE__, __LINE__, "Attempt to free invalid pointer", ptr);
 }
 
@@ -320,7 +364,7 @@
                          PageHeap::LargeSpanStats* large_spans) {
   r->central_bytes = 0;
   r->transfer_bytes = 0;
-  for (int cl = 0; cl < kNumClasses; ++cl) {
+  for (int cl = 0; cl < Static::num_size_classes(); ++cl) {
     const int length = Static::central_cache()[cl].length();
     const int tc_length = Static::central_cache()[cl].tc_length();
     const size_t cache_overhead = Static::central_cache()[cl].OverheadBytes();
@@ -359,7 +403,7 @@
 // WRITE stats to "out"
 static void DumpStats(TCMalloc_Printer* out, int level) {
   TCMallocStats stats;
-  uint64_t class_count[kNumClasses];
+  uint64_t class_count[kClassSizesMax];
   PageHeap::SmallSpanStats small;
   PageHeap::LargeSpanStats large;
   if (level >= 2) {
@@ -425,18 +469,25 @@
     out->printf("Total size of freelists for per-thread caches,\n");
     out->printf("transfer cache, and central cache, by size class\n");
     out->printf("------------------------------------------------\n");
-    uint64_t cumulative = 0;
-    for (int cl = 0; cl < kNumClasses; ++cl) {
+    uint64_t cumulative_bytes = 0;
+    uint64_t cumulative_overhead = 0;
+    for (uint32 cl = 0; cl < Static::num_size_classes(); ++cl) {
       if (class_count[cl] > 0) {
-        uint64_t class_bytes =
-            class_count[cl] * Static::sizemap()->ByteSizeForClass(cl);
-        cumulative += class_bytes;
-        out->printf("class %3d [ %8" PRIuS " bytes ] : "
-                "%8" PRIu64 " objs; %5.1f MiB; %5.1f cum MiB\n",
-                cl, Static::sizemap()->ByteSizeForClass(cl),
+        size_t cl_size = Static::sizemap()->ByteSizeForClass(cl);
+        const uint64_t class_bytes = class_count[cl] * cl_size;
+        cumulative_bytes += class_bytes;
+        const uint64_t class_overhead =
+            Static::central_cache()[cl].OverheadBytes();
+        cumulative_overhead += class_overhead;
+        out->printf("class %3d [ %8zu bytes ] : "
+                "%8" PRIu64 " objs; %5.1f MiB; %5.1f cum MiB; "
+                "%8.3f overhead MiB; %8.3f cum overhead MiB\n",
+                cl, cl_size,
                 class_count[cl],
                 class_bytes / MiB,
-                cumulative / MiB);
+                cumulative_bytes / MiB,
+                class_overhead / MiB,
+                cumulative_overhead / MiB);
       }
     }
 
@@ -454,9 +505,9 @@
     out->printf("------------------------------------------------\n");
     uint64_t total_normal = 0;
     uint64_t total_returned = 0;
-    for (int s = 0; s < kMaxPages; s++) {
-      const int n_length = small.normal_length[s];
-      const int r_length = small.returned_length[s];
+    for (int s = 1; s <= kMaxPages; s++) {
+      const int n_length = small.normal_length[s - 1];
+      const int r_length = small.returned_length[s - 1];
       if (n_length + r_length > 0) {
         uint64_t n_pages = s * n_length;
         uint64_t r_pages = s * r_length;
@@ -475,8 +526,9 @@
 
     total_normal += large.normal_pages;
     total_returned += large.returned_pages;
-    out->printf(">255   large * %6u spans ~ %6.1f MiB; %6.1f MiB cum"
+    out->printf(">%-5u large * %6u spans ~ %6.1f MiB; %6.1f MiB cum"
                 "; unmapped: %6.1f MiB; %6.1f MiB cum\n",
+                static_cast<unsigned int>(kMaxPages),
                 static_cast<unsigned int>(large.spans),
                 PagesToMiB(large.normal_pages + large.returned_pages),
                 PagesToMiB(total_normal + total_returned),
@@ -630,6 +682,17 @@
     return DumpHeapGrowthStackTraces();
   }
 
+  virtual size_t GetThreadCacheSize() {
+    ThreadCache* tc = ThreadCache::GetCacheIfPresent();
+    if (!tc)
+      return 0;
+    return tc->Size();
+  }
+
+  virtual void MarkThreadTemporarilyIdle() {
+    ThreadCache::BecomeTemporarilyIdle();
+  }
+
   virtual void Ranges(void* arg, RangeFunction func) {
     IterateOverRanges(arg, func);
   }
@@ -656,6 +719,14 @@
       return true;
     }
 
+    if (strcmp(name, "generic.total_physical_bytes") == 0) {
+      TCMallocStats stats;
+      ExtractStats(&stats, NULL, NULL, NULL);
+      *value = stats.pageheap.system_bytes + stats.metadata_bytes -
+               stats.pageheap.unmapped_bytes;
+      return true;
+    }
+
     if (strcmp(name, "tcmalloc.slack_bytes") == 0) {
       // Kept for backwards compatibility.  Now defined externally as:
       //    pageheap_free_bytes + pageheap_unmapped_bytes.
@@ -698,6 +769,54 @@
       return true;
     }
 
+    if (strcmp(name, "tcmalloc.pageheap_committed_bytes") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      *value = Static::pageheap()->stats().committed_bytes;
+      return true;
+    }
+
+    if (strcmp(name, "tcmalloc.pageheap_scavenge_count") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      *value = Static::pageheap()->stats().scavenge_count;
+      return true;
+    }
+
+    if (strcmp(name, "tcmalloc.pageheap_commit_count") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      *value = Static::pageheap()->stats().commit_count;
+      return true;
+    }
+
+    if (strcmp(name, "tcmalloc.pageheap_total_commit_bytes") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      *value = Static::pageheap()->stats().total_commit_bytes;
+      return true;
+    }
+
+    if (strcmp(name, "tcmalloc.pageheap_decommit_count") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      *value = Static::pageheap()->stats().decommit_count;
+      return true;
+    }
+
+    if (strcmp(name, "tcmalloc.pageheap_total_decommit_bytes") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      *value = Static::pageheap()->stats().total_decommit_bytes;
+      return true;
+    }
+
+    if (strcmp(name, "tcmalloc.pageheap_reserve_count") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      *value = Static::pageheap()->stats().reserve_count;
+      return true;
+    }
+
+    if (strcmp(name, "tcmalloc.pageheap_total_reserve_bytes") == 0) {
+        SpinLockHolder l(Static::pageheap_lock());
+        *value = Static::pageheap()->stats().total_reserve_bytes;
+        return true;
+    }
+
     if (strcmp(name, "tcmalloc.max_total_thread_cache_bytes") == 0) {
       SpinLockHolder l(Static::pageheap_lock());
       *value = ThreadCache::overall_thread_cache_size();
@@ -712,10 +831,17 @@
     }
 
     if (strcmp(name, "tcmalloc.aggressive_memory_decommit") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
       *value = size_t(Static::pageheap()->GetAggressiveDecommit());
       return true;
     }
 
+    if (strcmp(name, "tcmalloc.heap_limit_mb") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      *value = FLAGS_tcmalloc_heap_limit_mb;
+      return true;
+    }
+
     return false;
   }
 
@@ -729,10 +855,17 @@
     }
 
     if (strcmp(name, "tcmalloc.aggressive_memory_decommit") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
       Static::pageheap()->SetAggressiveDecommit(value != 0);
       return true;
     }
 
+    if (strcmp(name, "tcmalloc.heap_limit_mb") == 0) {
+      SpinLockHolder l(Static::pageheap_lock());
+      FLAGS_tcmalloc_heap_limit_mb = value;
+      return true;
+    }
+
     return false;
   }
 
@@ -744,12 +877,12 @@
 
   virtual SysAllocator* GetSystemAllocator() {
     SpinLockHolder h(Static::pageheap_lock());
-    return sys_alloc;
+    return tcmalloc_sys_alloc;
   }
 
   virtual void SetSystemAllocator(SysAllocator* alloc) {
     SpinLockHolder h(Static::pageheap_lock());
-    sys_alloc = alloc;
+    tcmalloc_sys_alloc = alloc;
   }
 
   virtual void ReleaseToSystem(size_t num_bytes) {
@@ -784,15 +917,7 @@
   virtual double GetMemoryReleaseRate() {
     return FLAGS_tcmalloc_release_rate;
   }
-  virtual size_t GetEstimatedAllocatedSize(size_t size) {
-    if (size <= kMaxSize) {
-      const size_t cl = Static::sizemap()->SizeClass(size);
-      const size_t alloc_size = Static::sizemap()->ByteSizeForClass(cl);
-      return alloc_size;
-    } else {
-      return tcmalloc::pages(size) << kPageShift;
-    }
-  }
+  virtual size_t GetEstimatedAllocatedSize(size_t size);
 
   // This just calls GetSizeWithCallback, but because that's in an
   // unnamed namespace, we need to move the definition below it in the
@@ -810,8 +935,8 @@
     if ((p >> (kAddressBits - kPageShift)) > 0) {
       return kNotOwned;
     }
-    size_t cl = Static::pageheap()->GetSizeClassIfCached(p);
-    if (cl != 0) {
+    uint32 cl;
+    if (Static::pageheap()->TryGetSizeClass(p, &cl)) {
       return kOwned;
     }
     const Span *span = Static::pageheap()->GetDescriptor(p);
@@ -819,19 +944,19 @@
   }
 
   virtual void GetFreeListSizes(vector<MallocExtension::FreeListInfo>* v) {
-    static const char* kCentralCacheType = "tcmalloc.central";
-    static const char* kTransferCacheType = "tcmalloc.transfer";
-    static const char* kThreadCacheType = "tcmalloc.thread";
-    static const char* kPageHeapType = "tcmalloc.page";
-    static const char* kPageHeapUnmappedType = "tcmalloc.page_unmapped";
-    static const char* kLargeSpanType = "tcmalloc.large";
-    static const char* kLargeUnmappedSpanType = "tcmalloc.large_unmapped";
+    static const char kCentralCacheType[] = "tcmalloc.central";
+    static const char kTransferCacheType[] = "tcmalloc.transfer";
+    static const char kThreadCacheType[] = "tcmalloc.thread";
+    static const char kPageHeapType[] = "tcmalloc.page";
+    static const char kPageHeapUnmappedType[] = "tcmalloc.page_unmapped";
+    static const char kLargeSpanType[] = "tcmalloc.large";
+    static const char kLargeUnmappedSpanType[] = "tcmalloc.large_unmapped";
 
     v->clear();
 
     // central class information
     int64 prev_class_size = 0;
-    for (int cl = 1; cl < kNumClasses; ++cl) {
+    for (int cl = 1; cl < Static::num_size_classes(); ++cl) {
       size_t class_size = Static::sizemap()->ByteSizeForClass(cl);
       MallocExtension::FreeListInfo i;
       i.min_object_size = prev_class_size + 1;
@@ -851,7 +976,7 @@
     }
 
     // Add stats from per-thread heaps
-    uint64_t class_count[kNumClasses];
+    uint64_t class_count[kClassSizesMax];
     memset(class_count, 0, sizeof(class_count));
     {
       SpinLockHolder h(Static::pageheap_lock());
@@ -860,7 +985,7 @@
     }
 
     prev_class_size = 0;
-    for (int cl = 1; cl < kNumClasses; ++cl) {
+    for (int cl = 1; cl < Static::num_size_classes(); ++cl) {
       MallocExtension::FreeListInfo i;
       i.min_object_size = prev_class_size + 1;
       i.max_object_size = Static::sizemap()->ByteSizeForClass(cl);
@@ -868,6 +993,8 @@
           class_count[cl] * Static::sizemap()->ByteSizeForClass(cl);
       i.type = kThreadCacheType;
       v->push_back(i);
+
+      prev_class_size = Static::sizemap()->ByteSizeForClass(cl);
     }
 
     // append page heap info
@@ -893,22 +1020,102 @@
     v->push_back(span_info);
 
     // small spans
-    for (int s = 1; s < kMaxPages; s++) {
+    for (int s = 1; s <= kMaxPages; s++) {
       MallocExtension::FreeListInfo i;
       i.max_object_size = (s << kPageShift);
       i.min_object_size = ((s - 1) << kPageShift);
 
       i.type = kPageHeapType;
-      i.total_bytes_free = (s << kPageShift) * small.normal_length[s];
+      i.total_bytes_free = (s << kPageShift) * small.normal_length[s - 1];
       v->push_back(i);
 
       i.type = kPageHeapUnmappedType;
-      i.total_bytes_free = (s << kPageShift) * small.returned_length[s];
+      i.total_bytes_free = (s << kPageShift) * small.returned_length[s - 1];
       v->push_back(i);
     }
   }
 };
 
+static inline ATTRIBUTE_ALWAYS_INLINE
+size_t align_size_up(size_t size, size_t align) {
+  ASSERT(align <= kPageSize);
+  size_t new_size = (size + align - 1) & ~(align - 1);
+  if (PREDICT_FALSE(new_size == 0)) {
+    // Note, new_size == 0 catches both integer overflow and size
+    // being 0.
+    if (size == 0) {
+      new_size = align;
+    } else {
+      new_size = size;
+    }
+  }
+  return new_size;
+}
+
+// Puts in *cl size class that is suitable for allocation of size bytes with
+// align alignment. Returns true if such size class exists and false otherwise.
+static bool size_class_with_alignment(size_t size, size_t align, uint32_t* cl) {
+  if (PREDICT_FALSE(align > kPageSize)) {
+    return false;
+  }
+  size = align_size_up(size, align);
+  if (PREDICT_FALSE(!Static::sizemap()->GetSizeClass(size, cl))) {
+    return false;
+  }
+  ASSERT((Static::sizemap()->class_to_size(*cl) & (align - 1)) == 0);
+  return true;
+}
+
+// nallocx slow path. Moved to a separate function because
+// ThreadCache::InitModule is not inlined which would cause nallocx to
+// become non-leaf function with stack frame and stack spills.
+static ATTRIBUTE_NOINLINE size_t nallocx_slow(size_t size, int flags) {
+  if (PREDICT_FALSE(!Static::IsInited())) ThreadCache::InitModule();
+
+  size_t align = static_cast<size_t>(1ull << (flags & 0x3f));
+  uint32 cl;
+  bool ok = size_class_with_alignment(size, align, &cl);
+  if (ok) {
+    return Static::sizemap()->ByteSizeForClass(cl);
+  } else {
+    return tcmalloc::pages(size) << kPageShift;
+  }
+}
+
+// The nallocx function allocates no memory, but it performs the same size
+// computation as the malloc function, and returns the real size of the
+// allocation that would result from the equivalent malloc function call.
+// nallocx is a malloc extension originally implemented by jemalloc:
+// http://www.unix.com/man-page/freebsd/3/nallocx/
+extern "C" PERFTOOLS_DLL_DECL
+size_t tc_nallocx(size_t size, int flags) {
+  if (PREDICT_FALSE(flags != 0)) {
+    return nallocx_slow(size, flags);
+  }
+  uint32 cl;
+  // size class 0 is only possible if malloc is not yet initialized
+  if (Static::sizemap()->GetSizeClass(size, &cl) && cl != 0) {
+    return Static::sizemap()->ByteSizeForClass(cl);
+  } else {
+    return nallocx_slow(size, 0);
+  }
+}
+
+extern "C" PERFTOOLS_DLL_DECL
+size_t nallocx(size_t size, int flags)
+#ifdef TC_ALIAS
+  TC_ALIAS(tc_nallocx);
+#else
+{
+  return nallocx_slow(size, flags);
+}
+#endif
+
+
+size_t TCMallocImplementation::GetEstimatedAllocatedSize(size_t size) {
+  return tc_nallocx(size, 0);
+}
+
 // The constructor allocates an object to ensure that initialization
 // runs before main(), and therefore we do not have a chance to become
 // multi-threaded before initialization.  We also create the TSD key
@@ -966,23 +1173,26 @@
 
 static inline bool CheckCachedSizeClass(void *ptr) {
   PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
-  size_t cached_value = Static::pageheap()->GetSizeClassIfCached(p);
-  return cached_value == 0 ||
-      cached_value == Static::pageheap()->GetDescriptor(p)->sizeclass;
+  uint32 cached_value;
+  if (!Static::pageheap()->TryGetSizeClass(p, &cached_value)) {
+    return true;
+  }
+  return cached_value == Static::pageheap()->GetDescriptor(p)->sizeclass;
 }
 
-static inline void* CheckedMallocResult(void *result) {
+static inline ATTRIBUTE_ALWAYS_INLINE void* CheckedMallocResult(void *result) {
   ASSERT(result == NULL || CheckCachedSizeClass(result));
   return result;
 }
 
-static inline void* SpanToMallocResult(Span *span) {
-  Static::pageheap()->CacheSizeClass(span->start, 0);
+static inline ATTRIBUTE_ALWAYS_INLINE void* SpanToMallocResult(Span *span) {
+  Static::pageheap()->InvalidateCachedSizeClass(span->start);
   return
       CheckedMallocResult(reinterpret_cast<void*>(span->start << kPageShift));
 }
 
 static void* DoSampledAllocation(size_t size) {
+#ifndef NO_TCMALLOC_SAMPLES
   // Grab the stack trace outside the heap lock
   StackTrace tmp;
   tmp.depth = GetStackTrace(tmp.stack, tcmalloc::kMaxStackDepth, 1);
@@ -991,13 +1201,13 @@
   SpinLockHolder h(Static::pageheap_lock());
   // Allocate span
   Span *span = Static::pageheap()->New(tcmalloc::pages(size == 0 ? 1 : size));
-  if (UNLIKELY(span == NULL)) {
+  if (PREDICT_FALSE(span == NULL)) {
     return NULL;
   }
 
   // Allocate stack trace
   StackTrace *stack = Static::stacktrace_allocator()->New();
-  if (UNLIKELY(stack == NULL)) {
+  if (PREDICT_FALSE(stack == NULL)) {
     // Sampling failed because of lack of memory
     return span;
   }
@@ -1007,6 +1217,9 @@
   tcmalloc::DLL_Prepend(Static::sampled_objects(), span);
 
   return SpanToMallocResult(span);
+#else
+  abort();
+#endif
 }
 
 namespace {
@@ -1019,6 +1232,16 @@
                  void* retry_arg,
                  bool from_operator,
                  bool nothrow) {
+  // we hit out of memory condition, usually if it happens we've
+  // called sbrk or mmap and failed, and thus errno is set. But there
+  // is support for setting up custom system allocator or setting up
+  // page heap size limit, in which cases errno may remain
+  // untouched.
+  //
+  // So we set errno here. C++ operator new doesn't require ENOMEM to
+  // be set, but doesn't forbid it too (and often C++ oom does happen
+  // with ENOMEM set).
+  errno = ENOMEM;
   if (!from_operator && !tc_new_mode) {
     // we're out of memory in C library function (malloc etc) and no
     // "new mode" forced on us. Just return NULL
@@ -1077,9 +1300,11 @@
 
 // Copy of FLAGS_tcmalloc_large_alloc_report_threshold with
 // automatic increases factored in.
+#ifdef ENABLE_LARGE_ALLOC_REPORT
 static int64_t large_alloc_threshold =
   (kPageSize > FLAGS_tcmalloc_large_alloc_report_threshold
    ? kPageSize : FLAGS_tcmalloc_large_alloc_report_threshold);
+#endif
 
 static void ReportLargeAlloc(Length num_pages, void* result) {
   StackTrace stack;
@@ -1098,36 +1323,9 @@
   write(STDERR_FILENO, buffer, strlen(buffer));
 }
 
-void* do_memalign(size_t align, size_t size);
-
-struct retry_memaligh_data {
-  size_t align;
-  size_t size;
-};
-
-static void *retry_do_memalign(void *arg) {
-  retry_memaligh_data *data = static_cast<retry_memaligh_data *>(arg);
-  return do_memalign(data->align, data->size);
-}
-
-static void *maybe_do_cpp_memalign_slow(size_t align, size_t size) {
-  retry_memaligh_data data;
-  data.align = align;
-  data.size = size;
-  return handle_oom(retry_do_memalign, &data,
-                    false, true);
-}
-
-inline void* do_memalign_or_cpp_memalign(size_t align, size_t size) {
-  void *rv = do_memalign(align, size);
-  if (LIKELY(rv != NULL)) {
-    return rv;
-  }
-  return maybe_do_cpp_memalign_slow(align, size);
-}
-
 // Must be called with the page lock held.
 inline bool should_report_large(Length num_pages) {
+#ifdef ENABLE_LARGE_ALLOC_REPORT
   const int64 threshold = large_alloc_threshold;
   if (threshold > 0 && num_pages >= (threshold >> kPageShift)) {
     // Increase the threshold by 1/8 every time we generate a report.
@@ -1136,18 +1334,24 @@
                              ? threshold + threshold/8 : 8ll<<30);
     return true;
   }
+#endif
   return false;
 }
 
 // Helper for do_malloc().
-inline void* do_malloc_pages(ThreadCache* heap, size_t size) {
+static void* do_malloc_pages(ThreadCache* heap, size_t size) {
   void* result;
   bool report_large;
 
   Length num_pages = tcmalloc::pages(size);
-  size = num_pages << kPageShift;
 
-  if ((FLAGS_tcmalloc_sample_parameter > 0) && heap->SampleAllocation(size)) {
+  // NOTE: we're passing original size here as opposed to rounded-up
+  // size as we do in do_malloc_small. The difference is small here
+  // (at most 4k out of at least 256k). And not rounding up saves us
+  // from possibility of overflow, which rounding up could produce.
+  //
+  // See https://github.com/gperftools/gperftools/issues/723
+  if (heap->SampleAllocation(size)) {
     result = DoSampledAllocation(size);
 
     SpinLockHolder h(Static::pageheap_lock());
@@ -1155,7 +1359,7 @@
   } else {
     SpinLockHolder h(Static::pageheap_lock());
     Span* span = Static::pageheap()->New(num_pages);
-    result = (UNLIKELY(span == NULL) ? NULL : SpanToMallocResult(span));
+    result = (PREDICT_FALSE(span == NULL) ? NULL : SpanToMallocResult(span));
     report_large = should_report_large(num_pages);
   }
 
@@ -1165,53 +1369,57 @@
   return result;
 }
 
-ALWAYS_INLINE void* do_malloc_small(ThreadCache* heap, size_t size) {
-  ASSERT(Static::IsInited());
-  ASSERT(heap != NULL);
-  size_t cl = Static::sizemap()->SizeClass(size);
-  size = Static::sizemap()->class_to_size(cl);
-
-  if (UNLIKELY(FLAGS_tcmalloc_sample_parameter > 0) && heap->SampleAllocation(size)) {
-    return DoSampledAllocation(size);
-  } else {
-    // The common case, and also the simplest.  This just pops the
-    // size-appropriate freelist, after replenishing it if it's empty.
-    return CheckedMallocResult(heap->Allocate(size, cl));
-  }
+static void *nop_oom_handler(size_t size) {
+  return NULL;
 }
 
-ALWAYS_INLINE void* do_malloc(size_t size) {
-  if (ThreadCache::have_tls &&
-      LIKELY(size < ThreadCache::MinSizeForSlowPath())) {
-    return do_malloc_small(ThreadCache::GetCacheWhichMustBePresent(), size);
-  } else if (size <= kMaxSize) {
-    return do_malloc_small(ThreadCache::GetCache(), size);
-  } else {
-    return do_malloc_pages(ThreadCache::GetCache(), size);
+ATTRIBUTE_ALWAYS_INLINE inline void* do_malloc(size_t size) {
+  if (PREDICT_FALSE(ThreadCache::IsUseEmergencyMalloc())) {
+    return tcmalloc::EmergencyMalloc(size);
   }
+
+  // note: it will force initialization of malloc if necessary
+  ThreadCache* cache = ThreadCache::GetCache();
+  uint32 cl;
+
+  ASSERT(Static::IsInited());
+  ASSERT(cache != NULL);
+
+  if (PREDICT_FALSE(!Static::sizemap()->GetSizeClass(size, &cl))) {
+    return do_malloc_pages(cache, size);
+  }
+
+  size_t allocated_size = Static::sizemap()->class_to_size(cl);
+  if (PREDICT_FALSE(cache->SampleAllocation(allocated_size))) {
+    return DoSampledAllocation(size);
+  }
+
+  // The common case, and also the simplest.  This just pops the
+  // size-appropriate freelist, after replenishing it if it's empty.
+  return CheckedMallocResult(cache->Allocate(allocated_size, cl, nop_oom_handler));
 }
 
 static void *retry_malloc(void* size) {
   return do_malloc(reinterpret_cast<size_t>(size));
 }
 
-ALWAYS_INLINE void* do_malloc_or_cpp_alloc(size_t size) {
+ATTRIBUTE_ALWAYS_INLINE inline void* do_malloc_or_cpp_alloc(size_t size) {
   void *rv = do_malloc(size);
-  if (LIKELY(rv != NULL)) {
+  if (PREDICT_TRUE(rv != NULL)) {
     return rv;
   }
   return handle_oom(retry_malloc, reinterpret_cast<void *>(size),
                     false, true);
 }
 
-ALWAYS_INLINE void* do_calloc(size_t n, size_t elem_size) {
+ATTRIBUTE_ALWAYS_INLINE inline void* do_calloc(size_t n, size_t elem_size) {
   // Overflow check
   const size_t size = n * elem_size;
   if (elem_size != 0 && size / elem_size != n) return NULL;
 
   void* result = do_malloc_or_cpp_alloc(size);
   if (result != NULL) {
-    memset(result, 0, size);
+    memset(result, 0, tc_nallocx(size, 0));
   }
   return result;
 }
@@ -1223,97 +1431,105 @@
   }
 }
 
-// Helper for do_free_with_callback(), below.  Inputs:
-//   ptr is object to be freed
-//   invalid_free_fn is a function that gets invoked on certain "bad frees"
-//   heap is the ThreadCache for this thread, or NULL if it isn't known
-//   heap_must_be_valid is whether heap is known to be non-NULL
-//
-// This function may only be used after Static::IsInited() is true.
-//
-// We can usually detect the case where ptr is not pointing to a page that
-// tcmalloc is using, and in those cases we invoke invalid_free_fn.
-//
-// To maximize speed in the common case, we usually get here with
-// heap_must_be_valid being a manifest constant equal to true.
-ALWAYS_INLINE void do_free_helper(void* ptr,
-                                  void (*invalid_free_fn)(void*),
-                                  ThreadCache* heap,
-                                  bool heap_must_be_valid) {
-  ASSERT((Static::IsInited() && heap != NULL) || !heap_must_be_valid);
-  if (!heap_must_be_valid && !Static::IsInited()) {
-    // We called free() before malloc().  This can occur if the
-    // (system) malloc() is called before tcmalloc is loaded, and then
-    // free() is called after tcmalloc is loaded (and tc_free has
-    // replaced free), but before the global constructor has run that
-    // sets up the tcmalloc data structures.
-    free_null_or_invalid(ptr, invalid_free_fn);
-    return;
+static ATTRIBUTE_NOINLINE void do_free_pages(Span* span, void* ptr) {
+  SpinLockHolder h(Static::pageheap_lock());
+  if (span->sample) {
+    StackTrace* st = reinterpret_cast<StackTrace*>(span->objects);
+    tcmalloc::DLL_Remove(span);
+    Static::stacktrace_allocator()->Delete(st);
+    span->objects = NULL;
   }
-  Span* span = NULL;
-  const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
-  size_t cl = Static::pageheap()->GetSizeClassIfCached(p);
-  if (UNLIKELY(cl == 0)) {
-    span = Static::pageheap()->GetDescriptor(p);
-    if (UNLIKELY(!span)) {
-      // span can be NULL because the pointer passed in is NULL or invalid
-      // (not something returned by malloc or friends), or because the
-      // pointer was allocated with some other allocator besides
-      // tcmalloc.  The latter can happen if tcmalloc is linked in via
-      // a dynamic library, but is not listed last on the link line.
-      // In that case, libraries after it on the link line will
-      // allocate with libc malloc, but free with tcmalloc's free.
-      free_null_or_invalid(ptr, invalid_free_fn);
-      return;
-    }
-    cl = span->sizeclass;
-    Static::pageheap()->CacheSizeClass(p, cl);
-  }
-  ASSERT(ptr != NULL);
-  if (LIKELY(cl != 0)) {
-    ASSERT(!Static::pageheap()->GetDescriptor(p)->sample);
-    if (heap_must_be_valid || heap != NULL) {
-      heap->Deallocate(ptr, cl);
-    } else {
-      // Delete directly into central cache
-      tcmalloc::SLL_SetNext(ptr, NULL);
-      Static::central_cache()[cl].InsertRange(ptr, ptr, 1);
-    }
-  } else {
-    SpinLockHolder h(Static::pageheap_lock());
-    ASSERT(reinterpret_cast<uintptr_t>(ptr) % kPageSize == 0);
-    ASSERT(span != NULL && span->start == p);
-    if (span->sample) {
-      StackTrace* st = reinterpret_cast<StackTrace*>(span->objects);
-      tcmalloc::DLL_Remove(span);
-      Static::stacktrace_allocator()->Delete(st);
-      span->objects = NULL;
-    }
-    Static::pageheap()->Delete(span);
-  }
+  Static::pageheap()->Delete(span);
 }
 
+#ifndef NDEBUG
+// note, with sized deletions we have no means to support win32
+// behavior where we detect "not ours" points and delegate them native
+// memory management. This is because nature of sized deletes
+// bypassing addr -> size class checks. So in this validation code we
+// also assume that sized delete is always used with "our" pointers.
+bool ValidateSizeHint(void* ptr, size_t size_hint) {
+  const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
+  Span* span  = Static::pageheap()->GetDescriptor(p);
+  uint32 cl = 0;
+  Static::sizemap()->GetSizeClass(size_hint, &cl);
+  return (span->sizeclass == cl);
+}
+#endif
+
 // Helper for the object deletion (free, delete, etc.).  Inputs:
 //   ptr is object to be freed
 //   invalid_free_fn is a function that gets invoked on certain "bad frees"
 //
 // We can usually detect the case where ptr is not pointing to a page that
 // tcmalloc is using, and in those cases we invoke invalid_free_fn.
-ALWAYS_INLINE void do_free_with_callback(void* ptr,
-                                         void (*invalid_free_fn)(void*)) {
-  ThreadCache* heap = NULL;
-  if (LIKELY(ThreadCache::IsFastPathAllowed())) {
-    heap = ThreadCache::GetCacheWhichMustBePresent();
-    do_free_helper(ptr, invalid_free_fn, heap, true);
-  } else {
-    heap = ThreadCache::GetCacheIfPresent();
-    do_free_helper(ptr, invalid_free_fn, heap, false);
+ATTRIBUTE_ALWAYS_INLINE inline
+void do_free_with_callback(void* ptr,
+                           void (*invalid_free_fn)(void*),
+                           bool use_hint, size_t size_hint) {
+  ThreadCache* heap = ThreadCache::GetCacheIfPresent();
+
+  const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
+  uint32 cl;
+
+  ASSERT(!use_hint || ValidateSizeHint(ptr, size_hint));
+
+  if (!use_hint || PREDICT_FALSE(!Static::sizemap()->GetSizeClass(size_hint, &cl))) {
+    // if we're in sized delete, but size is too large, no need to
+    // probe size cache
+    bool cache_hit = !use_hint && Static::pageheap()->TryGetSizeClass(p, &cl);
+    if (PREDICT_FALSE(!cache_hit)) {
+      Span* span  = Static::pageheap()->GetDescriptor(p);
+      if (PREDICT_FALSE(!span)) {
+        // span can be NULL because the pointer passed in is NULL or invalid
+        // (not something returned by malloc or friends), or because the
+        // pointer was allocated with some other allocator besides
+        // tcmalloc.  The latter can happen if tcmalloc is linked in via
+        // a dynamic library, but is not listed last on the link line.
+        // In that case, libraries after it on the link line will
+        // allocate with libc malloc, but free with tcmalloc's free.
+        free_null_or_invalid(ptr, invalid_free_fn);
+        return;
+      }
+      cl = span->sizeclass;
+      if (PREDICT_FALSE(cl == 0)) {
+        ASSERT(reinterpret_cast<uintptr_t>(ptr) % kPageSize == 0);
+        ASSERT(span != NULL && span->start == p);
+        do_free_pages(span, ptr);
+        return;
+      }
+      if (!use_hint) {
+        Static::pageheap()->SetCachedSizeClass(p, cl);
+      }
+    }
   }
+
+  if (PREDICT_TRUE(heap != NULL)) {
+    ASSERT(Static::IsInited());
+    // If we've hit initialized thread cache, so we're done.
+    heap->Deallocate(ptr, cl);
+    return;
+  }
+
+  if (PREDICT_FALSE(!Static::IsInited())) {
+    // if free was called very early we've could have missed the case
+    // of invalid or nullptr free. I.e. because probing size classes
+    // cache could return bogus result (cl = 0 as of this
+    // writing). But since there is no way we could be dealing with
+    // ptr we've allocated, since successfull malloc implies IsInited,
+    // we can just call "invalid free" handling code.
+    free_null_or_invalid(ptr, invalid_free_fn);
+    return;
+  }
+
+  // Otherwise, delete directly into central cache
+  tcmalloc::SLL_SetNext(ptr, NULL);
+  Static::central_cache()[cl].InsertRange(ptr, ptr, 1);
 }
 
 // The default "do_free" that uses the default callback.
-ALWAYS_INLINE void do_free(void* ptr) {
-  return do_free_with_callback(ptr, &InvalidFree);
+ATTRIBUTE_ALWAYS_INLINE inline void do_free(void* ptr) {
+  return do_free_with_callback(ptr, &InvalidFree, false, 0);
 }
 
 // NOTE: some logic here is duplicated in GetOwnership (above), for
@@ -1323,25 +1539,31 @@
   if (ptr == NULL)
     return 0;
   const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
-  size_t cl = Static::pageheap()->GetSizeClassIfCached(p);
-  if (cl != 0) {
+  uint32 cl;
+  if (Static::pageheap()->TryGetSizeClass(p, &cl)) {
     return Static::sizemap()->ByteSizeForClass(cl);
-  } else {
-    const Span *span = Static::pageheap()->GetDescriptor(p);
-    if (UNLIKELY(span == NULL)) {  // means we do not own this memory
-      return (*invalid_getsize_fn)(ptr);
-    } else if (span->sizeclass != 0) {
-      Static::pageheap()->CacheSizeClass(p, span->sizeclass);
-      return Static::sizemap()->ByteSizeForClass(span->sizeclass);
-    } else {
-      return span->length << kPageShift;
-    }
   }
+
+  const Span *span = Static::pageheap()->GetDescriptor(p);
+  if (PREDICT_FALSE(span == NULL)) {  // means we do not own this memory
+    return (*invalid_getsize_fn)(ptr);
+  }
+
+  if (span->sizeclass != 0) {
+    return Static::sizemap()->ByteSizeForClass(span->sizeclass);
+  }
+
+  if (span->sample) {
+    size_t orig_size = reinterpret_cast<StackTrace*>(span->objects)->size;
+    return tc_nallocx(orig_size, 0);
+  }
+
+  return span->length << kPageShift;
 }
 
 // This lets you call back to a given function pointer if ptr is invalid.
 // It is used primarily by windows code which wants a specialized callback.
-ALWAYS_INLINE void* do_realloc_with_callback(
+ATTRIBUTE_ALWAYS_INLINE inline void* do_realloc_with_callback(
     void* old_ptr, size_t new_size,
     void (*invalid_free_fn)(void*),
     size_t (*invalid_get_size_fn)(const void*)) {
@@ -1354,7 +1576,9 @@
   //    . If we need to grow, grow to max(new_size, old_size * 1.X)
   //    . Don't shrink unless new_size < old_size * 0.Y
   // X and Y trade-off time for wasted space.  For now we do 1.25 and 0.5.
-  const size_t lower_bound_to_grow = old_size + old_size / 4ul;
+  const size_t min_growth = min(old_size / 4,
+      (std::numeric_limits<size_t>::max)() - old_size);  // Avoid overflow.
+  const size_t lower_bound_to_grow = old_size + min_growth;
   const size_t upper_bound_to_shrink = old_size / 2ul;
   if ((new_size > old_size) || (new_size < upper_bound_to_shrink)) {
     // Need to reallocate.
@@ -1367,7 +1591,7 @@
       // Either new_size is not a tiny increment, or last do_malloc failed.
       new_ptr = do_malloc_or_cpp_alloc(new_size);
     }
-    if (UNLIKELY(new_ptr == NULL)) {
+    if (PREDICT_FALSE(new_ptr == NULL)) {
       return NULL;
     }
     MallocHook::InvokeNewHook(new_ptr, new_size);
@@ -1376,7 +1600,7 @@
     // We could use a variant of do_free() that leverages the fact
     // that we already know the sizeclass of old_ptr.  The benefit
     // would be small, so don't bother.
-    do_free_with_callback(old_ptr, invalid_free_fn);
+    do_free_with_callback(old_ptr, invalid_free_fn, false, 0);
     return new_ptr;
   } else {
     // We still need to call hooks to report the updated size:
@@ -1386,69 +1610,29 @@
   }
 }
 
-ALWAYS_INLINE void* do_realloc(void* old_ptr, size_t new_size) {
+ATTRIBUTE_ALWAYS_INLINE inline void* do_realloc(void* old_ptr, size_t new_size) {
   return do_realloc_with_callback(old_ptr, new_size,
                                   &InvalidFree, &InvalidGetSizeForRealloc);
 }
 
-// For use by exported routines below that want specific alignments
-//
-// Note: this code can be slow for alignments > 16, and can
-// significantly fragment memory.  The expectation is that
-// memalign/posix_memalign/valloc/pvalloc will not be invoked very
-// often.  This requirement simplifies our implementation and allows
-// us to tune for expected allocation patterns.
-void* do_memalign(size_t align, size_t size) {
+static ATTRIBUTE_ALWAYS_INLINE inline
+void* do_memalign_pages(size_t align, size_t size) {
   ASSERT((align & (align - 1)) == 0);
-  ASSERT(align > 0);
+  ASSERT(align > kPageSize);
   if (size + align < size) return NULL;         // Overflow
 
-  // Fall back to malloc if we would already align this memory access properly.
-  if (align <= AlignmentForSize(size)) {
-    void* p = do_malloc(size);
-    ASSERT((reinterpret_cast<uintptr_t>(p) % align) == 0);
-    return p;
-  }
-
-  if (UNLIKELY(Static::pageheap() == NULL)) ThreadCache::InitModule();
+  if (PREDICT_FALSE(Static::pageheap() == NULL)) ThreadCache::InitModule();
 
   // Allocate at least one byte to avoid boundary conditions below
   if (size == 0) size = 1;
 
-  if (size <= kMaxSize && align < kPageSize) {
-    // Search through acceptable size classes looking for one with
-    // enough alignment.  This depends on the fact that
-    // InitSizeClasses() currently produces several size classes that
-    // are aligned at powers of two.  We will waste time and space if
-    // we miss in the size class array, but that is deemed acceptable
-    // since memalign() should be used rarely.
-    int cl = Static::sizemap()->SizeClass(size);
-    while (cl < kNumClasses &&
-           ((Static::sizemap()->class_to_size(cl) & (align - 1)) != 0)) {
-      cl++;
-    }
-    if (cl < kNumClasses) {
-      ThreadCache* heap = ThreadCache::GetCache();
-      size = Static::sizemap()->class_to_size(cl);
-      return CheckedMallocResult(heap->Allocate(size, cl));
-    }
-  }
-
   // We will allocate directly from the page heap
   SpinLockHolder h(Static::pageheap_lock());
 
-  if (align <= kPageSize) {
-    // Any page-level allocation will be fine
-    // TODO: We could put the rest of this page in the appropriate
-    // TODO: cache but it does not seem worth it.
-    Span* span = Static::pageheap()->New(tcmalloc::pages(size));
-    return UNLIKELY(span == NULL) ? NULL : SpanToMallocResult(span);
-  }
-
   // Allocate extra pages and carve off an aligned portion
   const Length alloc = tcmalloc::pages(size + align);
   Span* span = Static::pageheap()->New(alloc);
-  if (UNLIKELY(span == NULL)) return NULL;
+  if (PREDICT_FALSE(span == NULL)) return NULL;
 
   // Skip starting portion so that we end up aligned
   Length skip = 0;
@@ -1510,15 +1694,6 @@
 }
 #endif  // HAVE_STRUCT_MALLINFO
 
-inline void* cpp_alloc(size_t size, bool nothrow) {
-  void* p = do_malloc(size);
-  if (LIKELY(p)) {
-    return p;
-  }
-  return handle_oom(retry_malloc, reinterpret_cast<void *>(size),
-                    true, nothrow);
-}
-
 }  // end unnamed namespace
 
 // As promised, the definition of this function, declared above.
@@ -1541,7 +1716,7 @@
 //-------------------------------------------------------------------
 
 extern "C" PERFTOOLS_DLL_DECL const char* tc_version(
-    int* major, int* minor, const char** patch) __THROW {
+    int* major, int* minor, const char** patch) PERFTOOLS_NOTHROW {
   if (major) *major = TC_VERSION_MAJOR;
   if (minor) *minor = TC_VERSION_MINOR;
   if (patch) *patch = TC_VERSION_PATCH;
@@ -1553,12 +1728,16 @@
 // If flag is 1, calls to malloc will behave like calls to new,
 // and the std_new_handler will be invoked on failure.
 // Returns the previous mode.
-extern "C" PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) __THROW {
+extern "C" PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) PERFTOOLS_NOTHROW {
   int old_mode = tc_new_mode;
   tc_new_mode = flag;
   return old_mode;
 }
 
+extern "C" PERFTOOLS_DLL_DECL int tc_query_new_mode() PERFTOOLS_NOTHROW {
+  return tc_new_mode;
+}
+
 #ifndef TCMALLOC_USING_DEBUGALLOCATION  // debugallocation.cc defines its own
 
 // CAVEAT: The code structure below ensures that MallocHook methods are always
@@ -1566,31 +1745,259 @@
 //         heap-checker.cc depends on this to start a stack trace from
 //         the call to the (de)allocation function.
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) __THROW {
-  void* result = do_malloc_or_cpp_alloc(size);
-  MallocHook::InvokeNewHook(result, size);
-  return result;
-}
+namespace tcmalloc {
 
-extern "C" PERFTOOLS_DLL_DECL void tc_free(void* ptr) __THROW {
+
+static ATTRIBUTE_SECTION(google_malloc)
+void invoke_hooks_and_free(void *ptr) {
   MallocHook::InvokeDeleteHook(ptr);
   do_free(ptr);
 }
 
+ATTRIBUTE_SECTION(google_malloc)
+void* cpp_throw_oom(size_t size) {
+  return handle_oom(retry_malloc, reinterpret_cast<void *>(size),
+                    true, false);
+}
+
+ATTRIBUTE_SECTION(google_malloc)
+void* cpp_nothrow_oom(size_t size) {
+  return handle_oom(retry_malloc, reinterpret_cast<void *>(size),
+                    true, true);
+}
+
+ATTRIBUTE_SECTION(google_malloc)
+void* malloc_oom(size_t size) {
+  return handle_oom(retry_malloc, reinterpret_cast<void *>(size),
+                    false, true);
+}
+
+// tcmalloc::allocate_full_XXX is called by fast-path malloc when some
+// complex handling is needed (such as fetching object from central
+// freelist or malloc sampling). It contains all 'operator new' logic,
+// as opposed to malloc_fast_path which only deals with important
+// subset of cases.
+//
+// Note that this is under tcmalloc namespace so that pprof
+// can automatically filter it out of growthz/heapz profiles.
+//
+// We have slightly fancy setup because we need to call hooks from
+// function in 'google_malloc' section and we cannot place template
+// into this section. Thus 3 separate functions 'built' by macros.
+//
+// Also note that we're carefully orchestrating for
+// MallocHook::GetCallerStackTrace to work even if compiler isn't
+// optimizing tail calls (e.g. -O0 is given). We still require
+// ATTRIBUTE_ALWAYS_INLINE to work for that case, but it was seen to
+// work for -O0 -fno-inline across both GCC and clang. I.e. in this
+// case we'll get stack frame for tc_new, followed by stack frame for
+// allocate_full_cpp_throw_oom, followed by hooks machinery and user
+// code's stack frames. So GetCallerStackTrace will find 2
+// subsequent stack frames in google_malloc section and correctly
+// 'cut' stack trace just before tc_new.
+template <void* OOMHandler(size_t)>
+ATTRIBUTE_ALWAYS_INLINE inline
+static void* do_allocate_full(size_t size) {
+  void* p = do_malloc(size);
+  if (PREDICT_FALSE(p == NULL)) {
+    p = OOMHandler(size);
+  }
+  MallocHook::InvokeNewHook(p, size);
+  return CheckedMallocResult(p);
+}
+
+#define AF(oom) \
+  ATTRIBUTE_SECTION(google_malloc)   \
+  void* allocate_full_##oom(size_t size) {   \
+    return do_allocate_full<oom>(size);     \
+  }
+
+AF(cpp_throw_oom)
+AF(cpp_nothrow_oom)
+AF(malloc_oom)
+
+#undef AF
+
+template <void* OOMHandler(size_t)>
+static ATTRIBUTE_ALWAYS_INLINE inline void* dispatch_allocate_full(size_t size) {
+  if (OOMHandler == cpp_throw_oom) {
+    return allocate_full_cpp_throw_oom(size);
+  }
+  if (OOMHandler == cpp_nothrow_oom) {
+    return allocate_full_cpp_nothrow_oom(size);
+  }
+  ASSERT(OOMHandler == malloc_oom);
+  return allocate_full_malloc_oom(size);
+}
+
+struct retry_memalign_data {
+  size_t align;
+  size_t size;
+};
+
+static void *retry_do_memalign(void *arg) {
+  retry_memalign_data *data = static_cast<retry_memalign_data *>(arg);
+  return do_memalign_pages(data->align, data->size);
+}
+
+static ATTRIBUTE_SECTION(google_malloc)
+void* memalign_pages(size_t align, size_t size,
+                     bool from_operator, bool nothrow) {
+  void *rv = do_memalign_pages(align, size);
+  if (PREDICT_FALSE(rv == NULL)) {
+    retry_memalign_data data;
+    data.align = align;
+    data.size = size;
+    rv = handle_oom(retry_do_memalign, &data,
+                    from_operator, nothrow);
+  }
+  MallocHook::InvokeNewHook(rv, size);
+  return CheckedMallocResult(rv);
+}
+
+} // namespace tcmalloc
+
+// This is quick, fast-path-only implementation of malloc/new. It is
+// designed to only have support for fast-path. It checks if more
+// complex handling is needed (such as a pageheap allocation or
+// sampling) and only performs allocation if none of those uncommon
+// conditions hold. When we have one of those odd cases it simply
+// tail-calls to one of tcmalloc::allocate_full_XXX defined above.
+//
+// Such approach was found to be quite effective. Generated code for
+// tc_{new,malloc} either succeeds quickly or tail-calls to
+// allocate_full. Terseness of the source and lack of
+// non-tail calls enables compiler to produce better code. Also
+// produced code is short enough to enable effort-less human
+// comprehension. Which itself led to elimination of various checks
+// that were not necessary for fast-path.
+template <void* OOMHandler(size_t)>
+ATTRIBUTE_ALWAYS_INLINE inline
+static void * malloc_fast_path(size_t size) {
+  if (PREDICT_FALSE(!base::internal::new_hooks_.empty())) {
+    return tcmalloc::dispatch_allocate_full<OOMHandler>(size);
+  }
+
+  ThreadCache *cache = ThreadCache::GetFastPathCache();
+
+  if (PREDICT_FALSE(cache == NULL)) {
+    return tcmalloc::dispatch_allocate_full<OOMHandler>(size);
+  }
+
+  uint32 cl;
+  if (PREDICT_FALSE(!Static::sizemap()->GetSizeClass(size, &cl))) {
+    return tcmalloc::dispatch_allocate_full<OOMHandler>(size);
+  }
+
+  size_t allocated_size = Static::sizemap()->ByteSizeForClass(cl);
+
+  if (PREDICT_FALSE(!cache->TryRecordAllocationFast(allocated_size))) {
+    return tcmalloc::dispatch_allocate_full<OOMHandler>(size);
+  }
+
+  return CheckedMallocResult(cache->Allocate(allocated_size, cl, OOMHandler));
+}
+
+template <void* OOMHandler(size_t)>
+ATTRIBUTE_ALWAYS_INLINE inline
+static void* memalign_fast_path(size_t align, size_t size) {
+  if (PREDICT_FALSE(align > kPageSize)) {
+    if (OOMHandler == tcmalloc::cpp_throw_oom) {
+      return tcmalloc::memalign_pages(align, size, true, false);
+    } else if (OOMHandler == tcmalloc::cpp_nothrow_oom) {
+      return tcmalloc::memalign_pages(align, size, true, true);
+    } else {
+      ASSERT(OOMHandler == tcmalloc::malloc_oom);
+      return tcmalloc::memalign_pages(align, size, false, true);
+    }
+  }
+
+  // Everything with alignment <= kPageSize we can easily delegate to
+  // regular malloc
+
+  return malloc_fast_path<OOMHandler>(align_size_up(size, align));
+}
+
+extern "C" PERFTOOLS_DLL_DECL CACHELINE_ALIGNED_FN
+void* tc_malloc(size_t size) PERFTOOLS_NOTHROW {
+  return malloc_fast_path<tcmalloc::malloc_oom>(size);
+}
+
+static ATTRIBUTE_ALWAYS_INLINE inline
+void free_fast_path(void *ptr) {
+  if (PREDICT_FALSE(!base::internal::delete_hooks_.empty())) {
+    tcmalloc::invoke_hooks_and_free(ptr);
+    return;
+  }
+  do_free(ptr);
+}
+
+extern "C" PERFTOOLS_DLL_DECL CACHELINE_ALIGNED_FN
+void tc_free(void* ptr) PERFTOOLS_NOTHROW {
+  free_fast_path(ptr);
+}
+
+extern "C" PERFTOOLS_DLL_DECL CACHELINE_ALIGNED_FN
+void tc_free_sized(void *ptr, size_t size) PERFTOOLS_NOTHROW {
+  if (PREDICT_FALSE(!base::internal::delete_hooks_.empty())) {
+    tcmalloc::invoke_hooks_and_free(ptr);
+    return;
+  }
+#ifndef NO_TCMALLOC_SAMPLES
+  // if ptr is kPageSize-aligned, then it could be sampled allocation,
+  // thus we don't trust hint and just do plain free. It also handles
+  // nullptr for us.
+  if (PREDICT_FALSE((reinterpret_cast<uintptr_t>(ptr) & (kPageSize-1)) == 0)) {
+    tc_free(ptr);
+    return;
+  }
+#else
+  if (!ptr) {
+    return;
+  }
+#endif
+  do_free_with_callback(ptr, &InvalidFree, true, size);
+}
+
+#ifdef TC_ALIAS
+
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_sized(void *p, size_t size) PERFTOOLS_NOTHROW
+  TC_ALIAS(tc_free_sized);
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_sized(void *p, size_t size) PERFTOOLS_NOTHROW
+  TC_ALIAS(tc_free_sized);
+
+#else
+
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_sized(void *p, size_t size) PERFTOOLS_NOTHROW {
+  tc_free_sized(p, size);
+}
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_sized(void *p, size_t size) PERFTOOLS_NOTHROW {
+  tc_free_sized(p, size);
+}
+
+#endif
+
 extern "C" PERFTOOLS_DLL_DECL void* tc_calloc(size_t n,
-                                              size_t elem_size) __THROW {
+                                              size_t elem_size) PERFTOOLS_NOTHROW {
+  if (ThreadCache::IsUseEmergencyMalloc()) {
+    return tcmalloc::EmergencyCalloc(n, elem_size);
+  }
   void* result = do_calloc(n, elem_size);
   MallocHook::InvokeNewHook(result, n * elem_size);
   return result;
 }
 
-extern "C" PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) __THROW {
-  MallocHook::InvokeDeleteHook(ptr);
-  do_free(ptr);
+extern "C" PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_free);
+#else
+{
+  free_fast_path(ptr);
 }
+#endif
 
 extern "C" PERFTOOLS_DLL_DECL void* tc_realloc(void* old_ptr,
-                                               size_t new_size) __THROW {
+                                               size_t new_size) PERFTOOLS_NOTHROW {
   if (old_ptr == NULL) {
     void* result = do_malloc_or_cpp_alloc(new_size);
     MallocHook::InvokeNewHook(result, new_size);
@@ -1601,85 +2008,95 @@
     do_free(old_ptr);
     return NULL;
   }
+  if (PREDICT_FALSE(tcmalloc::IsEmergencyPtr(old_ptr))) {
+    return tcmalloc::EmergencyRealloc(old_ptr, new_size);
+  }
   return do_realloc(old_ptr, new_size);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_new(size_t size) {
-  void* p = cpp_alloc(size, false);
-  // We keep this next instruction out of cpp_alloc for a reason: when
-  // it's in, and new just calls cpp_alloc, the optimizer may fold the
-  // new call into cpp_alloc, which messes up our whole section-based
-  // stacktracing (see ATTRIBUTE_SECTION, above).  This ensures cpp_alloc
-  // isn't the last thing this fn calls, and prevents the folding.
-  MallocHook::InvokeNewHook(p, size);
-  return p;
+extern "C" PERFTOOLS_DLL_DECL CACHELINE_ALIGNED_FN
+void* tc_new(size_t size) {
+  return malloc_fast_path<tcmalloc::cpp_throw_oom>(size);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_new_nothrow(size_t size, const std::nothrow_t&) __THROW {
-  void* p = cpp_alloc(size, true);
-  MallocHook::InvokeNewHook(p, size);
-  return p;
+extern "C" PERFTOOLS_DLL_DECL CACHELINE_ALIGNED_FN
+void* tc_new_nothrow(size_t size, const std::nothrow_t&) PERFTOOLS_NOTHROW {
+  return malloc_fast_path<tcmalloc::cpp_nothrow_oom>(size);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void tc_delete(void* p) __THROW {
-  MallocHook::InvokeDeleteHook(p);
-  do_free(p);
+extern "C" PERFTOOLS_DLL_DECL void tc_delete(void* p) PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_free);
+#else
+{
+  free_fast_path(p);
 }
+#endif
 
 // Standard C++ library implementations define and use this
 // (via ::operator delete(ptr, nothrow)).
 // But it's really the same as normal delete, so we just do the same thing.
-extern "C" PERFTOOLS_DLL_DECL void tc_delete_nothrow(void* p, const std::nothrow_t&) __THROW {
-  MallocHook::InvokeDeleteHook(p);
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_nothrow(void* p, const std::nothrow_t&) PERFTOOLS_NOTHROW
+{
+  if (PREDICT_FALSE(!base::internal::delete_hooks_.empty())) {
+    tcmalloc::invoke_hooks_and_free(p);
+    return;
+  }
   do_free(p);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_newarray(size_t size) {
-  void* p = cpp_alloc(size, false);
-  // We keep this next instruction out of cpp_alloc for a reason: when
-  // it's in, and new just calls cpp_alloc, the optimizer may fold the
-  // new call into cpp_alloc, which messes up our whole section-based
-  // stacktracing (see ATTRIBUTE_SECTION, above).  This ensures cpp_alloc
-  // isn't the last thing this fn calls, and prevents the folding.
-  MallocHook::InvokeNewHook(p, size);
-  return p;
+extern "C" PERFTOOLS_DLL_DECL void* tc_newarray(size_t size)
+#ifdef TC_ALIAS
+TC_ALIAS(tc_new);
+#else
+{
+  return malloc_fast_path<tcmalloc::cpp_throw_oom>(size);
 }
+#endif
 
 extern "C" PERFTOOLS_DLL_DECL void* tc_newarray_nothrow(size_t size, const std::nothrow_t&)
-    __THROW {
-  void* p = cpp_alloc(size, true);
-  MallocHook::InvokeNewHook(p, size);
-  return p;
+    PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_new_nothrow);
+#else
+{
+  return malloc_fast_path<tcmalloc::cpp_nothrow_oom>(size);
 }
+#endif
 
-extern "C" PERFTOOLS_DLL_DECL void tc_deletearray(void* p) __THROW {
-  MallocHook::InvokeDeleteHook(p);
-  do_free(p);
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray(void* p) PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_free);
+#else
+{
+  free_fast_path(p);
 }
+#endif
 
-extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_nothrow(void* p, const std::nothrow_t&) __THROW {
-  MallocHook::InvokeDeleteHook(p);
-  do_free(p);
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_nothrow(void* p, const std::nothrow_t&) PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_delete_nothrow);
+#else
+{
+  free_fast_path(p);
 }
+#endif
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_memalign(size_t align,
-                                                size_t size) __THROW {
-  void* result = do_memalign_or_cpp_memalign(align, size);
-  MallocHook::InvokeNewHook(result, size);
-  return result;
+extern "C" PERFTOOLS_DLL_DECL CACHELINE_ALIGNED_FN
+void* tc_memalign(size_t align, size_t size) PERFTOOLS_NOTHROW {
+  return memalign_fast_path<tcmalloc::malloc_oom>(align, size);
 }
 
 extern "C" PERFTOOLS_DLL_DECL int tc_posix_memalign(
-    void** result_ptr, size_t align, size_t size) __THROW {
+    void** result_ptr, size_t align, size_t size) PERFTOOLS_NOTHROW {
   if (((align % sizeof(void*)) != 0) ||
       ((align & (align - 1)) != 0) ||
       (align == 0)) {
     return EINVAL;
   }
 
-  void* result = do_memalign_or_cpp_memalign(align, size);
-  MallocHook::InvokeNewHook(result, size);
-  if (UNLIKELY(result == NULL)) {
+  void* result = tc_memalign(align, size);
+  if (PREDICT_FALSE(result == NULL)) {
     return ENOMEM;
   } else {
     *result_ptr = result;
@@ -1687,47 +2104,119 @@
   }
 }
 
-static size_t pagesize = 0;
+#if defined(ENABLE_ALIGNED_NEW_DELETE)
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_valloc(size_t size) __THROW {
-  // Allocate page-aligned object of length >= size bytes
-  if (pagesize == 0) pagesize = getpagesize();
-  void* result = do_memalign_or_cpp_memalign(pagesize, size);
-  MallocHook::InvokeNewHook(result, size);
-  return result;
+extern "C" PERFTOOLS_DLL_DECL void* tc_new_aligned(size_t size, std::align_val_t align) {
+  return memalign_fast_path<tcmalloc::cpp_throw_oom>(static_cast<size_t>(align), size);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t size) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void* tc_new_aligned_nothrow(size_t size, std::align_val_t align, const std::nothrow_t&) PERFTOOLS_NOTHROW {
+  return memalign_fast_path<tcmalloc::cpp_nothrow_oom>(static_cast<size_t>(align), size);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_aligned(void* p, std::align_val_t) PERFTOOLS_NOTHROW
+{
+  free_fast_path(p);
+}
+
+// There is no easy way to obtain the actual size used by do_memalign to allocate aligned storage, so for now
+// just ignore the size. It might get useful in the future.
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_sized_aligned(void* p, size_t size, std::align_val_t align) PERFTOOLS_NOTHROW
+{
+  free_fast_path(p);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void tc_delete_aligned_nothrow(void* p, std::align_val_t, const std::nothrow_t&) PERFTOOLS_NOTHROW
+{
+  free_fast_path(p);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_newarray_aligned(size_t size, std::align_val_t align)
+#ifdef TC_ALIAS
+TC_ALIAS(tc_new_aligned);
+#else
+{
+  return memalign_fast_path<tcmalloc::cpp_throw_oom>(static_cast<size_t>(align), size);
+}
+#endif
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_newarray_aligned_nothrow(size_t size, std::align_val_t align, const std::nothrow_t& nt) PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_new_aligned_nothrow);
+#else
+{
+  return memalign_fast_path<tcmalloc::cpp_nothrow_oom>(static_cast<size_t>(align), size);
+}
+#endif
+
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_aligned(void* p, std::align_val_t) PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_delete_aligned);
+#else
+{
+  free_fast_path(p);
+}
+#endif
+
+// There is no easy way to obtain the actual size used by do_memalign to allocate aligned storage, so for now
+// just ignore the size. It might get useful in the future.
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_sized_aligned(void* p, size_t size, std::align_val_t align) PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_delete_sized_aligned);
+#else
+{
+  free_fast_path(p);
+}
+#endif
+
+extern "C" PERFTOOLS_DLL_DECL void tc_deletearray_aligned_nothrow(void* p, std::align_val_t, const std::nothrow_t&) PERFTOOLS_NOTHROW
+#ifdef TC_ALIAS
+TC_ALIAS(tc_delete_aligned_nothrow);
+#else
+{
+  free_fast_path(p);
+}
+#endif
+
+#endif // defined(ENABLE_ALIGNED_NEW_DELETE)
+
+static size_t pagesize = 0;
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_valloc(size_t size) PERFTOOLS_NOTHROW {
+  // Allocate page-aligned object of length >= size bytes
+  if (pagesize == 0) pagesize = getpagesize();
+  return tc_memalign(pagesize, size);
+}
+
+extern "C" PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t size) PERFTOOLS_NOTHROW {
   // Round up size to a multiple of pagesize
   if (pagesize == 0) pagesize = getpagesize();
   if (size == 0) {     // pvalloc(0) should allocate one page, according to
     size = pagesize;   // http://man.free4web.biz/man3/libmpatrol.3.html
   }
   size = (size + pagesize - 1) & ~(pagesize - 1);
-  void* result = do_memalign_or_cpp_memalign(pagesize, size);
-  MallocHook::InvokeNewHook(result, size);
-  return result;
+  return tc_memalign(pagesize, size);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void tc_malloc_stats(void) __THROW {
+extern "C" PERFTOOLS_DLL_DECL void tc_malloc_stats(void) PERFTOOLS_NOTHROW {
   do_malloc_stats();
 }
 
-extern "C" PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) __THROW {
+extern "C" PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) PERFTOOLS_NOTHROW {
   return do_mallopt(cmd, value);
 }
 
 #ifdef HAVE_STRUCT_MALLINFO
-extern "C" PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW {
+extern "C" PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) PERFTOOLS_NOTHROW {
   return do_mallinfo();
 }
 #endif
 
-extern "C" PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW {
+extern "C" PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) PERFTOOLS_NOTHROW {
   return MallocExtension::instance()->GetAllocatedSize(ptr);
 }
 
-extern "C" PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size)  __THROW {
+extern "C" PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size)  PERFTOOLS_NOTHROW {
   void* result = do_malloc(size);
   MallocHook::InvokeNewHook(result, size);
   return result;

diff --git a/src/tcmalloc.h b/src/tcmalloc.h
index 2d64f4e..016c805 100644
--- a/src/tcmalloc.h
+++ b/src/tcmalloc.h

@@ -53,18 +53,18 @@
 # define __THROW   // __THROW is just an optimization, so ok to make it ""
 #endif
 
-#if !HAVE_CFREE_SYMBOL
+#if !HAVE_DECL_CFREE
 extern "C" void cfree(void* ptr) __THROW;
 #endif
-#if !HAVE_POSIX_MEMALIGN_SYMBOL
+#if !HAVE_DECL_POSIX_MEMALIGN
 extern "C" int posix_memalign(void** ptr, size_t align, size_t size) __THROW;
 #endif
-#if !HAVE_MEMALIGN_SYMBOL
+#if !HAVE_DECL_MEMALIGN
 extern "C" void* memalign(size_t __alignment, size_t __size) __THROW;
 #endif
-#if !HAVE_VALLOC_SYMBOL
+#if !HAVE_DECL_VALLOC
 extern "C" void* valloc(size_t __size) __THROW;
 #endif
-#if !HAVE_PVALLOC_SYMBOL
+#if !HAVE_DECL_PVALLOC
 extern "C" void* pvalloc(size_t __size) __THROW;
 #endif

diff --git a/src/tests/addressmap_unittest.cc b/src/tests/addressmap_unittest.cc
index a847dd6..45781f3 100644
--- a/src/tests/addressmap_unittest.cc
+++ b/src/tests/addressmap_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -34,6 +34,7 @@
 #include <stdlib.h>   // for rand()
 #include <vector>
 #include <set>
+#include <random>
 #include <algorithm>
 #include <utility>
 #include "addressmap-inl.h"
@@ -47,7 +48,7 @@
 using std::make_pair;
 using std::vector;
 using std::set;
-using std::random_shuffle;
+using std::shuffle;
 
 struct UniformRandomNumberGenerator {
   size_t Uniform(size_t max_size) {
@@ -91,7 +92,9 @@
     RAW_LOG(INFO, "Iteration %d/%d...\n", x, FLAGS_iters);
 
     // Permute pointers to get rid of allocation order issues
-    random_shuffle(ptrs_and_sizes.begin(), ptrs_and_sizes.end());
+    std::random_device rd;
+    std::mt19937 g(rd());
+    shuffle(ptrs_and_sizes.begin(), ptrs_and_sizes.end(), g);
 
     AddressMap<ValueT> map(malloc, free);
     const ValueT* result;

diff --git a/src/tests/atomicops_unittest.cc b/src/tests/atomicops_unittest.cc
index aa82a6b..22be839 100644
--- a/src/tests/atomicops_unittest.cc
+++ b/src/tests/atomicops_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -104,11 +104,6 @@
   base::subtle::NoBarrier_Store(&value, kVal2);
   ASSERT_EQ(kVal2, value);
 
-  base::subtle::Acquire_Store(&value, kVal1);
-  ASSERT_EQ(kVal1, value);
-  base::subtle::Acquire_Store(&value, kVal2);
-  ASSERT_EQ(kVal2, value);
-
   base::subtle::Release_Store(&value, kVal1);
   ASSERT_EQ(kVal1, value);
   base::subtle::Release_Store(&value, kVal2);
@@ -133,11 +128,6 @@
   ASSERT_EQ(kVal1, base::subtle::Acquire_Load(&value));
   value = kVal2;
   ASSERT_EQ(kVal2, base::subtle::Acquire_Load(&value));
-
-  value = kVal1;
-  ASSERT_EQ(kVal1, base::subtle::Release_Load(&value));
-  value = kVal2;
-  ASSERT_EQ(kVal2, base::subtle::Release_Load(&value));
 }
 
 template <class AtomicType>

diff --git a/src/tests/current_allocated_bytes_test.cc b/src/tests/current_allocated_bytes_test.cc
index eaa6a7b..49b7dc3 100644
--- a/src/tests/current_allocated_bytes_test.cc
+++ b/src/tests/current_allocated_bytes_test.cc

@@ -46,12 +46,12 @@
 #include <gperftools/malloc_extension.h>
 #include "base/logging.h"
 
-const char kCurrent[] = "generic.current_allocated_bytes";
-
 int main() {
   // We don't do accounting right when using debugallocation.cc, so
   // turn off the test then.  TODO(csilvers): get this working too.
 #ifdef NDEBUG
+  static const char kCurrent[] = "generic.current_allocated_bytes";
+
   size_t before_bytes, after_bytes;
   MallocExtension::instance()->GetNumericProperty(kCurrent, &before_bytes);
   free(malloc(200));

diff --git a/src/tests/debugallocation_test.cc b/src/tests/debugallocation_test.cc
index d935dbb..1e45db6 100644
--- a/src/tests/debugallocation_test.cc
+++ b/src/tests/debugallocation_test.cc

@@ -38,6 +38,7 @@
 #include "gperftools/malloc_extension.h"
 #include "gperftools/tcmalloc.h"
 #include "base/logging.h"
+#include "tests/testutil.h"
 
 using std::vector;
 
@@ -91,7 +92,7 @@
 
   // Allocate with malloc.
   {
-    int* x = static_cast<int*>(malloc(sizeof(*x)));
+    int* x = static_cast<int*>(noopt(malloc(sizeof(*x))));
     IF_DEBUG_EXPECT_DEATH(delete x, "mismatch.*being dealloc.*delete");
     IF_DEBUG_EXPECT_DEATH(delete [] x, "mismatch.*being dealloc.*delete *[[]");
     // Should work fine.
@@ -100,8 +101,8 @@
 
   // Allocate with new.
   {
-    int* x = new int;
-    int* y = new int;
+    int* x = noopt(new int);
+    int* y = noopt(new int);
     IF_DEBUG_EXPECT_DEATH(free(x), "mismatch.*being dealloc.*free");
     IF_DEBUG_EXPECT_DEATH(delete [] x, "mismatch.*being dealloc.*delete *[[]");
     delete x;
@@ -110,8 +111,8 @@
 
   // Allocate with new[].
   {
-    int* x = new int[1];
-    int* y = new int[1];
+    int* x = noopt(new int[1]);
+    int* y = noopt(new int[1]);
     IF_DEBUG_EXPECT_DEATH(free(x), "mismatch.*being dealloc.*free");
     IF_DEBUG_EXPECT_DEATH(delete x, "mismatch.*being dealloc.*delete");
     delete [] x;
@@ -120,8 +121,8 @@
 
   // Allocate with new(nothrow).
   {
-    int* x = new(std::nothrow) int;
-    int* y = new(std::nothrow) int;
+    int* x = noopt(new (std::nothrow) int);
+    int* y = noopt(new (std::nothrow) int);
     IF_DEBUG_EXPECT_DEATH(free(x), "mismatch.*being dealloc.*free");
     IF_DEBUG_EXPECT_DEATH(delete [] x, "mismatch.*being dealloc.*delete *[[]");
     delete x;
@@ -130,8 +131,8 @@
 
   // Allocate with new(nothrow)[].
   {
-    int* x = new(std::nothrow) int[1];
-    int* y = new(std::nothrow) int[1];
+    int* x = noopt(new (std::nothrow) int[1]);
+    int* y = noopt(new (std::nothrow) int[1]);
     IF_DEBUG_EXPECT_DEATH(free(x), "mismatch.*being dealloc.*free");
     IF_DEBUG_EXPECT_DEATH(delete x, "mismatch.*being dealloc.*delete");
     delete [] x;
@@ -141,13 +142,13 @@
 #endif  // #ifdef OS_MACOSX
 
 TEST(DebugAllocationTest, DoubleFree) {
-  int* pint = new int;
+  int* pint = noopt(new int);
   delete pint;
   IF_DEBUG_EXPECT_DEATH(delete pint, "has been already deallocated");
 }
 
 TEST(DebugAllocationTest, StompBefore) {
-  int* pint = new int;
+  int* pint = noopt(new int);
 #ifndef NDEBUG   // don't stomp memory if we're not in a position to detect it
   pint[-1] = 5;
   IF_DEBUG_EXPECT_DEATH(delete pint, "a word before object");
@@ -155,7 +156,7 @@
 }
 
 TEST(DebugAllocationTest, StompAfter) {
-  int* pint = new int;
+  int* pint = noopt(new int);
 #ifndef NDEBUG   // don't stomp memory if we're not in a position to detect it
   pint[1] = 5;
   IF_DEBUG_EXPECT_DEATH(delete pint, "a word after object");
@@ -164,10 +165,10 @@
 
 TEST(DebugAllocationTest, FreeQueueTest) {
   // Verify that the allocator doesn't return blocks that were recently freed.
-  int* x = new int;
+  int* x = noopt(new int);
   int* old_x = x;
   delete x;
-  x = new int;
+  x = noopt(new int);
   #if 1
     // This check should not be read as a universal guarantee of behavior.  If
     // other threads are executing, it would be theoretically possible for this
@@ -191,12 +192,12 @@
   // safe.  When debugging, we expect the (trashed) deleted block to be on the
   // list of recently-freed blocks, so the following 'new' will be safe.
 #if 1
-  int* x = new int;
+  int* x = noopt(new int);
   delete x;
   int poisoned_x_value = *x;
   *x = 1;  // a dangling write.
 
-  char* s = new char[FLAGS_max_free_queue_size];
+  char* s = noopt(new char[FLAGS_max_free_queue_size]);
   // When we delete s, we push the storage that was previously allocated to x
   // off the end of the free queue.  At that point, the write to that memory
   // will be detected.
@@ -210,7 +211,7 @@
 }
 
 TEST(DebugAllocationTest, DanglingWriteAtExitTest) {
-  int *x = new int;
+  int *x = noopt(new int);
   delete x;
   int old_x_value = *x;
   *x = 1;
@@ -221,7 +222,7 @@
 }
 
 TEST(DebugAllocationTest, StackTraceWithDanglingWriteAtExitTest) {
-  int *x = new int;
+  int *x = noopt(new int);
   delete x;
   int old_x_value = *x;
   *x = 1;
@@ -244,13 +245,13 @@
   FLAGS_max_free_queue_size = 0;
   // Force a round-trip through the queue management code so that the
   // new size is seen and the queue of recently-freed blocks is flushed.
-  free(malloc(1));
+  free(noopt(malloc(1)));
   FLAGS_max_free_queue_size = 1048576;
 #endif
 
   // Free something and check that it disappears from allocated bytes
   // immediately.
-  char* p = new char[1000];
+  char* p = noopt(new char[1000]);
   size_t after_malloc = CurrentlyAllocatedBytes();
   delete[] p;
   size_t after_free = CurrentlyAllocatedBytes();
@@ -263,12 +264,12 @@
   // exactly requested size, since debug_allocation doesn't allow users
   // to write more than that.
   for (int i = 0; i < 10; ++i) {
-    void *p = malloc(i);
+    void *p = noopt(malloc(i));
     EXPECT_EQ(i, MallocExtension::instance()->GetAllocatedSize(p));
     free(p);
   }
 #endif
-  void* a = malloc(1000);
+  void* a = noopt(malloc(1000));
   EXPECT_GE(MallocExtension::instance()->GetAllocatedSize(a), 1000);
   // This is just a sanity check.  If we allocated too much, alloc is broken
   EXPECT_LE(MallocExtension::instance()->GetAllocatedSize(a), 5000);
@@ -285,7 +286,7 @@
 
 #ifndef NDEBUG
 
-  a = malloc(kTooBig);
+  a = noopt(malloc(noopt(kTooBig)));
   EXPECT_EQ(NULL, a);
 
   // kAlsoTooBig is small enough not to get caught by debugallocation's check,
@@ -293,7 +294,7 @@
   // a non-const variable. See kTooBig for more details.
   size_t kAlsoTooBig = kTooBig - 1024;
 
-  a = malloc(kAlsoTooBig);
+  a = noopt(malloc(noopt(kAlsoTooBig)));
   EXPECT_EQ(NULL, a);
 #endif
 }
@@ -307,7 +308,7 @@
   EXPECT_NE(p, NULL);
   memcpy(stuff, p, sizeof(stuff));
 
-  p = realloc(p, sizeof(stuff) + 10);
+  p = noopt(realloc(p, sizeof(stuff) + 10));
   EXPECT_NE(p, NULL);
 
   int rv = memcmp(stuff, p, sizeof(stuff));

diff --git a/src/tests/debugallocation_test.sh b/src/tests/debugallocation_test.sh
index faa6c79..0f94ad0 100755
--- a/src/tests/debugallocation_test.sh
+++ b/src/tests/debugallocation_test.sh

@@ -33,6 +33,9 @@
 # Author: Craig Silverstein
 
 BINDIR="${BINDIR:-.}"
+# We expect PPROF_PATH to be set in the environment.
+# If not, we set it to some reasonable value
+export PPROF_PATH="${PPROF_PATH:-$BINDIR/src/pprof}"
 
 if [ "x$1" = "x-h" -o "x$1" = "x--help" ]; then
   echo "USAGE: $0 [unittest dir]"

diff --git a/src/tests/frag_unittest.cc b/src/tests/frag_unittest.cc
index c4016f9..6d2619f 100644
--- a/src/tests/frag_unittest.cc
+++ b/src/tests/frag_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2003, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/heap-checker-death_unittest.sh b/src/tests/heap-checker-death_unittest.sh
index 752a7ad..69db0c9 100755
--- a/src/tests/heap-checker-death_unittest.sh
+++ b/src/tests/heap-checker-death_unittest.sh

@@ -157,7 +157,7 @@
 
 # Test that we produce a reasonable textual leak report.
 Test 60 1 "MakeALeak" "" \
-          HEAP_CHECKER_TEST_TEST_LEAK=1 HEAP_CHECK_TEST_NO_THREADS=1 \
+          HEAP_CHECKER_TEST_TEST_LEAK=1 HEAP_CHECKER_TEST_NO_THREADS=1 \
   || exit 10
 
 # Test that very early log messages are present and controllable:

diff --git a/src/tests/heap-checker_unittest.cc b/src/tests/heap-checker_unittest.cc
index 8c8f865..b4bcb52 100644
--- a/src/tests/heap-checker_unittest.cc
+++ b/src/tests/heap-checker_unittest.cc

@@ -787,9 +787,9 @@
 static void DirectTestSTLAlloc(Alloc allocator, const char* name) {
   HeapLeakChecker check((string("direct_stl-") + name).c_str());
   static const int kSize = 1000;
-  typename Alloc::pointer ptrs[kSize];
+  typename Alloc::value_type* ptrs[kSize];
   for (int i = 0; i < kSize; ++i) {
-    typename Alloc::pointer p = allocator.allocate(i*3+1);
+    typename Alloc::value_type* p = allocator.allocate(i*3+1);
     HeapLeakChecker::IgnoreObject(p);
     // This will crash if p is not known to heap profiler:
     // (i.e. STL's "allocator" does not have a direct hook to heap profiler)
@@ -1298,7 +1298,7 @@
 #endif
 
 // to trick complier into preventing inlining
-static void* (*mmapper_addr)(uintptr_t* addr) = &Mmapper;
+static void* (* volatile mmapper_addr)(uintptr_t* addr) = &Mmapper;
 
 // TODO(maxim): copy/move this to memory_region_map_unittest
 // TODO(maxim): expand this test to include mmap64, mremap and sbrk calls.
@@ -1338,8 +1338,8 @@
   return r;
 }
 
-// to trick complier into preventing inlining
-static void* (*mallocer_addr)(uintptr_t* addr) = &Mallocer;
+// to trick compiler into preventing inlining
+static void* (* volatile mallocer_addr)(uintptr_t* addr) = &Mallocer;
 
 // non-static for friendship with HeapProfiler
 // TODO(maxim): expand this test to include

diff --git a/src/tests/heap-profiler_unittest.cc b/src/tests/heap-profiler_unittest.cc
index 3317813..addb5f1 100644
--- a/src/tests/heap-profiler_unittest.cc
+++ b/src/tests/heap-profiler_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/heap-profiler_unittest.sh b/src/tests/heap-profiler_unittest.sh
index b4c2e9f..91af04f 100755
--- a/src/tests/heap-profiler_unittest.sh
+++ b/src/tests/heap-profiler_unittest.sh

@@ -54,14 +54,11 @@
 
 HEAP_PROFILER="${1:-$BINDIR/heap-profiler_unittest}"
 PPROF="${2:-$PPROF_PATH}"
-TEST_TMPDIR=/tmp/heap_profile_info
+TEST_TMPDIR=`mktemp -d /tmp/heap-profiler_unittest.XXXXXX`
 
 # It's meaningful to the profiler, so make sure we know its state
 unset HEAPPROFILE
 
-rm -rf "$TEST_TMPDIR"
-mkdir "$TEST_TMPDIR" || exit 2
-
 num_failures=0
 
 # Given one profile (to check the contents of that profile) or two
@@ -140,7 +137,7 @@
 # testing of the HeapProfileStart/Stop functionality.
 $HEAP_PROFILER >"$TEST_TMPDIR/output2" 2>&1
 
-rm -rf $TMPDIR      # clean up
+rm -rf $TEST_TMPDIR      # clean up
 
 if [ $num_failures = 0 ]; then
   echo "PASS"

diff --git a/src/tests/low_level_alloc_unittest.cc b/src/tests/low_level_alloc_unittest.cc
index e3cb555..0474441 100644
--- a/src/tests/low_level_alloc_unittest.cc
+++ b/src/tests/low_level_alloc_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2006, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/malloc_extension_c_test.c b/src/tests/malloc_extension_c_test.c
index 278fdb7..2868b9c 100644
--- a/src/tests/malloc_extension_c_test.c
+++ b/src/tests/malloc_extension_c_test.c

@@ -1,11 +1,11 @@
 /* -*- Mode: C; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2009, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/malloc_extension_test.cc b/src/tests/malloc_extension_test.cc
index 31c4968..6570772 100644
--- a/src/tests/malloc_extension_test.cc
+++ b/src/tests/malloc_extension_test.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/markidle_unittest.cc b/src/tests/markidle_unittest.cc
index 827609f..829c503 100644
--- a/src/tests/markidle_unittest.cc
+++ b/src/tests/markidle_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2003, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -93,9 +93,26 @@
   CHECK_LE(post_idle, original);
 
   // Log after testing because logging can allocate heap memory.
-  VLOG(0, "Original usage: %" PRIuS "\n", original);
-  VLOG(0, "Post allocation: %" PRIuS "\n", post_allocation);
-  VLOG(0, "Post idle: %" PRIuS "\n", post_idle);
+  VLOG(0, "Original usage: %zu\n", original);
+  VLOG(0, "Post allocation: %zu\n", post_allocation);
+  VLOG(0, "Post idle: %zu\n", post_idle);
+}
+
+static void TestTemporarilyIdleUsage() {
+  const size_t original = MallocExtension::instance()->GetThreadCacheSize();
+
+  TestAllocation();
+  const size_t post_allocation = MallocExtension::instance()->GetThreadCacheSize();
+  CHECK_GT(post_allocation, original);
+
+  MallocExtension::instance()->MarkThreadIdle();
+  const size_t post_idle = MallocExtension::instance()->GetThreadCacheSize();
+  CHECK_EQ(post_idle, 0);
+
+  // Log after testing because logging can allocate heap memory.
+  VLOG(0, "Original usage: %zu\n", original);
+  VLOG(0, "Post allocation: %zu\n", post_allocation);
+  VLOG(0, "Post idle: %zu\n", post_idle);
 }
 
 int main(int argc, char** argv) {
@@ -103,6 +120,7 @@
   RunThread(&TestAllocation);
   RunThread(&MultipleIdleCalls);
   RunThread(&MultipleIdleNonIdlePhases);
+  RunThread(&TestTemporarilyIdleUsage);
 
   printf("PASS\n");
   return 0;

diff --git a/src/tests/memalign_unittest.cc b/src/tests/memalign_unittest.cc
index 309a3df..035f709 100644
--- a/src/tests/memalign_unittest.cc
+++ b/src/tests/memalign_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2004, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/packed-cache_test.cc b/src/tests/packed-cache_test.cc
index befbd77..3984594 100644
--- a/src/tests/packed-cache_test.cc
+++ b/src/tests/packed-cache_test.cc

@@ -35,24 +35,43 @@
 #include "base/logging.h"
 #include "packed-cache-inl.h"
 
-static const int kHashbits = PackedCache<64, uint64>::kHashbits;
+static const int kHashbits = PackedCache<20>::kHashbits;
+
+template <int kKeybits>
+static size_t MustGet(const PackedCache<kKeybits>& cache, uintptr_t key) {
+  uint32 rv;
+  CHECK(cache.TryGet(key, &rv));
+  return rv;
+}
+
+template <int kKeybits>
+static size_t Has(const PackedCache<kKeybits>& cache, uintptr_t key) {
+  uint32 dummy;
+  return cache.TryGet(key, &dummy);
+}
 
 // A basic sanity test.
 void PackedCacheTest_basic() {
-  PackedCache<32, uint32> cache(0);
-  CHECK_EQ(cache.GetOrDefault(0, 1), 0);
+  PackedCache<20> cache;
+
+  CHECK(!Has(cache, 0));
   cache.Put(0, 17);
-  CHECK(cache.Has(0));
-  CHECK_EQ(cache.GetOrDefault(0, 1), 17);
+  CHECK(Has(cache, 0));
+  CHECK_EQ(MustGet(cache, 0), 17);
+
   cache.Put(19, 99);
-  CHECK(cache.Has(0) && cache.Has(19));
-  CHECK_EQ(cache.GetOrDefault(0, 1), 17);
-  CHECK_EQ(cache.GetOrDefault(19, 1), 99);
+  CHECK_EQ(MustGet(cache, 0), 17);
+  CHECK_EQ(MustGet(cache, 19), 99);
+
   // Knock <0, 17> out by using a conflicting key.
   cache.Put(1 << kHashbits, 22);
-  CHECK(!cache.Has(0));
-  CHECK_EQ(cache.GetOrDefault(0, 1), 1);
-  CHECK_EQ(cache.GetOrDefault(1 << kHashbits, 1), 22);
+  CHECK(!Has(cache, 0));
+  CHECK_EQ(MustGet(cache, 1 << kHashbits), 22);
+
+  cache.Invalidate(19);
+  CHECK(!Has(cache, 19));
+  CHECK(!Has(cache, 0));
+  CHECK(Has(cache, 1 << kHashbits));
 }
 
 int main(int argc, char **argv) {

diff --git a/src/tests/page_heap_test.cc b/src/tests/page_heap_test.cc
index e82a1da..3caacc0 100644
--- a/src/tests/page_heap_test.cc
+++ b/src/tests/page_heap_test.cc

@@ -6,9 +6,13 @@
 // be found in the LICENSE file.
 
 #include "config_for_unittests.h"
+
+#include <stdio.h>
+
+#include <memory>
+
 #include "page_heap.h"
 #include "system-alloc.h"
-#include <stdio.h>
 #include "base/logging.h"
 #include "common.h"
 
@@ -39,33 +43,63 @@
 }
 
 static void TestPageHeap_Stats() {
-  tcmalloc::PageHeap* ph = new tcmalloc::PageHeap();
+  std::unique_ptr<tcmalloc::PageHeap> ph(new tcmalloc::PageHeap());
 
   // Empty page heap
-  CheckStats(ph, 0, 0, 0);
+  CheckStats(ph.get(), 0, 0, 0);
 
   // Allocate a span 's1'
   tcmalloc::Span* s1 = ph->New(256);
-  CheckStats(ph, 256, 0, 0);
+  CheckStats(ph.get(), 256, 0, 0);
 
   // Split span 's1' into 's1', 's2'.  Delete 's2'
   tcmalloc::Span* s2 = ph->Split(s1, 128);
   ph->Delete(s2);
-  CheckStats(ph, 256, 128, 0);
+  CheckStats(ph.get(), 256, 128, 0);
 
   // Unmap deleted span 's2'
   ph->ReleaseAtLeastNPages(1);
-  CheckStats(ph, 256, 0, 128);
+  CheckStats(ph.get(), 256, 0, 128);
 
   // Delete span 's1'
   ph->Delete(s1);
-  CheckStats(ph, 256, 128, 128);
+  CheckStats(ph.get(), 256, 128, 128);
+}
 
-  delete ph;
+// The number of kMaxPages-sized Spans we will allocate and free during the
+// tests.
+// We will also do twice this many kMaxPages/2-sized ones.
+static constexpr int kNumberMaxPagesSpans = 10;
+
+// Allocates all the last-level page tables we will need. Doing this before
+// calculating the base heap usage is necessary, because otherwise if any of
+// these are allocated during the main test it will throw the heap usage
+// calculations off and cause the test to fail.
+static void AllocateAllPageTables() {
+  // Make a separate PageHeap from the main test so the test can start without
+  // any pages in the lists.
+  std::unique_ptr<tcmalloc::PageHeap> ph(new tcmalloc::PageHeap());
+  tcmalloc::Span *spans[kNumberMaxPagesSpans * 2];
+  for (int i = 0; i < kNumberMaxPagesSpans; ++i) {
+    spans[i] = ph->New(kMaxPages);
+    EXPECT_NE(spans[i], NULL);
+  }
+  for (int i = 0; i < kNumberMaxPagesSpans; ++i) {
+    ph->Delete(spans[i]);
+  }
+  for (int i = 0; i < kNumberMaxPagesSpans * 2; ++i) {
+    spans[i] = ph->New(kMaxPages >> 1);
+    EXPECT_NE(spans[i], NULL);
+  }
+  for (int i = 0; i < kNumberMaxPagesSpans * 2; ++i) {
+    ph->Delete(spans[i]);
+  }
 }
 
 static void TestPageHeap_Limit() {
-  tcmalloc::PageHeap* ph = new tcmalloc::PageHeap();
+  AllocateAllPageTables();
+
+  std::unique_ptr<tcmalloc::PageHeap> ph(new tcmalloc::PageHeap());
 
   CHECK_EQ(kMaxPages, 1 << (20 - kPageShift));
 
@@ -77,25 +111,26 @@
     while((s = ph->New(kMaxPages)) == NULL) {
       FLAGS_tcmalloc_heap_limit_mb++;
     }
-    FLAGS_tcmalloc_heap_limit_mb += 9;
+    FLAGS_tcmalloc_heap_limit_mb += kNumberMaxPagesSpans - 1;
     ph->Delete(s);
     // We are [10, 11) mb from the limit now.
   }
 
   // Test AllocLarge and GrowHeap first:
   {
-    tcmalloc::Span * spans[10];
-    for (int i=0; i<10; ++i) {
+    tcmalloc::Span * spans[kNumberMaxPagesSpans];
+    for (int i=0; i<kNumberMaxPagesSpans; ++i) {
       spans[i] = ph->New(kMaxPages);
       EXPECT_NE(spans[i], NULL);
     }
     EXPECT_EQ(ph->New(kMaxPages), NULL);
 
-    for (int i=0; i<10; i += 2) {
+    for (int i=0; i<kNumberMaxPagesSpans; i += 2) {
       ph->Delete(spans[i]);
     }
 
-    tcmalloc::Span *defragmented = ph->New(5 * kMaxPages);
+    tcmalloc::Span *defragmented =
+        ph->New(kNumberMaxPagesSpans / 2 * kMaxPages);
 
     if (HaveSystemRelease) {
       // EnsureLimit should release deleted normal spans
@@ -109,15 +144,15 @@
       EXPECT_TRUE(ph->CheckExpensive());
     }
 
-    for (int i=1; i<10; i += 2) {
+    for (int i=1; i<kNumberMaxPagesSpans; i += 2) {
       ph->Delete(spans[i]);
     }
   }
 
   // Once again, testing small lists this time (twice smaller spans):
   {
-    tcmalloc::Span * spans[20];
-    for (int i=0; i<20; ++i) {
+    tcmalloc::Span * spans[kNumberMaxPagesSpans * 2];
+    for (int i=0; i<kNumberMaxPagesSpans * 2; ++i) {
       spans[i] = ph->New(kMaxPages >> 1);
       EXPECT_NE(spans[i], NULL);
     }
@@ -125,12 +160,12 @@
     tcmalloc::Span * lastHalf = ph->New(kMaxPages >> 1);
     EXPECT_EQ(ph->New(kMaxPages >> 1), NULL);
 
-    for (int i=0; i<20; i += 2) {
+    for (int i=0; i<kNumberMaxPagesSpans * 2; i += 2) {
       ph->Delete(spans[i]);
     }
 
-    for(Length len = kMaxPages >> 2; len < 5 * kMaxPages; len = len << 1)
-    {
+    for (Length len = kMaxPages >> 2;
+         len < kNumberMaxPagesSpans / 2 * kMaxPages; len = len << 1) {
       if(len <= kMaxPages >> 1 || HaveSystemRelease) {
         tcmalloc::Span *s = ph->New(len);
         EXPECT_NE(s, NULL);
@@ -140,7 +175,7 @@
 
     EXPECT_TRUE(ph->CheckExpensive());
 
-    for (int i=1; i<20; i += 2) {
+    for (int i=1; i<kNumberMaxPagesSpans * 2; i += 2) {
       ph->Delete(spans[i]);
     }
 
@@ -148,8 +183,6 @@
       ph->Delete(lastHalf);
     }
   }
-
-  delete ph;
 }
 
 }  // namespace

diff --git a/src/tests/pagemap_unittest.cc b/src/tests/pagemap_unittest.cc
index 88d46e7..71a94dc 100644
--- a/src/tests/pagemap_unittest.cc
+++ b/src/tests/pagemap_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2003, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/profile-handler_unittest.cc b/src/tests/profile-handler_unittest.cc
index 2984d0d..a8afbca 100644
--- a/src/tests/profile-handler_unittest.cc
+++ b/src/tests/profile-handler_unittest.cc

@@ -8,13 +8,6 @@
 //
 //
 // This file contains the unit tests for profile-handler.h interface.
-//
-// It is linked into three separate unit tests:
-//     profile-handler_unittest tests basic functionality
-//     profile-handler_disable_test tests that the profiler
-//         is disabled with --install_signal_handlers=false
-//     profile-handler_conflict_test tests that the profiler
-//         is disabled when a SIGPROF handler is registered before InitGoogle.
 
 #include "config.h"
 #include "profile-handler.h"
@@ -33,12 +26,6 @@
 DEFINE_bool(test_profiler_enabled, true,
             "expect profiler to be enabled during tests");
 
-// Should we look at the kernel signal handler settings during the test?
-// Not if we're in conflict_test, because we can't distinguish its nop
-// handler from the real one.
-DEFINE_bool(test_profiler_signal_handler, true,
-            "check profiler signal handler during tests");
-
 namespace {
 
 // TODO(csilvers): error-checking on the pthreads routines
@@ -81,11 +68,8 @@
 // reset.
 int kTimerResetInterval = 5000000;
 
-// Whether each thread has separate timers.
 static bool linux_per_thread_timers_mode_ = false;
-static bool timer_separate_ = false;
 static int timer_type_ = ITIMER_PROF;
-static int signal_number_ = SIGPROF;
 
 // Delays processing by the specified number of nano seconds. 'delay_ns'
 // must be less than the number of nano seconds in a second (1000000000).
@@ -110,51 +94,6 @@
           current_timer.it_value.tv_usec != 0);
 }
 
-class VirtualTimerGetterThread : public Thread {
- public:
-  VirtualTimerGetterThread() {
-    memset(&virtual_timer_, 0, sizeof virtual_timer_);
-  }
-  struct itimerval virtual_timer_;
-
- private:
-  void Run() {
-    CHECK_EQ(0, getitimer(ITIMER_VIRTUAL, &virtual_timer_));
-  }
-};
-
-// This function checks whether the timers are shared between thread. This
-// function spawns a thread, so use it carefully when testing thread-dependent
-// behaviour.
-static bool threads_have_separate_timers() {
-  struct itimerval new_timer_val;
-
-  // Enable the virtual timer in the current thread.
-  memset(&new_timer_val, 0, sizeof new_timer_val);
-  new_timer_val.it_value.tv_sec = 1000000;  // seconds
-  CHECK_EQ(0, setitimer(ITIMER_VIRTUAL, &new_timer_val, NULL));
-
-  // Spawn a thread, get the virtual timer's value there.
-  VirtualTimerGetterThread thread;
-  thread.SetJoinable(true);
-  thread.Start();
-  thread.Join();
-
-  // Disable timer here.
-  memset(&new_timer_val, 0, sizeof new_timer_val);
-  CHECK_EQ(0, setitimer(ITIMER_VIRTUAL, &new_timer_val, NULL));
-
-  bool target_timer_enabled = (thread.virtual_timer_.it_value.tv_sec != 0 ||
-                               thread.virtual_timer_.it_value.tv_usec != 0);
-  if (!target_timer_enabled) {
-    LOG(INFO, "threads have separate timers");
-    return true;
-  } else {
-    LOG(INFO, "threads have shared timers");
-    return false;
-  }
-}
-
 // Dummy worker thread to accumulate cpu time.
 class BusyThread : public Thread {
  public:
@@ -181,16 +120,12 @@
   void Run() {
     while (!stop_work()) {
     }
-    // If timers are separate, check that timer is enabled for this thread.
-    EXPECT_TRUE(linux_per_thread_timers_mode_ || !timer_separate_ || IsTimerEnabled());
   }
 };
 
 class NullThread : public Thread {
  private:
   void Run() {
-    // If timers are separate, check that timer is enabled for this thread.
-    EXPECT_TRUE(linux_per_thread_timers_mode_ || !timer_separate_ || IsTimerEnabled());
   }
 };
 
@@ -205,45 +140,34 @@
 class ProfileHandlerTest {
  protected:
 
-  // Determines whether threads have separate timers.
+  // Determines the timer type.
   static void SetUpTestCase() {
     timer_type_ = (getenv("CPUPROFILE_REALTIME") ? ITIMER_REAL : ITIMER_PROF);
-    signal_number_ = (getenv("CPUPROFILE_REALTIME") ? SIGALRM : SIGPROF);
 
-    timer_separate_ = threads_have_separate_timers();
 #if HAVE_LINUX_SIGEV_THREAD_ID
     linux_per_thread_timers_mode_ = (getenv("CPUPROFILE_PER_THREAD_TIMERS") != NULL);
     const char *signal_number = getenv("CPUPROFILE_TIMER_SIGNAL");
     if (signal_number) {
-      signal_number_ = strtol(signal_number, NULL, 0);
+      //signal_number_ = strtol(signal_number, NULL, 0);
       linux_per_thread_timers_mode_ = true;
+      Delay(kTimerResetInterval);
     }
 #endif
-    Delay(kTimerResetInterval);
   }
 
   // Sets up the profile timers and SIGPROF/SIGALRM handler in a known state.
   // It does the following:
-  // 1. Unregisters all the callbacks, stops the timer (if shared) and
-  //    clears out timer_sharing state in the ProfileHandler. This clears
-  //    out any state left behind by the previous test or during module
-  //    initialization when the test program was started.
-  // 2. Spawns two threads which will be registered with the ProfileHandler.
-  //    At this time ProfileHandler knows if the timers are shared.
+  // 1. Unregisters all the callbacks, stops the timer and clears out
+  //    timer_sharing state in the ProfileHandler. This clears out any state
+  //    left behind by the previous test or during module initialization when
+  //    the test program was started.
   // 3. Starts a busy worker thread to accumulate CPU usage.
   virtual void SetUp() {
     // Reset the state of ProfileHandler between each test. This unregisters
-    // all callbacks, stops timer (if shared) and clears timer sharing state.
+    // all callbacks and stops the timer.
     ProfileHandlerReset();
     EXPECT_EQ(0, GetCallbackCount());
     VerifyDisabled();
-    // ProfileHandler requires at least two threads to be registerd to determine
-    // whether timers are shared.
-    RegisterThread();
-    RegisterThread();
-    // Now that two threads are started, verify that the signal handler is
-    // disabled and the timers are correctly enabled/disabled.
-    VerifyDisabled();
     // Start worker to accumulate cpu usage.
     StartWorker();
   }
@@ -254,15 +178,6 @@
     StopWorker();
   }
 
-  // Starts a no-op thread that gets registered with the ProfileHandler. Waits
-  // for the thread to stop.
-  void RegisterThread() {
-    NullThread t;
-    t.SetJoinable(true);
-    t.Start();
-    t.Join();
-  }
-
   // Starts a busy worker thread to accumulate cpu time. There should be only
   // one busy worker running. This is required for the case where there are
   // separate timers for each thread.
@@ -282,14 +197,6 @@
     delete busy_worker_;
   }
 
-  // Checks whether SIGPROF/SIGALRM signal handler is enabled.
-  bool IsSignalEnabled() {
-    struct sigaction sa;
-    CHECK_EQ(sigaction(signal_number_, NULL, &sa), 0);
-    return ((sa.sa_handler == SIG_IGN) || (sa.sa_handler == SIG_DFL)) ?
-        false : true;
-  }
-
   // Gets the number of callbacks registered with the ProfileHandler.
   uint32 GetCallbackCount() {
     ProfileHandlerState state;
@@ -311,10 +218,6 @@
     EXPECT_GT(GetCallbackCount(), 0);
     // Check that the profile timer is enabled.
     EXPECT_EQ(FLAGS_test_profiler_enabled, linux_per_thread_timers_mode_ || IsTimerEnabled());
-    // Check that the signal handler is enabled.
-    if (FLAGS_test_profiler_signal_handler) {
-      EXPECT_EQ(FLAGS_test_profiler_enabled, IsSignalEnabled());
-    }
     uint64 interrupts_before = GetInterruptCount();
     // Sleep for a bit and check that tick counter is making progress.
     int old_tick_count = tick_counter;
@@ -337,38 +240,18 @@
     Delay(kSleepInterval);
     int new_tick_count = tick_counter;
     EXPECT_EQ(old_tick_count, new_tick_count);
-    // If no callbacks, signal handler and shared timer should be disabled.
+    // If no callbacks, timer should be disabled.
     if (GetCallbackCount() == 0) {
-      if (FLAGS_test_profiler_signal_handler) {
-        EXPECT_FALSE(IsSignalEnabled());
-      }
-      if (!linux_per_thread_timers_mode_) {
-        if (timer_separate_) {
-          EXPECT_TRUE(IsTimerEnabled());
-        } else {
-          EXPECT_FALSE(IsTimerEnabled());
-        }
-      }
+      EXPECT_FALSE(IsTimerEnabled());
     }
   }
 
-  // Verifies that the SIGPROF/SIGALRM interrupt handler is disabled and the
-  // timer, if shared, is disabled. Expects the worker to be running.
+  // Verifies that the timer is disabled. Expects the worker to be running.
   void VerifyDisabled() {
-    // Check that the signal handler is disabled.
-    if (FLAGS_test_profiler_signal_handler) {
-      EXPECT_FALSE(IsSignalEnabled());
-    }
     // Check that the callback count is 0.
     EXPECT_EQ(0, GetCallbackCount());
-    // Check that the timer is disabled if shared, enabled otherwise.
-    if (!linux_per_thread_timers_mode_) {
-      if (timer_separate_) {
-        EXPECT_TRUE(IsTimerEnabled());
-      } else {
-        EXPECT_FALSE(IsTimerEnabled());
-      }
-    }
+    // Check that the timer is disabled.
+    EXPECT_FALSE(IsTimerEnabled());
     // Verify that the ProfileHandler is not accumulating profile ticks.
     uint64 interrupts_before = GetInterruptCount();
     Delay(kSleepInterval);
@@ -435,14 +318,14 @@
 // Verifies that multiple callbacks can be registered.
 TEST_F(ProfileHandlerTest, MultipleCallbacks) {
   // Register first callback.
-  int first_tick_count;
+  int first_tick_count = 0;
   ProfileHandlerToken* token1 = RegisterCallback(&first_tick_count);
   // Check that callback was registered correctly.
   VerifyRegistration(first_tick_count);
   EXPECT_EQ(1, GetCallbackCount());
 
   // Register second callback.
-  int second_tick_count;
+  int second_tick_count = 0;
   ProfileHandlerToken* token2 = RegisterCallback(&second_tick_count);
   // Check that callback was registered correctly.
   VerifyRegistration(second_tick_count);
@@ -460,31 +343,31 @@
   VerifyUnregistration(second_tick_count);
   EXPECT_EQ(0, GetCallbackCount());
 
-  // Verify that the signal handler and timers are correctly disabled.
-  VerifyDisabled();
+  // Verify that the timers is correctly disabled.
+  if (!linux_per_thread_timers_mode_) VerifyDisabled();
 }
 
 // Verifies ProfileHandlerReset
 TEST_F(ProfileHandlerTest, Reset) {
   // Verify that the profile timer interrupt is disabled.
-  VerifyDisabled();
-  int first_tick_count;
+  if (!linux_per_thread_timers_mode_) VerifyDisabled();
+  int first_tick_count = 0;
   RegisterCallback(&first_tick_count);
   VerifyRegistration(first_tick_count);
   EXPECT_EQ(1, GetCallbackCount());
 
   // Register second callback.
-  int second_tick_count;
+  int second_tick_count = 0;
   RegisterCallback(&second_tick_count);
   VerifyRegistration(second_tick_count);
   EXPECT_EQ(2, GetCallbackCount());
 
   // Reset the profile handler and verify that callback were correctly
-  // unregistered and timer/signal are disabled.
+  // unregistered and the timer is disabled.
   ProfileHandlerReset();
   VerifyUnregistration(first_tick_count);
   VerifyUnregistration(second_tick_count);
-  VerifyDisabled();
+  if (!linux_per_thread_timers_mode_) VerifyDisabled();
 }
 
 // Verifies that ProfileHandler correctly handles a case where a callback was
@@ -492,30 +375,20 @@
 TEST_F(ProfileHandlerTest, RegisterCallbackBeforeThread) {
   // Stop the worker.
   StopWorker();
-  // Unregister all existing callbacks, stop the timer (if shared), disable
-  // the signal handler and reset the timer sharing state in the Profile
-  // Handler.
+  // Unregister all existing callbacks and stop the timer.
   ProfileHandlerReset();
   EXPECT_EQ(0, GetCallbackCount());
   VerifyDisabled();
 
-  // Start the worker. At this time ProfileHandler doesn't know if timers are
-  // shared as only one thread has registered so far.
+  // Start the worker.
   StartWorker();
-  // Register a callback and check that profile ticks are being delivered.
-  int tick_count;
+  // Register a callback and check that profile ticks are being delivered and
+  // the timer is enabled.
+  int tick_count = 0;
   RegisterCallback(&tick_count);
   EXPECT_EQ(1, GetCallbackCount());
   VerifyRegistration(tick_count);
-
-  // Register a second thread and verify that timer and signal handler are
-  // correctly enabled.
-  RegisterThread();
-  EXPECT_EQ(1, GetCallbackCount());
   EXPECT_EQ(FLAGS_test_profiler_enabled, linux_per_thread_timers_mode_ || IsTimerEnabled());
-  if (FLAGS_test_profiler_signal_handler) {
-    EXPECT_EQ(FLAGS_test_profiler_enabled, IsSignalEnabled());
-  }
 }
 
 }  // namespace

diff --git a/src/tests/profiledata_unittest.cc b/src/tests/profiledata_unittest.cc
index 972c1b0..3286b9c 100644
--- a/src/tests/profiledata_unittest.cc
+++ b/src/tests/profiledata_unittest.cc

@@ -366,6 +366,7 @@
     RUN(CollectTwoMatching);
     RUN(CollectTwoFlush);
     RUN(StartResetRestart);
+    RUN(StartStopNoOptionsEmpty);
     return 0;
   }
 };

diff --git a/src/tests/profiler_unittest.cc b/src/tests/profiler_unittest.cc
index 321f848..4c814c0 100644
--- a/src/tests/profiler_unittest.cc
+++ b/src/tests/profiler_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -46,7 +46,7 @@
 #include "base/simple_mutex.h"
 #include "tests/testutil.h"
 
-static int result = 0;
+static volatile int result = 0;
 static int g_iters = 0;   // argv[1]
 
 Mutex mutex(Mutex::LINKER_INITIALIZED);

diff --git a/src/tests/realloc_unittest.cc b/src/tests/realloc_unittest.cc
index e3d7b59..a4ea17c 100644
--- a/src/tests/realloc_unittest.cc
+++ b/src/tests/realloc_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2004, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/sampler_test.cc b/src/tests/sampler_test.cc
old mode 100755
new mode 100644
index cd64b0f..4095d6a
--- a/src/tests/sampler_test.cc
+++ b/src/tests/sampler_test.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -48,7 +48,7 @@
 #include <algorithm>
 #include <vector>
 #include <string>
-#include <cmath>
+#include <math.h>
 #include "base/logging.h"
 #include "base/commandlineflags.h"
 #include "sampler.h"       // The Sampler class being tested
@@ -325,28 +325,6 @@
 }
 
 
-// Test for Fastlog2 code
-// We care about the percentage error because we're using this
-// for choosing step sizes, so "close" is relative to the size of
-// the step we would get if we used the built-in log function
-TEST(Sampler, FastLog2) {
-  tcmalloc::Sampler sampler;
-  sampler.Init(1);
-  double max_ratio_error = 0;
-  for (double d = -1021.9; d < 1; d+= 0.13124235) {
-    double e = pow(2.0, d);
-    double truelog = log(e) / log(2.0);  // log_2(e)
-    double fastlog = sampler.FastLog2(e);
-    max_ratio_error = max(max_ratio_error,
-                          max(truelog/fastlog-1, fastlog/truelog-1));
-    CHECK_LE(max_ratio_error, 0.01);
-        //        << StringPrintf("d = %f, e=%f, truelog = %f, fastlog= %f\n",
-        //                        d, e, truelog, fastlog);
-  }
-  LOG(INFO) << StringPrintf("Fastlog2: max_ratio_error = %f\n",
-                            max_ratio_error);
-}
-
 // Futher tests
 
 bool CheckMean(size_t mean, int num_samples) {
@@ -392,11 +370,11 @@
   int num_iters = 128*4*8;
   // Allocate in mixed chunks
   for (int i = 0; i < num_iters; i++) {
-    if (sampler.SampleAllocation(size_big)) {
+    if (!sampler.RecordAllocation(size_big)) {
       counter_big += 1;
     }
     for (int i = 0; i < 129; i++) {
-      if (sampler.SampleAllocation(size_small)) {
+      if (!sampler.RecordAllocation(size_small)) {
         counter_small += 1;
       }
     }
@@ -540,12 +518,9 @@
     uint64_t largest_prng_value = (static_cast<uint64_t>(1)<<48) - 1;
     double q = (largest_prng_value >> (prng_mod_power - 26)) + 1.0;
     LOG(INFO) << StringPrintf("q = %f\n", q);
-    LOG(INFO) << StringPrintf("FastLog2(q) = %f\n", sampler.FastLog2(q));
     LOG(INFO) << StringPrintf("log2(q) = %f\n", log(q)/log(2.0));
-    // Replace min(sampler.FastLog2(q) - 26, 0.0) with
-    // (sampler.FastLog2(q) - 26.000705) when using that optimization
     uint64_t smallest_sample_step
-        = static_cast<uint64_t>(min(sampler.FastLog2(q) - 26, 0.0)
+        = static_cast<uint64_t>(min(log2(q) - 26, 0.0)
                                 * sample_scaling + 1);
     LOG(INFO) << "Smallest sample step is " << smallest_sample_step;
     uint64_t cutoff = static_cast<uint64_t>(10)
@@ -558,10 +533,8 @@
     uint64_t smallest_prng_value = 0;
     q = (smallest_prng_value >> (prng_mod_power - 26)) + 1.0;
     LOG(INFO) << StringPrintf("q = %f\n", q);
-    // Replace min(sampler.FastLog2(q) - 26, 0.0) with
-    // (sampler.FastLog2(q) - 26.000705) when using that optimization
     uint64_t largest_sample_step
-        = static_cast<uint64_t>(min(sampler.FastLog2(q) - 26, 0.0)
+        = static_cast<uint64_t>(min(log2(q) - 26, 0.0)
                                 * sample_scaling + 1);
     LOG(INFO) << "Largest sample step is " << largest_sample_step;
     CHECK_LE(largest_sample_step, one<<63);
@@ -604,7 +577,7 @@
     CHECK_GE(q, 0); // << rnd << "  " << prng_mod_power;
   }
   // Test some potentially out of bounds value for rnd
-  for (int i = 1; i <= 66; i++) {
+  for (int i = 1; i <= 63; i++) {
     rnd = one << i;
     double q = (rnd >> (prng_mod_power - 26)) + 1.0;
     LOG(INFO) << "rnd = " << rnd << " i=" << i << " q=" << q;

diff --git a/src/tests/simple_compat_test.cc b/src/tests/simple_compat_test.cc
index 5dbfd7a..24583a0 100644
--- a/src/tests/simple_compat_test.cc
+++ b/src/tests/simple_compat_test.cc

@@ -38,6 +38,9 @@
 
 #include <stddef.h>
 #include <stdio.h>
+
+#define GPERFTOOLS_SUPPRESS_LEGACY_WARNING
+
 #include <google/heap-checker.h>
 #include <google/heap-profiler.h>
 #include <google/malloc_extension.h>

diff --git a/src/tests/stack_trace_table_test.cc b/src/tests/stack_trace_table_test.cc
index 3cacd2d..393ebbe 100644
--- a/src/tests/stack_trace_table_test.cc
+++ b/src/tests/stack_trace_table_test.cc

@@ -70,18 +70,10 @@
   AddTrace(&table, t2);
   CHECK_EQ(table.depth_total(), 4);
   CHECK_EQ(table.bucket_total(), 2);
-  static const uintptr_t k3[] = {1, 1024, 2, 1, 2, 1,  512, 2, 2, 1, 0};
+  static const uintptr_t k3[] = {1, 512, 2, 2, 1, 1, 1024, 2, 1, 2, 0};
   CheckTracesAndReset(&table, k3, ARRAYSIZE(k3));
 
-  // Table w/ 2 x t1, 1 x t2
-  AddTrace(&table, t1);
-  AddTrace(&table, t2);
-  AddTrace(&table, t1);
-  CHECK_EQ(table.depth_total(), 4);
-  CHECK_EQ(table.bucket_total(), 2);
-  static const uintptr_t k4[] = {2, 2048, 2, 1, 2, 1,  512, 2, 2, 1, 0};
-  CheckTracesAndReset(&table, k4, ARRAYSIZE(k4));
-
+  // Table w/ t1, t3
   // Same stack as t1, but w/ different size
   tcmalloc::StackTrace t3;
   t3.size = static_cast<uintptr_t>(2);
@@ -89,12 +81,11 @@
   t3.stack[0] = reinterpret_cast<void*>(1);
   t3.stack[1] = reinterpret_cast<void*>(2);
 
-  // Table w/ t1, t3
   AddTrace(&table, t1);
   AddTrace(&table, t3);
-  CHECK_EQ(table.depth_total(), 2);
-  CHECK_EQ(table.bucket_total(), 1);
-  static const uintptr_t k5[] = {2, 1026, 2, 1, 2, 0};
+  CHECK_EQ(table.depth_total(), 4);
+  CHECK_EQ(table.bucket_total(), 2);
+  static const uintptr_t k5[] = {1, 2, 2, 1, 2, 1, 1024, 2, 1, 2, 0};
   CheckTracesAndReset(&table, k5, ARRAYSIZE(k5));
 
   puts("PASS");

diff --git a/src/tests/stacktrace_unittest.cc b/src/tests/stacktrace_unittest.cc
index 3c9f735..e55a632 100644
--- a/src/tests/stacktrace_unittest.cc
+++ b/src/tests/stacktrace_unittest.cc

@@ -1,10 +1,11 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -14,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -33,9 +34,18 @@
 #endif
 #include <stdio.h>
 #include <stdlib.h>
+
+// On those architectures we can and should test if backtracing with
+// ucontext and from signal handler works
+#if __GNUC__ && __linux__ && (__x86_64__ || __aarch64__ || __riscv)
+#include <signal.h>
+#define TEST_UCONTEXT_BITS 1
+#endif
+
 #include "base/commandlineflags.h"
 #include "base/logging.h"
 #include <gperftools/stacktrace.h>
+#include "tests/testutil.h"
 
 namespace {
 
@@ -96,6 +106,7 @@
 #define ADJUST_ADDRESS_RANGE_FROM_RA(prange) do { } while (0)
 #endif  // __GNUC__
 
+
 //-----------------------------------------------------------------------//
 
 void CheckRetAddrIsInFunction(void *ret_addr, const AddressRange &range)
@@ -106,24 +117,105 @@
 
 //-----------------------------------------------------------------------//
 
+#if TEST_UCONTEXT_BITS
+
+struct get_stack_trace_args {
+	int *size_ptr;
+	void **result;
+	int max_depth;
+	uintptr_t where;
+} gst_args;
+
+static
+void SignalHandler(int dummy, siginfo_t *si, void* ucv) {
+	auto uc = static_cast<ucontext_t*>(ucv);
+
+#ifdef __riscv
+	uc->uc_mcontext.__gregs[REG_PC] = gst_args.where;
+#elif __aarch64__
+	uc->uc_mcontext.pc = gst_args.where;
+#else
+	uc->uc_mcontext.gregs[REG_RIP] = gst_args.where;
+#endif
+
+	*gst_args.size_ptr = GetStackTraceWithContext(
+		gst_args.result,
+		gst_args.max_depth,
+		2,
+		uc);
+}
+
+int ATTRIBUTE_NOINLINE CaptureLeafUContext(void **stack, int stack_len) {
+  INIT_ADDRESS_RANGE(CheckStackTraceLeaf, start, end, &expected_range[0]);
+  DECLARE_ADDRESS_LABEL(start);
+
+  int size;
+
+  printf("Capturing stack trace from signal's ucontext\n");
+  struct sigaction sa;
+  memset(&sa, 0, sizeof(sa));
+  sa.sa_sigaction = SignalHandler;
+  sa.sa_flags = SA_SIGINFO | SA_RESETHAND;
+  int rv = sigaction(SIGSEGV, &sa, nullptr);
+  CHECK(rv == 0);
+
+  gst_args.size_ptr = &size;
+  gst_args.result = stack;
+  gst_args.max_depth = stack_len;
+  gst_args.where = reinterpret_cast<uintptr_t>(noopt(&&after));
+
+  // now, "write" to null pointer and trigger sigsegv to run signal
+  // handler. It'll then change PC to after, as if we jumped one line
+  // below.
+  *noopt(reinterpret_cast<void**>(0)) = 0;
+  // this is not reached, but gcc gets really odd if we don't actually
+  // use computed goto.
+  static void* jump_target = &&after;
+  goto *noopt(&jump_target);
+
+after:
+  printf("Obtained %d stack frames.\n", size);
+  CHECK_GE(size, 1);
+  CHECK_LE(size, stack_len);
+
+  DECLARE_ADDRESS_LABEL(end);
+
+  return size;
+}
+
+#endif  // TEST_UCONTEXT_BITS
+
+int ATTRIBUTE_NOINLINE CaptureLeafPlain(void **stack, int stack_len) {
+  INIT_ADDRESS_RANGE(CheckStackTraceLeaf, start, end, &expected_range[0]);
+  DECLARE_ADDRESS_LABEL(start);
+
+  int size = GetStackTrace(stack, stack_len, 0);
+
+  printf("Obtained %d stack frames.\n", size);
+  CHECK_GE(size, 1);
+  CHECK_LE(size, stack_len);
+
+  DECLARE_ADDRESS_LABEL(end);
+
+  return size;
+}
+
 void ATTRIBUTE_NOINLINE CheckStackTrace(int);
-void ATTRIBUTE_NOINLINE CheckStackTraceLeaf(void) {
-  const int STACK_LEN = 10;
+
+int (*leaf_capture_fn)(void**, int) = CaptureLeafPlain;
+
+void ATTRIBUTE_NOINLINE CheckStackTraceLeaf(int i) {
+  const int STACK_LEN = 20;
   void *stack[STACK_LEN];
   int size;
 
   ADJUST_ADDRESS_RANGE_FROM_RA(&expected_range[1]);
-  INIT_ADDRESS_RANGE(CheckStackTraceLeaf, start, end, &expected_range[0]);
-  DECLARE_ADDRESS_LABEL(start);
-  size = GetStackTrace(stack, STACK_LEN, 0);
-  printf("Obtained %d stack frames.\n", size);
-  CHECK_GE(size, 1);
-  CHECK_LE(size, STACK_LEN);
+
+  size = leaf_capture_fn(stack, STACK_LEN);
 
 #ifdef HAVE_EXECINFO_H
   {
     char **strings = backtrace_symbols(stack, size);
-    printf("Obtained %d stack frames.\n", size);
     for (int i = 0; i < size; i++)
       printf("%s %p\n", strings[i], stack[i]);
     printf("CheckStackTrace() addr: %p\n", &CheckStackTrace);
@@ -131,14 +223,18 @@
   }
 #endif
 
-  for (int i = 0; i < BACKTRACE_STEPS; i++) {
+  for (int i = 0, j = 0; i < BACKTRACE_STEPS; i++, j++) {
+    if (i == 1 && j == 1) {
+      // this is expected to be our function for which we don't
+      // establish bounds. So skip.
+      j++;
+    }
     printf("Backtrace %d: expected: %p..%p  actual: %p ... ",
-           i, expected_range[i].start, expected_range[i].end, stack[i]);
+           i, expected_range[i].start, expected_range[i].end, stack[j]);
     fflush(stdout);
-    CheckRetAddrIsInFunction(stack[i], expected_range[i]);
+    CheckRetAddrIsInFunction(stack[j], expected_range[i]);
     printf("OK\n");
   }
-  DECLARE_ADDRESS_LABEL(end);
 }
 
 //-----------------------------------------------------------------------//
@@ -149,7 +245,7 @@
   INIT_ADDRESS_RANGE(CheckStackTrace4, start, end, &expected_range[1]);
   DECLARE_ADDRESS_LABEL(start);
   for (int j = i; j >= 0; j--)
-    CheckStackTraceLeaf();
+    CheckStackTraceLeaf(j);
   DECLARE_ADDRESS_LABEL(end);
 }
 void ATTRIBUTE_NOINLINE CheckStackTrace3(int i) {
@@ -179,8 +275,9 @@
 void ATTRIBUTE_NOINLINE CheckStackTrace(int i) {
   INIT_ADDRESS_RANGE(CheckStackTrace, start, end, &expected_range[5]);
   DECLARE_ADDRESS_LABEL(start);
-  for (int j = i; j >= 0; j--)
+  for (int j = i; j >= 0; j--) {
     CheckStackTrace1(j);
+  }
   DECLARE_ADDRESS_LABEL(end);
 }
 
@@ -190,5 +287,12 @@
 int main(int argc, char ** argv) {
   CheckStackTrace(0);
   printf("PASS\n");
+
+#if TEST_UCONTEXT_BITS
+  leaf_capture_fn = CaptureLeafUContext;
+  CheckStackTrace(0);
+  printf("PASS\n");
+#endif  // TEST_UCONTEXT_BITS
+
   return 0;
 }

diff --git a/src/tests/system-alloc_unittest.cc b/src/tests/system-alloc_unittest.cc
index 4a5f7c0..fd199e2 100644
--- a/src/tests/system-alloc_unittest.cc
+++ b/src/tests/system-alloc_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -32,7 +32,9 @@
 // Author: Arun Sharma
 
 #include "config_for_unittests.h"
+
 #include "system-alloc.h"
+
 #include <stdio.h>
 #if defined HAVE_STDINT_H
 #include <stdint.h>             // to get uintptr_t
@@ -40,11 +42,15 @@
 #include <inttypes.h>           // another place uintptr_t might be defined
 #endif
 #include <sys/types.h>
+
 #include <algorithm>
 #include <limits>
-#include "base/logging.h"               // for Check_GEImpl, Check_LTImpl, etc
-#include <gperftools/malloc_extension.h>    // for MallocExtension::instance
-#include "common.h"                     // for kAddressBits
+
+#include "base/logging.h"                // for Check_GEImpl, Check_LTImpl, etc
+#include "common.h"                      // for kAddressBits
+#include "gperftools/malloc_extension.h" // for MallocExtension::instance
+#include "gperftools/tcmalloc.h"
+#include "tests/testutil.h"
 
 class ArraySysAllocator : public SysAllocator {
 public:
@@ -101,7 +107,7 @@
 
   // An allocation size that is likely to trigger the system allocator.
   // XXX: this is implementation specific.
-  char *p = new char[1024 * 1024];
+  char *p =  noopt(new char[1024 * 1024]);
   delete [] p;
 
   // Make sure that our allocator was invoked.
@@ -136,12 +142,12 @@
   // disable this test.
   // The weird parens are to avoid macro-expansion of 'max' on windows.
   const size_t kHugeSize = (std::numeric_limits<size_t>::max)() / 2;
-  void* p1 = malloc(kHugeSize);
-  void* p2 = malloc(kHugeSize);
+  void* p1 = noopt(malloc(kHugeSize));
+  void* p2 = noopt(malloc(kHugeSize));
   CHECK(p2 == NULL);
   if (p1 != NULL) free(p1);
 
-  char* q = new char[1024];
+  char* q = noopt(new char[1024]);
   CHECK(q != NULL);
   delete [] q;
 }

diff --git a/src/tests/tcmalloc_large_unittest.cc b/src/tests/tcmalloc_large_unittest.cc
index ff22007..02b8569 100644
--- a/src/tests/tcmalloc_large_unittest.cc
+++ b/src/tests/tcmalloc_large_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2005, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -42,19 +42,21 @@
 #include <set>                          // for set, etc
 
 #include "base/logging.h"               // for operator<<, CHECK, etc
+#include "gperftools/tcmalloc.h"
+#include "tests/testutil.h"
 
 using std::set;
 
 // Alloc a size that should always fail.
 
 void TryAllocExpectFail(size_t size) {
-  void* p1 = malloc(size);
+  void* p1 = noopt(malloc(size));
   CHECK(p1 == NULL);
 
-  void* p2 = malloc(1);
+  void* p2 = noopt(malloc(1));
   CHECK(p2 != NULL);
 
-  void* p3 = realloc(p2, size);
+  void* p3 = noopt(realloc(p2, size));
   CHECK(p3 == NULL);
 
   free(p2);
@@ -64,24 +66,23 @@
 // If it does work, touch some pages.
 
 void TryAllocMightFail(size_t size) {
-  unsigned char* p = static_cast<unsigned char*>(malloc(size));
-  if ( p != NULL ) {
-    unsigned char volatile* vp = p;  // prevent optimizations
+  unsigned char* p = static_cast<unsigned char*>(noopt(malloc(size)));
+  if (p != NULL) {
     static const size_t kPoints = 1024;
 
     for ( size_t i = 0; i < kPoints; ++i ) {
-      vp[i * (size / kPoints)] = static_cast<unsigned char>(i);
+      p[i * (size / kPoints)] = static_cast<unsigned char>(i);
     }
 
     for ( size_t i = 0; i < kPoints; ++i ) {
-      CHECK(vp[i * (size / kPoints)] == static_cast<unsigned char>(i));
+      CHECK(p[i * (size / kPoints)] == static_cast<unsigned char>(i));
     }
 
-    vp[size-1] = 'M';
-    CHECK(vp[size-1] == 'M');
+    p[size-1] = 'M';
+    CHECK(p[size-1] == 'M');
   }
 
-  free(p);
+  free(noopt(p));
 }
 
 int main (int argc, char** argv) {
@@ -103,7 +104,7 @@
 
   // Grab some memory so that some later allocations are guaranteed to fail.
   printf("Test small malloc\n");
-  void* p_small = malloc(4*1048576);
+  void* p_small = noopt(malloc(4*1048576));
   CHECK(p_small != NULL);
 
   // Test sizes up near the maximum size_t.

diff --git a/src/tests/tcmalloc_unittest.cc b/src/tests/tcmalloc_unittest.cc
index 69698bc..658772f 100644
--- a/src/tests/tcmalloc_unittest.cc
+++ b/src/tests/tcmalloc_unittest.cc

@@ -69,7 +69,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <stdio.h>
-#if defined HAVE_STDINT_H
+#ifdef HAVE_STDINT_H
 #include <stdint.h>        // for intptr_t
 #endif
 #include <sys/types.h>     // for size_t
@@ -91,6 +91,7 @@
 #include "base/simple_mutex.h"
 #include "gperftools/malloc_hook.h"
 #include "gperftools/malloc_extension.h"
+#include "gperftools/nallocx.h"
 #include "gperftools/tcmalloc.h"
 #include "thread_cache.h"
 #include "system-alloc.h"
@@ -135,14 +136,35 @@
 #else
 static bool kOSSupportsMemalign = true;
 static inline void* Memalign(size_t align, size_t size) {
-  return memalign(align, size);
+  return noopt(memalign(align, noopt(size)));
 }
 static inline int PosixMemalign(void** ptr, size_t align, size_t size) {
-  return posix_memalign(ptr, align, size);
+  return noopt(posix_memalign(ptr, align, noopt(size)));
 }
 
 #endif
 
+#if defined(ENABLE_ALIGNED_NEW_DELETE)
+
+#define OVERALIGNMENT 64
+
+struct overaligned_type
+{
+#if defined(__GNUC__)
+  __attribute__((__aligned__(OVERALIGNMENT)))
+#elif defined(_MSC_VER)
+  __declspec(align(OVERALIGNMENT))
+#else
+  alignas(OVERALIGNMENT)
+#endif
+  unsigned char data[OVERALIGNMENT * 2]; // make the object size different from
+                                         // alignment to make sure the correct
+                                         // values are passed to the new/delete
+                                         // implementation functions
+};
+
+#endif // defined(ENABLE_ALIGNED_NEW_DELETE)
+
 // On systems (like freebsd) that don't define MAP_ANONYMOUS, use the old
 // form of the name instead.
 #ifndef MAP_ANONYMOUS
@@ -158,6 +180,37 @@
 DECLARE_int32(max_free_queue_size);     // in debugallocation.cc
 DECLARE_int64(tcmalloc_sample_parameter);
 
+struct OOMAbleSysAlloc : public SysAllocator {
+  SysAllocator *child;
+  int simulate_oom;
+
+  void* Alloc(size_t size, size_t* actual_size, size_t alignment) {
+    if (simulate_oom) {
+      return NULL;
+    }
+    return child->Alloc(size, actual_size, alignment);
+  }
+};
+
+static union {
+  char buf[sizeof(OOMAbleSysAlloc)];
+  void *ptr;
+} test_sys_alloc_space;
+
+static OOMAbleSysAlloc* get_test_sys_alloc() {
+  return reinterpret_cast<OOMAbleSysAlloc*>(&test_sys_alloc_space);
+}
+
+void setup_oomable_sys_alloc() {
+  SysAllocator *def = MallocExtension::instance()->GetSystemAllocator();
+
+  OOMAbleSysAlloc *alloc = get_test_sys_alloc();
+  new (alloc) OOMAbleSysAlloc;
+  alloc->child = def;
+
+  MallocExtension::instance()->SetSystemAllocator(alloc);
+}
+
 namespace testing {
 
 static const int FLAGS_numtests = 50000;
@@ -319,7 +372,7 @@
         }
       }
     }
-    return malloc(size);
+    return noopt(malloc(size));
   }
 
  private:
@@ -549,7 +602,7 @@
 }
 
 static void TryHugeAllocation(size_t s, AllocatorState* rnd) {
-  void* p = rnd->alloc(s);
+  void* p = rnd->alloc(noopt(s));
   CHECK(p == NULL);   // huge allocation s should fail!
 }
 
@@ -581,7 +634,7 @@
 static void TestCalloc(size_t n, size_t s, bool ok) {
   char* p = reinterpret_cast<char*>(calloc(n, s));
   if (FLAGS_verbose)
-    fprintf(LOGSTREAM, "calloc(%" PRIxS ", %" PRIxS "): %p\n", n, s, p);
+    fprintf(LOGSTREAM, "calloc(%zx, %zx): %p\n", n, s, p);
   if (!ok) {
     CHECK(p == NULL);  // calloc(n, s) should not succeed
   } else {
@@ -608,16 +661,16 @@
   int deltas[] = { 1, -2, 4, -8, 16, -32, 64, -128 };
 
   for (int s = 0; s < sizeof(start_sizes)/sizeof(*start_sizes); ++s) {
-    void* p = malloc(start_sizes[s]);
+    void* p = noopt(malloc(start_sizes[s]));
     CHECK(p);
     // The larger the start-size, the larger the non-reallocing delta.
     for (int d = 0; d < (s+1) * 2; ++d) {
-      void* new_p = realloc(p, start_sizes[s] + deltas[d]);
+      void* new_p = noopt(realloc(p, start_sizes[s] + deltas[d]));
       CHECK(p == new_p);  // realloc should not allocate new memory
     }
     // Test again, but this time reallocing smaller first.
     for (int d = 0; d < s*2; ++d) {
-      void* new_p = realloc(p, start_sizes[s] - deltas[d]);
+      void* new_p = noopt(realloc(p, start_sizes[s] - deltas[d]));
       CHECK(p == new_p);  // realloc should not allocate new memory
     }
     free(p);
@@ -626,12 +679,13 @@
 #endif
 }
 
-static void TestNewHandler() throw (std::bad_alloc) {
+static void TestNewHandler() {
   ++news_handled;
   throw std::bad_alloc();
 }
 
 static void TestOneNew(void* (*func)(size_t)) {
+  func = noopt(func);
   // success test
   try {
     void* ptr = (*func)(kNotTooBig);
@@ -676,6 +730,7 @@
 }
 
 static void TestOneNothrowNew(void* (*func)(size_t, const std::nothrow_t&)) {
+  func = noopt(func);
   // success test
   try {
     void* ptr = (*func)(kNotTooBig, std::nothrow);
@@ -725,9 +780,9 @@
 // that we used the tcmalloc version of the call, and not the libc.
 // Note the ... in the hook signature: we don't care what arguments
 // the hook takes.
-#define MAKE_HOOK_CALLBACK(hook_type)                                   \
+#define MAKE_HOOK_CALLBACK(hook_type, ...)                              \
   static volatile int g_##hook_type##_calls = 0;                                 \
-  static void IncrementCallsTo##hook_type(...) {                        \
+  static void IncrementCallsTo##hook_type(__VA_ARGS__) {                \
     g_##hook_type##_calls++;                                            \
   }                                                                     \
   static void Verify##hook_type##WasCalled() {                          \
@@ -744,12 +799,14 @@
   }
 
 // We do one for each hook typedef in malloc_hook.h
-MAKE_HOOK_CALLBACK(NewHook);
-MAKE_HOOK_CALLBACK(DeleteHook);
-MAKE_HOOK_CALLBACK(MmapHook);
-MAKE_HOOK_CALLBACK(MremapHook);
-MAKE_HOOK_CALLBACK(MunmapHook);
-MAKE_HOOK_CALLBACK(SbrkHook);
+MAKE_HOOK_CALLBACK(NewHook, const void*, size_t);
+MAKE_HOOK_CALLBACK(DeleteHook, const void*);
+MAKE_HOOK_CALLBACK(MmapHook, const void*, const void*, size_t, int, int, int,
+                   off_t);
+MAKE_HOOK_CALLBACK(MremapHook, const void*, const void*, size_t, size_t, int,
+                   const void*);
+MAKE_HOOK_CALLBACK(MunmapHook, const void *, size_t);
+MAKE_HOOK_CALLBACK(SbrkHook, const void *, ptrdiff_t);
 
 static void TestAlignmentForSize(int size) {
   fprintf(LOGSTREAM, "Testing alignment of malloc(%d)\n", size);
@@ -900,8 +957,8 @@
   AggressiveDecommitChanger disabler(0);
 
   static const int MB = 1048576;
-  void* a = malloc(MB);
-  void* b = malloc(MB);
+  void* a = noopt(malloc(MB));
+  void* b = noopt(malloc(MB));
   MallocExtension::instance()->ReleaseFreeMemory();
   size_t starting_bytes = GetUnmappedBytes();
 
@@ -937,7 +994,7 @@
   MallocExtension::instance()->ReleaseFreeMemory();
   EXPECT_EQ(starting_bytes + 2*MB, GetUnmappedBytes());
 
-  a = malloc(MB);
+  a = noopt(malloc(MB));
   free(a);
   EXPECT_EQ(starting_bytes + MB, GetUnmappedBytes());
 
@@ -962,8 +1019,8 @@
   AggressiveDecommitChanger enabler(1);
 
   static const int MB = 1048576;
-  void* a = malloc(MB);
-  void* b = malloc(MB);
+  void* a = noopt(malloc(MB));
+  void* b = noopt(malloc(MB));
 
   size_t starting_bytes = GetUnmappedBytes();
 
@@ -984,7 +1041,7 @@
   MallocExtension::instance()->ReleaseFreeMemory();
   EXPECT_EQ(starting_bytes + 2*MB, GetUnmappedBytes());
 
-  a = malloc(MB);
+  a = noopt(malloc(MB));
   free(a);
 
   EXPECT_EQ(starting_bytes + 2*MB, GetUnmappedBytes());
@@ -1009,19 +1066,19 @@
 
   g_old_handler = std::set_new_handler(&OnNoMemory);
   g_no_memory = false;
-  void* ret = malloc(kTooBig);
+  void* ret = noopt(malloc(noopt(kTooBig)));
   EXPECT_EQ(NULL, ret);
   EXPECT_TRUE(g_no_memory);
 
   g_old_handler = std::set_new_handler(&OnNoMemory);
   g_no_memory = false;
-  ret = calloc(1, kTooBig);
+  ret = noopt(calloc(1, noopt(kTooBig)));
   EXPECT_EQ(NULL, ret);
   EXPECT_TRUE(g_no_memory);
 
   g_old_handler = std::set_new_handler(&OnNoMemory);
   g_no_memory = false;
-  ret = realloc(NULL, kTooBig);
+  ret = noopt(realloc(nullptr, noopt(kTooBig)));
   EXPECT_EQ(NULL, ret);
   EXPECT_TRUE(g_no_memory);
 
@@ -1048,13 +1105,16 @@
 }
 
 static void TestErrno(void) {
-  errno = 0;
-  void* ret = memalign(128, kTooBig);
-  EXPECT_EQ(NULL, ret);
-  EXPECT_EQ(ENOMEM, errno);
+  void* ret;
+  if (kOSSupportsMemalign) {
+    errno = 0;
+    ret = Memalign(128, kTooBig);
+    EXPECT_EQ(NULL, ret);
+    EXPECT_EQ(ENOMEM, errno);
+  }
 
   errno = 0;
-  ret = malloc(kTooBig);
+  ret = noopt(malloc(noopt(kTooBig)));
   EXPECT_EQ(NULL, ret);
   EXPECT_EQ(ENOMEM, errno);
 
@@ -1064,12 +1124,89 @@
   EXPECT_EQ(ENOMEM, errno);
 }
 
+
+#ifndef DEBUGALLOCATION
+// Ensure that nallocx works before main.
+struct GlobalNallocx {
+  GlobalNallocx() { CHECK_GT(nallocx(99, 0), 99); }
+} global_nallocx;
+
+#if defined(__GNUC__)
+
+static void check_global_nallocx() __attribute__((constructor));
+static void check_global_nallocx() { CHECK_GT(nallocx(99, 0), 99); }
+
+#endif // __GNUC__
+
+static void TestNAllocX() {
+  for (size_t size = 0; size <= (1 << 20); size += 7) {
+    size_t rounded = nallocx(size, 0);
+    ASSERT_GE(rounded, size);
+    void* ptr = malloc(size);
+    ASSERT_EQ(rounded, MallocExtension::instance()->GetAllocatedSize(ptr));
+    free(ptr);
+  }
+}
+
+static void TestNAllocXAlignment() {
+  for (size_t size = 0; size <= (1 << 20); size += 7) {
+    for (size_t align = 0; align < 10; align++) {
+      size_t rounded = nallocx(size, MALLOCX_LG_ALIGN(align));
+      ASSERT_GE(rounded, size);
+      ASSERT_EQ(rounded % (1 << align), 0);
+      void* ptr = tc_memalign(1 << align, size);
+      ASSERT_EQ(rounded, MallocExtension::instance()->GetAllocatedSize(ptr));
+      free(ptr);
+    }
+  }
+}
+
+static int saw_new_handler_runs;
+static void* volatile oom_test_last_ptr;
+
+static void test_new_handler() {
+  get_test_sys_alloc()->simulate_oom = false;
+  void *ptr = oom_test_last_ptr;
+  oom_test_last_ptr = NULL;
+  ::operator delete[](ptr);
+  saw_new_handler_runs++;
+}
+
+static ATTRIBUTE_NOINLINE void TestNewOOMHandling() {
+  // debug allocator does internal allocations and crashes when such
+  // internal allocation fails. So don't test it.
+  setup_oomable_sys_alloc();
+
+  std::new_handler old = std::set_new_handler(test_new_handler);
+  get_test_sys_alloc()->simulate_oom = true;
+
+  ASSERT_EQ(saw_new_handler_runs, 0);
+
+  for (int i = 0; i < 10240; i++) {
+    oom_test_last_ptr = noopt(new char [512]);
+    ASSERT_NE(oom_test_last_ptr, NULL);
+    if (saw_new_handler_runs) {
+      break;
+    }
+  }
+
+  ASSERT_GE(saw_new_handler_runs, 1);
+
+  get_test_sys_alloc()->simulate_oom = false;
+  std::set_new_handler(old);
+}
+#endif  // !DEBUGALLOCATION
+
 static int RunAllTests(int argc, char** argv) {
   // Optional argv[1] is the seed
   AllocatorState rnd(argc > 1 ? atoi(argv[1]) : 100);
 
   SetTestResourceLimit();
 
+#ifndef DEBUGALLOCATION
+  TestNewOOMHandling();
+#endif
+
   // TODO(odo):  This test has been disabled because it is only by luck that it
   // does not result in fragmentation.  When tcmalloc makes an allocation which
   // spans previously unused leaves of the pagemap it will allocate and fill in
@@ -1131,6 +1268,23 @@
     std::stable_sort(v.begin(), v.end());
   }
 
+#ifdef ENABLE_SIZED_DELETE
+  {
+    fprintf(LOGSTREAM, "Testing large sized delete is not crashing\n");
+    // Large sized delete
+    // case. https://github.com/gperftools/gperftools/issues/1254
+    std::vector<char*> addresses;
+    constexpr int kSizedDepth = 1024;
+    addresses.reserve(kSizedDepth);
+    for (int i = 0; i < kSizedDepth; i++) {
+      addresses.push_back(noopt(new char[12686]));
+    }
+    for (int i = 0; i < kSizedDepth; i++) {
+      ::operator delete[](addresses[i], 12686);
+    }
+  }
+#endif
+
   // Test each of the memory-allocation functions once, just as a sanity-check
   fprintf(LOGSTREAM, "Sanity-testing all the memory allocation functions\n");
   {
@@ -1191,59 +1345,136 @@
     VerifyDeleteHookWasCalled();
 #endif
 
-    p1 = valloc(60);
+    p1 = noopt(valloc(60));
     CHECK(p1 != NULL);
     VerifyNewHookWasCalled();
     free(p1);
     VerifyDeleteHookWasCalled();
 
-    p1 = pvalloc(70);
+    p1 = noopt(pvalloc(70));
     CHECK(p1 != NULL);
     VerifyNewHookWasCalled();
     free(p1);
     VerifyDeleteHookWasCalled();
 
-    char* p2 = new char;
+    char* p2 = noopt(new char);
     CHECK(p2 != NULL);
     VerifyNewHookWasCalled();
     delete p2;
     VerifyDeleteHookWasCalled();
 
-    p2 = new char[100];
+    p2 = noopt(new char[100]);
     CHECK(p2 != NULL);
     VerifyNewHookWasCalled();
     delete[] p2;
     VerifyDeleteHookWasCalled();
 
-    p2 = new(std::nothrow) char;
+    p2 = noopt(new (std::nothrow) char);
     CHECK(p2 != NULL);
     VerifyNewHookWasCalled();
     delete p2;
     VerifyDeleteHookWasCalled();
 
-    p2 = new(std::nothrow) char[100];
+    p2 = noopt(new (std::nothrow) char[100]);
     CHECK(p2 != NULL);
     VerifyNewHookWasCalled();
     delete[] p2;
     VerifyDeleteHookWasCalled();
 
     // Another way of calling operator new
-    p2 = static_cast<char*>(::operator new(100));
+    p2 = noopt(static_cast<char*>(::operator new(100)));
     CHECK(p2 != NULL);
     VerifyNewHookWasCalled();
     ::operator delete(p2);
     VerifyDeleteHookWasCalled();
 
     // Try to call nothrow's delete too.  Compilers use this.
-    p2 = static_cast<char*>(::operator new(100, std::nothrow));
+    p2 = noopt(static_cast<char*>(::operator new(100, std::nothrow)));
     CHECK(p2 != NULL);
     VerifyNewHookWasCalled();
     ::operator delete(p2, std::nothrow);
     VerifyDeleteHookWasCalled();
 
+#ifdef ENABLE_SIZED_DELETE
+    p2 = noopt(new char);
+    CHECK(p2 != NULL);
+    VerifyNewHookWasCalled();
+    ::operator delete(p2, sizeof(char));
+    VerifyDeleteHookWasCalled();
+
+    p2 = noopt(new char[100]);
+    CHECK(p2 != NULL);
+    VerifyNewHookWasCalled();
+    ::operator delete[](p2, sizeof(char) * 100);
+    VerifyDeleteHookWasCalled();
+#endif
+
+#if defined(ENABLE_ALIGNED_NEW_DELETE)
+
+    overaligned_type* poveraligned = noopt(new overaligned_type);
+    CHECK(poveraligned != NULL);
+    CHECK((((size_t)poveraligned) % OVERALIGNMENT) == 0u);
+    VerifyNewHookWasCalled();
+    delete poveraligned;
+    VerifyDeleteHookWasCalled();
+
+    poveraligned = noopt(new overaligned_type[10]);
+    CHECK(poveraligned != NULL);
+    CHECK((((size_t)poveraligned) % OVERALIGNMENT) == 0u);
+    VerifyNewHookWasCalled();
+    delete[] poveraligned;
+    VerifyDeleteHookWasCalled();
+
+    poveraligned = noopt(new(std::nothrow) overaligned_type);
+    CHECK(poveraligned != NULL);
+    CHECK((((size_t)poveraligned) % OVERALIGNMENT) == 0u);
+    VerifyNewHookWasCalled();
+    delete poveraligned;
+    VerifyDeleteHookWasCalled();
+
+    poveraligned = noopt(new(std::nothrow) overaligned_type[10]);
+    CHECK(poveraligned != NULL);
+    CHECK((((size_t)poveraligned) % OVERALIGNMENT) == 0u);
+    VerifyNewHookWasCalled();
+    delete[] poveraligned;
+    VerifyDeleteHookWasCalled();
+
+    // Another way of calling operator new
+    p2 = noopt(static_cast<char*>(::operator new(100, std::align_val_t(OVERALIGNMENT))));
+    CHECK(p2 != NULL);
+    CHECK((((size_t)p2) % OVERALIGNMENT) == 0u);
+    VerifyNewHookWasCalled();
+    ::operator delete(p2, std::align_val_t(OVERALIGNMENT));
+    VerifyDeleteHookWasCalled();
+
+    p2 = noopt(static_cast<char*>(::operator new(100, std::align_val_t(OVERALIGNMENT), std::nothrow)));
+    CHECK(p2 != NULL);
+    CHECK((((size_t)p2) % OVERALIGNMENT) == 0u);
+    VerifyNewHookWasCalled();
+    ::operator delete(p2, std::align_val_t(OVERALIGNMENT), std::nothrow);
+    VerifyDeleteHookWasCalled();
+
+#ifdef ENABLE_SIZED_DELETE
+    poveraligned = noopt(new overaligned_type);
+    CHECK(poveraligned != NULL);
+    CHECK((((size_t)poveraligned) % OVERALIGNMENT) == 0u);
+    VerifyNewHookWasCalled();
+    ::operator delete(poveraligned, sizeof(overaligned_type), std::align_val_t(OVERALIGNMENT));
+    VerifyDeleteHookWasCalled();
+
+    poveraligned = noopt(new overaligned_type[10]);
+    CHECK(poveraligned != NULL);
+    CHECK((((size_t)poveraligned) % OVERALIGNMENT) == 0u);
+    VerifyNewHookWasCalled();
+    ::operator delete[](poveraligned, sizeof(overaligned_type) * 10, std::align_val_t(OVERALIGNMENT));
+    VerifyDeleteHookWasCalled();
+#endif
+
+#endif // defined(ENABLE_ALIGNED_NEW_DELETE)
+
     // Try strdup(), which the system allocates but we must free.  If
     // all goes well, libc will use our malloc!
-    p2 = strdup("test");
+    p2 = noopt(strdup("in memory of James Golick"));
     CHECK(p2 != NULL);
     VerifyNewHookWasCalled();
     free(p2);
@@ -1279,9 +1510,9 @@
     VerifyMunmapHookWasCalled();
     close(fd);
 #else   // this is just to quiet the compiler: make sure all fns are called
-    IncrementCallsToMmapHook();
-    IncrementCallsToMunmapHook();
-    IncrementCallsToMremapHook();
+    IncrementCallsToMmapHook(NULL, NULL, 0, 0, 0, 0, 0);
+    IncrementCallsToMunmapHook(NULL, 0);
+    IncrementCallsToMremapHook(NULL, NULL, 0, 0, 0, NULL);
     VerifyMmapHookWasCalled();
     VerifyMremapHookWasCalled();
     VerifyMunmapHookWasCalled();
@@ -1289,7 +1520,7 @@
 
     // Test sbrk
     SetSbrkHook();
-#if defined(HAVE_SBRK) && defined(__linux) && \
+#if defined(HAVE___SBRK) && defined(__linux) && \
        (defined(__i386__) || defined(__x86_64__))
     p1 = sbrk(8192);
     CHECK(p1 != NULL);
@@ -1302,7 +1533,7 @@
     CHECK(p1 != NULL);
     CHECK_EQ(g_SbrkHook_calls, 0);
 #else   // this is just to quiet the compiler: make sure all fns are called
-    IncrementCallsToSbrkHook();
+    IncrementCallsToSbrkHook(NULL, 0);
     VerifySbrkHookWasCalled();
 #endif
 
@@ -1384,11 +1615,16 @@
   // Check that large allocations fail with NULL instead of crashing
 #ifndef DEBUGALLOCATION    // debug allocation takes forever for huge allocs
   fprintf(LOGSTREAM, "Testing out of memory\n");
+  size_t old_limit;
+  CHECK(MallocExtension::instance()->GetNumericProperty("tcmalloc.heap_limit_mb", &old_limit));
+  // Don't exercise more than 1 gig, no need to.
+  CHECK(MallocExtension::instance()->SetNumericProperty("tcmalloc.heap_limit_mb", 1 << 10));
   for (int s = 0; ; s += (10<<20)) {
     void* large_object = rnd.alloc(s);
     if (large_object == NULL) break;
     free(large_object);
   }
+  CHECK(MallocExtension::instance()->SetNumericProperty("tcmalloc.heap_limit_mb", old_limit));
 #endif
 
   TestHugeThreadCache();
@@ -1398,6 +1634,12 @@
   TestSetNewMode();
   TestErrno();
 
+// GetAllocatedSize under DEBUGALLOCATION returns the size that we asked for.
+#ifndef DEBUGALLOCATION
+  TestNAllocX();
+  TestNAllocXAlignment();
+#endif
+
   return 0;
 }
 

diff --git a/src/tests/tcmalloc_unittest.sh b/src/tests/tcmalloc_unittest.sh
index 755241e..0e7996a 100755
--- a/src/tests/tcmalloc_unittest.sh
+++ b/src/tests/tcmalloc_unittest.sh

@@ -69,12 +69,16 @@
 run_check_transfer_num_obj "40"
 run_check_transfer_num_obj "4096"
 
-echo -n "Testing $TCMALLOC_UNITTEST with TCMALLOC_AGGRESSIVE_DECOMMIT=f ... "
+echo -n "Testing $TCMALLOC_UNITTEST with TCMALLOC_AGGRESSIVE_DECOMMIT=t ... "
 
-TCMALLOC_AGGRESSIVE_DECOMMIT=f run_unittest
+TCMALLOC_AGGRESSIVE_DECOMMIT=t run_unittest
 
 echo -n "Testing $TCMALLOC_UNITTEST with TCMALLOC_HEAP_LIMIT_MB=512 ... "
 
 TCMALLOC_HEAP_LIMIT_MB=512 run_unittest
 
+echo -n "Testing $TCMALLOC_UNITTEST with TCMALLOC_ENABLE_SIZED_DELETE=t ..."
+
+TCMALLOC_ENABLE_SIZED_DELETE=t run_unittest
+
 echo "PASS"

diff --git a/src/tests/testutil.cc b/src/tests/testutil.cc
index c2c71cb..e5faa65 100644
--- a/src/tests/testutil.cc
+++ b/src/tests/testutil.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/tests/testutil.h b/src/tests/testutil.h
index 071a209..dc1db9b 100644
--- a/src/tests/testutil.h
+++ b/src/tests/testutil.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -59,4 +59,15 @@
 // out job limits.
 void SetTestResourceLimit();
 
+static void (* volatile noopt_helper)(void *) = +[] (void* dummy) {};
+
+// This function forces compiler to forget specific knowledge about
+// value of 'val'. This is useful to avoid compiler optimizing out
+// new/delete pairs for our unit tests.
+template <typename T>
+T noopt(T val) {
+  noopt_helper(&val);
+  return val;
+}
+
 #endif  // TCMALLOC_TOOLS_TESTUTIL_H_

diff --git a/src/tests/thread_dealloc_unittest.cc b/src/tests/thread_dealloc_unittest.cc
index 97615cd..770a760 100644
--- a/src/tests/thread_dealloc_unittest.cc
+++ b/src/tests/thread_dealloc_unittest.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2004, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/third_party/valgrind.h b/src/third_party/valgrind.h
deleted file mode 100644
index 577c59a..0000000
--- a/src/third_party/valgrind.h
+++ /dev/null

@@ -1,3924 +0,0 @@
-/* -*- c -*-
-   ----------------------------------------------------------------
-
-   Notice that the following BSD-style license applies to this one
-   file (valgrind.h) only.  The rest of Valgrind is licensed under the
-   terms of the GNU General Public License, version 2, unless
-   otherwise indicated.  See the COPYING file in the source
-   distribution for details.
-
-   ----------------------------------------------------------------
-
-   This file is part of Valgrind, a dynamic binary instrumentation
-   framework.
-
-   Copyright (C) 2000-2008 Julian Seward.  All rights reserved.
-
-   Redistribution and use in source and binary forms, with or without
-   modification, are permitted provided that the following conditions
-   are met:
-
-   1. Redistributions of source code must retain the above copyright
-      notice, this list of conditions and the following disclaimer.
-
-   2. The origin of this software must not be misrepresented; you must 
-      not claim that you wrote the original software.  If you use this 
-      software in a product, an acknowledgment in the product 
-      documentation would be appreciated but is not required.
-
-   3. Altered source versions must be plainly marked as such, and must
-      not be misrepresented as being the original software.
-
-   4. The name of the author may not be used to endorse or promote 
-      products derived from this software without specific prior written 
-      permission.
-
-   THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS
-   OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-   WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-   ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
-   DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-   DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
-   GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-   INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
-   WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-   NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-   SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-   ----------------------------------------------------------------
-
-   Notice that the above BSD-style license applies to this one file
-   (valgrind.h) only.  The entire rest of Valgrind is licensed under
-   the terms of the GNU General Public License, version 2.  See the
-   COPYING file in the source distribution for details.
-
-   ---------------------------------------------------------------- 
-*/
-
-
-/* This file is for inclusion into client (your!) code.
-
-   You can use these macros to manipulate and query Valgrind's 
-   execution inside your own programs.
-
-   The resulting executables will still run without Valgrind, just a
-   little bit more slowly than they otherwise would, but otherwise
-   unchanged.  When not running on valgrind, each client request
-   consumes very few (eg. 7) instructions, so the resulting performance
-   loss is negligible unless you plan to execute client requests
-   millions of times per second.  Nevertheless, if that is still a
-   problem, you can compile with the NVALGRIND symbol defined (gcc
-   -DNVALGRIND) so that client requests are not even compiled in.  */
-
-#ifndef __VALGRIND_H
-#define __VALGRIND_H
-
-#include <stdarg.h>
-
-/* Nb: this file might be included in a file compiled with -ansi.  So
-   we can't use C++ style "//" comments nor the "asm" keyword (instead
-   use "__asm__"). */
-
-/* Derive some tags indicating what the target platform is.  Note
-   that in this file we're using the compiler's CPP symbols for
-   identifying architectures, which are different to the ones we use
-   within the rest of Valgrind.  Note, __powerpc__ is active for both
-   32 and 64-bit PPC, whereas __powerpc64__ is only active for the
-   latter (on Linux, that is). */
-#undef PLAT_x86_linux
-#undef PLAT_amd64_linux
-#undef PLAT_ppc32_linux
-#undef PLAT_ppc64_linux
-#undef PLAT_ppc32_aix5
-#undef PLAT_ppc64_aix5
-
-#if !defined(_AIX) && defined(__i386__)
-#  define PLAT_x86_linux 1
-#elif !defined(_AIX) && defined(__x86_64__)
-#  define PLAT_amd64_linux 1
-#elif !defined(_AIX) && defined(__powerpc__) && !defined(__powerpc64__)
-#  define PLAT_ppc32_linux 1
-#elif !defined(_AIX) && defined(__powerpc__) && defined(__powerpc64__)
-#  define PLAT_ppc64_linux 1
-#elif defined(_AIX) && defined(__64BIT__)
-#  define PLAT_ppc64_aix5 1
-#elif defined(_AIX) && !defined(__64BIT__)
-#  define PLAT_ppc32_aix5 1
-#endif
-
-
-/* If we're not compiling for our target platform, don't generate
-   any inline asms.  */
-#if !defined(PLAT_x86_linux) && !defined(PLAT_amd64_linux) \
-    && !defined(PLAT_ppc32_linux) && !defined(PLAT_ppc64_linux) \
-    && !defined(PLAT_ppc32_aix5) && !defined(PLAT_ppc64_aix5)
-#  if !defined(NVALGRIND)
-#    define NVALGRIND 1
-#  endif
-#endif
-
-
-/* ------------------------------------------------------------------ */
-/* ARCHITECTURE SPECIFICS for SPECIAL INSTRUCTIONS.  There is nothing */
-/* in here of use to end-users -- skip to the next section.           */
-/* ------------------------------------------------------------------ */
-
-#if defined(NVALGRIND)
-
-/* Define NVALGRIND to completely remove the Valgrind magic sequence
-   from the compiled code (analogous to NDEBUG's effects on
-   assert()) */
-#define VALGRIND_DO_CLIENT_REQUEST(                               \
-        _zzq_rlval, _zzq_default, _zzq_request,                   \
-        _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5)    \
-   {                                                              \
-      (_zzq_rlval) = (_zzq_default);                              \
-   }
-
-#else  /* ! NVALGRIND */
-
-/* The following defines the magic code sequences which the JITter
-   spots and handles magically.  Don't look too closely at them as
-   they will rot your brain.
-
-   The assembly code sequences for all architectures is in this one
-   file.  This is because this file must be stand-alone, and we don't
-   want to have multiple files.
-
-   For VALGRIND_DO_CLIENT_REQUEST, we must ensure that the default
-   value gets put in the return slot, so that everything works when
-   this is executed not under Valgrind.  Args are passed in a memory
-   block, and so there's no intrinsic limit to the number that could
-   be passed, but it's currently five.
-   
-   The macro args are: 
-      _zzq_rlval    result lvalue
-      _zzq_default  default value (result returned when running on real CPU)
-      _zzq_request  request code
-      _zzq_arg1..5  request params
-
-   The other two macros are used to support function wrapping, and are
-   a lot simpler.  VALGRIND_GET_NR_CONTEXT returns the value of the
-   guest's NRADDR pseudo-register and whatever other information is
-   needed to safely run the call original from the wrapper: on
-   ppc64-linux, the R2 value at the divert point is also needed.  This
-   information is abstracted into a user-visible type, OrigFn.
-
-   VALGRIND_CALL_NOREDIR_* behaves the same as the following on the
-   guest, but guarantees that the branch instruction will not be
-   redirected: x86: call *%eax, amd64: call *%rax, ppc32/ppc64:
-   branch-and-link-to-r11.  VALGRIND_CALL_NOREDIR is just text, not a
-   complete inline asm, since it needs to be combined with more magic
-   inline asm stuff to be useful.
-*/
-
-/* ------------------------- x86-linux ------------------------- */
-
-#if defined(PLAT_x86_linux)
-
-typedef
-   struct { 
-      unsigned int nraddr; /* where's the code? */
-   }
-   OrigFn;
-
-#define __SPECIAL_INSTRUCTION_PREAMBLE                            \
-                     "roll $3,  %%edi ; roll $13, %%edi\n\t"      \
-                     "roll $29, %%edi ; roll $19, %%edi\n\t"
-
-#define VALGRIND_DO_CLIENT_REQUEST(                               \
-        _zzq_rlval, _zzq_default, _zzq_request,                   \
-        _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5)    \
-  { volatile unsigned int _zzq_args[6];                           \
-    volatile unsigned int _zzq_result;                            \
-    _zzq_args[0] = (unsigned int)(_zzq_request);                  \
-    _zzq_args[1] = (unsigned int)(_zzq_arg1);                     \
-    _zzq_args[2] = (unsigned int)(_zzq_arg2);                     \
-    _zzq_args[3] = (unsigned int)(_zzq_arg3);                     \
-    _zzq_args[4] = (unsigned int)(_zzq_arg4);                     \
-    _zzq_args[5] = (unsigned int)(_zzq_arg5);                     \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %EDX = client_request ( %EAX ) */         \
-                     "xchgl %%ebx,%%ebx"                          \
-                     : "=d" (_zzq_result)                         \
-                     : "a" (&_zzq_args[0]), "0" (_zzq_default)    \
-                     : "cc", "memory"                             \
-                    );                                            \
-    _zzq_rlval = _zzq_result;                                     \
-  }
-
-#define VALGRIND_GET_NR_CONTEXT(_zzq_rlval)                       \
-  { volatile OrigFn* _zzq_orig = &(_zzq_rlval);                   \
-    volatile unsigned int __addr;                                 \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %EAX = guest_NRADDR */                    \
-                     "xchgl %%ecx,%%ecx"                          \
-                     : "=a" (__addr)                              \
-                     :                                            \
-                     : "cc", "memory"                             \
-                    );                                            \
-    _zzq_orig->nraddr = __addr;                                   \
-  }
-
-#define VALGRIND_CALL_NOREDIR_EAX                                 \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* call-noredir *%EAX */                     \
-                     "xchgl %%edx,%%edx\n\t"
-#endif /* PLAT_x86_linux */
-
-/* ------------------------ amd64-linux ------------------------ */
-
-#if defined(PLAT_amd64_linux)
-
-typedef
-   struct { 
-      unsigned long long int nraddr; /* where's the code? */
-   }
-   OrigFn;
-
-#define __SPECIAL_INSTRUCTION_PREAMBLE                            \
-                     "rolq $3,  %%rdi ; rolq $13, %%rdi\n\t"      \
-                     "rolq $61, %%rdi ; rolq $51, %%rdi\n\t"
-
-#define VALGRIND_DO_CLIENT_REQUEST(                               \
-        _zzq_rlval, _zzq_default, _zzq_request,                   \
-        _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5)    \
-  { volatile unsigned long long int _zzq_args[6];                 \
-    volatile unsigned long long int _zzq_result;                  \
-    _zzq_args[0] = (unsigned long long int)(_zzq_request);        \
-    _zzq_args[1] = (unsigned long long int)(_zzq_arg1);           \
-    _zzq_args[2] = (unsigned long long int)(_zzq_arg2);           \
-    _zzq_args[3] = (unsigned long long int)(_zzq_arg3);           \
-    _zzq_args[4] = (unsigned long long int)(_zzq_arg4);           \
-    _zzq_args[5] = (unsigned long long int)(_zzq_arg5);           \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %RDX = client_request ( %RAX ) */         \
-                     "xchgq %%rbx,%%rbx"                          \
-                     : "=d" (_zzq_result)                         \
-                     : "a" (&_zzq_args[0]), "0" (_zzq_default)    \
-                     : "cc", "memory"                             \
-                    );                                            \
-    _zzq_rlval = _zzq_result;                                     \
-  }
-
-#define VALGRIND_GET_NR_CONTEXT(_zzq_rlval)                       \
-  { volatile OrigFn* _zzq_orig = &(_zzq_rlval);                   \
-    volatile unsigned long long int __addr;                       \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %RAX = guest_NRADDR */                    \
-                     "xchgq %%rcx,%%rcx"                          \
-                     : "=a" (__addr)                              \
-                     :                                            \
-                     : "cc", "memory"                             \
-                    );                                            \
-    _zzq_orig->nraddr = __addr;                                   \
-  }
-
-#define VALGRIND_CALL_NOREDIR_RAX                                 \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* call-noredir *%RAX */                     \
-                     "xchgq %%rdx,%%rdx\n\t"
-#endif /* PLAT_amd64_linux */
-
-/* ------------------------ ppc32-linux ------------------------ */
-
-#if defined(PLAT_ppc32_linux)
-
-typedef
-   struct { 
-      unsigned int nraddr; /* where's the code? */
-   }
-   OrigFn;
-
-#define __SPECIAL_INSTRUCTION_PREAMBLE                            \
-                     "rlwinm 0,0,3,0,0  ; rlwinm 0,0,13,0,0\n\t"  \
-                     "rlwinm 0,0,29,0,0 ; rlwinm 0,0,19,0,0\n\t"
-
-#define VALGRIND_DO_CLIENT_REQUEST(                               \
-        _zzq_rlval, _zzq_default, _zzq_request,                   \
-        _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5)    \
-                                                                  \
-  {          unsigned int  _zzq_args[6];                          \
-             unsigned int  _zzq_result;                           \
-             unsigned int* _zzq_ptr;                              \
-    _zzq_args[0] = (unsigned int)(_zzq_request);                  \
-    _zzq_args[1] = (unsigned int)(_zzq_arg1);                     \
-    _zzq_args[2] = (unsigned int)(_zzq_arg2);                     \
-    _zzq_args[3] = (unsigned int)(_zzq_arg3);                     \
-    _zzq_args[4] = (unsigned int)(_zzq_arg4);                     \
-    _zzq_args[5] = (unsigned int)(_zzq_arg5);                     \
-    _zzq_ptr = _zzq_args;                                         \
-    __asm__ volatile("mr 3,%1\n\t" /*default*/                    \
-                     "mr 4,%2\n\t" /*ptr*/                        \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = client_request ( %R4 ) */           \
-                     "or 1,1,1\n\t"                               \
-                     "mr %0,3"     /*result*/                     \
-                     : "=b" (_zzq_result)                         \
-                     : "b" (_zzq_default), "b" (_zzq_ptr)         \
-                     : "cc", "memory", "r3", "r4");               \
-    _zzq_rlval = _zzq_result;                                     \
-  }
-
-#define VALGRIND_GET_NR_CONTEXT(_zzq_rlval)                       \
-  { volatile OrigFn* _zzq_orig = &(_zzq_rlval);                   \
-    unsigned int __addr;                                          \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = guest_NRADDR */                     \
-                     "or 2,2,2\n\t"                               \
-                     "mr %0,3"                                    \
-                     : "=b" (__addr)                              \
-                     :                                            \
-                     : "cc", "memory", "r3"                       \
-                    );                                            \
-    _zzq_orig->nraddr = __addr;                                   \
-  }
-
-#define VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                   \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* branch-and-link-to-noredir *%R11 */       \
-                     "or 3,3,3\n\t"
-#endif /* PLAT_ppc32_linux */
-
-/* ------------------------ ppc64-linux ------------------------ */
-
-#if defined(PLAT_ppc64_linux)
-
-typedef
-   struct { 
-      unsigned long long int nraddr; /* where's the code? */
-      unsigned long long int r2;  /* what tocptr do we need? */
-   }
-   OrigFn;
-
-#define __SPECIAL_INSTRUCTION_PREAMBLE                            \
-                     "rotldi 0,0,3  ; rotldi 0,0,13\n\t"          \
-                     "rotldi 0,0,61 ; rotldi 0,0,51\n\t"
-
-#define VALGRIND_DO_CLIENT_REQUEST(                               \
-        _zzq_rlval, _zzq_default, _zzq_request,                   \
-        _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5)    \
-                                                                  \
-  {          unsigned long long int  _zzq_args[6];                \
-    register unsigned long long int  _zzq_result __asm__("r3");   \
-    register unsigned long long int* _zzq_ptr __asm__("r4");      \
-    _zzq_args[0] = (unsigned long long int)(_zzq_request);        \
-    _zzq_args[1] = (unsigned long long int)(_zzq_arg1);           \
-    _zzq_args[2] = (unsigned long long int)(_zzq_arg2);           \
-    _zzq_args[3] = (unsigned long long int)(_zzq_arg3);           \
-    _zzq_args[4] = (unsigned long long int)(_zzq_arg4);           \
-    _zzq_args[5] = (unsigned long long int)(_zzq_arg5);           \
-    _zzq_ptr = _zzq_args;                                         \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = client_request ( %R4 ) */           \
-                     "or 1,1,1"                                   \
-                     : "=r" (_zzq_result)                         \
-                     : "0" (_zzq_default), "r" (_zzq_ptr)         \
-                     : "cc", "memory");                           \
-    _zzq_rlval = _zzq_result;                                     \
-  }
-
-#define VALGRIND_GET_NR_CONTEXT(_zzq_rlval)                       \
-  { volatile OrigFn* _zzq_orig = &(_zzq_rlval);                   \
-    register unsigned long long int __addr __asm__("r3");         \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = guest_NRADDR */                     \
-                     "or 2,2,2"                                   \
-                     : "=r" (__addr)                              \
-                     :                                            \
-                     : "cc", "memory"                             \
-                    );                                            \
-    _zzq_orig->nraddr = __addr;                                   \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = guest_NRADDR_GPR2 */                \
-                     "or 4,4,4"                                   \
-                     : "=r" (__addr)                              \
-                     :                                            \
-                     : "cc", "memory"                             \
-                    );                                            \
-    _zzq_orig->r2 = __addr;                                       \
-  }
-
-#define VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                   \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* branch-and-link-to-noredir *%R11 */       \
-                     "or 3,3,3\n\t"
-
-#endif /* PLAT_ppc64_linux */
-
-/* ------------------------ ppc32-aix5 ------------------------- */
-
-#if defined(PLAT_ppc32_aix5)
-
-typedef
-   struct { 
-      unsigned int nraddr; /* where's the code? */
-      unsigned int r2;  /* what tocptr do we need? */
-   }
-   OrigFn;
-
-#define __SPECIAL_INSTRUCTION_PREAMBLE                            \
-                     "rlwinm 0,0,3,0,0  ; rlwinm 0,0,13,0,0\n\t"  \
-                     "rlwinm 0,0,29,0,0 ; rlwinm 0,0,19,0,0\n\t"
-
-#define VALGRIND_DO_CLIENT_REQUEST(                               \
-        _zzq_rlval, _zzq_default, _zzq_request,                   \
-        _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5)    \
-                                                                  \
-  {          unsigned int  _zzq_args[7];                          \
-    register unsigned int  _zzq_result;                           \
-    register unsigned int* _zzq_ptr;                              \
-    _zzq_args[0] = (unsigned int)(_zzq_request);                  \
-    _zzq_args[1] = (unsigned int)(_zzq_arg1);                     \
-    _zzq_args[2] = (unsigned int)(_zzq_arg2);                     \
-    _zzq_args[3] = (unsigned int)(_zzq_arg3);                     \
-    _zzq_args[4] = (unsigned int)(_zzq_arg4);                     \
-    _zzq_args[5] = (unsigned int)(_zzq_arg5);                     \
-    _zzq_args[6] = (unsigned int)(_zzq_default);                  \
-    _zzq_ptr = _zzq_args;                                         \
-    __asm__ volatile("mr 4,%1\n\t"                                \
-                     "lwz 3, 24(4)\n\t"                           \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = client_request ( %R4 ) */           \
-                     "or 1,1,1\n\t"                               \
-                     "mr %0,3"                                    \
-                     : "=b" (_zzq_result)                         \
-                     : "b" (_zzq_ptr)                             \
-                     : "r3", "r4", "cc", "memory");               \
-    _zzq_rlval = _zzq_result;                                     \
-  }
-
-#define VALGRIND_GET_NR_CONTEXT(_zzq_rlval)                       \
-  { volatile OrigFn* _zzq_orig = &(_zzq_rlval);                   \
-    register unsigned int __addr;                                 \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = guest_NRADDR */                     \
-                     "or 2,2,2\n\t"                               \
-                     "mr %0,3"                                    \
-                     : "=b" (__addr)                              \
-                     :                                            \
-                     : "r3", "cc", "memory"                       \
-                    );                                            \
-    _zzq_orig->nraddr = __addr;                                   \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = guest_NRADDR_GPR2 */                \
-                     "or 4,4,4\n\t"                               \
-                     "mr %0,3"                                    \
-                     : "=b" (__addr)                              \
-                     :                                            \
-                     : "r3", "cc", "memory"                       \
-                    );                                            \
-    _zzq_orig->r2 = __addr;                                       \
-  }
-
-#define VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                   \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* branch-and-link-to-noredir *%R11 */       \
-                     "or 3,3,3\n\t"
-
-#endif /* PLAT_ppc32_aix5 */
-
-/* ------------------------ ppc64-aix5 ------------------------- */
-
-#if defined(PLAT_ppc64_aix5)
-
-typedef
-   struct { 
-      unsigned long long int nraddr; /* where's the code? */
-      unsigned long long int r2;  /* what tocptr do we need? */
-   }
-   OrigFn;
-
-#define __SPECIAL_INSTRUCTION_PREAMBLE                            \
-                     "rotldi 0,0,3  ; rotldi 0,0,13\n\t"          \
-                     "rotldi 0,0,61 ; rotldi 0,0,51\n\t"
-
-#define VALGRIND_DO_CLIENT_REQUEST(                               \
-        _zzq_rlval, _zzq_default, _zzq_request,                   \
-        _zzq_arg1, _zzq_arg2, _zzq_arg3, _zzq_arg4, _zzq_arg5)    \
-                                                                  \
-  {          unsigned long long int  _zzq_args[7];                \
-    register unsigned long long int  _zzq_result;                 \
-    register unsigned long long int* _zzq_ptr;                    \
-    _zzq_args[0] = (unsigned int long long)(_zzq_request);        \
-    _zzq_args[1] = (unsigned int long long)(_zzq_arg1);           \
-    _zzq_args[2] = (unsigned int long long)(_zzq_arg2);           \
-    _zzq_args[3] = (unsigned int long long)(_zzq_arg3);           \
-    _zzq_args[4] = (unsigned int long long)(_zzq_arg4);           \
-    _zzq_args[5] = (unsigned int long long)(_zzq_arg5);           \
-    _zzq_args[6] = (unsigned int long long)(_zzq_default);        \
-    _zzq_ptr = _zzq_args;                                         \
-    __asm__ volatile("mr 4,%1\n\t"                                \
-                     "ld 3, 48(4)\n\t"                            \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = client_request ( %R4 ) */           \
-                     "or 1,1,1\n\t"                               \
-                     "mr %0,3"                                    \
-                     : "=b" (_zzq_result)                         \
-                     : "b" (_zzq_ptr)                             \
-                     : "r3", "r4", "cc", "memory");               \
-    _zzq_rlval = _zzq_result;                                     \
-  }
-
-#define VALGRIND_GET_NR_CONTEXT(_zzq_rlval)                       \
-  { volatile OrigFn* _zzq_orig = &(_zzq_rlval);                   \
-    register unsigned long long int __addr;                       \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = guest_NRADDR */                     \
-                     "or 2,2,2\n\t"                               \
-                     "mr %0,3"                                    \
-                     : "=b" (__addr)                              \
-                     :                                            \
-                     : "r3", "cc", "memory"                       \
-                    );                                            \
-    _zzq_orig->nraddr = __addr;                                   \
-    __asm__ volatile(__SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* %R3 = guest_NRADDR_GPR2 */                \
-                     "or 4,4,4\n\t"                               \
-                     "mr %0,3"                                    \
-                     : "=b" (__addr)                              \
-                     :                                            \
-                     : "r3", "cc", "memory"                       \
-                    );                                            \
-    _zzq_orig->r2 = __addr;                                       \
-  }
-
-#define VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                   \
-                     __SPECIAL_INSTRUCTION_PREAMBLE               \
-                     /* branch-and-link-to-noredir *%R11 */       \
-                     "or 3,3,3\n\t"
-
-#endif /* PLAT_ppc64_aix5 */
-
-/* Insert assembly code for other platforms here... */
-
-#endif /* NVALGRIND */
-
-
-/* ------------------------------------------------------------------ */
-/* PLATFORM SPECIFICS for FUNCTION WRAPPING.  This is all very        */
-/* ugly.  It's the least-worst tradeoff I can think of.               */
-/* ------------------------------------------------------------------ */
-
-/* This section defines magic (a.k.a appalling-hack) macros for doing
-   guaranteed-no-redirection macros, so as to get from function
-   wrappers to the functions they are wrapping.  The whole point is to
-   construct standard call sequences, but to do the call itself with a
-   special no-redirect call pseudo-instruction that the JIT
-   understands and handles specially.  This section is long and
-   repetitious, and I can't see a way to make it shorter.
-
-   The naming scheme is as follows:
-
-      CALL_FN_{W,v}_{v,W,WW,WWW,WWWW,5W,6W,7W,etc}
-
-   'W' stands for "word" and 'v' for "void".  Hence there are
-   different macros for calling arity 0, 1, 2, 3, 4, etc, functions,
-   and for each, the possibility of returning a word-typed result, or
-   no result.
-*/
-
-/* Use these to write the name of your wrapper.  NOTE: duplicates
-   VG_WRAP_FUNCTION_Z{U,Z} in pub_tool_redir.h. */
-
-#define I_WRAP_SONAME_FNNAME_ZU(soname,fnname)                    \
-   _vgwZU_##soname##_##fnname
-
-#define I_WRAP_SONAME_FNNAME_ZZ(soname,fnname)                    \
-   _vgwZZ_##soname##_##fnname
-
-/* Use this macro from within a wrapper function to collect the
-   context (address and possibly other info) of the original function.
-   Once you have that you can then use it in one of the CALL_FN_
-   macros.  The type of the argument _lval is OrigFn. */
-#define VALGRIND_GET_ORIG_FN(_lval)  VALGRIND_GET_NR_CONTEXT(_lval)
-
-/* Derivatives of the main macros below, for calling functions
-   returning void. */
-
-#define CALL_FN_v_v(fnptr)                                        \
-   do { volatile unsigned long _junk;                             \
-        CALL_FN_W_v(_junk,fnptr); } while (0)
-
-#define CALL_FN_v_W(fnptr, arg1)                                  \
-   do { volatile unsigned long _junk;                             \
-        CALL_FN_W_W(_junk,fnptr,arg1); } while (0)
-
-#define CALL_FN_v_WW(fnptr, arg1,arg2)                            \
-   do { volatile unsigned long _junk;                             \
-        CALL_FN_W_WW(_junk,fnptr,arg1,arg2); } while (0)
-
-#define CALL_FN_v_WWW(fnptr, arg1,arg2,arg3)                      \
-   do { volatile unsigned long _junk;                             \
-        CALL_FN_W_WWW(_junk,fnptr,arg1,arg2,arg3); } while (0)
-
-/* ------------------------- x86-linux ------------------------- */
-
-#if defined(PLAT_x86_linux)
-
-/* These regs are trashed by the hidden call.  No need to mention eax
-   as gcc can already see that, plus causes gcc to bomb. */
-#define __CALLER_SAVED_REGS /*"eax"*/ "ecx", "edx"
-
-/* These CALL_FN_ macros assume that on x86-linux, sizeof(unsigned
-   long) == 4. */
-
-#define CALL_FN_W_v(lval, orig)                                   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[1];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      __asm__ volatile(                                           \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_W(lval, orig, arg1)                             \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[2];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      __asm__ volatile(                                           \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $4, %%esp\n"                                       \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WW(lval, orig, arg1,arg2)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      __asm__ volatile(                                           \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $8, %%esp\n"                                       \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWW(lval, orig, arg1,arg2,arg3)                 \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[4];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      __asm__ volatile(                                           \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $12, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWWW(lval, orig, arg1,arg2,arg3,arg4)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[5];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      __asm__ volatile(                                           \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $16, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_5W(lval, orig, arg1,arg2,arg3,arg4,arg5)        \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[6];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      __asm__ volatile(                                           \
-         "pushl 20(%%eax)\n\t"                                    \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $20, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_6W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6)   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[7];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      __asm__ volatile(                                           \
-         "pushl 24(%%eax)\n\t"                                    \
-         "pushl 20(%%eax)\n\t"                                    \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $24, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_7W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7)                            \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[8];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      __asm__ volatile(                                           \
-         "pushl 28(%%eax)\n\t"                                    \
-         "pushl 24(%%eax)\n\t"                                    \
-         "pushl 20(%%eax)\n\t"                                    \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $28, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_8W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[9];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      __asm__ volatile(                                           \
-         "pushl 32(%%eax)\n\t"                                    \
-         "pushl 28(%%eax)\n\t"                                    \
-         "pushl 24(%%eax)\n\t"                                    \
-         "pushl 20(%%eax)\n\t"                                    \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $32, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_9W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8,arg9)                  \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[10];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      _argvec[9] = (unsigned long)(arg9);                         \
-      __asm__ volatile(                                           \
-         "pushl 36(%%eax)\n\t"                                    \
-         "pushl 32(%%eax)\n\t"                                    \
-         "pushl 28(%%eax)\n\t"                                    \
-         "pushl 24(%%eax)\n\t"                                    \
-         "pushl 20(%%eax)\n\t"                                    \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $36, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_10W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[11];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      _argvec[9] = (unsigned long)(arg9);                         \
-      _argvec[10] = (unsigned long)(arg10);                       \
-      __asm__ volatile(                                           \
-         "pushl 40(%%eax)\n\t"                                    \
-         "pushl 36(%%eax)\n\t"                                    \
-         "pushl 32(%%eax)\n\t"                                    \
-         "pushl 28(%%eax)\n\t"                                    \
-         "pushl 24(%%eax)\n\t"                                    \
-         "pushl 20(%%eax)\n\t"                                    \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $40, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_11W(lval, orig, arg1,arg2,arg3,arg4,arg5,       \
-                                  arg6,arg7,arg8,arg9,arg10,      \
-                                  arg11)                          \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[12];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      _argvec[9] = (unsigned long)(arg9);                         \
-      _argvec[10] = (unsigned long)(arg10);                       \
-      _argvec[11] = (unsigned long)(arg11);                       \
-      __asm__ volatile(                                           \
-         "pushl 44(%%eax)\n\t"                                    \
-         "pushl 40(%%eax)\n\t"                                    \
-         "pushl 36(%%eax)\n\t"                                    \
-         "pushl 32(%%eax)\n\t"                                    \
-         "pushl 28(%%eax)\n\t"                                    \
-         "pushl 24(%%eax)\n\t"                                    \
-         "pushl 20(%%eax)\n\t"                                    \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $44, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_12W(lval, orig, arg1,arg2,arg3,arg4,arg5,       \
-                                  arg6,arg7,arg8,arg9,arg10,      \
-                                  arg11,arg12)                    \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[13];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      _argvec[9] = (unsigned long)(arg9);                         \
-      _argvec[10] = (unsigned long)(arg10);                       \
-      _argvec[11] = (unsigned long)(arg11);                       \
-      _argvec[12] = (unsigned long)(arg12);                       \
-      __asm__ volatile(                                           \
-         "pushl 48(%%eax)\n\t"                                    \
-         "pushl 44(%%eax)\n\t"                                    \
-         "pushl 40(%%eax)\n\t"                                    \
-         "pushl 36(%%eax)\n\t"                                    \
-         "pushl 32(%%eax)\n\t"                                    \
-         "pushl 28(%%eax)\n\t"                                    \
-         "pushl 24(%%eax)\n\t"                                    \
-         "pushl 20(%%eax)\n\t"                                    \
-         "pushl 16(%%eax)\n\t"                                    \
-         "pushl 12(%%eax)\n\t"                                    \
-         "pushl 8(%%eax)\n\t"                                     \
-         "pushl 4(%%eax)\n\t"                                     \
-         "movl (%%eax), %%eax\n\t"  /* target->%eax */            \
-         VALGRIND_CALL_NOREDIR_EAX                                \
-         "addl $48, %%esp\n"                                      \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#endif /* PLAT_x86_linux */
-
-/* ------------------------ amd64-linux ------------------------ */
-
-#if defined(PLAT_amd64_linux)
-
-/* ARGREGS: rdi rsi rdx rcx r8 r9 (the rest on stack in R-to-L order) */
-
-/* These regs are trashed by the hidden call. */
-#define __CALLER_SAVED_REGS /*"rax",*/ "rcx", "rdx", "rsi",       \
-                            "rdi", "r8", "r9", "r10", "r11"
-
-/* These CALL_FN_ macros assume that on amd64-linux, sizeof(unsigned
-   long) == 8. */
-
-/* NB 9 Sept 07.  There is a nasty kludge here in all these CALL_FN_
-   macros.  In order not to trash the stack redzone, we need to drop
-   %rsp by 128 before the hidden call, and restore afterwards.  The
-   nastyness is that it is only by luck that the stack still appears
-   to be unwindable during the hidden call - since then the behaviour
-   of any routine using this macro does not match what the CFI data
-   says.  Sigh.
-
-   Why is this important?  Imagine that a wrapper has a stack
-   allocated local, and passes to the hidden call, a pointer to it.
-   Because gcc does not know about the hidden call, it may allocate
-   that local in the redzone.  Unfortunately the hidden call may then
-   trash it before it comes to use it.  So we must step clear of the
-   redzone, for the duration of the hidden call, to make it safe.
-
-   Probably the same problem afflicts the other redzone-style ABIs too
-   (ppc64-linux, ppc32-aix5, ppc64-aix5); but for those, the stack is
-   self describing (none of this CFI nonsense) so at least messing
-   with the stack pointer doesn't give a danger of non-unwindable
-   stack. */
-
-#define CALL_FN_W_v(lval, orig)                                   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[1];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_W(lval, orig, arg1)                             \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[2];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WW(lval, orig, arg1,arg2)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWW(lval, orig, arg1,arg2,arg3)                 \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[4];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWWW(lval, orig, arg1,arg2,arg3,arg4)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[5];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_5W(lval, orig, arg1,arg2,arg3,arg4,arg5)        \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[6];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "movq 40(%%rax), %%r8\n\t"                               \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_6W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6)   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[7];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "movq 48(%%rax), %%r9\n\t"                               \
-         "movq 40(%%rax), %%r8\n\t"                               \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         "addq $128,%%rsp\n\t"                                    \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_7W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7)                            \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[8];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "pushq 56(%%rax)\n\t"                                    \
-         "movq 48(%%rax), %%r9\n\t"                               \
-         "movq 40(%%rax), %%r8\n\t"                               \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $8, %%rsp\n"                                       \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_8W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[9];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "pushq 64(%%rax)\n\t"                                    \
-         "pushq 56(%%rax)\n\t"                                    \
-         "movq 48(%%rax), %%r9\n\t"                               \
-         "movq 40(%%rax), %%r8\n\t"                               \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $16, %%rsp\n"                                      \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_9W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8,arg9)                  \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[10];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      _argvec[9] = (unsigned long)(arg9);                         \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "pushq 72(%%rax)\n\t"                                    \
-         "pushq 64(%%rax)\n\t"                                    \
-         "pushq 56(%%rax)\n\t"                                    \
-         "movq 48(%%rax), %%r9\n\t"                               \
-         "movq 40(%%rax), %%r8\n\t"                               \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $24, %%rsp\n"                                      \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_10W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[11];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      _argvec[9] = (unsigned long)(arg9);                         \
-      _argvec[10] = (unsigned long)(arg10);                       \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "pushq 80(%%rax)\n\t"                                    \
-         "pushq 72(%%rax)\n\t"                                    \
-         "pushq 64(%%rax)\n\t"                                    \
-         "pushq 56(%%rax)\n\t"                                    \
-         "movq 48(%%rax), %%r9\n\t"                               \
-         "movq 40(%%rax), %%r8\n\t"                               \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $32, %%rsp\n"                                      \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_11W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10,arg11)     \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[12];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      _argvec[9] = (unsigned long)(arg9);                         \
-      _argvec[10] = (unsigned long)(arg10);                       \
-      _argvec[11] = (unsigned long)(arg11);                       \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "pushq 88(%%rax)\n\t"                                    \
-         "pushq 80(%%rax)\n\t"                                    \
-         "pushq 72(%%rax)\n\t"                                    \
-         "pushq 64(%%rax)\n\t"                                    \
-         "pushq 56(%%rax)\n\t"                                    \
-         "movq 48(%%rax), %%r9\n\t"                               \
-         "movq 40(%%rax), %%r8\n\t"                               \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $40, %%rsp\n"                                      \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_12W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                arg7,arg8,arg9,arg10,arg11,arg12) \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[13];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)(arg1);                         \
-      _argvec[2] = (unsigned long)(arg2);                         \
-      _argvec[3] = (unsigned long)(arg3);                         \
-      _argvec[4] = (unsigned long)(arg4);                         \
-      _argvec[5] = (unsigned long)(arg5);                         \
-      _argvec[6] = (unsigned long)(arg6);                         \
-      _argvec[7] = (unsigned long)(arg7);                         \
-      _argvec[8] = (unsigned long)(arg8);                         \
-      _argvec[9] = (unsigned long)(arg9);                         \
-      _argvec[10] = (unsigned long)(arg10);                       \
-      _argvec[11] = (unsigned long)(arg11);                       \
-      _argvec[12] = (unsigned long)(arg12);                       \
-      __asm__ volatile(                                           \
-         "subq $128,%%rsp\n\t"                                    \
-         "pushq 96(%%rax)\n\t"                                    \
-         "pushq 88(%%rax)\n\t"                                    \
-         "pushq 80(%%rax)\n\t"                                    \
-         "pushq 72(%%rax)\n\t"                                    \
-         "pushq 64(%%rax)\n\t"                                    \
-         "pushq 56(%%rax)\n\t"                                    \
-         "movq 48(%%rax), %%r9\n\t"                               \
-         "movq 40(%%rax), %%r8\n\t"                               \
-         "movq 32(%%rax), %%rcx\n\t"                              \
-         "movq 24(%%rax), %%rdx\n\t"                              \
-         "movq 16(%%rax), %%rsi\n\t"                              \
-         "movq 8(%%rax), %%rdi\n\t"                               \
-         "movq (%%rax), %%rax\n\t"  /* target->%rax */            \
-         VALGRIND_CALL_NOREDIR_RAX                                \
-         "addq $48, %%rsp\n"                                      \
-         "addq $128,%%rsp\n\t"                                    \
-         : /*out*/   "=a" (_res)                                  \
-         : /*in*/    "a" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#endif /* PLAT_amd64_linux */
-
-/* ------------------------ ppc32-linux ------------------------ */
-
-#if defined(PLAT_ppc32_linux)
-
-/* This is useful for finding out about the on-stack stuff:
-
-   extern int f9  ( int,int,int,int,int,int,int,int,int );
-   extern int f10 ( int,int,int,int,int,int,int,int,int,int );
-   extern int f11 ( int,int,int,int,int,int,int,int,int,int,int );
-   extern int f12 ( int,int,int,int,int,int,int,int,int,int,int,int );
-
-   int g9 ( void ) {
-      return f9(11,22,33,44,55,66,77,88,99);
-   }
-   int g10 ( void ) {
-      return f10(11,22,33,44,55,66,77,88,99,110);
-   }
-   int g11 ( void ) {
-      return f11(11,22,33,44,55,66,77,88,99,110,121);
-   }
-   int g12 ( void ) {
-      return f12(11,22,33,44,55,66,77,88,99,110,121,132);
-   }
-*/
-
-/* ARGREGS: r3 r4 r5 r6 r7 r8 r9 r10 (the rest on stack somewhere) */
-
-/* These regs are trashed by the hidden call. */
-#define __CALLER_SAVED_REGS                                       \
-   "lr", "ctr", "xer",                                            \
-   "cr0", "cr1", "cr2", "cr3", "cr4", "cr5", "cr6", "cr7",        \
-   "r0", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10",   \
-   "r11", "r12", "r13"
-
-/* These CALL_FN_ macros assume that on ppc32-linux, 
-   sizeof(unsigned long) == 4. */
-
-#define CALL_FN_W_v(lval, orig)                                   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[1];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_W(lval, orig, arg1)                             \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[2];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WW(lval, orig, arg1,arg2)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWW(lval, orig, arg1,arg2,arg3)                 \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[4];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWWW(lval, orig, arg1,arg2,arg3,arg4)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[5];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_5W(lval, orig, arg1,arg2,arg3,arg4,arg5)        \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[6];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      _argvec[5] = (unsigned long)arg5;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 7,20(11)\n\t"                                       \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_6W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6)   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[7];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      _argvec[5] = (unsigned long)arg5;                           \
-      _argvec[6] = (unsigned long)arg6;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 7,20(11)\n\t"                                       \
-         "lwz 8,24(11)\n\t"                                       \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_7W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7)                            \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[8];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      _argvec[5] = (unsigned long)arg5;                           \
-      _argvec[6] = (unsigned long)arg6;                           \
-      _argvec[7] = (unsigned long)arg7;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 7,20(11)\n\t"                                       \
-         "lwz 8,24(11)\n\t"                                       \
-         "lwz 9,28(11)\n\t"                                       \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_8W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[9];                          \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      _argvec[5] = (unsigned long)arg5;                           \
-      _argvec[6] = (unsigned long)arg6;                           \
-      _argvec[7] = (unsigned long)arg7;                           \
-      _argvec[8] = (unsigned long)arg8;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 7,20(11)\n\t"                                       \
-         "lwz 8,24(11)\n\t"                                       \
-         "lwz 9,28(11)\n\t"                                       \
-         "lwz 10,32(11)\n\t" /* arg8->r10 */                      \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_9W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8,arg9)                  \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[10];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      _argvec[5] = (unsigned long)arg5;                           \
-      _argvec[6] = (unsigned long)arg6;                           \
-      _argvec[7] = (unsigned long)arg7;                           \
-      _argvec[8] = (unsigned long)arg8;                           \
-      _argvec[9] = (unsigned long)arg9;                           \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "addi 1,1,-16\n\t"                                       \
-         /* arg9 */                                               \
-         "lwz 3,36(11)\n\t"                                       \
-         "stw 3,8(1)\n\t"                                         \
-         /* args1-8 */                                            \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 7,20(11)\n\t"                                       \
-         "lwz 8,24(11)\n\t"                                       \
-         "lwz 9,28(11)\n\t"                                       \
-         "lwz 10,32(11)\n\t" /* arg8->r10 */                      \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "addi 1,1,16\n\t"                                        \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_10W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[11];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      _argvec[5] = (unsigned long)arg5;                           \
-      _argvec[6] = (unsigned long)arg6;                           \
-      _argvec[7] = (unsigned long)arg7;                           \
-      _argvec[8] = (unsigned long)arg8;                           \
-      _argvec[9] = (unsigned long)arg9;                           \
-      _argvec[10] = (unsigned long)arg10;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "addi 1,1,-16\n\t"                                       \
-         /* arg10 */                                              \
-         "lwz 3,40(11)\n\t"                                       \
-         "stw 3,12(1)\n\t"                                        \
-         /* arg9 */                                               \
-         "lwz 3,36(11)\n\t"                                       \
-         "stw 3,8(1)\n\t"                                         \
-         /* args1-8 */                                            \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 7,20(11)\n\t"                                       \
-         "lwz 8,24(11)\n\t"                                       \
-         "lwz 9,28(11)\n\t"                                       \
-         "lwz 10,32(11)\n\t" /* arg8->r10 */                      \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "addi 1,1,16\n\t"                                        \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_11W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10,arg11)     \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[12];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      _argvec[5] = (unsigned long)arg5;                           \
-      _argvec[6] = (unsigned long)arg6;                           \
-      _argvec[7] = (unsigned long)arg7;                           \
-      _argvec[8] = (unsigned long)arg8;                           \
-      _argvec[9] = (unsigned long)arg9;                           \
-      _argvec[10] = (unsigned long)arg10;                         \
-      _argvec[11] = (unsigned long)arg11;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "addi 1,1,-32\n\t"                                       \
-         /* arg11 */                                              \
-         "lwz 3,44(11)\n\t"                                       \
-         "stw 3,16(1)\n\t"                                        \
-         /* arg10 */                                              \
-         "lwz 3,40(11)\n\t"                                       \
-         "stw 3,12(1)\n\t"                                        \
-         /* arg9 */                                               \
-         "lwz 3,36(11)\n\t"                                       \
-         "stw 3,8(1)\n\t"                                         \
-         /* args1-8 */                                            \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 7,20(11)\n\t"                                       \
-         "lwz 8,24(11)\n\t"                                       \
-         "lwz 9,28(11)\n\t"                                       \
-         "lwz 10,32(11)\n\t" /* arg8->r10 */                      \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "addi 1,1,32\n\t"                                        \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_12W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                arg7,arg8,arg9,arg10,arg11,arg12) \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[13];                         \
-      volatile unsigned long _res;                                \
-      _argvec[0] = (unsigned long)_orig.nraddr;                   \
-      _argvec[1] = (unsigned long)arg1;                           \
-      _argvec[2] = (unsigned long)arg2;                           \
-      _argvec[3] = (unsigned long)arg3;                           \
-      _argvec[4] = (unsigned long)arg4;                           \
-      _argvec[5] = (unsigned long)arg5;                           \
-      _argvec[6] = (unsigned long)arg6;                           \
-      _argvec[7] = (unsigned long)arg7;                           \
-      _argvec[8] = (unsigned long)arg8;                           \
-      _argvec[9] = (unsigned long)arg9;                           \
-      _argvec[10] = (unsigned long)arg10;                         \
-      _argvec[11] = (unsigned long)arg11;                         \
-      _argvec[12] = (unsigned long)arg12;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "addi 1,1,-32\n\t"                                       \
-         /* arg12 */                                              \
-         "lwz 3,48(11)\n\t"                                       \
-         "stw 3,20(1)\n\t"                                        \
-         /* arg11 */                                              \
-         "lwz 3,44(11)\n\t"                                       \
-         "stw 3,16(1)\n\t"                                        \
-         /* arg10 */                                              \
-         "lwz 3,40(11)\n\t"                                       \
-         "stw 3,12(1)\n\t"                                        \
-         /* arg9 */                                               \
-         "lwz 3,36(11)\n\t"                                       \
-         "stw 3,8(1)\n\t"                                         \
-         /* args1-8 */                                            \
-         "lwz 3,4(11)\n\t"   /* arg1->r3 */                       \
-         "lwz 4,8(11)\n\t"                                        \
-         "lwz 5,12(11)\n\t"                                       \
-         "lwz 6,16(11)\n\t"  /* arg4->r6 */                       \
-         "lwz 7,20(11)\n\t"                                       \
-         "lwz 8,24(11)\n\t"                                       \
-         "lwz 9,28(11)\n\t"                                       \
-         "lwz 10,32(11)\n\t" /* arg8->r10 */                      \
-         "lwz 11,0(11)\n\t"  /* target->r11 */                    \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "addi 1,1,32\n\t"                                        \
-         "mr %0,3"                                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[0])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#endif /* PLAT_ppc32_linux */
-
-/* ------------------------ ppc64-linux ------------------------ */
-
-#if defined(PLAT_ppc64_linux)
-
-/* ARGREGS: r3 r4 r5 r6 r7 r8 r9 r10 (the rest on stack somewhere) */
-
-/* These regs are trashed by the hidden call. */
-#define __CALLER_SAVED_REGS                                       \
-   "lr", "ctr", "xer",                                            \
-   "cr0", "cr1", "cr2", "cr3", "cr4", "cr5", "cr6", "cr7",        \
-   "r0", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10",   \
-   "r11", "r12", "r13"
-
-/* These CALL_FN_ macros assume that on ppc64-linux, sizeof(unsigned
-   long) == 8. */
-
-#define CALL_FN_W_v(lval, orig)                                   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+0];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1] = (unsigned long)_orig.r2;                       \
-      _argvec[2] = (unsigned long)_orig.nraddr;                   \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_W(lval, orig, arg1)                             \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+1];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WW(lval, orig, arg1,arg2)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+2];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWW(lval, orig, arg1,arg2,arg3)                 \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+3];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWWW(lval, orig, arg1,arg2,arg3,arg4)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+4];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_5W(lval, orig, arg1,arg2,arg3,arg4,arg5)        \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+5];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_6W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6)   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+6];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_7W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7)                            \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+7];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_8W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+8];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)" /* restore tocptr */                      \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_9W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8,arg9)                  \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+9];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "addi 1,1,-128\n\t"  /* expand stack frame */            \
-         /* arg9 */                                               \
-         "ld  3,72(11)\n\t"                                       \
-         "std 3,112(1)\n\t"                                       \
-         /* args1-8 */                                            \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)\n\t" /* restore tocptr */                  \
-         "addi 1,1,128"     /* restore frame */                   \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_10W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+10];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "addi 1,1,-128\n\t"  /* expand stack frame */            \
-         /* arg10 */                                              \
-         "ld  3,80(11)\n\t"                                       \
-         "std 3,120(1)\n\t"                                       \
-         /* arg9 */                                               \
-         "ld  3,72(11)\n\t"                                       \
-         "std 3,112(1)\n\t"                                       \
-         /* args1-8 */                                            \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)\n\t" /* restore tocptr */                  \
-         "addi 1,1,128"     /* restore frame */                   \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_11W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10,arg11)     \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+11];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      _argvec[2+11] = (unsigned long)arg11;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "addi 1,1,-144\n\t"  /* expand stack frame */            \
-         /* arg11 */                                              \
-         "ld  3,88(11)\n\t"                                       \
-         "std 3,128(1)\n\t"                                       \
-         /* arg10 */                                              \
-         "ld  3,80(11)\n\t"                                       \
-         "std 3,120(1)\n\t"                                       \
-         /* arg9 */                                               \
-         "ld  3,72(11)\n\t"                                       \
-         "std 3,112(1)\n\t"                                       \
-         /* args1-8 */                                            \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)\n\t" /* restore tocptr */                  \
-         "addi 1,1,144"     /* restore frame */                   \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_12W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                arg7,arg8,arg9,arg10,arg11,arg12) \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+12];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      _argvec[2+11] = (unsigned long)arg11;                       \
-      _argvec[2+12] = (unsigned long)arg12;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         "std 2,-16(11)\n\t"  /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "addi 1,1,-144\n\t"  /* expand stack frame */            \
-         /* arg12 */                                              \
-         "ld  3,96(11)\n\t"                                       \
-         "std 3,136(1)\n\t"                                       \
-         /* arg11 */                                              \
-         "ld  3,88(11)\n\t"                                       \
-         "std 3,128(1)\n\t"                                       \
-         /* arg10 */                                              \
-         "ld  3,80(11)\n\t"                                       \
-         "std 3,120(1)\n\t"                                       \
-         /* arg9 */                                               \
-         "ld  3,72(11)\n\t"                                       \
-         "std 3,112(1)\n\t"                                       \
-         /* args1-8 */                                            \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)\n\t" /* restore tocptr */                  \
-         "addi 1,1,144"     /* restore frame */                   \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#endif /* PLAT_ppc64_linux */
-
-/* ------------------------ ppc32-aix5 ------------------------- */
-
-#if defined(PLAT_ppc32_aix5)
-
-/* ARGREGS: r3 r4 r5 r6 r7 r8 r9 r10 (the rest on stack somewhere) */
-
-/* These regs are trashed by the hidden call. */
-#define __CALLER_SAVED_REGS                                       \
-   "lr", "ctr", "xer",                                            \
-   "cr0", "cr1", "cr2", "cr3", "cr4", "cr5", "cr6", "cr7",        \
-   "r0", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10",   \
-   "r11", "r12", "r13"
-
-/* Expand the stack frame, copying enough info that unwinding
-   still works.  Trashes r3. */
-
-#define VG_EXPAND_FRAME_BY_trashes_r3(_n_fr)                      \
-         "addi 1,1,-" #_n_fr "\n\t"                               \
-         "lwz  3," #_n_fr "(1)\n\t"                               \
-         "stw  3,0(1)\n\t"
-
-#define VG_CONTRACT_FRAME_BY(_n_fr)                               \
-         "addi 1,1," #_n_fr "\n\t"
-
-/* These CALL_FN_ macros assume that on ppc32-aix5, sizeof(unsigned
-   long) == 4. */
-
-#define CALL_FN_W_v(lval, orig)                                   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+0];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1] = (unsigned long)_orig.r2;                       \
-      _argvec[2] = (unsigned long)_orig.nraddr;                   \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_W(lval, orig, arg1)                             \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+1];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WW(lval, orig, arg1,arg2)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+2];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWW(lval, orig, arg1,arg2,arg3)                 \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+3];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWWW(lval, orig, arg1,arg2,arg3,arg4)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+4];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_5W(lval, orig, arg1,arg2,arg3,arg4,arg5)        \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+5];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t" /* arg2->r4 */                       \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz  7, 20(11)\n\t" /* arg5->r7 */                      \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_6W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6)   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+6];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz  7, 20(11)\n\t" /* arg5->r7 */                      \
-         "lwz  8, 24(11)\n\t" /* arg6->r8 */                      \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_7W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7)                            \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+7];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz  7, 20(11)\n\t" /* arg5->r7 */                      \
-         "lwz  8, 24(11)\n\t" /* arg6->r8 */                      \
-         "lwz  9, 28(11)\n\t" /* arg7->r9 */                      \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_8W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+8];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz  7, 20(11)\n\t" /* arg5->r7 */                      \
-         "lwz  8, 24(11)\n\t" /* arg6->r8 */                      \
-         "lwz  9, 28(11)\n\t" /* arg7->r9 */                      \
-         "lwz 10, 32(11)\n\t" /* arg8->r10 */                     \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_9W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8,arg9)                  \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+9];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         VG_EXPAND_FRAME_BY_trashes_r3(64)                        \
-         /* arg9 */                                               \
-         "lwz 3,36(11)\n\t"                                       \
-         "stw 3,56(1)\n\t"                                        \
-         /* args1-8 */                                            \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz  7, 20(11)\n\t" /* arg5->r7 */                      \
-         "lwz  8, 24(11)\n\t" /* arg6->r8 */                      \
-         "lwz  9, 28(11)\n\t" /* arg7->r9 */                      \
-         "lwz 10, 32(11)\n\t" /* arg8->r10 */                     \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(64)                                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_10W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+10];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         VG_EXPAND_FRAME_BY_trashes_r3(64)                        \
-         /* arg10 */                                              \
-         "lwz 3,40(11)\n\t"                                       \
-         "stw 3,60(1)\n\t"                                        \
-         /* arg9 */                                               \
-         "lwz 3,36(11)\n\t"                                       \
-         "stw 3,56(1)\n\t"                                        \
-         /* args1-8 */                                            \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz  7, 20(11)\n\t" /* arg5->r7 */                      \
-         "lwz  8, 24(11)\n\t" /* arg6->r8 */                      \
-         "lwz  9, 28(11)\n\t" /* arg7->r9 */                      \
-         "lwz 10, 32(11)\n\t" /* arg8->r10 */                     \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(64)                                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_11W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10,arg11)     \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+11];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      _argvec[2+11] = (unsigned long)arg11;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         VG_EXPAND_FRAME_BY_trashes_r3(72)                        \
-         /* arg11 */                                              \
-         "lwz 3,44(11)\n\t"                                       \
-         "stw 3,64(1)\n\t"                                        \
-         /* arg10 */                                              \
-         "lwz 3,40(11)\n\t"                                       \
-         "stw 3,60(1)\n\t"                                        \
-         /* arg9 */                                               \
-         "lwz 3,36(11)\n\t"                                       \
-         "stw 3,56(1)\n\t"                                        \
-         /* args1-8 */                                            \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz  7, 20(11)\n\t" /* arg5->r7 */                      \
-         "lwz  8, 24(11)\n\t" /* arg6->r8 */                      \
-         "lwz  9, 28(11)\n\t" /* arg7->r9 */                      \
-         "lwz 10, 32(11)\n\t" /* arg8->r10 */                     \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(72)                                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_12W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                arg7,arg8,arg9,arg10,arg11,arg12) \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+12];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      _argvec[2+11] = (unsigned long)arg11;                       \
-      _argvec[2+12] = (unsigned long)arg12;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "stw  2,-8(11)\n\t"  /* save tocptr */                   \
-         "lwz  2,-4(11)\n\t"  /* use nraddr's tocptr */           \
-         VG_EXPAND_FRAME_BY_trashes_r3(72)                        \
-         /* arg12 */                                              \
-         "lwz 3,48(11)\n\t"                                       \
-         "stw 3,68(1)\n\t"                                        \
-         /* arg11 */                                              \
-         "lwz 3,44(11)\n\t"                                       \
-         "stw 3,64(1)\n\t"                                        \
-         /* arg10 */                                              \
-         "lwz 3,40(11)\n\t"                                       \
-         "stw 3,60(1)\n\t"                                        \
-         /* arg9 */                                               \
-         "lwz 3,36(11)\n\t"                                       \
-         "stw 3,56(1)\n\t"                                        \
-         /* args1-8 */                                            \
-         "lwz  3, 4(11)\n\t"  /* arg1->r3 */                      \
-         "lwz  4, 8(11)\n\t"  /* arg2->r4 */                      \
-         "lwz  5, 12(11)\n\t" /* arg3->r5 */                      \
-         "lwz  6, 16(11)\n\t" /* arg4->r6 */                      \
-         "lwz  7, 20(11)\n\t" /* arg5->r7 */                      \
-         "lwz  8, 24(11)\n\t" /* arg6->r8 */                      \
-         "lwz  9, 28(11)\n\t" /* arg7->r9 */                      \
-         "lwz 10, 32(11)\n\t" /* arg8->r10 */                     \
-         "lwz 11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "lwz 2,-8(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(72)                                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#endif /* PLAT_ppc32_aix5 */
-
-/* ------------------------ ppc64-aix5 ------------------------- */
-
-#if defined(PLAT_ppc64_aix5)
-
-/* ARGREGS: r3 r4 r5 r6 r7 r8 r9 r10 (the rest on stack somewhere) */
-
-/* These regs are trashed by the hidden call. */
-#define __CALLER_SAVED_REGS                                       \
-   "lr", "ctr", "xer",                                            \
-   "cr0", "cr1", "cr2", "cr3", "cr4", "cr5", "cr6", "cr7",        \
-   "r0", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9", "r10",   \
-   "r11", "r12", "r13"
-
-/* Expand the stack frame, copying enough info that unwinding
-   still works.  Trashes r3. */
-
-#define VG_EXPAND_FRAME_BY_trashes_r3(_n_fr)                      \
-         "addi 1,1,-" #_n_fr "\n\t"                               \
-         "ld   3," #_n_fr "(1)\n\t"                               \
-         "std  3,0(1)\n\t"
-
-#define VG_CONTRACT_FRAME_BY(_n_fr)                               \
-         "addi 1,1," #_n_fr "\n\t"
-
-/* These CALL_FN_ macros assume that on ppc64-aix5, sizeof(unsigned
-   long) == 8. */
-
-#define CALL_FN_W_v(lval, orig)                                   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+0];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1] = (unsigned long)_orig.r2;                       \
-      _argvec[2] = (unsigned long)_orig.nraddr;                   \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_W(lval, orig, arg1)                             \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+1];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld 2,-16(11)\n\t" /* restore tocptr */                  \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WW(lval, orig, arg1,arg2)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+2];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWW(lval, orig, arg1,arg2,arg3)                 \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+3];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_WWWW(lval, orig, arg1,arg2,arg3,arg4)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+4];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_5W(lval, orig, arg1,arg2,arg3,arg4,arg5)        \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+5];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_6W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6)   \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+6];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_7W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7)                            \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+7];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_8W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8)                       \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+8];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_9W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,   \
-                                 arg7,arg8,arg9)                  \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+9];                        \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         VG_EXPAND_FRAME_BY_trashes_r3(128)                       \
-         /* arg9 */                                               \
-         "ld  3,72(11)\n\t"                                       \
-         "std 3,112(1)\n\t"                                       \
-         /* args1-8 */                                            \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(128)                                \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_10W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10)           \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+10];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         VG_EXPAND_FRAME_BY_trashes_r3(128)                       \
-         /* arg10 */                                              \
-         "ld  3,80(11)\n\t"                                       \
-         "std 3,120(1)\n\t"                                       \
-         /* arg9 */                                               \
-         "ld  3,72(11)\n\t"                                       \
-         "std 3,112(1)\n\t"                                       \
-         /* args1-8 */                                            \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(128)                                \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_11W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                  arg7,arg8,arg9,arg10,arg11)     \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+11];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      _argvec[2+11] = (unsigned long)arg11;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         VG_EXPAND_FRAME_BY_trashes_r3(144)                       \
-         /* arg11 */                                              \
-         "ld  3,88(11)\n\t"                                       \
-         "std 3,128(1)\n\t"                                       \
-         /* arg10 */                                              \
-         "ld  3,80(11)\n\t"                                       \
-         "std 3,120(1)\n\t"                                       \
-         /* arg9 */                                               \
-         "ld  3,72(11)\n\t"                                       \
-         "std 3,112(1)\n\t"                                       \
-         /* args1-8 */                                            \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(144)                                \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#define CALL_FN_W_12W(lval, orig, arg1,arg2,arg3,arg4,arg5,arg6,  \
-                                arg7,arg8,arg9,arg10,arg11,arg12) \
-   do {                                                           \
-      volatile OrigFn        _orig = (orig);                      \
-      volatile unsigned long _argvec[3+12];                       \
-      volatile unsigned long _res;                                \
-      /* _argvec[0] holds current r2 across the call */           \
-      _argvec[1]   = (unsigned long)_orig.r2;                     \
-      _argvec[2]   = (unsigned long)_orig.nraddr;                 \
-      _argvec[2+1] = (unsigned long)arg1;                         \
-      _argvec[2+2] = (unsigned long)arg2;                         \
-      _argvec[2+3] = (unsigned long)arg3;                         \
-      _argvec[2+4] = (unsigned long)arg4;                         \
-      _argvec[2+5] = (unsigned long)arg5;                         \
-      _argvec[2+6] = (unsigned long)arg6;                         \
-      _argvec[2+7] = (unsigned long)arg7;                         \
-      _argvec[2+8] = (unsigned long)arg8;                         \
-      _argvec[2+9] = (unsigned long)arg9;                         \
-      _argvec[2+10] = (unsigned long)arg10;                       \
-      _argvec[2+11] = (unsigned long)arg11;                       \
-      _argvec[2+12] = (unsigned long)arg12;                       \
-      __asm__ volatile(                                           \
-         "mr 11,%1\n\t"                                           \
-         VG_EXPAND_FRAME_BY_trashes_r3(512)                       \
-         "std  2,-16(11)\n\t" /* save tocptr */                   \
-         "ld   2,-8(11)\n\t"  /* use nraddr's tocptr */           \
-         VG_EXPAND_FRAME_BY_trashes_r3(144)                       \
-         /* arg12 */                                              \
-         "ld  3,96(11)\n\t"                                       \
-         "std 3,136(1)\n\t"                                       \
-         /* arg11 */                                              \
-         "ld  3,88(11)\n\t"                                       \
-         "std 3,128(1)\n\t"                                       \
-         /* arg10 */                                              \
-         "ld  3,80(11)\n\t"                                       \
-         "std 3,120(1)\n\t"                                       \
-         /* arg9 */                                               \
-         "ld  3,72(11)\n\t"                                       \
-         "std 3,112(1)\n\t"                                       \
-         /* args1-8 */                                            \
-         "ld   3, 8(11)\n\t"  /* arg1->r3 */                      \
-         "ld   4, 16(11)\n\t" /* arg2->r4 */                      \
-         "ld   5, 24(11)\n\t" /* arg3->r5 */                      \
-         "ld   6, 32(11)\n\t" /* arg4->r6 */                      \
-         "ld   7, 40(11)\n\t" /* arg5->r7 */                      \
-         "ld   8, 48(11)\n\t" /* arg6->r8 */                      \
-         "ld   9, 56(11)\n\t" /* arg7->r9 */                      \
-         "ld  10, 64(11)\n\t" /* arg8->r10 */                     \
-         "ld  11, 0(11)\n\t"  /* target->r11 */                   \
-         VALGRIND_BRANCH_AND_LINK_TO_NOREDIR_R11                  \
-         "mr 11,%1\n\t"                                           \
-         "mr %0,3\n\t"                                            \
-         "ld  2,-16(11)\n\t" /* restore tocptr */                 \
-         VG_CONTRACT_FRAME_BY(144)                                \
-         VG_CONTRACT_FRAME_BY(512)                                \
-         : /*out*/   "=r" (_res)                                  \
-         : /*in*/    "r" (&_argvec[2])                            \
-         : /*trash*/ "cc", "memory", __CALLER_SAVED_REGS          \
-      );                                                          \
-      lval = (__typeof__(lval)) _res;                             \
-   } while (0)
-
-#endif /* PLAT_ppc64_aix5 */
-
-
-/* ------------------------------------------------------------------ */
-/* ARCHITECTURE INDEPENDENT MACROS for CLIENT REQUESTS.               */
-/*                                                                    */
-/* ------------------------------------------------------------------ */
-
-/* Some request codes.  There are many more of these, but most are not
-   exposed to end-user view.  These are the public ones, all of the
-   form 0x1000 + small_number.
-
-   Core ones are in the range 0x00000000--0x0000ffff.  The non-public
-   ones start at 0x2000.
-*/
-
-/* These macros are used by tools -- they must be public, but don't
-   embed them into other programs. */
-#define VG_USERREQ_TOOL_BASE(a,b) \
-   ((unsigned int)(((a)&0xff) << 24 | ((b)&0xff) << 16))
-#define VG_IS_TOOL_USERREQ(a, b, v) \
-   (VG_USERREQ_TOOL_BASE(a,b) == ((v) & 0xffff0000))
-
-/* !! ABIWARNING !! ABIWARNING !! ABIWARNING !! ABIWARNING !! 
-   This enum comprises an ABI exported by Valgrind to programs
-   which use client requests.  DO NOT CHANGE THE ORDER OF THESE
-   ENTRIES, NOR DELETE ANY -- add new ones at the end. */
-typedef
-   enum { VG_USERREQ__RUNNING_ON_VALGRIND  = 0x1001,
-          VG_USERREQ__DISCARD_TRANSLATIONS = 0x1002,
-
-          /* These allow any function to be called from the simulated
-             CPU but run on the real CPU.  Nb: the first arg passed to
-             the function is always the ThreadId of the running
-             thread!  So CLIENT_CALL0 actually requires a 1 arg
-             function, etc. */
-          VG_USERREQ__CLIENT_CALL0 = 0x1101,
-          VG_USERREQ__CLIENT_CALL1 = 0x1102,
-          VG_USERREQ__CLIENT_CALL2 = 0x1103,
-          VG_USERREQ__CLIENT_CALL3 = 0x1104,
-
-          /* Can be useful in regression testing suites -- eg. can
-             send Valgrind's output to /dev/null and still count
-             errors. */
-          VG_USERREQ__COUNT_ERRORS = 0x1201,
-
-          /* These are useful and can be interpreted by any tool that
-             tracks malloc() et al, by using vg_replace_malloc.c. */
-          VG_USERREQ__MALLOCLIKE_BLOCK = 0x1301,
-          VG_USERREQ__FREELIKE_BLOCK   = 0x1302,
-          /* Memory pool support. */
-          VG_USERREQ__CREATE_MEMPOOL   = 0x1303,
-          VG_USERREQ__DESTROY_MEMPOOL  = 0x1304,
-          VG_USERREQ__MEMPOOL_ALLOC    = 0x1305,
-          VG_USERREQ__MEMPOOL_FREE     = 0x1306,
-          VG_USERREQ__MEMPOOL_TRIM     = 0x1307,
-          VG_USERREQ__MOVE_MEMPOOL     = 0x1308,
-          VG_USERREQ__MEMPOOL_CHANGE   = 0x1309,
-          VG_USERREQ__MEMPOOL_EXISTS   = 0x130a,
-
-          /* Allow printfs to valgrind log. */
-          VG_USERREQ__PRINTF           = 0x1401,
-          VG_USERREQ__PRINTF_BACKTRACE = 0x1402,
-
-          /* Stack support. */
-          VG_USERREQ__STACK_REGISTER   = 0x1501,
-          VG_USERREQ__STACK_DEREGISTER = 0x1502,
-          VG_USERREQ__STACK_CHANGE     = 0x1503
-   } Vg_ClientRequest;
-
-#if !defined(__GNUC__)
-#  define __extension__ /* */
-#endif
-
-/* Returns the number of Valgrinds this code is running under.  That
-   is, 0 if running natively, 1 if running under Valgrind, 2 if
-   running under Valgrind which is running under another Valgrind,
-   etc. */
-#define RUNNING_ON_VALGRIND  __extension__                        \
-   ({unsigned int _qzz_res;                                       \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0 /* if not */,          \
-                               VG_USERREQ__RUNNING_ON_VALGRIND,   \
-                               0, 0, 0, 0, 0);                    \
-    _qzz_res;                                                     \
-   })
-
-
-/* Discard translation of code in the range [_qzz_addr .. _qzz_addr +
-   _qzz_len - 1].  Useful if you are debugging a JITter or some such,
-   since it provides a way to make sure valgrind will retranslate the
-   invalidated area.  Returns no value. */
-#define VALGRIND_DISCARD_TRANSLATIONS(_qzz_addr,_qzz_len)         \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__DISCARD_TRANSLATIONS,  \
-                               _qzz_addr, _qzz_len, 0, 0, 0);     \
-   }
-
-
-/* These requests are for getting Valgrind itself to print something.
-   Possibly with a backtrace.  This is a really ugly hack. */
-
-#if defined(NVALGRIND)
-
-#  define VALGRIND_PRINTF(...)
-#  define VALGRIND_PRINTF_BACKTRACE(...)
-
-#else /* NVALGRIND */
-
-/* Modern GCC will optimize the static routine out if unused,
-   and unused attribute will shut down warnings about it.  */
-static int VALGRIND_PRINTF(const char *format, ...)
-   __attribute__((format(__printf__, 1, 2), __unused__));
-static int
-VALGRIND_PRINTF(const char *format, ...)
-{
-   unsigned long _qzz_res;
-   va_list vargs;
-   va_start(vargs, format);
-   VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0, VG_USERREQ__PRINTF,
-                              (unsigned long)format, (unsigned long)vargs, 
-                              0, 0, 0);
-   va_end(vargs);
-   return (int)_qzz_res;
-}
-
-static int VALGRIND_PRINTF_BACKTRACE(const char *format, ...)
-   __attribute__((format(__printf__, 1, 2), __unused__));
-static int
-VALGRIND_PRINTF_BACKTRACE(const char *format, ...)
-{
-   unsigned long _qzz_res;
-   va_list vargs;
-   va_start(vargs, format);
-   VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0, VG_USERREQ__PRINTF_BACKTRACE,
-                              (unsigned long)format, (unsigned long)vargs, 
-                              0, 0, 0);
-   va_end(vargs);
-   return (int)_qzz_res;
-}
-
-#endif /* NVALGRIND */
-
-
-/* These requests allow control to move from the simulated CPU to the
-   real CPU, calling an arbitary function.
-   
-   Note that the current ThreadId is inserted as the first argument.
-   So this call:
-
-     VALGRIND_NON_SIMD_CALL2(f, arg1, arg2)
-
-   requires f to have this signature:
-
-     Word f(Word tid, Word arg1, Word arg2)
-
-   where "Word" is a word-sized type.
-
-   Note that these client requests are not entirely reliable.  For example,
-   if you call a function with them that subsequently calls printf(),
-   there's a high chance Valgrind will crash.  Generally, your prospects of
-   these working are made higher if the called function does not refer to
-   any global variables, and does not refer to any libc or other functions
-   (printf et al).  Any kind of entanglement with libc or dynamic linking is
-   likely to have a bad outcome, for tricky reasons which we've grappled
-   with a lot in the past.
-*/
-#define VALGRIND_NON_SIMD_CALL0(_qyy_fn)                          \
-   __extension__                                                  \
-   ({unsigned long _qyy_res;                                      \
-    VALGRIND_DO_CLIENT_REQUEST(_qyy_res, 0 /* default return */,  \
-                               VG_USERREQ__CLIENT_CALL0,          \
-                               _qyy_fn,                           \
-                               0, 0, 0, 0);                       \
-    _qyy_res;                                                     \
-   })
-
-#define VALGRIND_NON_SIMD_CALL1(_qyy_fn, _qyy_arg1)               \
-   __extension__                                                  \
-   ({unsigned long _qyy_res;                                      \
-    VALGRIND_DO_CLIENT_REQUEST(_qyy_res, 0 /* default return */,  \
-                               VG_USERREQ__CLIENT_CALL1,          \
-                               _qyy_fn,                           \
-                               _qyy_arg1, 0, 0, 0);               \
-    _qyy_res;                                                     \
-   })
-
-#define VALGRIND_NON_SIMD_CALL2(_qyy_fn, _qyy_arg1, _qyy_arg2)    \
-   __extension__                                                  \
-   ({unsigned long _qyy_res;                                      \
-    VALGRIND_DO_CLIENT_REQUEST(_qyy_res, 0 /* default return */,  \
-                               VG_USERREQ__CLIENT_CALL2,          \
-                               _qyy_fn,                           \
-                               _qyy_arg1, _qyy_arg2, 0, 0);       \
-    _qyy_res;                                                     \
-   })
-
-#define VALGRIND_NON_SIMD_CALL3(_qyy_fn, _qyy_arg1, _qyy_arg2, _qyy_arg3) \
-   __extension__                                                  \
-   ({unsigned long _qyy_res;                                      \
-    VALGRIND_DO_CLIENT_REQUEST(_qyy_res, 0 /* default return */,  \
-                               VG_USERREQ__CLIENT_CALL3,          \
-                               _qyy_fn,                           \
-                               _qyy_arg1, _qyy_arg2,              \
-                               _qyy_arg3, 0);                     \
-    _qyy_res;                                                     \
-   })
-
-
-/* Counts the number of errors that have been recorded by a tool.  Nb:
-   the tool must record the errors with VG_(maybe_record_error)() or
-   VG_(unique_error)() for them to be counted. */
-#define VALGRIND_COUNT_ERRORS                                     \
-   __extension__                                                  \
-   ({unsigned int _qyy_res;                                       \
-    VALGRIND_DO_CLIENT_REQUEST(_qyy_res, 0 /* default return */,  \
-                               VG_USERREQ__COUNT_ERRORS,          \
-                               0, 0, 0, 0, 0);                    \
-    _qyy_res;                                                     \
-   })
-
-/* Mark a block of memory as having been allocated by a malloc()-like
-   function.  `addr' is the start of the usable block (ie. after any
-   redzone) `rzB' is redzone size if the allocator can apply redzones;
-   use '0' if not.  Adding redzones makes it more likely Valgrind will spot
-   block overruns.  `is_zeroed' indicates if the memory is zeroed, as it is
-   for calloc().  Put it immediately after the point where a block is
-   allocated. 
-   
-   If you're using Memcheck: If you're allocating memory via superblocks,
-   and then handing out small chunks of each superblock, if you don't have
-   redzones on your small blocks, it's worth marking the superblock with
-   VALGRIND_MAKE_MEM_NOACCESS when it's created, so that block overruns are
-   detected.  But if you can put redzones on, it's probably better to not do
-   this, so that messages for small overruns are described in terms of the
-   small block rather than the superblock (but if you have a big overrun
-   that skips over a redzone, you could miss an error this way).  See
-   memcheck/tests/custom_alloc.c for an example.
-
-   WARNING: if your allocator uses malloc() or 'new' to allocate
-   superblocks, rather than mmap() or brk(), this will not work properly --
-   you'll likely get assertion failures during leak detection.  This is
-   because Valgrind doesn't like seeing overlapping heap blocks.  Sorry.
-
-   Nb: block must be freed via a free()-like function specified
-   with VALGRIND_FREELIKE_BLOCK or mismatch errors will occur. */
-#define VALGRIND_MALLOCLIKE_BLOCK(addr, sizeB, rzB, is_zeroed)    \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__MALLOCLIKE_BLOCK,      \
-                               addr, sizeB, rzB, is_zeroed, 0);   \
-   }
-
-/* Mark a block of memory as having been freed by a free()-like function.
-   `rzB' is redzone size;  it must match that given to
-   VALGRIND_MALLOCLIKE_BLOCK.  Memory not freed will be detected by the leak
-   checker.  Put it immediately after the point where the block is freed. */
-#define VALGRIND_FREELIKE_BLOCK(addr, rzB)                        \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__FREELIKE_BLOCK,        \
-                               addr, rzB, 0, 0, 0);               \
-   }
-
-/* Create a memory pool. */
-#define VALGRIND_CREATE_MEMPOOL(pool, rzB, is_zeroed)             \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__CREATE_MEMPOOL,        \
-                               pool, rzB, is_zeroed, 0, 0);       \
-   }
-
-/* Destroy a memory pool. */
-#define VALGRIND_DESTROY_MEMPOOL(pool)                            \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__DESTROY_MEMPOOL,       \
-                               pool, 0, 0, 0, 0);                 \
-   }
-
-/* Associate a piece of memory with a memory pool. */
-#define VALGRIND_MEMPOOL_ALLOC(pool, addr, size)                  \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__MEMPOOL_ALLOC,         \
-                               pool, addr, size, 0, 0);           \
-   }
-
-/* Disassociate a piece of memory from a memory pool. */
-#define VALGRIND_MEMPOOL_FREE(pool, addr)                         \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__MEMPOOL_FREE,          \
-                               pool, addr, 0, 0, 0);              \
-   }
-
-/* Disassociate any pieces outside a particular range. */
-#define VALGRIND_MEMPOOL_TRIM(pool, addr, size)                   \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__MEMPOOL_TRIM,          \
-                               pool, addr, size, 0, 0);           \
-   }
-
-/* Resize and/or move a piece associated with a memory pool. */
-#define VALGRIND_MOVE_MEMPOOL(poolA, poolB)                       \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__MOVE_MEMPOOL,          \
-                               poolA, poolB, 0, 0, 0);            \
-   }
-
-/* Resize and/or move a piece associated with a memory pool. */
-#define VALGRIND_MEMPOOL_CHANGE(pool, addrA, addrB, size)         \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__MEMPOOL_CHANGE,        \
-                               pool, addrA, addrB, size, 0);      \
-   }
-
-/* Return 1 if a mempool exists, else 0. */
-#define VALGRIND_MEMPOOL_EXISTS(pool)                             \
-   ({unsigned int _qzz_res;                                       \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__MEMPOOL_EXISTS,        \
-                               pool, 0, 0, 0, 0);                 \
-    _qzz_res;                                                     \
-   })
-
-/* Mark a piece of memory as being a stack. Returns a stack id. */
-#define VALGRIND_STACK_REGISTER(start, end)                       \
-   ({unsigned int _qzz_res;                                       \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__STACK_REGISTER,        \
-                               start, end, 0, 0, 0);              \
-    _qzz_res;                                                     \
-   })
-
-/* Unmark the piece of memory associated with a stack id as being a
-   stack. */
-#define VALGRIND_STACK_DEREGISTER(id)                             \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__STACK_DEREGISTER,      \
-                               id, 0, 0, 0, 0);                   \
-   }
-
-/* Change the start and end address of the stack id. */
-#define VALGRIND_STACK_CHANGE(id, start, end)                     \
-   {unsigned int _qzz_res;                                        \
-    VALGRIND_DO_CLIENT_REQUEST(_qzz_res, 0,                       \
-                               VG_USERREQ__STACK_CHANGE,          \
-                               id, start, end, 0, 0);             \
-   }
-
-
-#undef PLAT_x86_linux
-#undef PLAT_amd64_linux
-#undef PLAT_ppc32_linux
-#undef PLAT_ppc64_linux
-#undef PLAT_ppc32_aix5
-#undef PLAT_ppc64_aix5
-
-#endif   /* __VALGRIND_H */

diff --git a/src/thread_cache.cc b/src/thread_cache.cc
index 444a09f..21d0f8e 100644
--- a/src/thread_cache.cc
+++ b/src/thread_cache.cc

@@ -70,8 +70,7 @@
 ThreadCache* ThreadCache::next_memory_steal_ = NULL;
 #ifdef HAVE_TLS
 __thread ThreadCache::ThreadLocalData ThreadCache::threadlocal_data_
-    ATTR_INITIAL_EXEC
-    = {0, 0};
+    ATTR_INITIAL_EXEC CACHELINE_ALIGNED;
 #endif
 bool ThreadCache::tsd_inited_ = false;
 pthread_key_t ThreadCache::heap_key_;
@@ -84,7 +83,7 @@
   if (max_size_ == 0) {
     // There isn't enough memory to go around.  Just give the minimum to
     // this thread.
-    max_size_ = kMinThreadCacheSize;
+    SetMaxSize(kMinThreadCacheSize);
 
     // Take unclaimed_cache_space_ negative.
     unclaimed_cache_space_ -= kMinThreadCacheSize;
@@ -95,8 +94,8 @@
   prev_ = NULL;
   tid_  = tid;
   in_setspecific_ = false;
-  for (size_t cl = 0; cl < kNumClasses; ++cl) {
-    list_[cl].Init();
+  for (uint32 cl = 0; cl < Static::num_size_classes(); ++cl) {
+    list_[cl].Init(Static::sizemap()->class_to_size(cl));
   }
 
   uint32_t sampler_seed;
@@ -106,7 +105,7 @@
 
 void ThreadCache::Cleanup() {
   // Put unused memory back into central cache
-  for (int cl = 0; cl < kNumClasses; ++cl) {
+  for (uint32 cl = 0; cl < Static::num_size_classes(); ++cl) {
     if (list_[cl].length() > 0) {
       ReleaseToCentralCache(&list_[cl], cl, list_[cl].length());
     }
@@ -115,7 +114,8 @@
 
 // Remove some objects of class "cl" from central cache and add to thread heap.
 // On success, return the first object for immediate use; otherwise return NULL.
-void* ThreadCache::FetchFromCentralCache(size_t cl, size_t byte_size) {
+void* ThreadCache::FetchFromCentralCache(uint32 cl, int32_t byte_size,
+                                         void *(*oom_handler)(size_t size)) {
   FreeList* list = &list_[cl];
   ASSERT(list->empty());
   const int batch_size = Static::sizemap()->num_objects_to_move(cl);
@@ -125,7 +125,12 @@
   int fetch_count = Static::central_cache()[cl].RemoveRange(
       &start, &end, num_to_move);
 
-  ASSERT((start == NULL) == (fetch_count == 0));
+  if (fetch_count == 0) {
+    ASSERT(start == NULL);
+    return oom_handler(byte_size);
+  }
+  ASSERT(start != NULL);
+
   if (--fetch_count >= 0) {
     size_ += byte_size * fetch_count;
     list->PushRange(fetch_count, SLL_Next(start), end);
@@ -152,7 +157,9 @@
   return start;
 }
 
-void ThreadCache::ListTooLong(FreeList* list, size_t cl) {
+void ThreadCache::ListTooLong(FreeList* list, uint32 cl) {
+  size_ += list->object_size();
+
   const int batch_size = Static::sizemap()->num_objects_to_move(cl);
   ReleaseToCentralCache(list, cl, batch_size);
 
@@ -174,10 +181,14 @@
       list->set_length_overages(0);
     }
   }
+
+  if (PREDICT_FALSE(size_ > max_size_)) {
+    Scavenge();
+  }
 }
 
 // Remove some objects of class "cl" from thread heap and add to central cache
-void ThreadCache::ReleaseToCentralCache(FreeList* src, size_t cl, int N) {
+void ThreadCache::ReleaseToCentralCache(FreeList* src, uint32 cl, int N) {
   ASSERT(src == &list_[cl]);
   if (N > src->length()) N = src->length();
   size_t delta_bytes = N * Static::sizemap()->ByteSizeForClass(cl);
@@ -205,8 +216,7 @@
   // that situation by dropping L/2 nodes from the free list.  This
   // may not release much memory, but if so we will call scavenge again
   // pretty soon and the low-water marks will be high on that call.
-  //int64 start = CycleClock::Now();
-  for (int cl = 0; cl < kNumClasses; cl++) {
+  for (int cl = 0; cl < Static::num_size_classes(); cl++) {
     FreeList* list = &list_[cl];
     const int lowmark = list->lowwatermark();
     if (lowmark > 0) {
@@ -241,7 +251,7 @@
   if (unclaimed_cache_space_ > 0) {
     // Possibly make unclaimed_cache_space_ negative.
     unclaimed_cache_space_ -= kStealAmount;
-    max_size_ += kStealAmount;
+    SetMaxSize(max_size_ + kStealAmount);
     return;
   }
   // Don't hold pageheap_lock too long.  Try to steal from 10 other
@@ -259,8 +269,8 @@
         next_memory_steal_->max_size_ <= kMinThreadCacheSize) {
       continue;
     }
-    next_memory_steal_->max_size_ -= kStealAmount;
-    max_size_ += kStealAmount;
+    next_memory_steal_->SetMaxSize(next_memory_steal_->max_size_ - kStealAmount);
+    SetMaxSize(max_size_ + kStealAmount);
 
     next_memory_steal_ = next_memory_steal_->next_;
     return;
@@ -268,12 +278,15 @@
 }
 
 int ThreadCache::GetSamplePeriod() {
-  return sampler_.GetSamplePeriod();
+  return Sampler::GetSamplePeriod();
 }
 
 void ThreadCache::InitModule() {
-  SpinLockHolder h(Static::pageheap_lock());
-  if (!phinited) {
+  {
+    SpinLockHolder h(Static::pageheap_lock());
+    if (phinited) {
+      return;
+    }
     const char *tcb = TCMallocGetenvSafe("TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES");
     if (tcb) {
       set_overall_thread_cache_size(strtoll(tcb, NULL, 10));
@@ -282,6 +295,10 @@
     threadcache_allocator.Init();
     phinited = 1;
   }
+
+  // We do "late" part of initialization without holding lock since
+  // there is chance it'll recurse into malloc
+  Static::InitLateMaybeRecursive();
 }
 
 void ThreadCache::InitTSD() {
@@ -303,8 +320,35 @@
 }
 
 ThreadCache* ThreadCache::CreateCacheIfNecessary() {
+  if (!tsd_inited_) {
+#ifndef NDEBUG
+    // tests that freeing nullptr very early is working
+    free(NULL);
+#endif
+
+    InitModule();
+  }
+
   // Initialize per-thread data if necessary
   ThreadCache* heap = NULL;
+
+  bool seach_condition = true;
+#ifdef HAVE_TLS
+  static __thread ThreadCache** current_heap_ptr ATTR_INITIAL_EXEC;
+  if (tsd_inited_) {
+    // In most common case we're avoiding expensive linear search
+    // through all heaps (see below). Working TLS enables faster
+    // protection from malloc recursion in pthread_setspecific
+    seach_condition = false;
+
+    if (current_heap_ptr != NULL) {
+      // we're being recursively called by pthread_setspecific below.
+      return *current_heap_ptr;
+    }
+    current_heap_ptr = &heap;
+  }
+#endif
+
   {
     SpinLockHolder h(Static::pageheap_lock());
     // On some old glibc's, and on freebsd's libc (as of freebsd 8.1),
@@ -328,10 +372,12 @@
     // This may be a recursive malloc call from pthread_setspecific()
     // In that case, the heap for this thread has already been created
     // and added to the linked list.  So we search for that first.
-    for (ThreadCache* h = thread_heaps_; h != NULL; h = h->next_) {
-      if (h->tid_ == me) {
-        heap = h;
-        break;
+    if (seach_condition) {
+      for (ThreadCache* h = thread_heaps_; h != NULL; h = h->next_) {
+        if (h->tid_ == me) {
+          heap = h;
+          break;
+        }
       }
     }
 
@@ -348,10 +394,13 @@
 #ifdef HAVE_TLS
     // Also keep a copy in __thread for faster retrieval
     threadlocal_data_.heap = heap;
-    SetMinSizeForSlowPath(kMaxSize + 1);
+    threadlocal_data_.fast_path_heap = heap;
 #endif
     heap->in_setspecific_ = false;
   }
+#ifdef HAVE_TLS
+  current_heap_ptr = NULL;
+#endif
   return heap;
 }
 
@@ -384,7 +433,7 @@
 #ifdef HAVE_TLS
   // Also update the copy in __thread
   threadlocal_data_.heap = NULL;
-  SetMinSizeForSlowPath(0);
+  threadlocal_data_.fast_path_heap = NULL;
 #endif
   heap->in_setspecific_ = false;
   if (GetThreadHeap() == heap) {
@@ -397,6 +446,12 @@
   DeleteCache(heap);
 }
 
+void ThreadCache::BecomeTemporarilyIdle() {
+  ThreadCache* heap = GetCacheIfPresent();
+  if (heap)
+    heap->Cleanup();
+}
+
 void ThreadCache::DestroyThreadCache(void* ptr) {
   // Note that "ptr" cannot be NULL since pthread promises not
   // to invoke the destructor on NULL values, but for safety,
@@ -405,7 +460,7 @@
 #ifdef HAVE_TLS
   // Prevent fast path of GetThreadHeap() from returning heap.
   threadlocal_data_.heap = NULL;
-  SetMinSizeForSlowPath(0);
+  threadlocal_data_.fast_path_heap = NULL;
 #endif
   DeleteCache(reinterpret_cast<ThreadCache*>(ptr));
 }
@@ -443,7 +498,7 @@
     // Increasing the total cache size should not circumvent the
     // slow-start growth of max_size_.
     if (ratio < 1.0) {
-        h->max_size_ = static_cast<size_t>(h->max_size_ * ratio);
+      h->SetMaxSize(h->max_size_ * ratio);
     }
     claimed += h->max_size_;
   }
@@ -455,7 +510,7 @@
   for (ThreadCache* h = thread_heaps_; h != NULL; h = h->next_) {
     *total_bytes += h->Size();
     if (class_count) {
-      for (int cl = 0; cl < kNumClasses; ++cl) {
+      for (int cl = 0; cl < Static::num_size_classes(); ++cl) {
         class_count[cl] += h->freelist_length(cl);
       }
     }

diff --git a/src/thread_cache.h b/src/thread_cache.h
index 81a020e..f8be152 100644
--- a/src/thread_cache.h
+++ b/src/thread_cache.h

@@ -43,6 +43,7 @@
 #include <stdint.h>                     // for uint32_t, uint64_t
 #endif
 #include <sys/types.h>                  // for ssize_t
+#include "base/commandlineflags.h"
 #include "common.h"
 #include "linked_list.h"
 #include "maybe_threads.h"
@@ -57,6 +58,8 @@
 #include "sampler.h"           // for Sampler
 #include "static_vars.h"       // for Static
 
+DECLARE_int64(tcmalloc_sample_parameter);
+
 namespace tcmalloc {
 
 //-------------------------------------------------------------------
@@ -71,23 +74,19 @@
   enum { have_tls = false };
 #endif
 
-  // All ThreadCache objects are kept in a linked list (for stats collection)
-  ThreadCache* next_;
-  ThreadCache* prev_;
-
   void Init(pthread_t tid);
   void Cleanup();
 
   // Accessors (mostly just for printing stats)
-  int freelist_length(size_t cl) const { return list_[cl].length(); }
+  int freelist_length(uint32 cl) const { return list_[cl].length(); }
 
   // Total byte size in cache
   size_t Size() const { return size_; }
 
   // Allocate an object of the given size and class. The size given
   // must be the same as the size of the class in the size map.
-  void* Allocate(size_t size, size_t cl);
-  void Deallocate(void* ptr, size_t size_class);
+  void* Allocate(size_t size, uint32 cl, void *(*oom_handler)(size_t size));
+  void Deallocate(void* ptr, uint32 size_class);
 
   void Scavenge();
 
@@ -97,18 +96,21 @@
   // should be sampled
   bool SampleAllocation(size_t k);
 
+  bool TryRecordAllocationFast(size_t k);
+
   static void         InitModule();
   static void         InitTSD();
   static ThreadCache* GetThreadHeap();
   static ThreadCache* GetCache();
   static ThreadCache* GetCacheIfPresent();
+  static ThreadCache* GetFastPathCache();
   static ThreadCache* GetCacheWhichMustBePresent();
   static ThreadCache* CreateCacheIfNecessary();
   static void         BecomeIdle();
-  static size_t       MinSizeForSlowPath();
-  static void         SetMinSizeForSlowPath(size_t size);
-
-  static bool IsFastPathAllowed() { return MinSizeForSlowPath() != 0; }
+  static void         BecomeTemporarilyIdle();
+  static void         SetUseEmergencyMalloc();
+  static void         ResetUseEmergencyMalloc();
+  static bool         IsUseEmergencyMalloc();
 
   // Return the number of thread heaps in use.
   static inline int HeapsInUse();
@@ -150,13 +152,16 @@
     uint16_t length_overages_;
 #endif
 
+    int32_t size_;
+
    public:
-    void Init() {
+    void Init(size_t size) {
       list_ = NULL;
       length_ = 0;
       lowater_ = 0;
       max_length_ = 1;
       length_overages_ = 0;
+      size_ = size;
     }
 
     // Return current length of list
@@ -164,6 +169,10 @@
       return length_;
     }
 
+    int32_t object_size() const {
+      return size_;
+    }
+
     // Return the maximum length of the list.
     size_t max_length() const {
       return max_length_;
@@ -193,9 +202,11 @@
     int lowwatermark() const { return lowater_; }
     void clear_lowwatermark() { lowater_ = length_; }
 
-    void Push(void* ptr) {
+    uint32_t Push(void* ptr) {
+      uint32_t length = length_ + 1;
       SLL_Push(&list_, ptr);
-      length_++;
+      length_ = length;
+      return length;
     }
 
     void* Pop() {
@@ -205,6 +216,15 @@
       return SLL_Pop(&list_);
     }
 
+    bool TryPop(void **rv) {
+      if (SLL_TryPop(&list_, rv)) {
+        length_--;
+        if (PREDICT_FALSE(length_ < lowater_)) lowater_ = length_;
+        return true;
+      }
+      return false;
+    }
+
     void* Next() {
       return SLL_Next(&list_);
     }
@@ -224,14 +244,19 @@
 
   // Gets and returns an object from the central cache, and, if possible,
   // also adds some objects of that size class to this thread cache.
-  void* FetchFromCentralCache(size_t cl, size_t byte_size);
+  void* FetchFromCentralCache(uint32 cl, int32_t byte_size,
+                              void *(*oom_handler)(size_t size));
+
+  void ListTooLong(void* ptr, uint32 cl);
 
   // Releases some number of items from src.  Adjusts the list's max_length
   // to eventually converge on num_objects_to_move(cl).
-  void ListTooLong(FreeList* src, size_t cl);
+  void ListTooLong(FreeList* src, uint32 cl);
 
   // Releases N items from this thread cache.
-  void ReleaseToCentralCache(FreeList* src, size_t cl, int N);
+  void ReleaseToCentralCache(FreeList* src, uint32 cl, int N);
+
+  void SetMaxSize(int32 new_max_size);
 
   // Increase max_size_ by reducing unclaimed_cache_space_ or by
   // reducing the max_size_ of some other thread.  In both cases,
@@ -252,24 +277,15 @@
   // Since we don't really use dlopen in google code -- and using dlopen
   // on a malloc replacement is asking for trouble in any case -- that's
   // a good tradeoff for us.
-#ifdef HAVE___ATTRIBUTE__
-#define ATTR_INITIAL_EXEC __attribute__ ((tls_model ("initial-exec")))
-#else
-#define ATTR_INITIAL_EXEC
-#endif
-
 #ifdef HAVE_TLS
   struct ThreadLocalData {
+    ThreadCache* fast_path_heap;
     ThreadCache* heap;
-    // min_size_for_slow_path is 0 if heap is NULL or kMaxSize + 1 otherwise.
-    // The latter is the common case and allows allocation to be faster
-    // than it would be otherwise: typically a single branch will
-    // determine that the requested allocation is no more than kMaxSize
-    // and we can then proceed, knowing that global and thread-local tcmalloc
-    // state is initialized.
-    size_t min_size_for_slow_path;
+    bool use_emergency_malloc;
   };
-  static __thread ThreadLocalData threadlocal_data_ ATTR_INITIAL_EXEC;
+  static __thread ThreadLocalData threadlocal_data_
+    CACHELINE_ALIGNED ATTR_INITIAL_EXEC;
+
 #endif
 
   // Thread-specific key.  Initialization here is somewhat tricky
@@ -277,7 +293,7 @@
   // is in a good enough state to handle pthread_keycreate().
   // Therefore, we use TSD keys only after tsd_inited is set to true.
   // Until then, we use a slow path to get the heap object.
-  static bool tsd_inited_;
+  static ATTRIBUTE_HIDDEN bool tsd_inited_;
   static pthread_key_t heap_key_;
 
   // Linked list of heap objects.  Protected by Static::pageheap_lock.
@@ -306,14 +322,14 @@
   // This class is laid out with the most frequently used fields
   // first so that hot elements are placed on the same cache line.
 
-  size_t        size_;                  // Combined size of data
-  size_t        max_size_;              // size_ > max_size_ --> Scavenge()
+  FreeList      list_[kClassSizesMax];     // Array indexed by size-class
+
+  int32         size_;                     // Combined size of data
+  int32         max_size_;                 // size_ > max_size_ --> Scavenge()
 
   // We sample allocations, biased by the size of the allocation
   Sampler       sampler_;               // A sampler
 
-  FreeList      list_[kNumClasses];     // Array indexed by size-class
-
   pthread_t     tid_;                   // Which thread owns it
   bool          in_setspecific_;        // In call to pthread_setspecific?
 
@@ -326,6 +342,12 @@
   static void DeleteCache(ThreadCache* heap);
   static void RecomputePerThreadCacheSize();
 
+public:
+
+  // All ThreadCache objects are kept in a linked list (for stats collection)
+  ThreadCache* next_;
+  ThreadCache* prev_;
+
   // Ensure that this class is cacheline-aligned. This is critical for
   // performance, as false sharing would negate many of the benefits
   // of a per-thread cache.
@@ -341,44 +363,45 @@
   return threadcache_allocator.inuse();
 }
 
-inline bool ThreadCache::SampleAllocation(size_t k) {
-  return sampler_.SampleAllocation(k);
-}
-
-inline void* ThreadCache::Allocate(size_t size, size_t cl) {
-  ASSERT(size <= kMaxSize);
-  ASSERT(size == Static::sizemap()->ByteSizeForClass(cl));
-
+inline ATTRIBUTE_ALWAYS_INLINE void* ThreadCache::Allocate(
+  size_t size, uint32 cl, void *(*oom_handler)(size_t size)) {
   FreeList* list = &list_[cl];
-  if (UNLIKELY(list->empty())) {
-    return FetchFromCentralCache(cl, size);
+
+#ifdef NO_TCMALLOC_SAMPLES
+  size = list->object_size();
+#endif
+
+  ASSERT(size <= kMaxSize);
+  ASSERT(size != 0);
+  ASSERT(size == 0 || size == Static::sizemap()->ByteSizeForClass(cl));
+
+  void* rv;
+  if (!list->TryPop(&rv)) {
+    return FetchFromCentralCache(cl, size, oom_handler);
   }
   size_ -= size;
-  return list->Pop();
+  return rv;
 }
 
-inline void ThreadCache::Deallocate(void* ptr, size_t cl) {
+inline ATTRIBUTE_ALWAYS_INLINE void ThreadCache::Deallocate(void* ptr, uint32 cl) {
+  ASSERT(list_[cl].max_length() > 0);
   FreeList* list = &list_[cl];
-  size_ += Static::sizemap()->ByteSizeForClass(cl);
-  ssize_t size_headroom = max_size_ - size_ - 1;
 
   // This catches back-to-back frees of allocs in the same size
   // class. A more comprehensive (and expensive) test would be to walk
   // the entire freelist. But this might be enough to find some bugs.
   ASSERT(ptr != list->Next());
 
-  list->Push(ptr);
-  ssize_t list_headroom =
-      static_cast<ssize_t>(list->max_length()) - list->length();
+  uint32_t length = list->Push(ptr);
 
-  // There are two relatively uncommon things that require further work.
-  // In the common case we're done, and in that case we need a single branch
-  // because of the bitwise-or trick that follows.
-  if (UNLIKELY((list_headroom | size_headroom) < 0)) {
-    if (list_headroom < 0) {
-      ListTooLong(list, cl);
-    }
-    if (size_ >= max_size_) Scavenge();
+  if (PREDICT_FALSE(length > list->max_length())) {
+    ListTooLong(list, cl);
+    return;
+  }
+
+  size_ += list->object_size();
+  if (PREDICT_FALSE(size_ > max_size_)){
+    Scavenge();
   }
 }
 
@@ -403,12 +426,14 @@
 }
 
 inline ThreadCache* ThreadCache::GetCache() {
+#ifdef HAVE_TLS
+  ThreadCache* ptr = GetThreadHeap();
+#else
   ThreadCache* ptr = NULL;
-  if (!tsd_inited_) {
-    InitModule();
-  } else {
+  if (PREDICT_TRUE(tsd_inited_)) {
     ptr = GetThreadHeap();
   }
+#endif
   if (ptr == NULL) ptr = CreateCacheIfNecessary();
   return ptr;
 }
@@ -417,24 +442,69 @@
 // because we may be in the thread destruction code and may have
 // already cleaned up the cache for this thread.
 inline ThreadCache* ThreadCache::GetCacheIfPresent() {
-  if (!tsd_inited_) return NULL;
+#ifndef HAVE_TLS
+  if (PREDICT_FALSE(!tsd_inited_)) return NULL;
+#endif
   return GetThreadHeap();
 }
 
-inline size_t ThreadCache::MinSizeForSlowPath() {
-#ifdef HAVE_TLS
-  return threadlocal_data_.min_size_for_slow_path;
+inline ThreadCache* ThreadCache::GetFastPathCache() {
+#ifndef HAVE_TLS
+  return GetCacheIfPresent();
 #else
-  return 0;
+  return threadlocal_data_.fast_path_heap;
 #endif
 }
 
-inline void ThreadCache::SetMinSizeForSlowPath(size_t size) {
+inline void ThreadCache::SetUseEmergencyMalloc() {
 #ifdef HAVE_TLS
-  threadlocal_data_.min_size_for_slow_path = size;
+  threadlocal_data_.fast_path_heap = NULL;
+  threadlocal_data_.use_emergency_malloc = true;
 #endif
 }
 
+inline void ThreadCache::ResetUseEmergencyMalloc() {
+#ifdef HAVE_TLS
+  ThreadCache *heap = threadlocal_data_.heap;
+  threadlocal_data_.fast_path_heap = heap;
+  threadlocal_data_.use_emergency_malloc = false;
+#endif
+}
+
+inline bool ThreadCache::IsUseEmergencyMalloc() {
+#if defined(HAVE_TLS) && defined(ENABLE_EMERGENCY_MALLOC)
+  return PREDICT_FALSE(threadlocal_data_.use_emergency_malloc);
+#else
+  return false;
+#endif
+}
+
+inline void ThreadCache::SetMaxSize(int32 new_max_size) {
+  max_size_ = new_max_size;
+}
+
+#ifndef NO_TCMALLOC_SAMPLES
+
+inline bool ThreadCache::SampleAllocation(size_t k) {
+  return !sampler_.RecordAllocation(k);
+}
+
+inline bool ThreadCache::TryRecordAllocationFast(size_t k) {
+  return sampler_.TryRecordAllocationFast(k);
+}
+
+#else
+
+inline bool ThreadCache::SampleAllocation(size_t k) {
+  return false;
+}
+
+inline bool ThreadCache::TryRecordAllocationFast(size_t k) {
+  return true;
+}
+
+#endif
+
 }  // namespace tcmalloc
 
 #endif  // TCMALLOC_THREAD_CACHE_H_

diff --git a/src/windows/CMakeLists.txt b/src/windows/CMakeLists.txt
new file mode 100644
index 0000000..2e83497
--- /dev/null
+++ b/src/windows/CMakeLists.txt

@@ -0,0 +1,10 @@
+add_executable(addr2line-pdb addr2line-pdb.c)
+target_link_libraries(addr2line-pdb dbghelp)
+
+add_executable(nm-pdb nm-pdb.c)
+target_link_libraries(nm-pdb dbghelp)
+
+#enable_language(ASM)
+#add_executable(preamble_patcher_test preamble_patcher_test.cc shortproc.asm)
+#target_link_libraries(preamble_patcher_test tcmalloc_minimal)
+#add_test(preamble_patcher_test preamble_patcher_test)
\ No newline at end of file

diff --git a/src/windows/addr2line-pdb.c b/src/windows/addr2line-pdb.c
index 5c65a03..88d207b 100644
--- a/src/windows/addr2line-pdb.c
+++ b/src/windows/addr2line-pdb.c

@@ -1,10 +1,11 @@
+/* -*- Mode: c; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2008, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -35,9 +36,17 @@
  * c:\websymbols without asking.
  */
 
+#ifndef WIN32_LEAN_AND_MEAN
 #define WIN32_LEAN_AND_MEAN
+#endif
+
+#ifndef _CRT_SECURE_NO_WARNINGS
 #define _CRT_SECURE_NO_WARNINGS
+#endif
+
+#ifndef _CRT_SECURE_NO_DEPRECATE
 #define _CRT_SECURE_NO_DEPRECATE
+#endif
 
 #include <stdio.h>
 #include <stdlib.h>
@@ -49,8 +58,8 @@
 #define WEBSYM "SRV*c:\\websymbols*http://msdl.microsoft.com/download/symbols"
 
 void usage() {
-  fprintf(stderr, "usage: "
-          "addr2line-pdb [-f|--functions] [-C|--demangle] [-e filename]\n");
+  fprintf(stderr, "usage: addr2line-pdb "
+          "[-f|--functions] [-C|--demangle] [-e|--exe filename]\n");
   fprintf(stderr, "(Then list the hex addresses on stdin, one per line)\n");
 }
 
@@ -73,7 +82,8 @@
     } else if (strcmp(argv[i], "--demangle") == 0 ||
                strcmp(argv[i], "-C") == 0) {
       symopts |= SYMOPT_UNDNAME;
-    } else if (strcmp(argv[i], "-e") == 0) {
+    } else if (strcmp(argv[i], "--exe") == 0 ||
+               strcmp(argv[i], "-e") == 0) {
       if (i + 1 >= argc) {
         fprintf(stderr, "FATAL ERROR: -e must be followed by a filename\n");
         return 1;
@@ -93,7 +103,7 @@
 
   if (!SymInitialize(process, NULL, FALSE)) {
     error = GetLastError();
-    fprintf(stderr, "SymInitialize returned error : %d\n", error);
+    fprintf(stderr, "SymInitialize returned error : %lu\n", error);
     return 1;
   }
 
@@ -107,13 +117,13 @@
     strcat(search, ";" WEBSYM);
   } else {
     error = GetLastError();
-    fprintf(stderr, "SymGetSearchPath returned error : %d\n", error);
+    fprintf(stderr, "SymGetSearchPath returned error : %lu\n", error);
     rv = 1;                   /* An error, but not a fatal one */
     strcpy(search, WEBSYM);   /* Use a default value */
   }
   if (!SymSetSearchPath(process, search)) {
     error = GetLastError();
-    fprintf(stderr, "SymSetSearchPath returned error : %d\n", error);
+    fprintf(stderr, "SymSetSearchPath returned error : %lu\n", error);
     rv = 1;                   /* An error, but not a fatal one */
   }
 
@@ -122,7 +132,7 @@
   if (!module_base) {
     /* SymLoadModuleEx failed */
     error = GetLastError();
-    fprintf(stderr, "SymLoadModuleEx returned error : %d for %s\n",
+    fprintf(stderr, "SymLoadModuleEx returned error : %lu for %s\n",
             error, filename);
     SymCleanup(process);
     return 1;
@@ -133,25 +143,35 @@
     /* GNU addr2line seems to just do a strtol and ignore any
      * weird characters it gets, so we will too.
      */
-    unsigned __int64 addr = _strtoui64(buf, NULL, 16);
+    unsigned __int64 reladdr = _strtoui64(buf, NULL, 16);
     ULONG64 buffer[(sizeof(SYMBOL_INFO) +
                     MAX_SYM_NAME*sizeof(TCHAR) +
                     sizeof(ULONG64) - 1)
                    / sizeof(ULONG64)];
+    memset(buffer, 0, sizeof(buffer));
     PSYMBOL_INFO pSymbol = (PSYMBOL_INFO)buffer;
     IMAGEHLP_LINE64 line;
     DWORD dummy;
+
+    // Just ignore overflow. In an overflow scenario, the resulting address
+    // will be lower than module_base which hasn't been mapped by any prior
+    // SymLoadModuleEx() command. This will cause SymFromAddr() and
+    // SymGetLineFromAddr64() both to return failures and print the correct
+    // ?? and ??:0 message variant.
+    ULONG64 absaddr = reladdr + module_base;
+
     pSymbol->SizeOfStruct = sizeof(SYMBOL_INFO);
-    pSymbol->MaxNameLen = MAX_SYM_NAME;
+    // The length of the name is not including the null-terminating character.
+    pSymbol->MaxNameLen = MAX_SYM_NAME - 1;
     if (print_function_name) {
-      if (SymFromAddr(process, (DWORD64)addr, NULL, pSymbol)) {
+      if (SymFromAddr(process, (DWORD64)absaddr, NULL, pSymbol)) {
         printf("%s\n", pSymbol->Name);
       } else {
         printf("??\n");
       }
     }
     line.SizeOfStruct = sizeof(IMAGEHLP_LINE64);
-    if (SymGetLineFromAddr64(process, (DWORD64)addr, &dummy, &line)) {
+    if (SymGetLineFromAddr64(process, (DWORD64)absaddr, &dummy, &line)) {
       printf("%s:%d\n", line.FileName, (int)line.LineNumber);
     } else {
       printf("??:0\n");

diff --git a/src/windows/config.h b/src/windows/config.h
index 9976457..bd520e4 100644
--- a/src/windows/config.h
+++ b/src/windows/config.h

@@ -1,3 +1,4 @@
+/* -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* A manual version of config.h fit for windows machines.
  *
  * Use of this source code is governed by a BSD-style license that can
@@ -15,230 +16,237 @@
 
 #ifndef GOOGLE_PERFTOOLS_WINDOWS_CONFIG_H_
 #define GOOGLE_PERFTOOLS_WINDOWS_CONFIG_H_
+/* used by tcmalloc.h */
+#define GPERFTOOLS_CONFIG_H_
 
-/* define this if you are linking tcmalloc statically and overriding the
- * default allocators.
- * For instructions on how to use this mode, see
- * http://groups.google.com/group/google-perftools/browse_thread/thread/41cd3710af85e57b
- */
-#undef WIN32_OVERRIDE_ALLOCATORS
+/* Enable aggressive decommit by default */
+/* #undef ENABLE_AGGRESSIVE_DECOMMIT_BY_DEFAULT */
 
-/* Define to 1 if your libc has a snprintf implementation */
-#undef HAVE_SNPRINTF
+/* Build new/delete operators for overaligned types */
+/* #undef ENABLE_ALIGNED_NEW_DELETE */
 
-/* Define to 1 if compiler supports __builtin_stack_pointer */
-#undef HAVE_BUILTIN_STACK_POINTER
+/* Build runtime detection for sized delete */
+/* #undef ENABLE_DYNAMIC_SIZED_DELETE */
 
-/* Define to 1 if you have the <conflict-signal.h> header file. */
-#undef HAVE_CONFLICT_SIGNAL_H
+/* Report large allocation */
+/* #undef ENABLE_LARGE_ALLOC_REPORT */
+
+/* Build sized deletion operators */
+/* #undef ENABLE_SIZED_DELETE */
+
+/* Define to 1 if you have the <asm/ptrace.h> header file. */
+/* #undef HAVE_ASM_PTRACE_H */
 
 /* Define to 1 if you have the <cygwin/signal.h> header file. */
-#undef HAVE_CYGWIN_SIGNAL_H
+/* #undef HAVE_CYGWIN_SIGNAL_H */
+
+/* Define to 1 if you have the declaration of `backtrace', and to 0 if you
+   don't. */
+/* #undef HAVE_DECL_BACKTRACE */
 
 /* Define to 1 if you have the declaration of `cfree', and to 0 if you don't.
    */
-#undef HAVE_DECL_CFREE
+#define HAVE_DECL_CFREE 0
 
 /* Define to 1 if you have the declaration of `memalign', and to 0 if you
    don't. */
-#undef HAVE_DECL_MEMALIGN
+#define HAVE_DECL_MEMALIGN 0
+
+/* Define to 1 if you have the declaration of `nanosleep', and to 0 if you
+   don't. */
+#define HAVE_DECL_NANOSLEEP 0
 
 /* Define to 1 if you have the declaration of `posix_memalign', and to 0 if
    you don't. */
-#undef HAVE_DECL_POSIX_MEMALIGN
+#define HAVE_DECL_POSIX_MEMALIGN 0
 
 /* Define to 1 if you have the declaration of `pvalloc', and to 0 if you
    don't. */
-#undef HAVE_DECL_PVALLOC
+#define HAVE_DECL_PVALLOC 0
 
-/* Define to 1 if you have the declaration of `uname', and to 0 if you don't.
+/* Define to 1 if you have the declaration of `sleep', and to 0 if you don't.
    */
-#undef HAVE_DECL_UNAME
+#define HAVE_DECL_SLEEP 0
 
 /* Define to 1 if you have the declaration of `valloc', and to 0 if you don't.
    */
-#undef HAVE_DECL_VALLOC
+#define HAVE_DECL_VALLOC 0
 
 /* Define to 1 if you have the <dlfcn.h> header file. */
-#undef HAVE_DLFCN_H
+/* #undef HAVE_DLFCN_H */
 
 /* Define to 1 if the system has the type `Elf32_Versym'. */
-#undef HAVE_ELF32_VERSYM
+/* #undef HAVE_ELF32_VERSYM */
 
 /* Define to 1 if you have the <execinfo.h> header file. */
-#undef HAVE_EXECINFO_H
+/* #undef HAVE_EXECINFO_H */
 
 /* Define to 1 if you have the <fcntl.h> header file. */
-#undef HAVE_FCNTL_H
+#define HAVE_FCNTL_H 1
 
 /* Define to 1 if you have the <features.h> header file. */
-#undef HAVE_FEATURES_H
+/* #undef HAVE_FEATURES_H */
+
+/* Define to 1 if you have the `fork' function. */
+/* #undef HAVE_FORK */
 
 /* Define to 1 if you have the `geteuid' function. */
-#undef HAVE_GETEUID
-
-/* Define to 1 if you have the `getpagesize' function. */
-#define HAVE_GETPAGESIZE 1   /* we define it in windows/port.cc */
+/* #undef HAVE_GETEUID */
 
 /* Define to 1 if you have the <glob.h> header file. */
-#undef HAVE_GLOB_H
+/* #undef HAVE_GLOB_H */
 
 /* Define to 1 if you have the <grp.h> header file. */
-#undef HAVE_GRP_H
+/* #undef HAVE_GRP_H */
 
 /* Define to 1 if you have the <inttypes.h> header file. */
-#undef HAVE_INTTYPES_H
+#if defined(_MSC_VER) && _MSC_VER >= 1900
+#define HAVE_INTTYPES_H 1
+#endif
 
 /* Define to 1 if you have the <libunwind.h> header file. */
-#undef HAVE_LIBUNWIND_H
+/* #undef HAVE_LIBUNWIND_H */
 
 /* Define to 1 if you have the <linux/ptrace.h> header file. */
-#undef HAVE_LINUX_PTRACE_H
+/* #undef HAVE_LINUX_PTRACE_H */
+
+/* Define if this is Linux that has SIGEV_THREAD_ID */
+/* #undef HAVE_LINUX_SIGEV_THREAD_ID */
 
 /* Define to 1 if you have the <malloc.h> header file. */
 #define HAVE_MALLOC_H 1
 
-/* Define to 1 if you have the <malloc/malloc.h> header file. */
-#undef HAVE_MALLOC_MALLOC_H
-
 /* Define to 1 if you have the <memory.h> header file. */
-#undef HAVE_MEMORY_H
+#define HAVE_MEMORY_H 1
 
 /* Define to 1 if you have a working `mmap' system call. */
-#undef HAVE_MMAP
-
-/* define if the compiler implements namespaces */
-#define HAVE_NAMESPACES 1
+/* #undef HAVE_MMAP */
 
 /* Define to 1 if you have the <poll.h> header file. */
-#undef HAVE_POLL_H
+/* #undef HAVE_POLL_H */
 
 /* define if libc has program_invocation_name */
-#undef HAVE_PROGRAM_INVOCATION_NAME
+/* #undef HAVE_PROGRAM_INVOCATION_NAME */
 
 /* Define if you have POSIX threads libraries and header files. */
-#undef HAVE_PTHREAD
+/* #undef HAVE_PTHREAD */
+
+/* defined to 1 if pthread symbols are exposed even without include pthread.h
+   */
+/* #undef HAVE_PTHREAD_DESPITE_ASKING_FOR */
 
 /* Define to 1 if you have the <pwd.h> header file. */
-#undef HAVE_PWD_H
+/* #undef HAVE_PWD_H */
 
 /* Define to 1 if you have the `sbrk' function. */
-#undef HAVE_SBRK
+/* #undef HAVE_SBRK */
 
 /* Define to 1 if you have the <sched.h> header file. */
-#undef HAVE_SCHED_H
+/* #undef HAVE_SCHED_H */
 
 /* Define to 1 if you have the <stdint.h> header file. */
-#undef HAVE_STDINT_H
+#define HAVE_STDINT_H 1
 
 /* Define to 1 if you have the <stdlib.h> header file. */
 #define HAVE_STDLIB_H 1
 
 /* Define to 1 if you have the <strings.h> header file. */
-#undef HAVE_STRINGS_H
+/* #undef HAVE_STRINGS_H */
 
 /* Define to 1 if you have the <string.h> header file. */
 #define HAVE_STRING_H 1
 
 /* Define to 1 if the system has the type `struct mallinfo'. */
-#undef HAVE_STRUCT_MALLINFO
+/* #undef HAVE_STRUCT_MALLINFO */
 
 /* Define to 1 if you have the <sys/cdefs.h> header file. */
-#undef HAVE_SYS_CDEFS_H
-
-/* Define to 1 if you have the <sys/malloc.h> header file. */
-#undef HAVE_SYS_MALLOC_H
-
-/* Define to 1 if you have the <sys/param.h> header file. */
-#undef HAVE_SYS_PARAM_H
+/* #undef HAVE_SYS_CDEFS_H */
 
 /* Define to 1 if you have the <sys/prctl.h> header file. */
-#undef HAVE_SYS_PRCTL_H
+/* #undef HAVE_SYS_PRCTL_H */
 
 /* Define to 1 if you have the <sys/resource.h> header file. */
-#undef HAVE_SYS_RESOURCE_H
+/* #undef HAVE_SYS_RESOURCE_H */
 
 /* Define to 1 if you have the <sys/socket.h> header file. */
-#undef HAVE_SYS_SOCKET_H
+/* #undef HAVE_SYS_SOCKET_H */
 
 /* Define to 1 if you have the <sys/stat.h> header file. */
 #define HAVE_SYS_STAT_H 1
 
 /* Define to 1 if you have the <sys/syscall.h> header file. */
-#undef HAVE_SYS_SYSCALL_H
+/* #undef HAVE_SYS_SYSCALL_H */
 
 /* Define to 1 if you have the <sys/types.h> header file. */
 #define HAVE_SYS_TYPES_H 1
 
-/* <sys/ucontext.h> is broken on redhat 7 */
-#undef HAVE_SYS_UCONTEXT_H
+/* Define to 1 if you have the <sys/ucontext.h> header file. */
+/* #undef HAVE_SYS_UCONTEXT_H */
 
 /* Define to 1 if you have the <sys/wait.h> header file. */
-#undef HAVE_SYS_WAIT_H
+/* #undef HAVE_SYS_WAIT_H */
 
 /* Define to 1 if compiler supports __thread */
 #define HAVE_TLS 1
 
 /* Define to 1 if you have the <ucontext.h> header file. */
-#undef HAVE_UCONTEXT_H
+/* #undef HAVE_UCONTEXT_H */
 
 /* Define to 1 if you have the <unistd.h> header file. */
-#undef HAVE_UNISTD_H
+/* #undef HAVE_UNISTD_H */
+
+/* Whether <unwind.h> contains _Unwind_Backtrace */
+/* #undef HAVE_UNWIND_BACKTRACE */
 
 /* Define to 1 if you have the <unwind.h> header file. */
-#undef HAVE_UNWIND_H
-
-/* Define to 1 if you have the <valgrind.h> header file. */
-#undef HAVE_VALGRIND_H
+/* #undef HAVE_UNWIND_H */
 
 /* define if your compiler has __attribute__ */
-#undef HAVE___ATTRIBUTE__
+/* #undef HAVE___ATTRIBUTE__ */
+
+/* define if your compiler supports alignment of functions */
+/* #undef HAVE___ATTRIBUTE__ALIGNED_FN */
 
 /* Define to 1 if compiler supports __environ */
-#undef HAVE___ENVIRON
+/* #undef HAVE___ENVIRON */
 
-/* Define to 1 if the system has the type `__int64'. */
-#define HAVE___INT64 1
+/* Define to 1 if you have the `__sbrk' function. */
+/* #undef HAVE___SBRK */
 
 /* prefix where we look for installed files */
-#undef INSTALL_PREFIX
+/* #undef INSTALL_PREFIX */
 
 /* Define to 1 if int32_t is equivalent to intptr_t */
-#undef INT32_EQUALS_INTPTR
+#ifndef _WIN64
+#define INT32_EQUALS_INTPTR 1
+#endif
 
-/* Define to the sub-directory in which libtool stores uninstalled libraries.
-   */
-#undef LT_OBJDIR
-
-/* Define to 'volatile' if __malloc_hook is declared volatile */
-#undef MALLOC_HOOK_MAYBE_VOLATILE
-
-/* Define to 1 if your C compiler doesn't accept -c and -o together. */
-#undef NO_MINUS_C_MINUS_O
+/* Define to the sub-directory where libtool stores uninstalled libraries. */
+/* #undef LT_OBJDIR */
 
 /* Name of package */
 #define PACKAGE "gperftools"
 
 /* Define to the address where bug reports for this package should be sent. */
-#define PACKAGE_BUGREPORT "opensource@google.com"
+#define PACKAGE_BUGREPORT "gperftools@googlegroups.com"
 
 /* Define to the full name of this package. */
 #define PACKAGE_NAME "gperftools"
 
 /* Define to the full name and version of this package. */
-#define PACKAGE_STRING "gperftools 2.4"
+#define PACKAGE_STRING "gperftools 2.9.1"
 
 /* Define to the one symbol short name of this package. */
 #define PACKAGE_TARNAME "gperftools"
 
 /* Define to the home page for this package. */
-#undef PACKAGE_URL
+#define PACKAGE_URL ""
 
 /* Define to the version of this package. */
-#define PACKAGE_VERSION "2.4"
+#define PACKAGE_VERSION "2.9.1"
 
 /* How to access the PC from a struct ucontext */
-#undef PC_FROM_UCONTEXT
+/* #undef PC_FROM_UCONTEXT */
 
 /* Always the empty-string on non-windows systems. On windows, should be
    "__declspec(dllexport)". This way, when we compile the dll, we export our
@@ -246,20 +254,11 @@
    used internally, to compile the DLL, and every DLL source file #includes
    "config.h" before anything else. */
 #ifndef PERFTOOLS_DLL_DECL
-# define PERFTOOLS_IS_A_DLL  1   /* not set if you're statically linking */
-# define PERFTOOLS_DLL_DECL  __declspec(dllexport)
-# define PERFTOOLS_DLL_DECL_FOR_UNITTESTS  __declspec(dllimport)
+# define PERFTOOLS_IS_A_DLL 1   /* not set if you're statically linking */
+# define PERFTOOLS_DLL_DECL __declspec(dllexport)
+# define PERFTOOLS_DLL_DECL_FOR_UNITTESTS __declspec(dllimport)
 #endif
 
-/* printf format code for printing a size_t and ssize_t */
-#define PRIdS  "Id"
-
-/* printf format code for printing a size_t and ssize_t */
-#define PRIuS  "Iu"
-
-/* printf format code for printing a size_t and ssize_t */
-#define PRIxS  "Ix"
-
 /* Mark the systems where we know it's bad if pthreads runs too
    early before main (before threads are initialized, presumably).  */
 #ifdef __FreeBSD__
@@ -268,28 +267,25 @@
 
 /* Define to necessary symbol if this constant uses a non-standard name on
    your system. */
-#undef PTHREAD_CREATE_JOINABLE
+/* #undef PTHREAD_CREATE_JOINABLE */
 
 /* Define to 1 if you have the ANSI C header files. */
 #define STDC_HEADERS 1
 
-/* the namespace where STL code like vector<> is defined */
-#define STL_NAMESPACE  std
+/* Define 8 bytes of allocation alignment for tcmalloc */
+/* #undef TCMALLOC_ALIGN_8BYTES */
+
+/* Define internal page size for tcmalloc as number of left bitshift */
+/* #undef TCMALLOC_PAGE_SIZE_SHIFT */
 
 /* Version number of package */
-#undef VERSION
+#define VERSION "2.9.1"
 
 /* C99 says: define this to get the PRI... macros from stdint.h */
 #ifndef __STDC_FORMAT_MACROS
 # define __STDC_FORMAT_MACROS 1
 #endif
 
-/* Define to `__inline__' or `__inline' if that's what the C compiler
-   calls it, or to nothing if 'inline' is not supported under any name.  */
-#ifndef __cplusplus
-#undef inline
-#endif
-
 // ---------------------------------------------------------------------
 // Extra stuff not found in config.h.in
 

diff --git a/src/windows/get_mangled_names.cc b/src/windows/get_mangled_names.cc
index 08bd03b..fd6424b 100644
--- a/src/windows/get_mangled_names.cc
+++ b/src/windows/get_mangled_names.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2008, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -27,7 +27,7 @@
 // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
 // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-// 
+//
 // ---
 // Author: Craig Silverstein (opensource@google.com)
 

diff --git a/src/windows/google/tcmalloc.h b/src/windows/google/tcmalloc.h
index c7db631..075482e 100644
--- a/src/windows/google/tcmalloc.h
+++ b/src/windows/google/tcmalloc.h

@@ -1,10 +1,10 @@
 /* Copyright (c) 2003, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +14,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR

diff --git a/src/windows/gperftools/tcmalloc.h b/src/windows/gperftools/tcmalloc.h
index 9ba79a9..5116b29 100644
--- a/src/windows/gperftools/tcmalloc.h
+++ b/src/windows/gperftools/tcmalloc.h

@@ -1,11 +1,11 @@
-// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// -*- Mode: C; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2003, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -30,33 +30,39 @@
  *
  * ---
  * Author: Sanjay Ghemawat <opensource@google.com>
- *         .h.in file by Craig Silverstein <opensource@google.com>
+ *         .h file by Craig Silverstein <opensource@google.com>
  */
 
 #ifndef TCMALLOC_TCMALLOC_H_
 #define TCMALLOC_TCMALLOC_H_
 
-#include <stddef.h>                     // for size_t
-#ifdef HAVE_SYS_CDEFS_H
-#include <sys/cdefs.h>   // where glibc defines __THROW
+#include <stddef.h>                     /* for size_t */
+#ifdef __cplusplus
+#include <new>                          /* for std::nothrow_t, std::align_val_t */
 #endif
 
-// __THROW is defined in glibc systems.  It means, counter-intuitively,
-// "This function will never throw an exception."  It's an optional
-// optimization tool, but we may need to use it to match glibc prototypes.
-#ifndef __THROW    /* I guess we're not on a glibc system */
-# define __THROW   /* __THROW is just an optimization, so ok to make it "" */
-#endif
-
-// Define the version number so folks can check against it
+/* Define the version number so folks can check against it */
 #define TC_VERSION_MAJOR  2
-#define TC_VERSION_MINOR  4
-#define TC_VERSION_PATCH  ""
-#define TC_VERSION_STRING "gperftools 2.4"
+#define TC_VERSION_MINOR  9
+#define TC_VERSION_PATCH  ".1"
+#define TC_VERSION_STRING "gperftools 2.9.1"
 
-#include <stdlib.h>   // for struct mallinfo, if it's defined
+#ifndef PERFTOOLS_NOTHROW
 
-// Annoying stuff for windows -- makes sure clients can import these functions
+#if __cplusplus >= 201103L
+#define PERFTOOLS_NOTHROW noexcept
+#elif defined(__cplusplus)
+#define PERFTOOLS_NOTHROW throw()
+#else
+# ifdef __GNUC__
+#  define PERFTOOLS_NOTHROW __attribute__((__nothrow__))
+# else
+#  define PERFTOOLS_NOTHROW
+# endif
+#endif
+
+#endif
+
 #ifndef PERFTOOLS_DLL_DECL
 # ifdef _WIN32
 #   define PERFTOOLS_DLL_DECL  __declspec(dllimport)
@@ -66,60 +72,84 @@
 #endif
 
 #ifdef __cplusplus
-namespace std {
-struct nothrow_t;
-}
-
 extern "C" {
 #endif
-  // Returns a human-readable version string.  If major, minor,
-  // and/or patch are not NULL, they are set to the major version,
-  // minor version, and patch-code (a string, usually "").
+  /*
+   * Returns a human-readable version string.  If major, minor,
+   * and/or patch are not NULL, they are set to the major version,
+   * minor version, and patch-code (a string, usually "").
+   */
   PERFTOOLS_DLL_DECL const char* tc_version(int* major, int* minor,
-                                            const char** patch) __THROW;
+                                            const char** patch) PERFTOOLS_NOTHROW;
 
-  PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void tc_free(void* ptr) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_realloc(void* ptr, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_calloc(size_t nmemb, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) __THROW;
+  PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_free(void* ptr) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_free_sized(void *ptr, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_realloc(void* ptr, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_calloc(size_t nmemb, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) PERFTOOLS_NOTHROW;
 
   PERFTOOLS_DLL_DECL void* tc_memalign(size_t __alignment,
-                                       size_t __size) __THROW;
+                                       size_t __size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL int tc_posix_memalign(void** ptr,
-                                           size_t align, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_valloc(size_t __size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t __size) __THROW;
+                                           size_t align, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_valloc(size_t __size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t __size) PERFTOOLS_NOTHROW;
 
-  PERFTOOLS_DLL_DECL void tc_malloc_stats(void) __THROW;
-  PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) __THROW;
-#if 0
-  PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW;
-#endif
+  PERFTOOLS_DLL_DECL void tc_malloc_stats(void) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) PERFTOOLS_NOTHROW;
 
-  // This is an alias for MallocExtension::instance()->GetAllocatedSize().
-  // It is equivalent to
-  //    OS X: malloc_size()
-  //    glibc: malloc_usable_size()
-  //    Windows: _msize()
-  PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW;
+  /*
+   * This is an alias for MallocExtension::instance()->GetAllocatedSize().
+   * It is equivalent to
+   *    OS X: malloc_size()
+   *    glibc: malloc_usable_size()
+   *    Windows: _msize()
+   */
+  PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) PERFTOOLS_NOTHROW;
 
 #ifdef __cplusplus
-  PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) __THROW;
+  PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void* tc_new(size_t size);
   PERFTOOLS_DLL_DECL void* tc_new_nothrow(size_t size,
-                                          const std::nothrow_t&) __THROW;
-  PERFTOOLS_DLL_DECL void tc_delete(void* p) __THROW;
+                                          const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete(void* p) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_sized(void* p, size_t size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void tc_delete_nothrow(void* p,
-                                            const std::nothrow_t&) __THROW;
+                                            const std::nothrow_t&) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void* tc_newarray(size_t size);
   PERFTOOLS_DLL_DECL void* tc_newarray_nothrow(size_t size,
-                                               const std::nothrow_t&) __THROW;
-  PERFTOOLS_DLL_DECL void tc_deletearray(void* p) __THROW;
+                                               const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray(void* p) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_sized(void* p, size_t size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void tc_deletearray_nothrow(void* p,
-                                                 const std::nothrow_t&) __THROW;
+                                                 const std::nothrow_t&) PERFTOOLS_NOTHROW;
+
+#if defined(__cpp_aligned_new) || (defined(_MSVC_LANG) && _MSVC_LANG > 201402L)
+  PERFTOOLS_DLL_DECL void* tc_new_aligned(size_t size, std::align_val_t al);
+  PERFTOOLS_DLL_DECL void* tc_new_aligned_nothrow(size_t size, std::align_val_t al,
+                                          const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_aligned(void* p, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_sized_aligned(void* p, size_t size, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_aligned_nothrow(void* p, std::align_val_t al,
+                                            const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_newarray_aligned(size_t size, std::align_val_t al);
+  PERFTOOLS_DLL_DECL void* tc_newarray_aligned_nothrow(size_t size, std::align_val_t al,
+                                               const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_aligned(void* p, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_sized_aligned(void* p, size_t size, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_aligned_nothrow(void* p, std::align_val_t al,
+                                                 const std::nothrow_t&) PERFTOOLS_NOTHROW;
+#endif
 }
 #endif
 
-#endif  // #ifndef TCMALLOC_TCMALLOC_H_
+/* We're only un-defining for public */
+#if !defined(GPERFTOOLS_CONFIG_H_)
+
+#undef PERFTOOLS_NOTHROW
+
+#endif /* GPERFTOOLS_CONFIG_H_ */
+
+#endif  /* #ifndef TCMALLOC_TCMALLOC_H_ */

diff --git a/src/windows/gperftools/tcmalloc.h.in b/src/windows/gperftools/tcmalloc.h.in
index 7458de1..adb7962 100644
--- a/src/windows/gperftools/tcmalloc.h.in
+++ b/src/windows/gperftools/tcmalloc.h.in

@@ -1,11 +1,11 @@
-// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
+// -*- Mode: C; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2003, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -30,33 +30,39 @@
  *
  * ---
  * Author: Sanjay Ghemawat <opensource@google.com>
- *         .h.in file by Craig Silverstein <opensource@google.com>
+ *         .h file by Craig Silverstein <opensource@google.com>
  */
 
 #ifndef TCMALLOC_TCMALLOC_H_
 #define TCMALLOC_TCMALLOC_H_
 
-#include <stddef.h>                     // for size_t
-#ifdef HAVE_SYS_CDEFS_H
-#include <sys/cdefs.h>   // where glibc defines __THROW
+#include <stddef.h>                     /* for size_t */
+#ifdef __cplusplus
+#include <new>                          /* for std::nothrow_t, std::align_val_t */
 #endif
 
-// __THROW is defined in glibc systems.  It means, counter-intuitively,
-// "This function will never throw an exception."  It's an optional
-// optimization tool, but we may need to use it to match glibc prototypes.
-#ifndef __THROW    /* I guess we're not on a glibc system */
-# define __THROW   /* __THROW is just an optimization, so ok to make it "" */
-#endif
-
-// Define the version number so folks can check against it
+/* Define the version number so folks can check against it */
 #define TC_VERSION_MAJOR  @TC_VERSION_MAJOR@
 #define TC_VERSION_MINOR  @TC_VERSION_MINOR@
 #define TC_VERSION_PATCH  "@TC_VERSION_PATCH@"
 #define TC_VERSION_STRING "gperftools @TC_VERSION_MAJOR@.@TC_VERSION_MINOR@@TC_VERSION_PATCH@"
 
-#include <stdlib.h>   // for struct mallinfo, if it's defined
+#ifndef PERFTOOLS_NOTHROW
 
-// Annoying stuff for windows -- makes sure clients can import these functions
+#if __cplusplus >= 201103L
+#define PERFTOOLS_NOTHROW noexcept
+#elif defined(__cplusplus)
+#define PERFTOOLS_NOTHROW throw()
+#else
+# ifdef __GNUC__
+#  define PERFTOOLS_NOTHROW __attribute__((__nothrow__))
+# else
+#  define PERFTOOLS_NOTHROW
+# endif
+#endif
+
+#endif
+
 #ifndef PERFTOOLS_DLL_DECL
 # ifdef _WIN32
 #   define PERFTOOLS_DLL_DECL  __declspec(dllimport)
@@ -66,60 +72,84 @@
 #endif
 
 #ifdef __cplusplus
-namespace std {
-struct nothrow_t;
-}
-
 extern "C" {
 #endif
-  // Returns a human-readable version string.  If major, minor,
-  // and/or patch are not NULL, they are set to the major version,
-  // minor version, and patch-code (a string, usually "").
+  /*
+   * Returns a human-readable version string.  If major, minor,
+   * and/or patch are not NULL, they are set to the major version,
+   * minor version, and patch-code (a string, usually "").
+   */
   PERFTOOLS_DLL_DECL const char* tc_version(int* major, int* minor,
-                                            const char** patch) __THROW;
+                                            const char** patch) PERFTOOLS_NOTHROW;
 
-  PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void tc_free(void* ptr) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_realloc(void* ptr, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_calloc(size_t nmemb, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) __THROW;
+  PERFTOOLS_DLL_DECL void* tc_malloc(size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_malloc_skip_new_handler(size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_free(void* ptr) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_free_sized(void *ptr, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_realloc(void* ptr, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_calloc(size_t nmemb, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_cfree(void* ptr) PERFTOOLS_NOTHROW;
 
   PERFTOOLS_DLL_DECL void* tc_memalign(size_t __alignment,
-                                       size_t __size) __THROW;
+                                       size_t __size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL int tc_posix_memalign(void** ptr,
-                                           size_t align, size_t size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_valloc(size_t __size) __THROW;
-  PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t __size) __THROW;
+                                           size_t align, size_t size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_valloc(size_t __size) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_pvalloc(size_t __size) PERFTOOLS_NOTHROW;
 
-  PERFTOOLS_DLL_DECL void tc_malloc_stats(void) __THROW;
-  PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) __THROW;
-#if 0
-  PERFTOOLS_DLL_DECL struct mallinfo tc_mallinfo(void) __THROW;
-#endif
+  PERFTOOLS_DLL_DECL void tc_malloc_stats(void) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL int tc_mallopt(int cmd, int value) PERFTOOLS_NOTHROW;
 
-  // This is an alias for MallocExtension::instance()->GetAllocatedSize().
-  // It is equivalent to
-  //    OS X: malloc_size()
-  //    glibc: malloc_usable_size()
-  //    Windows: _msize()
-  PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) __THROW;
+  /*
+   * This is an alias for MallocExtension::instance()->GetAllocatedSize().
+   * It is equivalent to
+   *    OS X: malloc_size()
+   *    glibc: malloc_usable_size()
+   *    Windows: _msize()
+   */
+  PERFTOOLS_DLL_DECL size_t tc_malloc_size(void* ptr) PERFTOOLS_NOTHROW;
 
 #ifdef __cplusplus
-  PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) __THROW;
+  PERFTOOLS_DLL_DECL int tc_set_new_mode(int flag) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void* tc_new(size_t size);
   PERFTOOLS_DLL_DECL void* tc_new_nothrow(size_t size,
-                                          const std::nothrow_t&) __THROW;
-  PERFTOOLS_DLL_DECL void tc_delete(void* p) __THROW;
+                                          const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete(void* p) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_sized(void* p, size_t size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void tc_delete_nothrow(void* p,
-                                            const std::nothrow_t&) __THROW;
+                                            const std::nothrow_t&) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void* tc_newarray(size_t size);
   PERFTOOLS_DLL_DECL void* tc_newarray_nothrow(size_t size,
-                                               const std::nothrow_t&) __THROW;
-  PERFTOOLS_DLL_DECL void tc_deletearray(void* p) __THROW;
+                                               const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray(void* p) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_sized(void* p, size_t size) PERFTOOLS_NOTHROW;
   PERFTOOLS_DLL_DECL void tc_deletearray_nothrow(void* p,
-                                                 const std::nothrow_t&) __THROW;
+                                                 const std::nothrow_t&) PERFTOOLS_NOTHROW;
+
+#if defined(__cpp_aligned_new) || (defined(_MSVC_LANG) && _MSVC_LANG > 201402L)
+  PERFTOOLS_DLL_DECL void* tc_new_aligned(size_t size, std::align_val_t al);
+  PERFTOOLS_DLL_DECL void* tc_new_aligned_nothrow(size_t size, std::align_val_t al,
+                                          const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_aligned(void* p, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_sized_aligned(void* p, size_t size, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_delete_aligned_nothrow(void* p, std::align_val_t al,
+                                            const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void* tc_newarray_aligned(size_t size, std::align_val_t al);
+  PERFTOOLS_DLL_DECL void* tc_newarray_aligned_nothrow(size_t size, std::align_val_t al,
+                                               const std::nothrow_t&) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_aligned(void* p, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_sized_aligned(void* p, size_t size, std::align_val_t al) PERFTOOLS_NOTHROW;
+  PERFTOOLS_DLL_DECL void tc_deletearray_aligned_nothrow(void* p, std::align_val_t al,
+                                                 const std::nothrow_t&) PERFTOOLS_NOTHROW;
+#endif
 }
 #endif
 
-#endif  // #ifndef TCMALLOC_TCMALLOC_H_
+/* We're only un-defining for public */
+#if !defined(GPERFTOOLS_CONFIG_H_)
+
+#undef PERFTOOLS_NOTHROW
+
+#endif /* GPERFTOOLS_CONFIG_H_ */
+
+#endif  /* #ifndef TCMALLOC_TCMALLOC_H_ */

diff --git a/src/windows/ia32_modrm_map.cc b/src/windows/ia32_modrm_map.cc
index f1f1906..817ac43 100644
--- a/src/windows/ia32_modrm_map.cc
+++ b/src/windows/ia32_modrm_map.cc

@@ -1,10 +1,11 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -31,8 +32,8 @@
  * Author: Joi Sigurdsson
  *
  * Table of relevant information about how to decode the ModR/M byte.
- * Based on information in the IA-32 Intel® Architecture
- * Software Developers Manual Volume 2: Instruction Set Reference.
+ * Based on information in the IA-32 Intel® Architecture
+ * Software Developer's Manual Volume 2: Instruction Set Reference.
  */
 
 #include "mini_disassembler.h"
@@ -49,7 +50,7 @@
   /* r/m == 100 */ { false, false, OS_ZERO },
   /* r/m == 101 */ { false, false, OS_ZERO },
   /* r/m == 110 */ { true, false, OS_WORD },
-  /* r/m == 111 */ { false, false, OS_ZERO }, 
+  /* r/m == 111 */ { false, false, OS_ZERO },
 // mod == 01
   /* r/m == 000 */ { true, false, OS_BYTE },
   /* r/m == 001 */ { true, false, OS_BYTE },
@@ -58,7 +59,7 @@
   /* r/m == 100 */ { true, false, OS_BYTE },
   /* r/m == 101 */ { true, false, OS_BYTE },
   /* r/m == 110 */ { true, false, OS_BYTE },
-  /* r/m == 111 */ { true, false, OS_BYTE }, 
+  /* r/m == 111 */ { true, false, OS_BYTE },
 // mod == 10
   /* r/m == 000 */ { true, false, OS_WORD },
   /* r/m == 001 */ { true, false, OS_WORD },
@@ -67,7 +68,7 @@
   /* r/m == 100 */ { true, false, OS_WORD },
   /* r/m == 101 */ { true, false, OS_WORD },
   /* r/m == 110 */ { true, false, OS_WORD },
-  /* r/m == 111 */ { true, false, OS_WORD }, 
+  /* r/m == 111 */ { true, false, OS_WORD },
 // mod == 11
   /* r/m == 000 */ { false, false, OS_ZERO },
   /* r/m == 001 */ { false, false, OS_ZERO },
@@ -88,7 +89,7 @@
   /* r/m == 100 */ { false, true, OS_ZERO },
   /* r/m == 101 */ { true, false, OS_DOUBLE_WORD },
   /* r/m == 110 */ { false, false, OS_ZERO },
-  /* r/m == 111 */ { false, false, OS_ZERO }, 
+  /* r/m == 111 */ { false, false, OS_ZERO },
 // mod == 01
   /* r/m == 000 */ { true, false, OS_BYTE },
   /* r/m == 001 */ { true, false, OS_BYTE },
@@ -97,7 +98,7 @@
   /* r/m == 100 */ { true, true, OS_BYTE },
   /* r/m == 101 */ { true, false, OS_BYTE },
   /* r/m == 110 */ { true, false, OS_BYTE },
-  /* r/m == 111 */ { true, false, OS_BYTE }, 
+  /* r/m == 111 */ { true, false, OS_BYTE },
 // mod == 10
   /* r/m == 000 */ { true, false, OS_DOUBLE_WORD },
   /* r/m == 001 */ { true, false, OS_DOUBLE_WORD },
@@ -106,7 +107,7 @@
   /* r/m == 100 */ { true, true, OS_DOUBLE_WORD },
   /* r/m == 101 */ { true, false, OS_DOUBLE_WORD },
   /* r/m == 110 */ { true, false, OS_DOUBLE_WORD },
-  /* r/m == 111 */ { true, false, OS_DOUBLE_WORD }, 
+  /* r/m == 111 */ { true, false, OS_DOUBLE_WORD },
 // mod == 11
   /* r/m == 000 */ { false, false, OS_ZERO },
   /* r/m == 001 */ { false, false, OS_ZERO },

diff --git a/src/windows/ia32_opcode_map.cc b/src/windows/ia32_opcode_map.cc
index ba6a79e..9d54f6b 100644
--- a/src/windows/ia32_opcode_map.cc
+++ b/src/windows/ia32_opcode_map.cc

@@ -1,10 +1,11 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -14,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -30,8 +31,8 @@
  * ---
  * Author: Joi Sigurdsson
  *
- * Opcode decoding maps.  Based on the IA-32 Intel® Architecture
- * Software Developers Manual Volume 2: Instruction Set Reference.  Idea
+ * Opcode decoding maps.  Based on the IA-32 Intel® Architecture
+ * Software Developer's Manual Volume 2: Instruction Set Reference.  Idea
  * for how to lay out the tables in memory taken from the implementation
  * in the Bastard disassembly environment.
  */
@@ -294,10 +295,10 @@
   /* 0xD5 */ { 0, IT_GENERIC, AM_I | OT_B, AM_NOT_USED, AM_NOT_USED, "aad", false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0xD6 */ { 0, IT_UNUSED, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0xD7 */ { 0, IT_GENERIC, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, "xlat", false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
-  
+
   // The following 8 lines would be references to the FPU tables, but we currently
   // do not support the FPU instructions in this disassembler.
-  
+
   /* 0xD8 */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0xD9 */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0xDA */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
@@ -306,8 +307,8 @@
   /* 0xDD */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0xDE */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0xDF */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
-  
-  
+
+
   /* 0xE0 */ { 0, IT_JUMP, AM_J | OT_B, AM_NOT_USED, AM_NOT_USED, "loopnz", false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0xE1 */ { 0, IT_JUMP, AM_J | OT_B, AM_NOT_USED, AM_NOT_USED, "loopz", false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0xE2 */ { 0, IT_JUMP, AM_J | OT_B, AM_NOT_USED, AM_NOT_USED, "loop", false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
@@ -619,7 +620,7 @@
     /* F3h */ { 0 },
     /* 66h */ { 0, IT_GENERIC, AM_V | OT_DQ, AM_W | OT_DQ, AM_NOT_USED, "pcmpeqd" } },
   /* 0x77 */ { 0, IT_GENERIC, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, "emms", false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
-  
+
   // The following six opcodes are escapes into the MMX stuff, which this disassembler does not support.
   /* 0x78 */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0x79 */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
@@ -627,7 +628,7 @@
   /* 0x7B */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0x7C */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
   /* 0x7D */ { 0, IT_UNKNOWN, AM_NOT_USED, AM_NOT_USED, AM_NOT_USED, 0, false, /* F2h */ { 0 }, /* F3h */ { 0 }, /* 66h */ { 0 } },
-  
+
   /* 0x7E */ { 0, IT_GENERIC, AM_E | OT_D, AM_P | OT_D, AM_NOT_USED, "movd", true,
     /* F2h */ { 0 },
     /* F3h */ { 0, IT_GENERIC, AM_V | OT_Q, AM_W | OT_Q, AM_NOT_USED, "movq" },
@@ -1192,27 +1193,27 @@
   /*  1 */ {s_opcode_byte_after_0f, 0, 0xff, 0, 0xff},
   // Start of tables for opcodes using ModR/M bits as extension
   /*  2 */ {s_opcode_byte_after_80, 3, 0x07, 0, 0x07},
-  /*  3 */ {s_opcode_byte_after_81, 3, 0x07, 0, 0x07}, 
-  /*  4 */ {s_opcode_byte_after_82, 3, 0x07, 0, 0x07}, 
-  /*  5 */ {s_opcode_byte_after_83, 3, 0x07, 0, 0x07}, 
-  /*  6 */ {s_opcode_byte_after_c0, 3, 0x07, 0, 0x07}, 
-  /*  7 */ {s_opcode_byte_after_c1, 3, 0x07, 0, 0x07}, 
-  /*  8 */ {s_opcode_byte_after_d0, 3, 0x07, 0, 0x07}, 
-  /*  9 */ {s_opcode_byte_after_d1, 3, 0x07, 0, 0x07}, 
-  /* 10 */ {s_opcode_byte_after_d2, 3, 0x07, 0, 0x07}, 
-  /* 11 */ {s_opcode_byte_after_d3, 3, 0x07, 0, 0x07}, 
-  /* 12 */ {s_opcode_byte_after_f6, 3, 0x07, 0, 0x07}, 
-  /* 13 */ {s_opcode_byte_after_f7, 3, 0x07, 0, 0x07}, 
-  /* 14 */ {s_opcode_byte_after_fe, 3, 0x07, 0, 0x01}, 
-  /* 15 */ {s_opcode_byte_after_ff, 3, 0x07, 0, 0x07}, 
-  /* 16 */ {s_opcode_byte_after_0f00, 3, 0x07, 0, 0x07}, 
-  /* 17 */ {s_opcode_byte_after_0f01, 3, 0x07, 0, 0x07}, 
-  /* 18 */ {s_opcode_byte_after_0f18, 3, 0x07, 0, 0x07}, 
-  /* 19 */ {s_opcode_byte_after_0f71, 3, 0x07, 0, 0x07}, 
-  /* 20 */ {s_opcode_byte_after_0f72, 3, 0x07, 0, 0x07}, 
-  /* 21 */ {s_opcode_byte_after_0f73, 3, 0x07, 0, 0x07}, 
-  /* 22 */ {s_opcode_byte_after_0fae, 3, 0x07, 0, 0x07}, 
-  /* 23 */ {s_opcode_byte_after_0fba, 3, 0x07, 0, 0x07}, 
+  /*  3 */ {s_opcode_byte_after_81, 3, 0x07, 0, 0x07},
+  /*  4 */ {s_opcode_byte_after_82, 3, 0x07, 0, 0x07},
+  /*  5 */ {s_opcode_byte_after_83, 3, 0x07, 0, 0x07},
+  /*  6 */ {s_opcode_byte_after_c0, 3, 0x07, 0, 0x07},
+  /*  7 */ {s_opcode_byte_after_c1, 3, 0x07, 0, 0x07},
+  /*  8 */ {s_opcode_byte_after_d0, 3, 0x07, 0, 0x07},
+  /*  9 */ {s_opcode_byte_after_d1, 3, 0x07, 0, 0x07},
+  /* 10 */ {s_opcode_byte_after_d2, 3, 0x07, 0, 0x07},
+  /* 11 */ {s_opcode_byte_after_d3, 3, 0x07, 0, 0x07},
+  /* 12 */ {s_opcode_byte_after_f6, 3, 0x07, 0, 0x07},
+  /* 13 */ {s_opcode_byte_after_f7, 3, 0x07, 0, 0x07},
+  /* 14 */ {s_opcode_byte_after_fe, 3, 0x07, 0, 0x01},
+  /* 15 */ {s_opcode_byte_after_ff, 3, 0x07, 0, 0x07},
+  /* 16 */ {s_opcode_byte_after_0f00, 3, 0x07, 0, 0x07},
+  /* 17 */ {s_opcode_byte_after_0f01, 3, 0x07, 0, 0x07},
+  /* 18 */ {s_opcode_byte_after_0f18, 3, 0x07, 0, 0x07},
+  /* 19 */ {s_opcode_byte_after_0f71, 3, 0x07, 0, 0x07},
+  /* 20 */ {s_opcode_byte_after_0f72, 3, 0x07, 0, 0x07},
+  /* 21 */ {s_opcode_byte_after_0f73, 3, 0x07, 0, 0x07},
+  /* 22 */ {s_opcode_byte_after_0fae, 3, 0x07, 0, 0x07},
+  /* 23 */ {s_opcode_byte_after_0fba, 3, 0x07, 0, 0x07},
   /* 24 */ {s_opcode_byte_after_0fc7, 3, 0x07, 0, 0x01}
 };
 

diff --git a/src/windows/mingw.h b/src/windows/mingw.h
index 0586e62..542f9ae 100644
--- a/src/windows/mingw.h
+++ b/src/windows/mingw.h

@@ -1,11 +1,11 @@
 /* -*- Mode: C; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -54,8 +54,6 @@
 # define _WIN32_WINNT 0x0501
 #endif
 
-#define HAVE_SNPRINTF 1
-
 // Some mingw distributions have a pthreads wrapper, but it doesn't
 // work as well as native windows spinlocks (at least for us).  So
 // pretend the pthreads wrapper doesn't exist, even when it does.
@@ -63,6 +61,8 @@
 #undef HAVE_PTHREAD
 #endif
 
+#undef HAVE_FORK
+
 #define HAVE_PID_T
 
 #include "windows/port.h"

diff --git a/src/windows/mini_disassembler.cc b/src/windows/mini_disassembler.cc
index 0c62004..35d7a9d 100644
--- a/src/windows/mini_disassembler.cc
+++ b/src/windows/mini_disassembler.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -323,7 +323,7 @@
           // floating point
           succeeded = false;
           break;
-        case OT_V: // Word, doubleword or quadword, depending on operand-size 
+        case OT_V: // Word, doubleword or quadword, depending on operand-size
                    // attribute.
           if (operand_is_64_bits_ && flag_operand & AM_I &&
               flag_operand & IOS_64)

diff --git a/src/windows/mini_disassembler.h b/src/windows/mini_disassembler.h
index 93bdc06..8b3e4ba 100644
--- a/src/windows/mini_disassembler.h
+++ b/src/windows/mini_disassembler.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -73,7 +73,7 @@
 // Disassemble() method.
 //
 // If you would like to extend this disassembler, please refer to the
-// IA-32 Intel® Architecture Software Developers Manual Volume 2:
+// IA-32 Intel® Architecture Software Developer's Manual Volume 2:
 // Instruction Set Reference for information about operand decoding
 // etc.
 class PERFTOOLS_DLL_DECL MiniDisassembler {

diff --git a/src/windows/mini_disassembler_types.h b/src/windows/mini_disassembler_types.h
index 06d4755..97da92d 100644
--- a/src/windows/mini_disassembler_types.h
+++ b/src/windows/mini_disassembler_types.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -165,7 +165,7 @@
   InstructionType type_;
 
   // Description of the type of the dest, src and aux operands,
-  // put together from enOperandType, enAddressingMethod and 
+  // put together from enOperandType, enAddressingMethod and
   // enImmediateOperandSize flags.
   int flag_dest_;
   int flag_source_;
@@ -188,9 +188,9 @@
   // Description of the type of the dest, src and aux operands,
   // put together from an enOperandType flag and an enAddressingMethod
   // flag.
-  int flag_dest_;
-  int flag_source_;
-  int flag_aux_;
+  unsigned flag_dest_;
+  unsigned flag_source_;
+  unsigned flag_aux_;
 
   // We indicate the mnemonic for debugging purposes
   const char* mnemonic_;

diff --git a/src/windows/nm-pdb.c b/src/windows/nm-pdb.c
index 95a080d..9f6f431 100644
--- a/src/windows/nm-pdb.c
+++ b/src/windows/nm-pdb.c

@@ -1,6 +1,7 @@
+/* -*- Mode: c; c-basic-offset: 2; indent-tabs-mode: nil -*- */
 /* Copyright (c) 2008, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:

diff --git a/src/windows/override_functions.cc b/src/windows/override_functions.cc
index e7917d3..8afb851 100644
--- a/src/windows/override_functions.cc
+++ b/src/windows/override_functions.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -30,7 +30,7 @@
 //
 // ---
 // Author: Mike Belshe
-// 
+//
 // To link tcmalloc into a EXE or DLL statically without using the patching
 // facility, we can take a stock libcmt and remove all the allocator functions.
 // When we relink the EXE/DLL with the modified libcmt and tcmalloc, a few
@@ -52,26 +52,70 @@
 
 #include "tcmalloc.cc"
 
-extern "C" void* _recalloc(void* p, size_t n, size_t size) {
-  void* result = realloc(p, n * size);
-  memset(result, 0, n * size);
-  return result;
+extern "C" {
+
+void* _malloc_base(size_t size) {
+  return malloc(size);
 }
 
-extern "C" void* _calloc_impl(size_t n, size_t size) {
+void _free_base(void* p) {
+  free(p);
+}
+
+void* _calloc_base(size_t n, size_t size) {
   return calloc(n, size);
 }
 
-extern "C" size_t _msize(void* p) {
+void* _recalloc(void* old_ptr, size_t n, size_t size) {
+  // Ensure that (n * size) does not overflow
+  if (!(n == 0 || (std::numeric_limits<size_t>::max)() / n >= size)) {
+    errno = ENOMEM;
+    return NULL;
+  }
+
+  const size_t old_size = tc_malloc_size(old_ptr);
+  const size_t new_size = n * size;
+
+  void* new_ptr = realloc(old_ptr, new_size);
+
+  // If the reallocation succeeded and the new block is larger, zero-fill the
+  // new bytes:
+  if (new_ptr != NULL && new_size > old_size) {
+    memset(static_cast<char*>(new_ptr) + old_size, 0, tc_nallocx(new_size, 0) - old_size);
+  }
+
+  return new_ptr;
+}
+
+void* _calloc_impl(size_t n, size_t size) {
+  return calloc(n, size);
+}
+
+size_t _msize(void* p) {
   return MallocExtension::instance()->GetAllocatedSize(p);
 }
 
-extern "C" intptr_t _get_heap_handle() {
+HANDLE __acrt_heap = nullptr;
+
+bool __acrt_initialize_heap() {
+  new TCMallocGuard();
+  return true;
+}
+
+bool __acrt_uninitialize_heap(bool) {
+  return true;
+}
+
+intptr_t _get_heap_handle() {
   return 0;
 }
 
+HANDLE __acrt_getheap() {
+  return __acrt_heap;
+}
+
 // The CRT heap initialization stub.
-extern "C" int _heap_init() {
+int _heap_init() {
   // We intentionally leak this object.  It lasts for the process
   // lifetime.  Trying to teardown at _heap_term() is so late that
   // you can't do anything useful anyway.
@@ -80,13 +124,25 @@
 }
 
 // The CRT heap cleanup stub.
-extern "C" void _heap_term() {
+void _heap_term() {
 }
 
-extern "C" int _set_new_mode(int flag) {
+// We set this to 1 because part of the CRT uses a check of _crtheap != 0
+// to test whether the CRT has been initialized.  Once we've ripped out
+// the allocators from libcmt, we need to provide this definition so that
+// the rest of the CRT is still usable.
+void* _crtheap = reinterpret_cast<void*>(1);
+
+int _set_new_mode(int flag) {
   return tc_set_new_mode(flag);
 }
 
+int _query_new_mode() {
+  return tc_query_new_mode();
+}
+
+}  // extern "C"
+
 #ifndef NDEBUG
 #undef malloc
 #undef free
@@ -115,9 +171,3 @@
   return calloc(n, size);
 }
 #endif  // NDEBUG
-
-// We set this to 1 because part of the CRT uses a check of _crtheap != 0
-// to test whether the CRT has been initialized.  Once we've ripped out
-// the allocators from libcmt, we need to provide this definition so that
-// the rest of the CRT is still usable.
-extern "C" void* _crtheap = reinterpret_cast<void*>(1);

diff --git a/src/windows/patch_functions.cc b/src/windows/patch_functions.cc
index ff1bec7..a2d0a03 100644
--- a/src/windows/patch_functions.cc
+++ b/src/windows/patch_functions.cc

@@ -1,10 +1,11 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2007, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -14,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -101,6 +102,16 @@
 // These are hard-coded, unfortunately. :-( They are also probably
 // compiler specific.  See get_mangled_names.cc, in this directory,
 // for instructions on how to update these names for your compiler.
+#ifdef _WIN64
+const char kMangledNew[] = "??2@YAPEAX_K@Z";
+const char kMangledNewArray[] = "??_U@YAPEAX_K@Z";
+const char kMangledDelete[] = "??3@YAXPEAX@Z";
+const char kMangledDeleteArray[] = "??_V@YAXPEAX@Z";
+const char kMangledNewNothrow[] = "??2@YAPEAX_KAEBUnothrow_t@std@@@Z";
+const char kMangledNewArrayNothrow[] = "??_U@YAPEAX_KAEBUnothrow_t@std@@@Z";
+const char kMangledDeleteNothrow[] = "??3@YAXPEAXAEBUnothrow_t@std@@@Z";
+const char kMangledDeleteArrayNothrow[] = "??_V@YAXPEAXAEBUnothrow_t@std@@@Z";
+#else
 const char kMangledNew[] = "??2@YAPAXI@Z";
 const char kMangledNewArray[] = "??_U@YAPAXI@Z";
 const char kMangledDelete[] = "??3@YAXPAX@Z";
@@ -109,6 +120,7 @@
 const char kMangledNewArrayNothrow[] = "??_U@YAPAXIABUnothrow_t@std@@@Z";
 const char kMangledDeleteNothrow[] = "??3@YAXPAXABUnothrow_t@std@@@Z";
 const char kMangledDeleteArrayNothrow[] = "??_V@YAXPAXABUnothrow_t@std@@@Z";
+#endif
 
 // This is an unused but exported symbol that we can use to tell the
 // MSVC linker to bring in libtcmalloc, via the /INCLUDE linker flag.
@@ -183,6 +195,8 @@
     k_Msize, k_Expand,
     // A MS CRT "internal" function, implemented using _calloc_impl
     k_CallocCrt,
+    // Underlying deallocation functions called by CRT internal functions or operator delete
+    kFreeBase, kFreeDbg,
     kNumFunctions
   };
 
@@ -265,6 +279,8 @@
 
   static void* Perftools_malloc(size_t size) __THROW;
   static void Perftools_free(void* ptr) __THROW;
+  static void Perftools_free_base(void* ptr) __THROW;
+  static void Perftools_free_dbg(void* ptr, int block_use) __THROW;
   static void* Perftools_realloc(void* ptr, size_t size) __THROW;
   static void* Perftools_calloc(size_t nmemb, size_t size) __THROW;
   static void* Perftools_new(size_t size);
@@ -406,7 +422,7 @@
   NULL,  // kMangledNewArrayNothrow,
   NULL,  // kMangledDeleteNothrow,
   NULL,  // kMangledDeleteArrayNothrow,
-  "_msize", "_expand", "_calloc_crt",
+  "_msize", "_expand", "_calloc_crt", "_free_base", "_free_dbg"
 };
 
 // For mingw, I can't patch the new/delete here, because the
@@ -438,6 +454,8 @@
   (GenericFnPtr)&::_msize,
   (GenericFnPtr)&::_expand,
   (GenericFnPtr)&::calloc,
+  (GenericFnPtr)&::free,
+  (GenericFnPtr)&::free
 };
 
 template<int T> GenericFnPtr LibcInfoWithPatchFunctions<T>::origstub_fn_[] = {
@@ -461,6 +479,8 @@
   (GenericFnPtr)&Perftools__msize,
   (GenericFnPtr)&Perftools__expand,
   (GenericFnPtr)&Perftools_calloc,
+  (GenericFnPtr)&Perftools_free_base,
+  (GenericFnPtr)&Perftools_free_dbg
 };
 
 /*static*/ WindowsInfo::FunctionInfo WindowsInfo::function_info_[] = {
@@ -791,9 +811,7 @@
 
 template<int T>
 void* LibcInfoWithPatchFunctions<T>::Perftools_malloc(size_t size) __THROW {
-  void* result = do_malloc_or_cpp_alloc(size);
-  MallocHook::InvokeNewHook(result, size);
-  return result;
+  return malloc_fast_path<tcmalloc::malloc_oom>(size);
 }
 
 template<int T>
@@ -803,7 +821,28 @@
   // allocated by tcmalloc.  Note it calls the origstub_free from
   // *this* templatized instance of LibcInfo.  See "template
   // trickiness" above.
-  do_free_with_callback(ptr, (void (*)(void*))origstub_fn_[kFree]);
+  do_free_with_callback(ptr, (void (*)(void*))origstub_fn_[kFree], false, 0);
+}
+
+template<int T>
+void LibcInfoWithPatchFunctions<T>::Perftools_free_base(void* ptr) __THROW{
+  MallocHook::InvokeDeleteHook(ptr);
+  // This calls the windows free if do_free decides ptr was not
+  // allocated by tcmalloc.  Note it calls the origstub_free from
+  // *this* templatized instance of LibcInfo.  See "template
+  // trickiness" above.
+  do_free_with_callback(ptr, (void(*)(void*))origstub_fn_[kFreeBase], false, 0);
+}
+
+template<int T>
+void LibcInfoWithPatchFunctions<T>::Perftools_free_dbg(void* ptr, int block_use) __THROW {
+  MallocHook::InvokeDeleteHook(ptr);
+  // The windows _free_dbg is called if ptr isn't owned by tcmalloc.
+  if (MallocExtension::instance()->GetOwnership(ptr) == MallocExtension::kOwned) {
+    do_free(ptr);
+  } else {
+    reinterpret_cast<void (*)(void*, int)>(origstub_fn_[kFreeDbg])(ptr, block_use);
+  }
 }
 
 template<int T>
@@ -817,7 +856,7 @@
   if (new_size == 0) {
     MallocHook::InvokeDeleteHook(old_ptr);
     do_free_with_callback(old_ptr,
-                          (void (*)(void*))origstub_fn_[kFree]);
+                          (void (*)(void*))origstub_fn_[kFree], false, 0);
     return NULL;
   }
   return do_realloc_with_callback(
@@ -836,58 +875,50 @@
 
 template<int T>
 void* LibcInfoWithPatchFunctions<T>::Perftools_new(size_t size) {
-  void* p = cpp_alloc(size, false);
-  MallocHook::InvokeNewHook(p, size);
-  return p;
+  return malloc_fast_path<tcmalloc::cpp_throw_oom>(size);
 }
 
 template<int T>
 void* LibcInfoWithPatchFunctions<T>::Perftools_newarray(size_t size) {
-  void* p = cpp_alloc(size, false);
-  MallocHook::InvokeNewHook(p, size);
-  return p;
+  return malloc_fast_path<tcmalloc::cpp_throw_oom>(size);
 }
 
 template<int T>
 void LibcInfoWithPatchFunctions<T>::Perftools_delete(void *p) {
   MallocHook::InvokeDeleteHook(p);
-  do_free_with_callback(p, (void (*)(void*))origstub_fn_[kFree]);
+  do_free_with_callback(p, (void (*)(void*))origstub_fn_[kFree], false, 0);
 }
 
 template<int T>
 void LibcInfoWithPatchFunctions<T>::Perftools_deletearray(void *p) {
   MallocHook::InvokeDeleteHook(p);
-  do_free_with_callback(p, (void (*)(void*))origstub_fn_[kFree]);
+  do_free_with_callback(p, (void (*)(void*))origstub_fn_[kFree], false, 0);
 }
 
 template<int T>
 void* LibcInfoWithPatchFunctions<T>::Perftools_new_nothrow(
     size_t size, const std::nothrow_t&) __THROW {
-  void* p = cpp_alloc(size, true);
-  MallocHook::InvokeNewHook(p, size);
-  return p;
+  return malloc_fast_path<tcmalloc::cpp_nothrow_oom>(size);
 }
 
 template<int T>
 void* LibcInfoWithPatchFunctions<T>::Perftools_newarray_nothrow(
     size_t size, const std::nothrow_t&) __THROW {
-  void* p = cpp_alloc(size, true);
-  MallocHook::InvokeNewHook(p, size);
-  return p;
+  return malloc_fast_path<tcmalloc::cpp_nothrow_oom>(size);
 }
 
 template<int T>
 void LibcInfoWithPatchFunctions<T>::Perftools_delete_nothrow(
     void *p, const std::nothrow_t&) __THROW {
   MallocHook::InvokeDeleteHook(p);
-  do_free_with_callback(p, (void (*)(void*))origstub_fn_[kFree]);
+  do_free_with_callback(p, (void (*)(void*))origstub_fn_[kFree], false, 0);
 }
 
 template<int T>
 void LibcInfoWithPatchFunctions<T>::Perftools_deletearray_nothrow(
     void *p, const std::nothrow_t&) __THROW {
   MallocHook::InvokeDeleteHook(p);
-  do_free_with_callback(p, (void (*)(void*))origstub_fn_[kFree]);
+  do_free_with_callback(p, (void (*)(void*))origstub_fn_[kFree], false, 0);
 }
 
 
@@ -971,16 +1002,6 @@
               lpBaseAddress);
 }
 
-// g_load_map holds a copy of windows' refcount for how many times
-// each currently loaded module has been loaded and unloaded.  We use
-// it as an optimization when the same module is loaded more than
-// once: as long as the refcount stays above 1, we don't need to worry
-// about patching because it's already patched.  Likewise, we don't
-// need to unpatch until the refcount drops to 0.  load_map is
-// maintained in LoadLibraryExW and FreeLibrary, and only covers
-// modules explicitly loaded/freed via those interfaces.
-static std::map<HMODULE, int>* g_load_map = NULL;
-
 HMODULE WINAPI WindowsInfo::Perftools_LoadLibraryExW(LPCWSTR lpFileName,
                                                      HANDLE hFile,
                                                      DWORD dwFlags) {

diff --git a/src/windows/port.cc b/src/windows/port.cc
index 76224a2..e73c508 100644
--- a/src/windows/port.cc
+++ b/src/windows/port.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -85,7 +85,8 @@
 
 // Windows doesn't support pthread_key_create's destr_function, and in
 // fact it's a bit tricky to get code to run when a thread exits.  This
-// is cargo-cult magic from http://www.codeproject.com/threads/tls.asp.
+// is cargo-cult magic from https://www.codeproject.com/Articles/8113/Thread-Local-Storage-The-C-Way
+// and http://lallouslab.net/2017/05/30/using-cc-tls-callbacks-in-visual-studio-with-your-32-or-64bits-programs/.
 // This code is for VC++ 7.1 and later; VC++ 6.0 support is possible
 // but more busy-work -- see the webpage for how to do it.  If all
 // this fails, we could use DllMain instead.  The big problem with
@@ -147,8 +148,11 @@
 
 // extern "C" suppresses C++ name mangling so we know the symbol names
 // for the linker /INCLUDE:symbol pragmas above.
+// Note that for some unknown reason, the extern "C" {} construct is ignored
+// by the MSVC VS2017 compiler (at least) when a const modifier is used
+#if defined(_M_IX86)
 extern "C" {
-// This tells the linker to run these functions.
+// In x86, the PE loader looks for callbacks in a data segment
 #pragma data_seg(push, old_seg)
 #pragma data_seg(".CRT$XLB")
 void (NTAPI *p_thread_callback_tcmalloc)(
@@ -157,6 +161,16 @@
 int (*p_process_term_tcmalloc)(void) = on_process_term;
 #pragma data_seg(pop, old_seg)
 }  // extern "C"
+#elif defined(_M_X64)
+// In x64, the PE loader looks for callbacks in a constant segment
+#pragma const_seg(push, oldseg)
+#pragma const_seg(".CRT$XLB")
+extern "C" void (NTAPI * const p_thread_callback_tcmalloc)(
+	HINSTANCE h, DWORD dwReason, PVOID pv) = on_tls_callback;
+#pragma const_seg(".CRT$XTU")
+extern "C" int (NTAPI * const p_process_term_tcmalloc)(void) = on_process_term;
+#pragma const_seg(pop, oldseg)
+#endif
 
 #else  // #ifdef _MSC_VER  [probably msys/mingw]
 

diff --git a/src/windows/port.h b/src/windows/port.h
index 0350f45..29c6cb9 100644
--- a/src/windows/port.h
+++ b/src/windows/port.h

@@ -64,7 +64,7 @@
 #include <assert.h>
 #include <stdlib.h>          /* for rand, srand, _strtoxxx */
 
-#if _MSC_VER >= 1900
+#if defined(_MSC_VER) && _MSC_VER >= 1900
 #define _TIMESPEC_DEFINED
 #include <time.h>
 #endif
@@ -102,20 +102,9 @@
 /* ----------------------------------- BASIC TYPES */
 
 #ifndef HAVE_STDINT_H
-#ifndef HAVE___INT64    /* we need to have all the __intX names */
 # error  Do not know how to set up type aliases.  Edit port.h for your system.
 #endif
 
-typedef __int8 int8_t;
-typedef __int16 int16_t;
-typedef __int32 int32_t;
-typedef __int64 int64_t;
-typedef unsigned __int8 uint8_t;
-typedef unsigned __int16 uint16_t;
-typedef unsigned __int32 uint32_t;
-typedef unsigned __int64 uint64_t;
-#endif  /* #ifndef HAVE_STDINT_H */
-
 /* I guess MSVC's <types.h> doesn't include ssize_t by default? */
 #ifdef _MSC_VER
 typedef intptr_t ssize_t;
@@ -329,17 +318,7 @@
 }
 #endif
 
-#ifndef HAVE_SNPRINTF
-inline int snprintf(char *str, size_t size, const char *format, ...) {
-  va_list ap;
-  int r;
-  va_start(ap, format);
-  r = perftools_vsnprintf(str, size, format, ap);
-  va_end(ap);
-  return r;
-}
-#endif
-
+#ifndef HAVE_INTTYPES_H
 #define PRIx64  "I64x"
 #define SCNx64  "I64x"
 #define PRId64  "I64d"
@@ -352,6 +331,7 @@
 # define PRIuPTR "lu"
 # define PRIxPTR "lx"
 #endif
+#endif
 
 /* ----------------------------------- FILE IO */
 
@@ -461,7 +441,7 @@
 #endif
 
 #ifndef __MINGW32__
-#if _MSC_VER < 1800
+#if defined(_MSC_VER) && _MSC_VER < 1800
 inline long long int strtoll(const char *nptr, char **endptr, int base) {
     return _strtoi64(nptr, endptr, base);
 }

diff --git a/src/windows/preamble_patcher.cc b/src/windows/preamble_patcher.cc
index ec05537..5b5fc35 100644
--- a/src/windows/preamble_patcher.cc
+++ b/src/windows/preamble_patcher.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -365,8 +365,8 @@
           IsNearRelativeJump(preamble_stub + preamble_bytes, cur_bytes) ||
           IsNearAbsoluteCall(preamble_stub + preamble_bytes, cur_bytes) ||
           IsNearRelativeCall(preamble_stub + preamble_bytes, cur_bytes)) {
-        jump_ret = PatchNearJumpOrCall(preamble_stub + preamble_bytes, 
-                                       cur_bytes, target + target_bytes, 
+        jump_ret = PatchNearJumpOrCall(preamble_stub + preamble_bytes,
+                                       cur_bytes, target + target_bytes,
                                        &jump_bytes, MAX_PREAMBLE_STUB_SIZE);
       }
       if (jump_ret == SIDESTEP_JUMP_INSTRUCTION) {
@@ -510,7 +510,7 @@
         reinterpret_cast<__int64>(target) - val > INT_MAX) {
         // We're further than 2GB from the target
       break;
-    } else if (val <= NULL) {
+    } else if (val <= 0) {
       // Less than 0
       break;
     }

diff --git a/src/windows/preamble_patcher.h b/src/windows/preamble_patcher.h
index 76f158a..701e570 100644
--- a/src/windows/preamble_patcher.h
+++ b/src/windows/preamble_patcher.h

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -111,7 +111,7 @@
 // MyTypesafeFuncPtr original_func_stub;
 // int MyTypesafeFunc(int x) { return x + 1; }
 // int HookMyTypesafeFunc(int x) { return 1 + original_func_stub(x); }
-// 
+//
 // void MyPatchInitializingFunction() {
 //   original_func_stub = PreamblePatcher::Patch(
 //              MyTypesafeFunc, HookMyTypesafeFunc);
@@ -311,9 +311,9 @@
   }
 
   // Allocates a block of memory of size MAX_PREAMBLE_STUB_SIZE that is as
-  // close (within 2GB) as possible to target.  This is done to ensure that 
-  // we can perform a relative jump from target to a trampoline if the 
-  // replacement function is > +-2GB from target.  This means that we only need 
+  // close (within 2GB) as possible to target.  This is done to ensure that
+  // we can perform a relative jump from target to a trampoline if the
+  // replacement function is > +-2GB from target.  This means that we only need
   // to patch 5 bytes in the target function.
   //
   // @param target    Pointer to target function.
@@ -346,7 +346,7 @@
   // head of a linked list of pages used to allocate blocks that are within
   // 2GB of the target.
   static PreamblePage* preamble_pages_;
-  
+
   // Page granularity
   static long granularity_;
 
@@ -436,9 +436,9 @@
   // target_function, we get to the address stop, we return
   // immediately, the address that jumps to stop_before.
   //
-  // @param stop_before_trampoline  When following JMP instructions from 
+  // @param stop_before_trampoline  When following JMP instructions from
   // target_function, stop before a trampoline is detected.  See comment in
-  // PreamblePatcher::RawPatchWithStub for more information.  This parameter 
+  // PreamblePatcher::RawPatchWithStub for more information.  This parameter
   // has no effect in 32-bit mode.
   //
   // @return Either target_function (the input parameter), or if
@@ -492,7 +492,7 @@
   static bool IsNearRelativeJump(unsigned char* target,
                                  unsigned int instruction_size);
 
-  // Helper routine that determines if a target instruction is a near 
+  // Helper routine that determines if a target instruction is a near
   // absolute call.
   //
   // @param target            Pointer to instruction.
@@ -503,7 +503,7 @@
   static bool IsNearAbsoluteCall(unsigned char* target,
                                  unsigned int instruction_size);
 
-  // Helper routine that determines if a target instruction is a near 
+  // Helper routine that determines if a target instruction is a near
   // absolute call.
   //
   // @param target            Pointer to instruction.
@@ -590,7 +590,7 @@
                                            unsigned char* target,
                                            unsigned int* target_bytes,
                                            unsigned int target_size);
-  
+
   // Helper routine that patches a 64-bit MOV instruction with a RIP-relative
   // displacement.  The target buffer must be within 2GB of the source.
   //

diff --git a/src/windows/preamble_patcher_test.cc b/src/windows/preamble_patcher_test.cc
index e4605c6..f3e0511 100644
--- a/src/windows/preamble_patcher_test.cc
+++ b/src/windows/preamble_patcher_test.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2011, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -341,7 +341,7 @@
 bool UnitTests() {
   return TestPatchWithPreambleNearRelativeCall() &&
       TestPatchWithPreambleAbsoluteJump() &&
-      TestPatchWithPreambleNearRelativeCondJump() && 
+      TestPatchWithPreambleNearRelativeCondJump() &&
       TestPatchWithPreambleShortCondJump() &&
       TestDisassembler() && TestPatchWithLongJump() &&
       TestPatchUsingDynamicStub() && PatchThenUnpatch() &&

diff --git a/src/windows/preamble_patcher_with_stub.cc b/src/windows/preamble_patcher_with_stub.cc
index 23f9d3a..d2c896c 100644
--- a/src/windows/preamble_patcher_with_stub.cc
+++ b/src/windows/preamble_patcher_with_stub.cc

@@ -1,11 +1,11 @@
 // -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 /* Copyright (c) 2007, Google Inc.
  * All rights reserved.
- * 
+ *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are
  * met:
- * 
+ *
  *     * Redistributions of source code must retain the above copyright
  * notice, this list of conditions and the following disclaimer.
  *     * Redistributions in binary form must reproduce the above
@@ -15,7 +15,7 @@
  *     * Neither the name of Google Inc. nor the names of its
  * contributors may be used to endorse or promote products derived from
  * this software without specific prior written permission.
- * 
+ *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -124,7 +124,7 @@
           - reinterpret_cast<__int64>(target) - 5;
       if (trampoline_offset > INT_MAX || trampoline_offset < INT_MIN) {
         // We're screwed.
-        SIDESTEP_ASSERT(false 
+        SIDESTEP_ASSERT(false
                        && "Preamble stub is too far from target to patch.");
         return SIDESTEP_UNEXPECTED;
       }

diff --git a/src/windows/system-alloc.cc b/src/windows/system-alloc.cc
index 9537745..bdd0392 100644
--- a/src/windows/system-alloc.cc
+++ b/src/windows/system-alloc.cc

@@ -1,10 +1,11 @@
+// -*- Mode: C++; c-basic-offset: 2; indent-tabs-mode: nil -*-
 // Copyright (c) 2013, Google Inc.
 // All rights reserved.
-// 
+//
 // Redistribution and use in source and binary forms, with or without
 // modification, are permitted provided that the following conditions are
 // met:
-// 
+//
 //     * Redistributions of source code must retain the above copyright
 // notice, this list of conditions and the following disclaimer.
 //     * Redistributions in binary form must reproduce the above
@@ -14,7 +15,7 @@
 //     * Neither the name of Google Inc. nor the names of its
 // contributors may be used to endorse or promote products derived from
 // this software without specific prior written permission.
-// 
+//
 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
@@ -46,7 +47,7 @@
 static SpinLock spinlock(SpinLock::LINKER_INITIALIZED);
 
 // The current system allocator declaration
-SysAllocator* sys_alloc = NULL;
+SysAllocator* tcmalloc_sys_alloc = NULL;
 // Number of bytes taken from system.
 size_t TCMalloc_SystemTaken = 0;
 
@@ -121,7 +122,7 @@
 static bool system_alloc_inited = false;
 void InitSystemAllocators(void) {
   VirtualSysAllocator *alloc = new (virtual_space) VirtualSysAllocator();
-  sys_alloc = tc_get_sysalloc_override(alloc);
+  tcmalloc_sys_alloc = tc_get_sysalloc_override(alloc);
 }
 
 extern PERFTOOLS_DLL_DECL
@@ -134,7 +135,7 @@
     system_alloc_inited = true;
   }
 
-  void* result = sys_alloc->Alloc(size, actual_size, alignment);
+  void* result = tcmalloc_sys_alloc->Alloc(size, actual_size, alignment);
   if (result != NULL) {
     if (actual_size) {
       TCMalloc_SystemTaken += *actual_size;
commit	20350acc2cd2f9271064477a1605129ef9585e6c	[log] [tgz]
author	Brian Silverman <bsilver16384@gmail.com>	Wed Nov 17 18:19:55 2021 -0800
committer	Brian Silverman <brian.silverman@bluerivertech.com>	Wed Nov 17 18:19:55 2021 -0800
tree	d0014a348441d5d70408012fc5e485c35b45220e
parent	745610d16119f59479f84918a66456ece9d6d461 [diff]