Date: Tue, 31 Jul 2018 10:25:07 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: malloc implementation survey: omalloc On Tue, Jul 31, 2018 at 09:49:30AM +0200, Markus Wichmann wrote: > > It ran well on everything from a arm cortex-m0 to an intel core i7. The > > code compiled to 504 .text bytes, on cortex-m0, iirc. > > > > I wrote it originally for using on the kernel-side of an rtos, but it could > > easily be extended to make a syscall when a process runs out of ram. > > > > Obviously, a shortcoming is that the memory blocks must be PoT and there is > > the potential for fragmentation. Otherwise though, the meta-information is > > intrinsic within the pointer, which obliviates the need for external > > storage. > > > > That does sound interesting, but I don't really follow. The meta-info > which we need to save is the length of the object returned. How can you > encode that in the pointer without any external storage? Using internal > storage? Yeah, that is where we are right now, and that means there's > only one buffer overflow between us and oblivion at any given time. This isn't the case with a minimally reasonable design, including musl's current one. The chunk header and footer must match; if they don't, it's proof of UB and the allocator terminates the program. The current implementation is not hardened against replacing real headers with apparently valid, matching ones, but this hardening can be achieved by xor'ing them with a canary. One could even make an array of canaries, indexed by some bits or a hash of the pointer, so that stealing the canary for one chunk doesn't let you forge another one. In many ways, internal storage of metadata is *more hardened* because the allocator is able to catch overflows this way. With no internal data for the allocator to consistency-check, attackers can just target application data in adjacent chunks and don't have to worry that the allocator will catch the attack and terminate. > What I am getting at is this: We can't just benchmark a few algorithms > and stick with the fastest one. Nor is a simple algorithmic analysis > enough, since often enough, simple lists can be replaced with more > complicated containers yielding better performance for some loads. It's > complicated... Because fast and not wasting memory or producing catastrophic fragmentation are usually mutually exclusive. Eliminating runaway waste and fragmentation without a major performance regression is the primary goal. Being faster is a secondary goal. I know the "main source" of bad behavior in the current design can be eliminated by getting rid of the fine-grained locks and switching to a global lock. It can probably also be solved by holding multiple locks at once and a complex lock order protocol, but it's not clear that wouldn't be slower than just using a global lock. Major new direction in design has a better chance of finding something where locality of access needed for allocator operations is not a significant tradeoff with fragmentation or over-allocation. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.