Date: Wed, 28 Oct 2015 22:10:14 -0400 From: Rich Felker <dalias@...c.org> To: nommu@...mu.org Cc: musl@...ts.openwall.com Subject: Behavior of mmap on Linux/nommu, & musl dynamic linker Presently musl's dynamic linker is not behaving entirely correctly on nommu systems, and I think I understand the issues, but Most of this applies only to non-FDPIC (plain ELF with constant displacement between segments) loading, so I'll start with that part: What we generally do to map libraries in this case on systems with MMU is start out with one large mmap, starting at the beginning of the lowest-address PT_LOAD segment, using the permissions of that segment, but whose length is the total amount of address space that needs to be reserved. (Note that some of this mapping may be past the end of the file, in which case access may SIGBUS, but we never intend to access it so that doesn't matter.) After that, we mmap additional segments over top of parts of that address range using MAP_FIXED. This yields a minimum number of mmap calls/changes to the vm layout of the process and thus very efficient loading for small libraries where syscall time dominates relocation time. Unfortunately, MAP_FIXED is not accepted at all on Linux/nommu; it unconditionally produces EINVAL. In principle it should be possible to use MAP_FIXED like this to replace parts of private mappings, but it's no more efficient than simply using read/memcpy to replace the data. So what musl is doing right now is handling the failure of MAP_FIXED with EINVAL by using read to load the file contents at the appropriate addresses within the range obtained by the first mmap call. Now here's where we hit the next problem: for non-writable private mappings, Linux/nommu sets the VM_MAYSHARE flag (indicated in /proc/%d/maps with a lowercase 's' instead of 'p') and in principle the file operations backend is allowed to assign an address range that's actually shared with other processes mapping the file (or the actual rom/cache/whatever copy of the file). This behavior actually justifies the choice to disallow MAP_FIXED; if the address range obtained for a non-writable private map is actually shared memory that other processes may be using for their own non-writable private or shared maps of the same file, then using MAP_FIXED to reassign the address range to different use is not possible. What's also likely not valid is the way musl is using read to fill in the additional segments' contents. Since the first segment is generally text (non-writable), the map returned by mmap could potentially be a shared map, in which case we would clobber memory shared with other processes. In practice, this isn't happening, but I'm not sure why. The following comment in mm/nommu.c may suggest a reason: /* if we want to share, we need to check for regions created by other * mmap() calls that overlap with our proposed mapping * - we can only share with a superset match on most regular files * - shared mappings on character devices and memory backed files are * permitted to overlap inexactly as far as we are concerned for in * these cases, sharing is handled in the driver or filesystem rather * than here */ For libraries without debug info or with large bss, the total mapping length is likely to be larger than the total file length, in which case the tests for shareability may always fail -- I haven't actually checked this because the logic is complex and hard to follow, but it seems plausible. However obviously something needs to be changed. What I think I should do is detect the failure of MAP_FIXED the first time it fails, unmap the (now-useless) initial map, and switch to using private anonymous maps followed by read for setting up any address ranges that need writable subranges. uClibc uses a slightly simpler approach without first trying MAP_FIXED and falling back (just allocating anonymous memory to begin with), but musl supports both mmu-ful and nommu runtime environments, and dropping support for shared text and COW on mmu-ful runtime environments is not a reasonable option. Only one small part of all this applies to musl's FDPIC ELF loader: bss allocation. When we map a writable PT_LOAD segment with bss (p_memsz>p_filesz), the initial mmap is potentially larger than the file, and a second mmap with MAP_FIXED is used to replace the bss part of the mapping with anonymous zero pages. This operation fails on nommu right now, and we fall back to memset, which would potentially SIGBUS in a runtime environment with mmu. So there is an assumption encoded here that nommu does not SIGBUS, that the full mapping length requested actually has memory underlying it. This seems reasonable, but I'd welcome feedback if anyone has good reason to disagree. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.