Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 24 Jul 2011 14:33:25 +0400
From: Vasiliy Kulikov <segoon@...nwall.com>
To: musl <musl@...ts.openwall.com>
Subject: holywar: malloc() vs. OOM

Rich,

This is more a question about your malloc() failure policy for musl than
an actual proposal.

When brk() or mmap() fails, libc usually returns NULL to the program.
If the program wants to gracefully handle OOM, it may do it.  If not, it
will likely generate SIGSEGV and will be killed without any problem.

However, there are potential issues with this behaviour:

1) If the program doesn't handle OOM at all, it can lead to security
problems.

 a) if NULL page is not mmap'ed (the case of all nonroot apps and most
of root apps), the page starting from vm.mmap_min_addr still may present
in the process' vm.  For some distros only one page was guarded this way
in the past.  So, if the allocation is bigger than ~4-64kb, and the
write begins from the end of the page, then some bad things may happen
before SIGSEGV (the worst case is privilege escalation).  This is a
patological case, I didn't see such cases myself, but it's possible in
theory.

 b) if NULL page is mmap'ed, the application might not identify OOM at
all as the page is mmap'ed and SIGSEGV is not sent.  (Yes, apps mmap'ing
NULL page must handle OOM, but see (2).)

2) If the program handle OOM, it might do it very bad way.  The OOM
handling code path is almost always not tested and contain bugs.  Even the
kernel, which obviously must handle OOM, doesn't properly handle it
(I found bugs in OOM handling code much more often than in other error
handling code) because this code is not tested.  DBUS daemon, which is
closely connected with init in modern distros, must not fail on OOM by
design (otherwise init would fail and the whole system would
hang/reboot), and it took much time to remove silent bugs in this code:

(http://blog.ometer.com/2008/02/04/out-of-memory-handling-d-bus-experience/)


In theory, these are bugs of applications and not of libc, and they
should be fully handled in programs, not in libc.  Period.

But looking at the problem from the pragmatic point of view we'll see
that libc is actually the easiest place where the problem may be 
workarounded (not fixed, surely).  The workaround would be simply
raising SIGKILL if malloc() fails (either because of brk() or mmap()).
For the rare programs craving to handle OOM such code should be used:

#define _OOM_MAY_FAIL_
#include <stdlib.h>

Then the workaround is disabled.


Probably I overestimate the importance of OOM errors, and (1) in
particular.   However, I think it is worth discussing.


Thanks,

-- 
Vasiliy

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.