oss-security - Re: [PATCH] exec argument expansion can inappropriately trigger OOM-killer

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100830194806.GC25870@openwall.com>
Date: Mon, 30 Aug 2010 23:48:06 +0400
From: Solar Designer <solar@...nwall.com>
To: Roland McGrath <roland@...hat.com>
Cc: Kees Cook <kees.cook@...onical.com>, linux-kernel@...r.kernel.org,
	oss-security@...ts.openwall.com, Al Viro <viro@...iv.linux.org.uk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Neil Horman <nhorman@...driver.com>, linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH] exec argument expansion can inappropriately trigger OOM-killer

On Mon, Aug 30, 2010 at 03:06:16AM -0700, Roland McGrath wrote:
> And I say, if your userland process could really allocate another 200GB,
> then more power to you, you can do it with an exec too.  If you could do
> the same with a userland stack allocation, and spend all that time in
> strlen calls and then memcpy, you can do it inside execve too.  If it
> takes days, that's what you asked for, and it's your process.  It just
> ought to be every bit (or near enough) as preemptible and interruptible
> as that normal userland activity ought to be.

This makes sense to me.  However, introducing a new preemption point
may violate assumptions under which the code was written and reviewed
in the past.  In the worst case, we'd introduce/expose race conditions
allowing for privilege escalation.

> So, perhaps we want this (count already has a cond_resched in its loop):

Good point re: count() already having this (I think it did not in 2.2).

> @@ -400,6 +403,10 @@ static int copy_strings(int argc, const 
>  		int len;
>  		unsigned long pos;
>  
> +		if (signal_pending(current))
> +			return -ERESTARTNOINTR;
> +		cond_resched();

So, in current kernels, you're making it possible for more kinds of
things to change after prepare_binprm() but before
search_binary_handler().  We'd need to check for possible implications
of this.

I must admit I am not familiar with what additional kinds of things may
change when execution is preempted.  This made a significant difference
in some much older kernels (many years ago), but now that the kernel
makes a lot less use of locking most things may be changed by another
CPU even without preemption.  So does anyone have a list of what
additional risks we're exposed to, if any, when we allow preemption in
current kernels?

> Has someone reported this BUG_ON failure mode with a reproducer?

64bit_dos.c was supposed to be the reproducer, and I managed to get it
to work (as I've documented in another message earlier today).  The
prerequisites appeared to be (some of these might be specific to my
tests, though):

- 64-bit kernel with 32-bit userland support (e.g., CONFIG_IA32_EMULATION);
- 64-bit build of 64bit_dos.c;
- 32-bit build of the target program;
- no dynamic linking in the target program;
- "ulimit -s unlimited" before running the reproducer program;
- over 3 GB of RAM in the system.

> [...]  Rather than better enabling OOM killing, I think what really
> makes sense is for the nascent mm to be marked such that allocations in
> it (they'll be from get_arg_page->get_user_pages->handle_mm_fault) just
> fail with ENOMEM before it resorts to the OOM killer (or perhaps even to
> very aggressive pageout).  That should percolate back to the execve just
> failing with ENOMEM, which is nicer than OOM kill even if the OOM killer
> actually does pick exactly and only the right target.

I agree.

Thanks,

Alexander

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.