Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 8 Dec 2010 10:34:38 -0500
From: Nelson Elhage <>
Subject: Re: kernel: Dangerous interaction between
 clear_child_tid, set_fs(), and kernel oopses

On Wed, Dec 08, 2010 at 07:51:18AM +0300, Solar Designer wrote:
> Nelson, Dan, Steve -
> It's been a few days, so I'll over-quote a little bit.  Please see below:
> On Thu, Dec 02, 2010 at 12:21:14AM -0500, Nelson Elhage wrote:
> > I've discovered an interesting interaction in the Linux kernel between the
> > clear_child_tid feature of clone(2), and the set_fs() function used internally
> > in the kernel to temporarily disable access_ok() checking of userspace pointers.
> > 
> > Under some (not totally uncommon) circumstances, it is possible for a user to
> > leverage this interaction to turn a kernel oops or BUG() into a write of an
> > integer 0 to a user-controlled address in kernel memory.
> > 
> > I'm not sure if this merits a CVE or not; It is (as far as I can tell) only a
> > problem in the presence of another security bug, but it potentially makes a
> > large class of bugs significantly more dangerous (DoS -> privesc).
> > 
> > Reference:
> >
> To me, things like this are more important than individual NULL pointer
> dereference bugs or the like.  So if those get CVEs, this one definitely
> should as well.

Yeah, as you saw, Dan requested a CVE separately and this is CVE-2010-4258.

> Nelson - why are you proposing adding set_fs(USER_DS); not to the very
> beginning of do_exit(), but below a few calls/checks?  I don't think
> there's any performance improvement from that, and it feels
> "theoretically safer" to return to the sane/safe state as soon as
> possible.  I am currently looking at do_exit() in OpenVZ's RHEL5-based
> 2.6.18-194.26.1.el5.028stab079.1 - it does a bit more work before
> reaching the place you patch.  So I am tempted to introduce
> set_fs(USER_DS); as the very first statement in do_exit() instead.

I put the set_fs() after the in_interrupt() check, since set_fs() frobs the
current thread_info, and IIUC, we aren't guaranteed to have one on an interrupt
stack. So I wanted to preserve that check/immediate panic(), rather than
possible triggering a recursive fault or other weird behavior. Other than that,
I stuck it as early as possible.

If I'm wrong about it possibly failing on an interrupt stack, then yeah, it
might make sense to put it even earlier. Or to rearrange things so that the flow
is "check interrupt -> set_fs() -> everything else".

> Did you check whether 2.4 kernels are affected as well?

I have not. My man pages claim that CLONE_CHILD_CLEARTID is new since Linux
2.5.49, so that specific hole probably isn't there, though.

- Nelson

> Thanks,
> Alexander

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.