Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 1 Jan 2018 19:15:50 -0800
From: John Reiser <>
Subject: Re: [PATCH] Add comments to i386 assembly source

On 01/01/2018 13:49 UTC, Rich Felker wrote:
> On Mon, Jan 01, 2018 at 02:57:02PM -0800, John Reiser wrote:
>> There's a bug.  clone() is a user-level function that can be used
>> independently of the musl internal implementation of threads.
>> Thus when clone() in musl/src/linux/clone.c calls
>>          return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls, ctid));
>> then the i386 implementation of __clone has no guarantee about
>> the value in %gs, and it is a bug to assume that (%gs >> 3)
>> fits in 8 bits.
> The ABI is that at function call or any time a signal could be
> received, %gs must always be a valid segment register value reflecting
> the current thread's thread pointer. If this is violated, the program
> has undefined behavior.

More than one segment descriptor can designate the same subset
of the linear address space.  Duplicate the segment descriptor
to a target selector that is >= 256, and load %gs with the
duplicate selector before calling clone().

>> The code in musl/src/thread/i386/clone.s wastes up to 12 bytes
>> when aligning the new stack, by aligning before [pre-]allocating
>> space for the one argument to the thread function.
> I suspect the initial value happens to be aligned anyway in which case
> reserving 16 bytes and aligning to 16 is the same as reserving 4 and
> aligning to 16. If you think it's not, I don't mind changing if you
> can do careful testing to make sure it doesn't introduce any bugs.

This is another bug!  Consider the valid code:
	void **lo_stack = malloc(5 * sizeof(void *));
	/* malloc() guarantees 16-byte alignment of lo_stack */
	clone(func, &lo_stack[5], ...);

then __clone() does:
	and $-16,%ecx  /* &lo_stack[4] */
	sub $ 16,%ecx  /* &lo_stack[0] */
	mov %ecx,%esp  /* new thread: implicit action of ___NR_clone system call */
	call *%eax  /* OUT-OF-BOUNDS:  lo_stack[-1] = return address */

Thus, starting the thread function has scribbled outside the allocated area,
even though the lo_stack[] array can accommodate the call by the code I showed:
	lea -NBPW(arg2),%ecx  /* &lo_stack[4] */
	and $-16,%ecx  /* still &lo_stack[4] */
	mov %ecx,%esp  /* new thread: implicit action of __NR_clone system call */
	call *%eax  /* lo_stack[3] = return address */

The danger is not "new bugs", but rather revealing latent bugs that were
obscured by the less-strict old code.  For instance, if the thread
function actually has two formal parameters, or if it uses va_arg()
to reference beyond the first actual argument, then running the optimal
code is more likely to notice.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.