|
|
Message-ID: <22b1db07-aa95-ff84-cef0-708f6f968c45@virtuozzo.com>
Date: Thu, 28 Jun 2018 16:08:29 +0300
From: Vasily Averin <vvs@...tuozzo.com>
To: Solar Designer <solar@...nwall.com>, owl-dev@...ts.openwall.com
Subject: Re: 32-bit syscall breakage in -431 kernel with KAISER
On 06/26/2018 10:13 PM, Solar Designer wrote:
> On Tue, Jun 26, 2018 at 08:00:49PM +0200, Pavel Kankovsky wrote:
>> It turns out the handler is invoked with an unexpected stack frame in the
>> "trampoline stack" (the top of the "trampoline stack", as stored in
>> TSS.rsp0, is 0xffff810001a36278 in this example):
>>
>> (gdb) x/6g $rsp
>> 0xffff810001a36248: 0x00000000b7f81ea1 0x0000000000000073
>> 0xffff810001a36258: 0x0000000000000246 0x00000000bfa78274
>> 0xffff810001a36268: 0x000000000000007b 0x000000000000007b
>>
>> There should be five entries (rip at the bottom, cs, eflags, rsp, and ss
>> at the top) only but the actual stack contains one mysterious and bogus
>> extra entry (the other 0x000000000000007b) at the top, shifting everything
>> else down by 8 bytes.
>>
>> This stack frame (including the bogus extra entry) is copied to the
>> regular kernel stack. As far as I can tell, the same extra entry appears
>> in other interrupt handlers, e.g. common_interrupt.
>>
>> Functions that get an explicit pointer to struct pt_regs are not affected
>> but anything that expects to find pt_regs at the top of the stack (e.g.
>> compat_alloc_user_space via task_pt_regs(current)) breaks.
>>
>> It seems the extra entry is added because the top of the trampoline stack
>> is not aligned to 16 bytes and the CPU does not like it and enforces the
>> alignment, shifting the whole stack down wrt. the expected layout as a
>> result.
>
> AMD64 Architecture Programmer's Manual, Volume 2, Section 8.9 mentions
> 16-byte alignment:
>
> "Interrupt-Stack Alignment. In legacy mode, the interrupt-stack pointer
> can be aligned at any address boundary. Long mode, however, aligns the
> stack on a 16-byte boundary. This alignment is performed by the
> processor in hardware before pushing items onto the stack frame. The
> previous RSP is saved unconditionally on the new stack by the interrupt
> mechanism. A subsequent IRET instruction automatically restores the
> previous RSP."
>
> "Although the RSP alignment is always performed in long mode, it is only
> of consequence when the interrupted program is already running at CPL=0,
> and it is generally used only within the operating-system kernel. The
> operating system should put 16-byte aligned RSP values in the TSS for
> interrupts that change privilege levels."
>
>> I have modified struct tss_struct in include/asm-x86_64/processor.h to
>> include an extra "unsigned long stack_padding" between stack_canary and
>> stack to make stack 16-byte aligned...
>>
>>
>> ---snip---
>> --- include/asm-x86_64/processor.h.orig 2018-06-26 12:11:17 +0000
>> +++ include/asm-x86_64/processor.h 2018-06-26 16:50:55 +0000
>> @@ -269,6 +269,7 @@
>> */
>> #ifndef __GENKSYMS__
>> unsigned long stack_canary;
>> + unsigned long stack_padding;
>> unsigned long stack[64];
>> #endif
>> } __attribute__((packed, __aligned__(PAGE_SIZE)));
>> ---snip---
>>
>>
>> ...and lo! 32-bit ifconfig works as expected.
>
> Wow. But per my review of the full struct tss_struct, the stack[] field
> offset is:
>
> 4+8*5+4*2+2*2+1025*8+8 = 8264
Alexander,
seems you're wrong
in my version of rhel5-based -123.1 kernel
crash> tss_struct -o
struct tss_struct {
[0x0] u32 reserved1;
[0x4] u64 rsp0;
[0xc] u64 rsp1;
[0x14] u64 rsp2;
[0x1c] u64 reserved2;
[0x24] u64 ist[7];
[0x5c] u32 reserved3;
[0x60] u32 reserved4;
[0x64] u16 reserved5;
[0x66] u16 io_bitmap_base;
[0x68] unsigned long io_bitmap[1025];
[0x2070] unsigned long stack_canary;
[0x2078] unsigned long stack[64];
}
SIZE: 0x3000
crash> tss_struct -od
struct tss_struct {
[0] u32 reserved1;
[4] u64 rsp0;
[12] u64 rsp1;
[20] u64 rsp2;
[28] u64 reserved2;
[36] u64 ist[7];
[92] u32 reserved3;
[96] u32 reserved4;
[100] u16 reserved5;
[102] u16 io_bitmap_base;
[104] unsigned long io_bitmap[1025];
[8304] unsigned long stack_canary;
[8312] unsigned long stack[64];
}
SIZE: 12288
Seems you missed that 'ist' filed is an array
>
> So it's not 16-byte aligned. But that's the start of stack[], and we
> need the stack pointer to be aligned. Yet I suppose aligning stack[]
> itself is in fact the most appropriate fix.
>
> Reviewing the diffs between e.g. -423 and -431, I see this stack[] in
> struct tss is in fact newly added:
>
> @@ -245,7 +248,29 @@
> * 8 bytes, for an extra "long" of ~0UL
> */
> unsigned long io_bitmap[IO_BITMAP_LONGS + 1];
> -} __attribute__((packed)) ____cacheline_aligned;
> + /*
> + *
> + * The Intel SDM says (Volume 3, 7.2.1):
> + *
> + * Avoid placing a page boundary in the part of the TSS that the
> + * processor reads during a task switch (the first 104 bytes). The
> + * processor may not correctly perform address translations if a
> + * boundary occurs in this area. During a task switch, the processor
> + * reads and writes into the first 104 bytes of each TSS (using
> + * contiguous physical addresses beginning with the physical address
> + * of the first byte of the TSS). So, after TSS access begins, if
> + * part of the 104 bytes is not physically contiguous, the processor
> + * will access incorrect information without generating a page-fault
> + * exception.
> + *
> + * There are also a lot of errata involving the TSS spanning a page
> + * boundary. Assert that we're not doing that.
> + */
> +#ifndef __GENKSYMS__
> + unsigned long stack_canary;
> + unsigned long stack[64];
> +#endif
> +} __attribute__((packed, __aligned__(PAGE_SIZE)));
>
> Do you think this is a RHEL5 bug?
>
> Thanks,
>
> Alexander
>
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.