lkrg-users - Re: LKRG 0.7 CI & ED bypass

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1E86562D-62B5-41E9-BE0A-DD6B443A2A9D@gmail.com>
Date: Fri, 26 Jul 2019 21:03:35 +0400
From: Ilya Matveychikov <matvejchikov@...il.com>
To: lkrg-users@...ts.openwall.com
Subject: Re: LKRG 0.7 CI & ED bypass



> On Jul 26, 2019, at 8:31 PM, Adam Zabrocki <pi3@....com.pl> wrote:
> 
> Hi,
> 
> I was managed to fix the PoC and make a repro. Original PoC is generating a 
> fatal exception (on my VMs) most likely because of the #PF during user-mode 
> page reference. Since int3 instruction generates kprobe exception we have #PF 
> in int3 and have fatal exception. Nevertheless, I was managed to fix the PoC 
> that #PF is not generated at all and then I repro entire scenario. Moreover 
> I've improved PoC in a various ways that it works on a SMEP machines as well. 
> However, this PoC does not leave machine in a stable state and has some 
> limitations:

Where is this modified POC available?

> 
> - if SMEP is enabled, it works around 60%-70% of time (at least on my 
> various test machines). LKRG has a chance to detect it, or to generate other 
> type of crashes. 60%-70% numbers might be different, depends on the 
> environment so I would not make strong assumption on that. However, it is not 
> stable to work all the time.
> - 'text_mutex' is never released (to block CI) and machine is very slow:
>    a. All of my machines are stuck wih 99.9+ CPU usage, e.g. %Cpu(s):  0.0 
> us,100.0 sys
>    b. Some of my machine are spitting OOM - depends how overloaded machine 
> is
>    c. You can't unload any kernel module
>    d. If you try to load any kernel module, machine will freeze
>    e. None of the kernel functionality which relies on that lock will work, 
> e.g. tracing, perf, etc.
> - Kernel is trying to restore from the 'bad state' and trying to kill 
> 'stuck' threads. You are spammed in the logs with e.g.:
> 
>    Jul 25 12:10:47 pi3-ubuntu kernel: INFO: task kworker/u480:1:47 blocked for more than 120 seconds.
>    Jul 25 12:10:47 pi3-ubuntu kernel:       Tainted: G           OE   4.8.0-53-generic #56~16.04.1-Ubuntu
>    Jul 25 12:10:47 pi3-ubuntu kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>    Jul 25 12:10:47 pi3-ubuntu kernel: kworker/u480:1  D ffff8a2dff777cf8     0    47      2 0x00000000
>    Jul 25 12:10:47 pi3-ubuntu kernel: Workqueue: events_unbound p_check_integrity [p_lkrg]
>    Jul 25 12:10:47 pi3-ubuntu kernel:  ffff8a2dff777cf8 ffff8a2dff4d56c0 ffffffff8d60d500 ffff8a2dff4d4c40
>    Jul 25 12:10:47 pi3-ubuntu kernel:  0000000000000286 ffff8a2dff778000 ffffffff8d649da4 ffff8a2dff4d4c40
>    Jul 25 12:10:47 pi3-ubuntu kernel:  00000000ffffffff ffffffff8d649da8 ffff8a2dff777d10 ffffffff8d096045
>    Jul 25 12:10:47 pi3-ubuntu kernel: Call Trace:
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d096045>] schedule+0x35/0x80
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d0962ee>] schedule_preempt_disabled+0xe/0x10
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d097f49>] __mutex_lock_slowpath+0xb9/0x130
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d097fdf>] mutex_lock+0x1f/0x30
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffffc06d9c52>] p_check_integrity+0xe2/0x1360 [p_lkrg]
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c89d89b>] process_one_work+0x16b/0x4a0
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c89dc1b>] worker_thread+0x4b/0x500
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c89dbd0>] ? process_one_work+0x4a0/0x4a0
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c89dbd0>] ? process_one_work+0x4a0/0x4a0
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c8a3fb8>] kthread+0xd8/0xf0
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8d09aa9f>] ret_from_fork+0x1f/0x40
>    Jul 25 12:10:47 pi3-ubuntu kernel:  [<ffffffff8c8a3ee0>] ? kthread_create_on_node+0x1e0/0x1e0
> 
>   a. Depends on the kernel configuration, it might happen more or less often. 
> You can configure machine to not generate that messages.
>   b. Machine can also be configured to invoke panic() if task is being 'stuck' 
> / hung like in that situation. It is controled by 
> "/proc/sys/kernel/hung_task_panic" interface. Some distros do enable panic on 
> hung by default.
> 
> - If you do not restore mutexes to the valid state, you machine will finally 
> crash (it's is on the slow DoS path), you can also see it in the process logs 
> (a lot of tasks):
>    2176 root      20   0       0      0      0 R   8.6  0.0   2:29.08 kworker/u480:5
>    2185 root      20   0       0      0      0 R   8.6  0.0   1:06.75 kworker/u480:11
>       6 root      20   0       0      0      0 R   8.3  0.0   2:46.26 kworker/u480:0
>    2178 root      20   0       0      0      0 R   8.3  0.0   2:16.42 kworker/u480:6
>    2182 root      20   0       0      0      0 R   8.3  0.0   1:38.66 kworker/u480:8
>    2190 root      20   0       0      0      0 R   8.3  0.0   0:54.86 kworker/u480:15
>    2200 root      20   0       0      0      0 R   8.3  0.0   0:46.68 kworker/u480:25
>    2207 root      20   0       0      0      0 R   8.3  0.0   0:36.62 kworker/u480:27
>    2212 root      20   0       0      0      0 R   8.3  0.0   0:17.43 kworker/u480:32
>    2213 root      20   0       0      0      0 R   8.3  0.0   0:27.97 kworker/u480:33
>    ...
>    ...
>    2221 root      20   0       0      0      0 R   7.0  0.0   0:14.28 kworker/u480:41
>    2233 root      20   0       0      0      0 R   7.0  0.0   0:10.17 kworker/u480:43 
> 

^^^ The lack of proper cleanup after (unlocking of text-mutex) was mentioned in
my original message. Obviously, it’s wrong to leave it locked forever. But you’ve
got the idea of how it might be used, albeit this vector is considered as “known”.


> We were aware about possibility of attacking synchronization mechanism at it 
> is documented (e.g. here 
> https://www.openwall.com/presentations/CONFidence2018-LKRG-Under-The-Hood/slide-39.html). 
> How machine reacts on that type of attack, matches what I've seen during 
> first LKRG developement.
> 
> LKRG's CI should verify SMEP / WP CPU bits, but currently it does not do it. 
> It is wrong, so I've prepared a simple patch which verifies critical CPU 
> bits, on every CPU-core, whenever CI is invoked and before any mutex/spinlock 
> is taken:
> 
> https://bitbucket.org/Adam_pi3/lkrg-main/commits/13a9b5c3a93549b5f0ac1f8317ced3baefbfa501
> 
> This patch always stops the current PoC (on machines with SMEP).
> As a workaround you can also enable /proc/sys/kernel/hung_task_panic and tune 
> timeout value.
> 

Do you really think SMEP is able somehow to prevent the exploitation? All these dances
around SMEP bypass detection worth nothing in total as finally one can make a pure ROP
exploitation without even executing a bit of user-space code from the kernel.

Ilya
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.