Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 18 Mar 2020 04:39:39 +0100
From: Adam Zabrocki <pi3@....com.pl>
To: lkrg-users@...ts.openwall.com
Subject: Re: lkrg is freezing whole server during boot.
 Kernel-5.5.9

Hi,

It looks like you have some running task which kernel is unable to freeze:

    "Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, 
    wq_busy=0)"

It looks like you are hitting some kernel problem (not LKRG). However, you 
should try to find out which task is problematic and what could be the reason 
why this task can't be frozen. Maybe it is hitting some bug?

However, from the stack trace I can see that it might be related to 
"nf_tables_set.ko" module. Can you load LKRG after this module is loaded? Or 
opposide, can you guarantee it is loaded before you try to load LKRG?


Thanks,
Adam


On Wed, Mar 18, 2020 at 01:39:18AM +0100, bryn1u85 . wrote:
> Hey guys,
> 
> I have a strange problem with lkrg. The LKRG has been compiled well based
> on newest source from gitlab. After all i just run lkrg and working well
> too but after reboot my server was freezed. I undertook investment and
> found something. LKRG says that freezes server for 20 seconds and so on.
> 
> Mar 18 00:56:46 localhost.localdomain kernel: Freezing user space processes
> ...
> *Mar 18 00:56:46 localhost.localdomain kernel: Freezing of tasks failed
> after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0):*
> Mar 18 00:56:46 localhost.localdomain kernel: nft             D    0   922
>      1 0x00004004
> Mar 18 00:56:46 localhost.localdomain kernel: Call Trace:
> Mar 18 00:56:46 localhost.localdomain kernel:  ? __schedule+0x2e4/0x780
> Mar 18 00:56:46 localhost.localdomain kernel:  ?
> insn_get_modrm.part.0+0x5c/0xe0
> Mar 18 00:56:46 localhost.localdomain kernel:  schedule+0x50/0xc0
> Mar 18 00:56:46 localhost.localdomain kernel:  schedule_timeout+0x20a/0x300
> Mar 18 00:56:46 localhost.localdomain kernel:  ? cpumask_next+0x1b/0x20
> Mar 18 00:56:46 localhost.localdomain kernel:
>  wait_for_completion+0x119/0x160
> Mar 18 00:56:46 localhost.localdomain kernel:  ? wake_up_q+0xa0/0xa0
> Mar 18 00:56:46 localhost.localdomain kernel:  __wait_rcu_gp+0x139/0x140
> Mar 18 00:56:46 localhost.localdomain kernel:  synchronize_rcu+0x68/0x70
> Mar 18 00:56:46 localhost.localdomain kernel:  ? __call_rcu+0x4e0/0x4e0
> Mar 18 00:56:46 localhost.localdomain kernel:  ?
> __bpf_trace_rcu_utilization+0x20/0x20
> Mar 18 00:56:46 localhost.localdomain kernel:
>  __nf_tables_abort+0x1f2/0x8f0 [nf_tables]
> Mar 18 00:56:46 localhost.localdomain kernel:  nf_tables_abort+0x1a/0x40
> [nf_tables]
> Mar 18 00:56:46 localhost.localdomain kernel:
>  nfnetlink_rcv_batch+0x4a1/0x700 [nfnetlink]
> Mar 18 00:56:46 localhost.localdomain kernel:  ?
> __nla_validate_parse+0x53/0x8a0
> Mar 18 00:56:46 localhost.localdomain kernel:  ? security_capable+0x42/0x60
> Mar 18 00:56:46 localhost.localdomain kernel:  nfnetlink_rcv+0x117/0x163
> [nfnetlink]
> Mar 18 00:56:46 localhost.localdomain kernel:  netlink_unicast+0x194/0x230
> Mar 18 00:56:46 localhost.localdomain kernel:  netlink_sendmsg+0x232/0x470
> Mar 18 00:56:46 localhost.localdomain kernel:  sock_sendmsg+0x61/0x70
> Mar 18 00:56:46 localhost.localdomain kernel:  ____sys_sendmsg+0x207/0x250
> Mar 18 00:56:46 localhost.localdomain kernel:  ___sys_sendmsg+0x8c/0xd0
> Mar 18 00:56:46 localhost.localdomain kernel:  __sys_sendmsg+0x5c/0xa0
> Mar 18 00:56:46 localhost.localdomain kernel:  do_syscall_64+0x74/0x433
> Mar 18 00:56:46 localhost.localdomain kernel:
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Mar 18 00:56:46 localhost.localdomain kernel: RIP: 0033:0x7fd5804ccbb7
> Mar 18 00:57:06 localhost.localdomain kernel: Code: Bad RIP value.
> Mar 18 00:57:06 localhost.localdomain kernel: RSP: 002b:00007ffd1a4ff278
> EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> Mar 18 00:57:06 localhost.localdomain kernel: RAX: ffffffffffffffda RBX:
> 0000000000000001 RCX: 00007fd5804ccbb7
> Mar 18 00:57:06 localhost.localdomain kernel: RDX: 0000000000000000 RSI:
> 00007ffd1a500300 RDI: 0000000000000003
> Mar 18 00:57:06 localhost.localdomain kernel: RBP: 00007ffd1a500400 R08:
> 0000000000000000 R09: 00007ffd1a4ff280
> Mar 18 00:57:06 localhost.localdomain kernel: R10: 00007fd580694398 R11:
> 0000000000000246 R12: 0000000000000003
> Mar 18 00:57:06 localhost.localdomain kernel: R13: 0000000000000016 R14:
> 00007ffd1a4ff290 R15: 00007ffd1a500460
> Mar 18 00:57:06 localhost.localdomain kernel: OOM killer enabled.
> Mar 18 00:57:06 localhost.localdomain kernel: Restarting tasks ... done.
> Mar 18 00:57:06 localhost.localdomain kernel: Freezing user space processes
> ...
> *Mar 18 00:57:06 localhost.localdomain kernel: Freezing of tasks failed
> after 20.008 seconds (1 tasks refusing to freeze, wq_busy=0):*
> Mar 18 00:57:06 localhost.localdomain kernel: nft             D    0   922
>      1 0x00004004
> Mar 18 00:57:06 localhost.localdomain kernel: Call Trace:
> Mar 18 00:57:06 localhost.localdomain kernel:  ? __schedule+0x2e4/0x780
> Mar 18 00:57:06 localhost.localdomain kernel:  ?
> insn_get_modrm.part.0+0x5c/0xe0
> Mar 18 00:57:06 localhost.localdomain kernel:  schedule+0x50/0xc0
> Mar 18 00:57:06 localhost.localdomain kernel:  schedule_timeout+0x20a/0x300
> Mar 18 00:57:06 localhost.localdomain kernel:  ? cpumask_next+0x1b/0x20
> Mar 18 00:57:06 localhost.localdomain kernel:
>  wait_for_completion+0x119/0x160
> Mar 18 00:57:06 localhost.localdomain kernel:  ? wake_up_q+0xa0/0xa0
> Mar 18 00:57:06 localhost.localdomain kernel:  __wait_rcu_gp+0x139/0x140
> Mar 18 00:57:06 localhost.localdomain kernel:  synchronize_rcu+0x68/0x70
> Mar 18 00:57:06 localhost.localdomain kernel:  ? __call_rcu+0x4e0/0x4e0
> Mar 18 00:57:06 localhost.localdomain kernel:  ?
> __bpf_trace_rcu_utilization+0x20/0x20
> Mar 18 00:57:06 localhost.localdomain kernel:
>  __nf_tables_abort+0x1f2/0x8f0 [nf_tables]
> Mar 18 00:57:06 localhost.localdomain kernel:  ?
> nf_tables_newtable+0x3fd/0x580 [nf_tables]
> Mar 18 00:57:06 localhost.localdomain kernel:  nf_tables_abort+0x1a/0x40
> [nf_tables]
> Mar 18 00:57:06 localhost.localdomain kernel:
>  nfnetlink_rcv_batch+0x4a1/0x700 [nfnetlink]
> Mar 18 00:57:06 localhost.localdomain kernel:  ?
> __nla_validate_parse+0x53/0x8a0
> Mar 18 00:57:06 localhost.localdomain kernel:  ? security_capable+0x42/0x60
> Mar 18 00:57:06 localhost.localdomain kernel:  nfnetlink_rcv+0x117/0x163
> [nfnetlink]
> Mar 18 00:57:06 localhost.localdomain kernel:  netlink_unicast+0x194/0x230
> Mar 18 00:57:06 localhost.localdomain kernel:  netlink_sendmsg+0x232/0x470
> Mar 18 00:57:06 localhost.localdomain kernel:  sock_sendmsg+0x61/0x70
> Mar 18 00:57:06 localhost.localdomain kernel:  ____sys_sendmsg+0x207/0x250
> Mar 18 00:57:06 localhost.localdomain kernel:  ___sys_sendmsg+0x8c/0xd0
> Mar 18 00:57:06 localhost.localdomain kernel:  __sys_sendmsg+0x5c/0xa0
> Mar 18 00:57:06 localhost.localdomain kernel:  do_syscall_64+0x74/0x433
> Mar 18 00:57:06 localhost.localdomain kernel:
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> Mar 18 00:57:06 localhost.localdomain kernel: RIP: 0033:0x7fd5804ccbb7
> Mar 18 00:57:06 localhost.localdomain kernel: Code: Bad RIP value.
> Mar 18 00:57:06 localhost.localdomain kernel: RSP: 002b:00007ffd1a4ff278
> EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> Mar 18 00:57:06 localhost.localdomain kernel: RAX: ffffffffffffffda RBX:
> 0000000000000001 RCX: 00007fd5804ccbb7
> Mar 18 00:57:06 localhost.localdomain kernel: RDX: 0000000000000000 RSI:
> 00007ffd1a500300 RDI: 0000000000000003
> Mar 18 00:57:06 localhost.localdomain kernel: RBP: 00007ffd1a500400 R08:
> 0000000000000000 R09: 00007ffd1a4ff280
> Mar 18 00:57:06 localhost.localdomain kernel: R10: 00007fd580694398 R11:
> 0000000000000246 R12: 0000000000000003
> Mar 18 00:57:06 localhost.localdomain kernel: R13: 0000000000000016 R14:
> 00007ffd1a4ff290 R15: 00007ffd1a500460
> Mar 18 00:57:06 localhost.localdomain kernel: OOM killer enabled.
> Mar 18 00:57:06 localhost.localdomain kernel: Restarting tasks ... done.
> Mar 18 00:57:06 localhost.localdomain kernel: Freezing user space processes
> ...
> 
> This freeze is looping again and again as i said for 20 seconds. The server
> can't finish booting process. Someone can help ?? When i disable lkrg
> everything back to normal.
> 
> Best regards,
> Micha??,

-- 
pi3 (pi3ki31ny) - pi3 (at) itsec pl
http://pi3.com.pl

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.