|
|
Message-ID: <20210816184105.GA2071@pi3.com.pl>
Date: Mon, 16 Aug 2021 20:41:05 +0200
From: Adam Zabrocki <pi3@....com.pl>
To: lkrg-users@...ts.openwall.com
Subject: Re: Re:Re: Re:deadlock happen
on p_rb_hash[i].p_lock.lock
Hi Ethan,
I took a look at yhe stack traces. Since LKRG 0.9 we do have
p_ttwu_do_wakeup_entry. The hook for this function was removed. In fact that's
one of the reason I would suggest to update LKRG...
- Adam
On Fri, Aug 13, 2021 at 03:33:40PM +0800, youyan wrote:
> hi Adam
> The deadlock issue due to hard to reproduce , it needs dozens of machines and weeks. At the same time, the machine has been mass-produced。So
> I can not switch new lkrg code before full verity test.
> On my machine has fellow funtion ftrace.Could you help me review? If some situation may casue deallock? while before p_cmp_tasks have lock the rwlock,and another cpu want the rwlock to write. Thanks!!!
>
> 1) awbctrl-3361 => kworker-3331
> ------------------------------------------
>
>
> 1) | p_cmp_tasks [sidkm]() {
> 1) ==========> |
> 1) | gic_handle_irq() {
> 1) | handle_IPI() {
> 1) | irq_enter() {
> 1) 0.808 us | rcu_irq_enter();
> 1) 0.230 us | preempt_count_add();
> 1) 6.307 us | }
> 1) | __wake_up() {
> 1) | __wake_up_common_lock() {
> 1) | _raw_spin_lock_irqsave() {
> 1) 0.539 us | preempt_count_add();
> 1) 0.307 us | do_raw_spin_lock();
> 1) 4.731 us | }
> 1) | __wake_up_common() {
> 1) | autoremove_wake_function() {
> 1) | default_wake_function() {
> 1) | try_to_wake_up() {
> 1) | _raw_spin_lock_irqsave() {
> 1) 0.230 us | preempt_count_add();
> 1) 0.462 us | do_raw_spin_lock();
> 1) 4.461 us | }
> 1) | select_task_rq_fair() {
> 1) 0.231 us | __rcu_read_lock();
> 1) 0.270 us | idle_cpu();
> 1) 0.269 us | target_load();
> 1) 0.269 us | source_load();
> 1) 0.346 us | task_h_load();
> 1) 0.231 us | idle_cpu();
> 1) 0.385 us | idle_cpu();
> 1) 0.269 us | idle_cpu();
> 1) 0.385 us | idle_cpu();
> 1) 0.230 us | __rcu_read_unlock();
> 1) 0.230 us | __rcu_read_lock();
> 1) 0.230 us | __rcu_read_unlock();
> 1) 0.231 us | nohz_balance_exit_idle();
> 1) + 31.231 us | }
> 1) 0.308 us | cpus_share_cache();
> 1) | _raw_spin_lock() {
> 1) 0.230 us | preempt_count_add();
> 1) 0.231 us | do_raw_spin_lock();
> 1) 4.346 us | }
> 1) 0.423 us | update_rq_clock();
> 1) | ttwu_do_activate() {
> 1) | activate_task() {
> 1) | psi_task_change() {
> 1) 0.539 us | record_times();
> 1) 3.154 us | }
> 1) | enqueue_task_fair() {
> 1) | update_curr() {
> 1) 0.269 us | update_min_vruntime();
> 1) | cpuacct_charge() {
> 1) 0.577 us | __rcu_read_lock();
> 1) 0.231 us | __rcu_read_unlock();
> 1) 5.346 us | }
> 1) 9.885 us | }
> 1) 0.346 us | __update_load_avg_se();
> 1) 0.385 us | __update_load_avg_cfs_rq();
> 1) 0.231 us | update_cfs_shares();
> 1) 0.346 us | account_entity_enqueue();
> 1) 0.269 us | check_spread();
> 1) 0.231 us | __rcu_read_lock();
> 1) 0.231 us | __rcu_read_unlock();
> 1) 0.231 us | hrtick_update();
> 1) + 30.462 us | }
> 1) + 37.692 us | }
> 1) | optimized_callback() {
> 1) | opt_pre_handler() {
> 1) | pre_handler_kretprobe() {
> 1) | _raw_spin_lock_irqsave() {
> 1) 0.231 us | preempt_count_add();
> 1) 0.461 us | do_raw_spin_lock();
> 1) 4.693 us | } /* _raw_spin_lock_irqsave */
> 1) | _raw_spin_unlock_irqrestore() {
> 1) 0.307 us | do_raw_spin_unlock();
> 1) 0.270 us | preempt_count_sub();
> 1) 4.461 us | }
> 1) | p_ttwu_do_wakeup_entry [sidkm]() {
> 1) | _raw_read_trylock() {
> 1) 0.231 us | preempt_count_add();
> 1) 0.539 us | do_raw_read_trylock();
> 1) 4.769 us | }
> 1) | p_ed_validate_from_running [sidkm]() {
> 1) | p_validate_task_from_running [sidkm]() {
> 1) 0.231 us | __rcu_read_lock();
> 1) 0.538 us | p_rb_find_ed_pid [sidkm]();
> 1) | p_cmp_tasks [sidkm]() {
> 1) 0.577 us | p_ed_pcfi_validate_sp [sidkm]();
> 1) | p_cmp_creds [sidkm]() {
>
>
>
>
>
>
>
> thanks and best regards
> ethan
>
>
>
>
>
>
>
>
> At 2021-07-16 01:39:45, "Adam Zabrocki" <pi3@....com.pl> wrote:
> >Can you try LKRG from git TOT ?
> >
> >On Thu, Jul 15, 2021 at 08:52:49PM +0800, youyan wrote:
> >> Hi all
> >> I am sorry ,do not notice picture can not direct dispaly on mail list。I also describe it in words.
> >> cpu0 cpu1 wait for the lock ,which is holded on cpu2.
> >> cpu2 wait kretprobe_table_locks[hash].lock which is hold cpu3
> >> cpu3 wait for the p_rb_hash[i].p_lock.lock.
> >> the value of p_rb_hash[i].p_lock.lock is 0x01. 0x01 also mean this lock is holded throuh read lock.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 在 2021-07-15 20:20:50,"youyan" <hyouyan@....com> 写道:
> >>
> >> Hi all
> >> I met a deadlock issue, p_rb_hash[i].p_lock.lock is not unlocked. lkrg version is 0.8, software is android 10 ,hardware is unisoc SL8541E。
> >> fellow picture is trace32 stack callback and register。
> >> 1:cpu 0
> >>
> >>
> >>
> >>
> >> 2:cpu1
> >> 3:cpu 2
> >> 4:cpu3
> >>
> >>
> >> Above situation,I think where use read_lock for p_rb_hash[i].p_lock.lock ,but not unlock.Or after lock,there is some code may cause schedule. Go throuh lkrg code, I can not find this situation code.
> >> Repeating this issue need at least two weeks.
> >> Have anybody met this similar issue??
> >>
> >>
> >> thanks and best regards
> >> ethan
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> >--
> >pi3 (pi3ki31ny) - pi3 (at) itsec pl
> >http://pi3.com.pl
--
pi3 (pi3ki31ny) - pi3 (at) itsec pl
http://pi3.com.pl
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.