Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 5 Feb 2021 19:49:18 +0100
From: Adam Zabrocki <pi3@....com.pl>
To: lkrg-users@...ts.openwall.com
Subject: Re: feedback about lkrg porting to qcm2150&SL8541E
 android 10

Hi Ethan,

Thanks for your report and feedback! Please find some of my comments inlined

On Fri, Feb 05, 2021 at 05:06:13PM +0800, youyan wrote:
> Hi admins
> Thanks admins for supporting me porting lkrg to android, specify thanks Adam. After a few months stability test, LKRG already run well on my android device.
> Now, I want to feedback the issue which I met during the period of porting and stability testing, and some fix solution which may not be a good way,just for reference。
> 1:freeze userspace timeout lead to app anr
> (1) At some situation,some thread block all signals(for example qcom TEE driver,use sigprocmask(SIG_SETMASK, &new_sigset, &old_sigset))
> (2) qcom TEE driver must wait qcom TEE user application to send notify to restore the signal(sigprocmask(SIG_SETMASK, &old_sigset, NULL);)
> (3) insmod lkrg module,code run on P_SYM(p_freeze_processes)(). which will freeze qcom TEE user application.
> (4) above situation will lead to freeze processes timeout, timeout time is 20s. However, android anr time is 5s. So frezze timeout will lead some proccess crash.
> 
> 
>     My fix solution:
>      Before freeze processes,start hrtimer. Timer handle will be execed 500ms later. when timer handle  exec will check if freeze processes is sucessful. If not sucessful,cancel freeze processes. 
>       if(freeze_successful==0)
> {
> p_print_log(P_LKRG_CRIT, "freeze proccess has some problem  00...\n");
> pm_system_wakeup();
> hrtimer_forward_now(timer, ms_to_ktime(500));
> ret = HRTIMER_RESTART;
> }
> 

When this situation happens, do you retry freezing or are you bailing out?
Did you consider synchronizing TEE state before enforcing 'freeze' ? E.g. if 
you know when is 'safe' to execute 'freeze' onlt then perform initialization?

> 
>     2: The number of exit threads exceed 40,cause kernel crash
>      (1) When system boot, or system abnormal, at this time a lot of threads exit. 
>      (2)The number of exit threads exceed 40.
>      (3) A thread is running on code do exit(do_exit), at the same time, lkrg is checking all process(p_cmp_tasks),may cause kernel null pointer crash。
> 
> 
>    My fix solution:
>    temporarily increase  kretprobe maxactive。


In fact, I've changed exit() logic. If you have opportunity, can you try the 
latest LKRG from github repo and verify if you have the same issue?

>    3: CONFIG_OPTPROBES=y will lead insmod lkrg module more slowly
>         when kernel config have  CONFIG_OPTPROBES=y, finish insmod lkrg module will need more time.
>    My fix solution:
>         before insmod lkrg,turn off optimization by echo 0 to /proc/sys/debug/kprobes-optimization
> 

Right, optimized kprobes were broken in Linux kernel for some time. We've 
managed to report and fix OPT kprobes in mainline. More about that you can read 
here:

http://blog.pi3.com.pl/?p=831

It is worth to add, if you don't have FTRACE compiled-in, you shouldn't have 
OPT kprobes. I'm not sure if that is smth which is acceptable from your point 
of view.

> 
>     4:Calculate kernel text and ro data, lock irq may lead some thread or interrupt can not process in time
>     Calculate kernel text and ro data need 100ms(more or less) on qcm2150&SL8541E,  lock irq 100ms may lead some thread or interrupt can not process in time
>    My fix solution:
>        temporarily disable Calculate kernel text and ro data.

Would you be able to elaborate how do you do it?

>     5:mutex_lock() lead to kernel report bug crash
>       when kernel config have CONFIG_DEBUG_ATOMIC_SLEEP=y,  schedule on atmoic context may cause kernel report bug crash,for example turn off selinux(setenforce 0).
>     My fix solution:
>       not very good way at this time,write some mutex code by myself, the function just do not have schedule on atmoic context
> 

Thanks for all useful information.
I'm wondering if it is possible to share your diff and maybe some of the 
solutions can be merged to LKRG repo.

Thanks,
Adam


> 
> 
> 
> 
> 
> 
> 
> thanks and best regards
> ethan
> 
> 
>      
> 
> 
>         
>      
> 
> 
> 
> 
> 
> 
> 

-- 
pi3 (pi3ki31ny) - pi3 (at) itsec pl
http://pi3.com.pl

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.