Date: Sun, 21 Jun 2015 00:10:12 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: ignore or limit Idle=Y for non-CPU-only? magnum - what do you think of the below? I think that for now we should simply make Idle=Y ignored when running OpenCL/CUDA formats, just like it is already ignored for OpenMP. On Tue, Jun 02, 2015 at 03:56:14AM +0300, Solar Designer wrote: > magnum, all - > > For a few years now, JtR's default is Idle=Y. This works well when > targeting the host's CPUs only and no synchronization is needed. > > We already have logic in place to ignore Idle=Y when using OpenMP and > the thread count is greater than 1, because in that case yielding CPU > impacts more than just the current thread (it may also unnecessarily > make other threads wait when they reach the end of a parallel region). > > Now, there's a similar issue when targeting non-CPU devices. When we > yield CPU (because there's other demand for CPU on the host, whether > from another instance of JtR or from something else), we additionally > risk having an external device stall waiting for input from that CPU. > > I found that when I use both GPUs and CPUs on a machine at once, with > multiple instances of john, I end up editing john.conf to set Idle=Y in > the CPU-using instances and Idle=N in the GPU using ones. When I forget > to do that, my GPU usage percentage drops. > > Should we possibly ignore Idle=Y when running OpenCL and CUDA formats? > > There's another aspect here, though. When targeting NVIDIA GPUs, we > often end up having a thread busily looping on the CPU. This is > described e.g. here: > > https://devtalk.nvidia.com/default/topic/494659/execute-kernels-without-100-cpu-busy-wait-/ > > an old thread, but I don't know if things have improved since. I've > seen the busy loop issue recently. > > So to avoid wasting that CPU core for potential concurrent CPU-using > instances of john, maybe we can check if the target device is an NVIDIA > card and if so only partially ignore Idle=Y: do invoke nice(20), but > don't use SCHED_IDLE and don't invoke sched_yield(). Unfortunately, > this would still cause some reduction in GPU usage when there's a > concurrent CPU-using john - just not as much reduction as the current > idle.c code causes. > > Or maybe we need to make Idle tri-state? If so, what exactly would the > three states correspond to? > > Ideally, the default would normally not need to be adjusted and would > result in near-optimal behavior. > > Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.