john-dev - Re: OpenCL kernel max running time vs. "ASIC hang"

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALaL1qA=5CNgu1FLMdRKvKwdmQuiq+7pd0WFmeq1a3pocSc57w@mail.gmail.com>
Date: Mon, 25 Jun 2012 21:14:21 -0700
From: Bit Weasil <bitweasil@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: OpenCL kernel max running time vs. "ASIC hang"

There's an overhead to kernel launches, yes.

It's up to you.  I've not found long kernel launches to be reliable on any
platform.  For CUDA, the kernel-killer kicks in unless the box is headless.
 For OpenCL with AMD cards, it's unreliable due to ASIC hangs.  I've just
not been able to make it work.

Keep it as an option - I can run long-duration kernels on all my code, by
setting the "steps per invocation" to the total number.  I just don't do it
most of the time.

On Mon, Jun 25, 2012 at 7:04 PM, SAYANTAN DATTA <std2048@...il.com> wrote:

>
>
> On Tue, Jun 26, 2012 at 4:57 AM, Solar Designer <solar@...nwall.com>wrote:
>
>> On Tue, Jun 26, 2012 at 01:06:08AM +0200, magnum wrote:
>> > On 2012-06-26 00:27, Solar Designer wrote:
>> > >I discussed this matter with Bit Weasil on IRC a few days ago.
>> > >According to him, we shouldn't be trying to spend more than 200 ms per
>> > >OpenCL kernel invocation, or we'll face random "ASIC hang" issues on
>> AMD
>> [...]
>>
>> > That's not an easy goal with slow formats. For RAR, with 256K rounds of
>> > SHA-1, I currently don't get much below 2000ms on 7790, and that's with
>> > GWS that produces a 40% slower c/s than what we currently use. For best
>> > c/s we exceed 9 seconds. Then again, my code is made by a newbie. Making
>> > it 10x faster would be nice for sure. But even Milen said his RAR kernel
>> > ran for 2-3 seconds a while ago.
>>
>> I understand that reducing the amount of parallelism in a kernel
>> invocation slows things down, but why not reduce the amount of work per
>> kernel invocation by other means - specifically, in your example, why
>> not reduce the number of SHA-1 iterations per kernel invocation?  We may
>> invoke the kernel more than once from one crypt_all() call,
>> sequentially.  For example, the 256k may be achieved by 256 invocations
>> of a kernel doing 1k iterations.  This would bring the 9 seconds down to
>> 35 ms per kernel invocation.  Perhaps the intermediate results can even
>> stay in the GPU between those invocations.
>>
>> Have you considered that?
>>
>> Alexander
>>
>
> Wouldn't calling clEnqueNDRangeKernel too many times will cause a
> performance hit?  What about pausing the execution for some time as
> requested by the user. Say we press 'P' which will pasue the execution so
> that the user can perform the graphic oriented tasks and then resume the
> execution when he is finished.
>
> Regards,
> Sayantan
>
>

Content of type "text/html" skipped

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.