john-dev - Re: OpenCL kernel max running time vs. "ASIC hang"

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABh=JREnNW1vXXqzWs-VKJNtibPra18OKR21nG6z4C-MR1j=eA@mail.gmail.com>
Date: Tue, 26 Jun 2012 18:19:38 +0300
From: Milen Rangelov <gat3way@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: OpenCL kernel max running time vs. "ASIC hang"

Hello Bitweasil,

Generally the same for me - increasing NDRange increases the chance I
trigger an ASIC hang. Also it is a fact I did not trigger it with older
driver versions. Yet even with much smaller NDRanges I can occasionally hit
the problem, it just takes more time to reproduce. And this happens for me
just for a specific type of kernels that involve loops and lots of memory
accesses. In fact I have kernels that may take longer to execute (e.g
sha512 as opposed to "old" zip encryption) but they do not cause ASIC hangs
(at least not noticed any yet).

Well at that moment I am not sure what's the problem and having in mind
what a mess my iterated kernels are, I am more inclined to blame my code. I
hope you are right about execution time though - it would be easy to fix.
Well except for stuff like rar or sha512crypt where reducing ndrange would
lead to appaling occupancy and the kernel itself would need to be split in
several parts, keeping intermediate data in global memory.

On Tue, Jun 26, 2012 at 5:44 PM, Bitweasil <bitweasil@...il.com> wrote:

> ...was everyone on this list except me? Hi, gw.
>
> From what I've seen, I don't think writing out of bounds is responsible.
> If I simply change my runtime on my rainbow table generate kernels, I can
> trigger the bug or go for days without it happening. The only difference is
> how many steps it runs at a time. It's a very simple kernel, and the hangs
> were happening at the 10-15% done mark - long before it is anywhere near
> the bounds of an array. I've definitely gone romping through memory in the
> past, and I know it causes weird behavior, but my table gen kernels are the
> simplest and oldest kernels I have, and the steps per invocation value
> changes nothing about their memory access.
>
> I'd love to be proved wrong, though! If it were kernel bugs, that's a
> whole lot easier to deal with!
>
>
> On Jun 26, 2012, at 0:02, Milen Rangelov <gat3way@...il.com> wrote:
>
>
> On Tue, Jun 26, 2012 at 2:54 AM, magnum <john.magnum@...hmail.com> wrote:
>
>>  Yes only now, and I was thinking more like 16K rounds x16. Not sure I
>> want to go there unless AMD says somewhere that this is actually a design
>> limit. Maybe they do? I would like to know the exact specified limit.
>>
>
>
> In fact I've tried this and it did not help. At present I am changing my
> whole architecture and refactoring plugins, working currently on "fast"
> ones. So I don't have much time to play with heavyweight kernels like rar
> or wpa. Anyway I am becoming more and more confident that the ASIC hangs
> are actually a result of kernel bugs - e.g out of bounds writes or
> something like that.
>
>

Content of type "text/html" skipped

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.