john-dev - Re: LWS and GWS auto-tuning

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2b66fc15a5faaffd97961028ee3ded51@smtp.hushmail.com>
Date: Wed, 26 Aug 2015 20:56:18 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: LWS and GWS auto-tuning

On 2015-08-26 08:59, Solar Designer wrote:
> What LWS is being used for the first pass at GWS tuning now?  Where is
> it set in code now?

Unless set earlier, it's set in opencl-autotune.h under 'if 
(need_best_lws)'. Latest commit put some alternatives there, in a chain 
of #if's.

> I think we should add reporting of the tentative LWS to the "Calculating
> best global worksize (GWS)" lines.

Done. What we still don't know (and never will) is what was actually 
used when we used an LWS of 0 (which translates to sending a NULL 
pointer, which is special).

> If the tuned LWS happens to be the same as what was used during the
> first pass at GWS tuning, then the second pass should start where the
> first pass left off (only test higher GWS than the first pass reached).

Unless it was 0, yes. I might look into that.

> Maybe the initial LWS should be based on
> ocl_device_list[sequence_nr].cores_per_MP unless a given format requests
> otherwise - e.g., bcrypt-opencl would, for specific GPUs.

Current code just queries CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE 
and will pick 32 for both nvidia and AMD. It picks (under OSX driver) 1 
for my CPU and 8 for my Intel HD4000.

> BTW, bcrypt-opencl now bypasses auto-tuning.  Maybe it shouldn't (except
> for exact GPUs it's fully aware of), but should instead provide hints.
> The same probably applies to many other formats.

I think it auto-tunes: Most of Sayantan's formats implement their own 
auto-tune, often without a hint about what's going on.

> How do other password crackers approach this issue?  For example, I
> don't recall hearing of oclHashcat doing any auto-tuning.  In cryptocoin
> miners, there's an "intensity" setting, which I guess adjusts GWS.
> IIRC, oclHashcat has something like it too.  But I think these programs
> use some nearly-optimal settings even when the user hasn't increased the
> default intensity - so how do they manage?

I think oclHashcat simply has hard-coded figures for every device/device 
class (btw it even has a precompiled kernel for every device class 
though they might be built from much fewer source files). I'm not 
interested in going that way at all. Despite we're not anywhere near 
perfection, I wouldn't be surprised if we're competing well when it 
comes to *generic* shared code for auto-tuning a wide range of devices 
and kernels with very different properties. Especially since one of our 
requirements is speed (as in quick tuning).

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.