Date: Wed, 26 Aug 2015 20:56:18 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: LWS and GWS auto-tuning On 2015-08-26 08:59, Solar Designer wrote: > What LWS is being used for the first pass at GWS tuning now? Where is > it set in code now? Unless set earlier, it's set in opencl-autotune.h under 'if (need_best_lws)'. Latest commit put some alternatives there, in a chain of #if's. > I think we should add reporting of the tentative LWS to the "Calculating > best global worksize (GWS)" lines. Done. What we still don't know (and never will) is what was actually used when we used an LWS of 0 (which translates to sending a NULL pointer, which is special). > If the tuned LWS happens to be the same as what was used during the > first pass at GWS tuning, then the second pass should start where the > first pass left off (only test higher GWS than the first pass reached). Unless it was 0, yes. I might look into that. > Maybe the initial LWS should be based on > ocl_device_list[sequence_nr].cores_per_MP unless a given format requests > otherwise - e.g., bcrypt-opencl would, for specific GPUs. Current code just queries CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE and will pick 32 for both nvidia and AMD. It picks (under OSX driver) 1 for my CPU and 8 for my Intel HD4000. > BTW, bcrypt-opencl now bypasses auto-tuning. Maybe it shouldn't (except > for exact GPUs it's fully aware of), but should instead provide hints. > The same probably applies to many other formats. I think it auto-tunes: Most of Sayantan's formats implement their own auto-tune, often without a hint about what's going on. > How do other password crackers approach this issue? For example, I > don't recall hearing of oclHashcat doing any auto-tuning. In cryptocoin > miners, there's an "intensity" setting, which I guess adjusts GWS. > IIRC, oclHashcat has something like it too. But I think these programs > use some nearly-optimal settings even when the user hasn't increased the > default intensity - so how do they manage? I think oclHashcat simply has hard-coded figures for every device/device class (btw it even has a precompiled kernel for every device class though they might be built from much fewer source files). I'm not interested in going that way at all. Despite we're not anywhere near perfection, I wouldn't be surprised if we're competing well when it comes to *generic* shared code for auto-tuning a wide range of devices and kernels with very different properties. Especially since one of our requirements is speed (as in quick tuning). magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.