john-dev - Re: OpenCL KPC and LWS [was: Recent github patches]

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4F4413BF.1030704@linuxasylum.net>
Date: Tue, 21 Feb 2012 22:59:27 +0100
From: Samuele Giovanni Tonon <samu@...uxasylum.net>
To: john-dev@...ts.openwall.com
Subject: Re: OpenCL KPC and LWS [was: Recent github patches]

On 02/21/12 20:49, magnum wrote:
> On 02/20/2012 11:39 PM, Samuele Giovanni Tonon wrote:
>> On 02/20/12 20:38, magnum wrote:
>>> Anyway, I tried find_best_kpc and it picks very small numbers (like
>>> 69632) and end up a lot slower than just going with the default 2M. I
>>> also tried manually setting 4M and that worked fine and was faster than
>>> 2M. Maybe the find_best could be enhanced somehow.
>>
>> this is quite strange, find_best_kpc should get the faster KPC no matter
>> what,  could you give me some details:
>> format used, GPU card, LWS and some output ?
> 
> Here's some output. This is ssha on a GTX 280 but I saw similar issues
> with the 9600GT as well as with other formats.
> 
> First, a default run:
> 
> $ ../run/john -test -form:ssha-opencl
> OpenCL Platforms: 1
> OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
> GTX 280>>>
> Compilation log:
> 
> Max Group Work Size 512 Optimal local work size 32
> (to avoid this test on next run do export LWS=32)
> Local work size (LWS) 32, Keys per crypt (KPC) 2097152
> Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
> Many salts:     54973K c/s real, 27486K c/s virtual
> Only one salt:  32896K c/s real, 16448K c/s virtual
> 
> OK, so it picked LWS 32 and the default KPC is 2M. Then I ask for
> auto-tuning KPC:
> 
> $ KPC=0 ../run/john -test -form:ssha-opencl
> OpenCL Platforms: 1
> OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
> GTX 280>>>
> Compilation log:
> 
> Max Group Work Size 512 Optimal local work size 32
> (to avoid this test on next run do export LWS=32)
> Calculating best keys per crypt, this will take a while Optimal keys per
> crypt 98304
> (to avoid this test on next run do export KPC=98304)
> Local work size (LWS) 32, Keys per crypt (KPC) 98304
> Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
> Many salts:     45907K c/s real, 23542K c/s virtual
> Only one salt:  25657K c/s real, 12958K c/s virtual
> 
> It picks a very low number and performance drops. Now I try manually
> setting KPC to 1.5M instead:
> 
> $ KPC=$((3<<19)) ../run/john -test -form:ssha-opencl
> OpenCL Platforms: 1
> OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
> GTX 280>>>
> Compilation log:
> 
> Max Group Work Size 512 Optimal local work size 64
> (to avoid this test on next run do export LWS=64)
> Local work size (LWS) 64, Keys per crypt (KPC) 1572864
> Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
> Many salts:     59177K c/s real, 29155K c/s virtual
> Only one salt:  34603K c/s real, 17215K c/s virtual
> 
> This is ~10% faster than the 2M above BUT note that LWS happened to end
> up as 64 this time. Running the exact same command a couple of times, I
> sometimes get the following instead:
> 
> $ KPC=$((3<<19)) ../run/john -test -form:ssha-opencl
> OpenCL Platforms: 1
> OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
> GTX 280>>>
> Compilation log:
> 
> Max Group Work Size 512 Optimal local work size 32
> (to avoid this test on next run do export LWS=32)
> Local work size (LWS) 32, Keys per crypt (KPC) 1572864
> Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
> Many salts:     53970K c/s real, 26853K c/s virtual
> Only one salt:  32955K c/s real, 16399K c/s virtual
> 
> Here, LWS was 32 and performance was worse than with KPC=2M.
> 
> So the main issue is that auto KPC does not pick a good number. The LWS
> fluctuations might be due to normal variations between runs. I should
> have recorded the figures for KPC=2M and LWS=64 but I missed that.

looks like a chicken-egg problem: when lws is tested i use the default
kpc=2M, when LWS is up i use the best LWS i just detected; luksas
already reported this kind of problem but i thought we were safe since
LWS usually is rather obvious.

i will make some changes to print lws times during a debug session so
you can tell me what are the numbers behind those different LWS.

thanks for the report!

Cheers
Samuele
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.