john-dev - OpenCL KPC and LWS [was: Recent github patches]

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <692e4c781efd06ec7db9214b5edaa3ed@smtp.hushmail.com>
Date: Tue, 21 Feb 2012 20:49:06 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: OpenCL KPC and LWS [was: Recent github patches]

On 02/20/2012 11:39 PM, Samuele Giovanni Tonon wrote:
> On 02/20/12 20:38, magnum wrote:
>> Anyway, I tried find_best_kpc and it picks very small numbers (like
>> 69632) and end up a lot slower than just going with the default 2M. I
>> also tried manually setting 4M and that worked fine and was faster than
>> 2M. Maybe the find_best could be enhanced somehow.
> 
> this is quite strange, find_best_kpc should get the faster KPC no matter
> what,  could you give me some details:
> format used, GPU card, LWS and some output ?

Here's some output. This is ssha on a GTX 280 but I saw similar issues
with the 9600GT as well as with other formats.

First, a default run:

$ ../run/john -test -form:ssha-opencl
OpenCL Platforms: 1
OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
GTX 280>>>
Compilation log:

Max Group Work Size 512 Optimal local work size 32
(to avoid this test on next run do export LWS=32)
Local work size (LWS) 32, Keys per crypt (KPC) 2097152
Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
Many salts:     54973K c/s real, 27486K c/s virtual
Only one salt:  32896K c/s real, 16448K c/s virtual

OK, so it picked LWS 32 and the default KPC is 2M. Then I ask for
auto-tuning KPC:

$ KPC=0 ../run/john -test -form:ssha-opencl
OpenCL Platforms: 1
OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
GTX 280>>>
Compilation log:

Max Group Work Size 512 Optimal local work size 32
(to avoid this test on next run do export LWS=32)
Calculating best keys per crypt, this will take a while Optimal keys per
crypt 98304
(to avoid this test on next run do export KPC=98304)
Local work size (LWS) 32, Keys per crypt (KPC) 98304
Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
Many salts:     45907K c/s real, 23542K c/s virtual
Only one salt:  25657K c/s real, 12958K c/s virtual

It picks a very low number and performance drops. Now I try manually
setting KPC to 1.5M instead:

$ KPC=$((3<<19)) ../run/john -test -form:ssha-opencl
OpenCL Platforms: 1
OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
GTX 280>>>
Compilation log:

Max Group Work Size 512 Optimal local work size 64
(to avoid this test on next run do export LWS=64)
Local work size (LWS) 64, Keys per crypt (KPC) 1572864
Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
Many salts:     59177K c/s real, 29155K c/s virtual
Only one salt:  34603K c/s real, 17215K c/s virtual

This is ~10% faster than the 2M above BUT note that LWS happened to end
up as 64 this time. Running the exact same command a couple of times, I
sometimes get the following instead:

$ KPC=$((3<<19)) ../run/john -test -form:ssha-opencl
OpenCL Platforms: 1
OpenCL Platform: <<<NVIDIA CUDA>>> 1 device(s), using device: <<<GeForce
GTX 280>>>
Compilation log:

Max Group Work Size 512 Optimal local work size 32
(to avoid this test on next run do export LWS=32)
Local work size (LWS) 32, Keys per crypt (KPC) 1572864
Benchmarking: Netscape LDAP SSHA OPENCL [salted SHA-1]... DONE
Many salts:     53970K c/s real, 26853K c/s virtual
Only one salt:  32955K c/s real, 16399K c/s virtual

Here, LWS was 32 and performance was worse than with KPC=2M.

So the main issue is that auto KPC does not pick a good number. The LWS
fluctuations might be due to normal variations between runs. I should
have recorded the figures for KPC=2M and LWS=64 but I missed that.

magnum
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.