Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 26 Aug 2015 11:33:19 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: LWS and GWS auto-tuning

On Tue, Aug 25, 2015 at 08:36:44PM +0200, magnum wrote:
> Worst/best 10 for Tahiti (oldoffice failing):

> Ratio:	0.90959 real, 0.76098 virtual	descrypt-opencl, traditional 
> crypt(3):Only one salt
> Ratio:	0.93767 real, 0.93587 virtual	strip-opencl, STRIP Password 
> Manager:Raw
> Ratio:	0.94268 real, 1.06452 virtual	ssha-opencl, Netscape LDAP 
> {SSHA}:Many salts
> Ratio:	0.94619 real, 1.00000 virtual	sha256crypt-opencl, crypt(3) $5$ 
> (rounds=5000):Raw

Out of these, the first 3 remain the same for me, and sha256crypt-opencl
actually gets auto-tuned better now.

Old:

[solar@...er run]$ time ./john -test -form=sha256crypt-opencl -v=4
Device 0: Tahiti [AMD Radeon HD 7900 Series]
Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... Calculating best global worksize (GWS); max. 4s total for crypt_all()
gws:      1024        1889 c/s     9445000 rounds/s 541.875ms per crypt_all()!
gws:      2048        4931 c/s    24655000 rounds/s 415.304ms per crypt_all()!
gws:      4096        8352 c/s    41760000 rounds/s 490.413ms per crypt_all()+
gws:      8192       12427 c/s    62135000 rounds/s 659.195ms per crypt_all()+
gws:     16384       22380 c/s   111900000 rounds/s 732.058ms per crypt_all()+
gws:     32768       22443 c/s   112215000 rounds/s    1.459s per crypt_all()
gws:     65536       28263 c/s   141315000 rounds/s    2.318s per crypt_all()+
gws:    131072       29923 c/s   149615000 rounds/s    4.380s per crypt_all() - too slow
Max local worksize 256, Local worksize (LWS) 256, global worksize (GWS) 65536
DONE
Speed for cost 1 (iteration count) of 5000
Raw:    25700 c/s real, 2184K c/s virtual


real    0m12.138s
user    0m0.396s
sys     0m0.751s

New:

[solar@...er run]$ time ./john -test -form=sha256crypt-opencl -v=4
Device 0: Tahiti [AMD Radeon HD 7900 Series]
Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... Calculating best global worksize (GWS); max. 2s total for crypt_all()
gws:      1024        1788 c/s     8940000 rounds/s 572.529ms per crypt_all()!
gws:      2048        4928 c/s    24640000 rounds/s 415.537ms per crypt_all()!
gws:      4096        8350 c/s    41750000 rounds/s 490.537ms per crypt_all()+
gws:      8192       12418 c/s    62090000 rounds/s 659.676ms per crypt_all()+
gws:     16384       22401 c/s   112005000 rounds/s 731.391ms per crypt_all()+
gws:     32768       22392 c/s   111960000 rounds/s    1.463s per crypt_all()
gws:     65536       27499 c/s   137495000 rounds/s    2.383s per crypt_all() - too slow
Calculating best local worksize (LWS)
Testing LWS=64 GWS=16384 ... 50.580ms+
Testing LWS=128 GWS=16384 ... 50.796ms
Testing LWS=192 GWS=16320 ... 50.181ms+
Testing LWS=256 GWS=16384 ... 50.658ms
Calculating best global worksize (GWS); max. 4s total for crypt_all()
gws:      6144       10079 c/s    50395000 rounds/s 609.575ms per crypt_all()!
gws:     12288       18821 c/s    94105000 rounds/s 652.854ms per crypt_all()+
gws:     24576       29143 c/s   145715000 rounds/s 843.278ms per crypt_all()+
gws:     49152       30982 c/s   154910000 rounds/s    1.586s per crypt_all()+
gws:     98304       34272 c/s   171360000 rounds/s    2.868s per crypt_all()+
gws:    196608       36327 c/s   181635000 rounds/s    5.412s per crypt_all() - too slow
Local worksize (LWS) 192, global worksize (GWS) 98304
DONE
Speed for cost 1 (iteration count) of 5000
Raw:    27459 c/s real, 3276K c/s virtual


real    0m14.208s
user    0m0.475s
sys     0m0.865s

but the differences during LWS tuning might be too small to be reliable.
In fact, another run does show worse auto-tuning:

[solar@...er run]$ time ./john -test -form=sha256crypt-opencl -v=4
Device 0: Tahiti [AMD Radeon HD 7900 Series]
Benchmarking: sha256crypt-opencl, crypt(3) $5$ (rounds=5000) [SHA256 OpenCL]... Calculating best global worksize (GWS); max. 2s total for crypt_all()
gws:      1024        1993 c/s     9965000 rounds/s 513.606ms per crypt_all()!
gws:      2048        4918 c/s    24590000 rounds/s 416.360ms per crypt_all()!
gws:      4096        8346 c/s    41730000 rounds/s 490.734ms per crypt_all()+
gws:      8192       12419 c/s    62095000 rounds/s 659.625ms per crypt_all()+
gws:     16384       22365 c/s   111825000 rounds/s 732.555ms per crypt_all()+
gws:     32768       22631 c/s   113155000 rounds/s    1.447s per crypt_all()+
gws:     65536       28402 c/s   142010000 rounds/s    2.307s per crypt_all() - too slow
Calculating best local worksize (LWS)
Testing LWS=64 GWS=32768 ... 76.825ms+
Testing LWS=128 GWS=32768 ... 76.901ms
Testing LWS=192 GWS=32640 ... 80.392ms
Testing LWS=256 GWS=32768 ... 77.425ms
Calculating best global worksize (GWS); max. 4s total for crypt_all()
gws:      2048        5334 c/s    26670000 rounds/s 383.886ms per crypt_all()!
gws:      4096        8862 c/s    44310000 rounds/s 462.146ms per crypt_all()+
gws:      8192       12420 c/s    62100000 rounds/s 659.536ms per crypt_all()+
gws:     16384       22905 c/s   114525000 rounds/s 715.285ms per crypt_all()+
gws:     32768       24439 c/s   122195000 rounds/s    1.340s per crypt_all()+
gws:     65536       30948 c/s   154740000 rounds/s    2.117s per crypt_all()+
gws:    131072       34028 c/s   170140000 rounds/s    3.851s per crypt_all()+
gws:    262144       38105 c/s   190525000 rounds/s    6.879s per crypt_all() - too slow
Local worksize (LWS) 64, global worksize (GWS) 131072
DONE
Speed for cost 1 (iteration count) of 5000
Raw:    24317 c/s real, 4369K c/s virtual


real    0m18.643s
user    0m0.517s
sys     0m0.876s

I think starting with a better (queried) LWS for the first pass at GWS
tuning would prevent this.

BTW, why are the c/s rates reported during auto-tuning so different from
the final ones here?

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ