john-dev - Re: PHC: Lyra2 on GPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHUM3AA_zfnbuQ3LS40r8xov3sD+qdBsq31xCKY2C-uKng@mail.gmail.com>
Date: Mon, 6 Jul 2015 16:56:11 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on GPU

2015-07-05 9:53 GMT+02:00 Solar Designer <solar@...nwall.com>:
> Agnieszka,
>
> On Sat, Jul 04, 2015 at 02:04:26AM +0200, Agnieszka Bielec wrote:
>> my optimizations are based on transfer one table to local memory and
>> copying small portions of global memory into local buffers, I didn't
>> saw any sense i coalescing and I didn't tried it
>
> Please also try going in the opposite direction: keep more stuff in
> global memory, reduce use of local memory per instance to the point
> where you can use a lot higher GWS - like 20480 (10x higher than what's
> auto-tuned now) or even higher.  This may result in a speedup through
> hiding of global memory access latencies due to the greater concurrency.

it's my first version, I'm including results for costs 16 16, 1 20 and
1 28. benchmarking doesn't work good in my old version and I'm setting
GWS manually, note that I'm getting CL_INVALID_BUFFER_SIZE for
GWS=8192 and cost 16 16. it's 3GB.
I said that I'm using local memory but I wanted to say __private ,
sorry if caused confusion

 [a@...er run]$ ./john --test --format=lyra2-opencl --cost=16:16,16:16
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 384.00 kB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 16, cost 2 (m) of 16, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    1932 c/s real, 51200 c/s virtual


[a@...er run]$ GWS=1024 ./john --test --format=lyra2-old-pencl
--cost=16:16,16:16
Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development
use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 384.00 kB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 16, cost 2 (m) of 16, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    769 c/s real, 34133 c/s virtual


GWS=8192 ./john --test --format=lyra2-old-pencl --cost=16:16,16:16
Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development
use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 384.00 kB
OpenCL error (CL_INVALID_BUFFER_SIZE) in file
(opencl_lyra2_old_fmt_plug.c) at line (170) - (Error creating device
buffer)


[a@...er run]$ ./john --test --format=lyra2-opencl --cost=1:1,20:20
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 480.00 kB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    9660 c/s real, 78769 c/s virtual

[a@...er run]$ ./john --test --format=lyra2-old-pencl --cost=1:1,20:20
Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development
use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 480.00 kB
Local worksize (LWS) 64, global worksize (GWS) 256
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    1969 c/s real, 51200 c/s virtual

[a@...er run]$ GWS=512 ./john --test --format=lyra2-old-pencl --cost=1:1,20:20
Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development
use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 480.00 kB
Local worksize (LWS) 64, global worksize (GWS) 512
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    3318 c/s real, 51200 c/s virtual

[a@...er run]$ GWS=1024 ./john --test --format=lyra2-old-pencl --cost=1:1,20:20
Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development
use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 480.00 kB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    3938 c/s real, 51200 c/s virtual

[a@...er run]$ GWS=2048 ./john --test --format=lyra2-old-pencl --cost=1:1,20:20
Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development
use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 480.00 kB
Local worksize (LWS) 64, global worksize (GWS) 2048
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 20, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    2178 c/s real, 51200 c/s virtual


[a@...er run]$ ./john --test --format=lyra2-opencl --cost=1:1,28:28
Benchmarking: Lyra2-opencl [Lyra2 OpenCL (inefficient, development use
only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 672.00 kB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 28, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    7123 c/s real, 51200 c/s virtual


[a@...er run]$ GWS=1024 ./john --test --format=lyra2-old-pencl --cost=1:1,28:28
Benchmarking: Lyra2-old-pencl [Lyra2 OpenCL (inefficient, development
use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
memory per hash : 672.00 kB
Local worksize (LWS) 64, global worksize (GWS) 1024
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 28, cost 3 (c) of 256, cost 4 (p) of 2
Raw:    2718 c/s real, 51200 c/s virtual
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.