Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 4 Jul 2015 12:54:20 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on GPU

On Sat, Jul 04, 2015 at 02:04:26AM +0200, Agnieszka Bielec wrote:
> I received results:
> 
> [a@...er run]$ ./john --test --format=lyra2-opencl --dev=5
> Benchmarking: Lyra2-opencl, Lyra2 [Lyra2 Sponge OpenCL (inefficient,
> development use only)]... Device 5: GeForce GTX TITAN
> Local worksize (LWS) 64, global worksize (GWS) 2048
> DONE
> Speed for cost 1 (t) of 8, cost 2 (m) of 8, cost 3 (c) of 256, cost 4 (p) of 2
> Raw:    6023 c/s real, 5965 c/s virtual
> 
> [a@...er run]$ ./john --test --format=lyra2-opencl
> Benchmarking: Lyra2-opencl, Lyra2 [Lyra2 Sponge OpenCL (inefficient,
> development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
> Local worksize (LWS) 64, global worksize (GWS) 2048
> DONE
> Speed for cost 1 (t) of 8, cost 2 (m) of 8, cost 3 (c) of 256, cost 4 (p) of 2
> Raw:    7447 c/s real, 51200 c/s virtual
> 
> before optimizations speed was equal to 1k

Cool.  And these are much better than what you were getting with Lyra2
authors' CUDA code, right?

Are these higher speeds reproducible on actual cracking runs?  Please test.

> my optimizations are based on transfer one table to local memory and
> copying small portions of global memory into local buffers, I didn't
> saw any sense i coalescing and I didn't tried it

OK.

Is the "copying small portions of global memory into local buffers" like
prefetching?  Or are those small portions more frequently accessed than
the rest?  In other words, why is this optimization effective for Lyra2?

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ