john-dev - Re: PHC: Lyra2 on GPU

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150704095420.GA22777@openwall.com>
Date: Sat, 4 Jul 2015 12:54:20 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on GPU

On Sat, Jul 04, 2015 at 02:04:26AM +0200, Agnieszka Bielec wrote:
> I received results:
> 
> [a@...er run]$ ./john --test --format=lyra2-opencl --dev=5
> Benchmarking: Lyra2-opencl, Lyra2 [Lyra2 Sponge OpenCL (inefficient,
> development use only)]... Device 5: GeForce GTX TITAN
> Local worksize (LWS) 64, global worksize (GWS) 2048
> DONE
> Speed for cost 1 (t) of 8, cost 2 (m) of 8, cost 3 (c) of 256, cost 4 (p) of 2
> Raw:    6023 c/s real, 5965 c/s virtual
> 
> [a@...er run]$ ./john --test --format=lyra2-opencl
> Benchmarking: Lyra2-opencl, Lyra2 [Lyra2 Sponge OpenCL (inefficient,
> development use only)]... Device 0: Tahiti [AMD Radeon HD 7900 Series]
> Local worksize (LWS) 64, global worksize (GWS) 2048
> DONE
> Speed for cost 1 (t) of 8, cost 2 (m) of 8, cost 3 (c) of 256, cost 4 (p) of 2
> Raw:    7447 c/s real, 51200 c/s virtual
> 
> before optimizations speed was equal to 1k

Cool.  And these are much better than what you were getting with Lyra2
authors' CUDA code, right?

Are these higher speeds reproducible on actual cracking runs?  Please test.

> my optimizations are based on transfer one table to local memory and
> copying small portions of global memory into local buffers, I didn't
> saw any sense i coalescing and I didn't tried it

OK.

Is the "copying small portions of global memory into local buffers" like
prefetching?  Or are those small portions more frequently accessed than
the rest?  In other words, why is this optimization effective for Lyra2?

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.