Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 2 Jun 2015 00:18:37 +0200
From: Lukas Odzioba <>
Subject: Re: PHC: Parallel in OpenCL

2015-05-31 13:39 GMT+02:00 Agnieszka Bielec <>:
> none@...e ~/Desktop/parallel/run $ ./john --test --format=parallel-opencl
> Device 0: GeForce GTX 960M
> Many salts:     37236 c/s real, 37236 c/s virtual
> [ run]$ ./john --test --format=parallel-opencl --dev=5
> Device 5: GeForce GTX TITAN
> Many salts:     40206 c/s real, 40454 c/s virtual
> GCN without "add 0" optimization
> [ run]$ ./john --test --format=parallel-opencl --dev=1
> Many salts:     45093 c/s real, 4915K c/s virtual
> GCN with unrolling one loop
> [ run]$ ./john --test --format=parallel-opencl --dev=1
> Many salts:     27536 c/s real, 3276K c/s virtual

On one hand you have great results on mobile gpu, while on the
"proper" ones results are similar.
On the gcn it looks for me like you have hit a local maximum and
without major code reorganization "against the current" it will be
hard to jump out out this hole. Code size might be a major limiting
factor here, so we can try to simplify current code, or split
computations into two separate kernels. Another approach would be
moving some of the initialization code on the host side and this way
limiting the code size.

Optimizing alu operations might not give results because of some other
limitation like memory bandwidth or code cache size, here we can have
both of them. As I said on irc I would try to simply it even if you
don't see results on the first sight.

One more question what was the code size of the very first general

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.