Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 28 Jan 2013 08:38:14 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Proposed optimizations to pwsafe

On 28 Jan, 2013, at 2:09 , Brian Wallace <nightstrike9809@...il.com> wrote:

> When I applied the opencl optimization, I only saw minor improvements compared to the CUDA improvements.  I found that was kind of weird, because it was basically the same changes to the code.


This is not exactly related to your changes but I have always had problems with #pragma unroll on nvidia. It does not seem to have any effect. For current pwsafe OpenCL kernel, I get the same size binary if I comment out the "#pragma unroll". And if I don't, but instead add a "#pragma OPENCL EXTENSION cl_nv_pragma_unroll : enable" there is still no difference in size. That pragma is just a request, maybe the compiler opta to ignore it in this case (and all other cases I've tried) and/or are unrolling even without the pragma.

Oh, I just tried and the same applies to pwsafe-cuda. If I remove the #pragma unrolls I get the same speed.


On another note, you might notice the OpenCL format self-tests a lot faster than CUDA even when using same workgroup sizes. It's 15 seconds vs. a minute and a half(!) on my laptop - despite the OpenCL format runs device tuning in that time. This is because of the first line in crypt_all() which adjust global work size down to the count argument. I think we do this in all but Sayantan's OpenCL formats. The same should be done to all CUDA formats if possible but I'm not sure how to do it. If you can fix this, it would be great. This should also come with decreasing MIN_KEYS_PER_CRYPT to THREADS. In addition to speeding up initialization a lot, it helps Single mode.

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ