Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 12 Nov 2012 19:50:58 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Clear keys

On 12 Nov, 2012, at 13:33 , Claudio André <claudioandre.br@...il.com> wrote:
> Hi, i tried to use clear_keys (as in commit ba10ced)
> 
> But it hurts performance. Did i understand anything wrong?

Probably not. I was going to bring this up too. Many parameters are involved.

I did this for ntlmv2-opencl as it's running at speeds where this has significance (currently ~ 18M c/s at best).  My original code did a 64-byte blind memset per key, first thing in set_key(). I changed it to memsetting the whole array at once in clear_keys() and it got a little faster on all hardware I have tested. But not a lot faster. I guess the original method makes better use of the cache (after that memset, cache is warm for the succeding uincode translation) while the huge memset is more effective in terms of SSE/AVX/etc optimized code.

Anyway, that is a whopping 128 MB memset when running on the 7970 and normally less than 1/3 of each key will actually be dirty. So I thought there ought to be a better way. Just for the hell of it I tried making a trivial clear_keys kernel but no matter how fast that is, the transfer time makes it useless (unless I start juggling with two buffers).

Then I moved the clearing back to set_key() - but in a way that does a 32-bit word at a time and that stops when it reaches already clear memory. This is faster on most gear I've tried (including Bull w/ GTX570) but has a huge impact on the Tahiti (a.k.a 7970) that I really can't explain (speed drops to like 25% of original blind memset) despite it will normally not have to clean more than one word - if that. So for now I settled for the clear_keys() memset. I guess I could adopt to hardware and chose between these methods at runtime.

It could be that the self test vectors happen to produce worse results (very long keys mixed with shorter) with that last experiment than what would be the normal case in real life. I might do more experiments later.

magnum
[ CONTENT OF TYPE text/html SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ