Date: Tue, 1 Feb 2011 10:54:17 +0200 From: Milen Rangelov <gat3way@...il.com> To: john-users@...ts.openwall.com Subject: Re: FreeBSD crypt() / MD5-crypt implementation question Hello, The "andnot" optimization won't work on GPUs as they don't have a PANDN-like instruction in their ISA - neither AMD, nor Nvidia. I've been playing around with this and found out the xor versions to be faster. Speaking about optimizations, on 5xxx ATIs there is a single "leftrotate" instruction (bitalign) that is 3 times faster than the "two-shifts, one or" version. Newer Nvidia hardware has fused SHL+ADD instruction which also helps, but not as much as bitalign on ATI. AMD ISA has BFI_INT which is a single instruction that does practically the exact same thing as the F and G transformations. Unfortunately, AMD did not expose BFI_INT to IL and it's rather useless at present. That would give a nice 15% speed boost for MD5 (perhaps even more if hash reversal is performed, which unfortunately is not possible for md5crypt). On Tue, Feb 1, 2011 at 12:41 AM, Freddie Witherden <freddie@...herden.org>wrote: > On 31/01/11 22:04, Simon Marechal wrote: > > Le 31/01/2011 16:32, Freddie Witherden a écrit : > >> implementation is not ideal (I am reasonably confident that you can get > >> 20% out of it without too much work) it does perform better than the > > > > That would be quite a nice improvement. The MD5 body function is pretty > > tight (performs like barswf, and I know nothing that is faster), but the > > "dispatch" part is far from optimal. According to my last profiling it > > only accounted for 15% of the processing time. I'm not sure where the > > low hanging fruits are in this patch ... > > Interesting, as I've found it to perform quite a bit worse than BarsWF. > In the MD5 code I've been poking about with (which is--essentially--the > same as that in JtR) around ~4% can be gained from 0 input optimisation > for inputs less than 224-bits. This is similar to what BarsWF does, > although far less aggressive, given that MD5 crypt strings tend to be > longer. > > I've also had some luck changing the definitions of the F and G > functions from their xor forms to their andnot forms. (Although this is > quite likely to be dependant on the individual instruction latencies.) > > While I have not really toyed with the dispatch function I gleefully > assumed after a cursory glance that with a bit of TLC a good few percent > could be squeezed out of it. Perhaps I was being a bit too optimistic. > > Polemically yours, Freddie. > >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.