Date: Sun, 23 Aug 2015 08:53:05 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU Agnieszka, There might also be room for improvement of Argon2 performance on GPUs through special handling of BLAKE2b's 64-bit operations. See: http://hashcat.net/forum/archive/index.php?thread-3422.html "All the 64-bit based algorithms like SHA512, Keccak etc dropped in performance with each new driver a little bit. So it was hard to notice. GPUs instructions operate still on 32-bit only, so the 64-bit mode is emulated. But the way how it is emulated was somehow broken. I was able to pinpoint the problem where the biggest drop came from and I managed to workaround it. For NVidia it took me a little PTX hack, for AMD luckily there was no binary hack required." Unfortunately, atom doesn't go into further detail there (but we could try asking him). I guess the approach amounts to explicitly building 64-bit addition out of 32-bit additions. Maybe having it split like that right away (rather than only in the PTX or IL to ISA translation) is somehow friendlier to current compilers. I guess this is part of why oclHashcat is faster than JtR at SHA-512 based hashes (per further announcements, oclHashcat's performance at those has been improved way further since that old forum posting above). In a vectorized kernel, we'd switch from ulong2 to uint4. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.