Date: Thu, 8 Oct 2015 00:37:44 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Cc: Roman Rusakov <rusakovster@...il.com>, deeplearningjohndoe@...il.com Subject: Re: nVidia Maxwell support (especially descrypt)? DeepLearningJohnDoe - thank you for your work in this area, and we'd appreciate any comments you might have on the below. On Wed, Oct 07, 2015 at 06:54:20PM +0200, magnum wrote: > >On Wed, Oct 7, 2015 at 8:44 AM, Solar Designer <solar@...nwall.com> wrote: > >>And of course we'll also need to include some LOP3.LUT S-boxes. > >>If Roman's are still unreleased (except for S4), then Janet's. [...] > I implemeted this in 9c82bcc, using DeepLearningJohnDoes's (a.k.a > Janet's) S-boxes except for s4. Are you getting better speeds with Roman's S4? > Boost appears to be in the order of 10% for LM, 20% for DES. Confirmed, on Titan X against the same 10 descrypt hashes (10 different salts) as yesterday: 0g 0:00:03:10 2.04% (ETA: 02:51:06) 0g/s 22303Kp/s 226641Kc/s 226641KC/s GPU:67C util:100% fan:26% aacxytna..aacxytna This is now roughly same speed as Tahiti. Titan X got to be better than that. Maybe that split of the S-box "lookups" across 4 work-items is key to better performance (more work done per registers consumed). Sayantan, please look into that. I'd run on many more salts to reduce the key setup overhead, but then the kernel build time becomes large and distorts the reported c/s figures too much for quick runs like this. Maybe we need to reset the timer to zero once the kernels are built, or/and maybe we need to add computation and reporting of instantaneous speeds (not just the all-time averages). > Is there any special place to look for more of Romans's work? No. We need to ask Roman. I sort of just did, by CC'ing him. BTW, our current opencl_sboxes.h defaults to using nonstd.c derived expressions when !HAVE_LUT3. Maybe it should also have an option for using sboxes-s.c derived expressions, which are supposed to be faster on AMD GPUs. > BTW we now also use LOP3.LUT for many MD4, MD5 and SHA-2 OpenCL formats. > Some driver bug prevented me for using it in SHA-1 with nvidia 352.39 > (the code is there, just disabled) and md5crypt disable it because of > performance regression (still to be investigated). Some formats show a > fine boost but none as much as DEScrypt. ... with our guess on why lower boost being that LOP3.LUT was often used anyway, introduced in the PTX to ISA translation. Thank you all for working on this! Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.