Date: Fri, 09 Nov 2012 20:00:50 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Split kernel for OpenCL WPA-PSK Solar, how about upgrading Bull to the driver versions recommended by HashCat? It's Catalyst 12.8 and nvidia 304.32. Perhaps you could opt to use the versions (if not the actual packages) supplied with Ubuntu 12.10: Catalyst 12.9 and nvidia 304.43. See below. On 11/08/2012 08:09 PM, magnum wrote: > On 8 Nov, 2012, at 19:12 , magnum <john.magnum@...hmail.com> wrote: >> Using device 0: Tahiti >> Local worksize (LWS) 192, Global worksize (GWS) 196608 >> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE >> Raw: 66197 c/s real, 137970 c/s virtual > >> This code too does over 2.1 billion SHA1/second, but CPU post-processing nearly halves the speed (without OMP). So I'm in the process of moving all of that post-processing to GPU. It's just a couple HMACs more, so I hope to exceed 120K c/s with that in place. > > Lol, while digging into that post processing, I found out that the (CPU side) prf_512() function of wpapsk.h did four times more work than needed. It produced an 80 byte key of which only 16 bytes was needed. Just with this fix, the Tahiti figure went up another 35%: > > Using device 0: Tahiti > Local worksize (LWS) 256, Global worksize (GWS) 262144 > Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE > Raw: 89164 c/s real, 296207 c/s virtual > > This will affect CUDA too. Still, I'm proceeeding with implementing all of that post-processing on GPU. Done, but not committed yet. It now does NO post processing on CPU: Using device 0: Tahiti Local worksize (LWS) 128, Global worksize (GWS) 262144 Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE Raw: 129453 c/s real, 52428K c/s virtual This is a tad faster than HashCat unless Atom has tweaked it since the figure I found (otherwise he will now... OTOH I'm not done yet =). Jumbo-7 does 42723 c/s out of the box. However, I now hit another nvidia bug (on the old 295.49): Using device 0: GeForce GTX 570 Compilation log: ptxas application ptx input, line 11; error : Module-scoped variables in .local state space are not allowed with ABI ptxas fatal : Ptx assembly aborted due to errors Error building kernel. Returned build code: -42. DEVICE_INFO=130 OpenCL error (CL_INVALID_BINARY) in file (common-opencl.c) at line (151) - (clBuildProgram failed.) It works fine with Fermi and Kepler with other versions of the driver, and even with the *same* version of the driver but using a 9600GT. It also works fine with AMD or Intel CPU drivers. I get the same error with GPG after some performance tweaks. Haven't found a workaround yet. I'm SICK of chasing driver bugs. Perhaps I should learn CUDA instead. Where do I start? If I only could figure out how to compile clcc on Bull. I have no chance to look at the referenced Ptx assembly. Or is there a way to keep temp files? magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.