Date: Mon, 24 Aug 2015 21:09:14 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: interleaving on GPUs On 2015-08-24 09:08, magnum wrote: > On 2015-08-24 04:44, Solar Designer wrote: >> On Sun, Aug 23, 2015 at 11:19:08PM +0200, magnum wrote: (...) >>> So there is indeed a speedup for MD4 and MD5 but not for SHA-1 in >>> this case. >> >> Cool! If the loss on your laptop for MD4 and MD5 is less than the gain >> on TITAN, then can we make this the default? > > I'll have a look at that. Since we see a significant loss for SHA-1 it > will be per format. PBKDF2-HMAC-MD4/5 are contrived ones. We should add > vector support to more formats. I did some more tests and ended up enabling 2x vector for Kepler for PBKDF2-HMAC-MD4/5, NTLMv2 and sha1crypt. All others (of the already vector capable) got worse performance: Ratio: 0.55688 real, 0.53111 virtual office2013-opencl, MS Office 2013 (100,000 iterations):Raw Ratio: 0.73330 real, 0.51310 virtual encfs-opencl, EncFS:Raw Ratio: 0.83659 real, 0.94903 virtual krb5pa-sha1-opencl, Kerberos 5 AS-REQ Pre-Auth etype 17/18:Raw Ratio: 0.84023 real, 0.88025 virtual RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Many salts Ratio: 0.84828 real, 0.84460 virtual PBKDF2-HMAC-SHA1-opencl:Raw Ratio: 0.88036 real, 0.88123 virtual wpapsk-opencl, WPA/WPA2 PSK:Raw Ratio: 0.90116 real, 0.91801 virtual RAKP-opencl, IPMI 2.0 RAKP (RMCP+):Only one salt Ratio: 0.94701 real, 0.90975 virtual office2007-opencl, MS Office 2007 (50,000 iterations):Raw Ratio: 0.96478 real, 1.05856 virtual office2010-opencl, MS Office 2010 (100,000 iterations):Raw Ratio: 1.00629 real, 1.02604 virtual ntlmv2-opencl, NTLMv2 C/R:Only one salt Ratio: 1.07017 real, 1.05947 virtual ntlmv2-opencl, NTLMv2 C/R:Many salts Ratio: 1.13467 real, 1.14177 virtual PBKDF2-HMAC-MD4-opencl:Raw Ratio: 1.16411 real, 1.17945 virtual PBKDF2-HMAC-MD5-opencl:Raw Ratio: 1.25004 real, 1.25691 virtual sha1crypt-opencl, (NetBSD):Raw Curiously enough, PBKDF*1*-HMAC-SHA-1 (sha1crypt) shows a 25% boost while all other iterated SHA-1 formats show regression, including PBKDF2. The theoretical difference is miniscule (an xor missing in PBKDF1) but in these implementations it also affects register pressure due to less state needing to be recorded (for split kernels). I wonder if I can optimize that shared PBKDF2 a little. I guess this also suggests we should implement vectorizing for raw MD4, MD5 and SHA-1. And who knows, maybe SHA-256 too. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.