Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 24 Aug 2015 21:09:14 +0200
From: magnum <>
Subject: Re: interleaving on GPUs

On 2015-08-24 09:08, magnum wrote:
> On 2015-08-24 04:44, Solar Designer wrote:
>> On Sun, Aug 23, 2015 at 11:19:08PM +0200, magnum wrote:
>>> So there is indeed a speedup for MD4 and MD5 but not for SHA-1 in
>>> this case.
>> Cool!  If the loss on your laptop for MD4 and MD5 is less than the gain
>> on TITAN, then can we make this the default?
> I'll have a look at that. Since we see a significant loss for SHA-1 it
> will be per format. PBKDF2-HMAC-MD4/5 are contrived ones. We should add
> vector support to more formats.

I did some more tests and ended up enabling 2x vector for Kepler for 
PBKDF2-HMAC-MD4/5, NTLMv2 and sha1crypt. All others (of the already 
vector capable) got worse performance:

Ratio:	0.55688 real, 0.53111 virtual	office2013-opencl, MS Office 2013 
(100,000 iterations):Raw
Ratio:	0.73330 real, 0.51310 virtual	encfs-opencl, EncFS:Raw
Ratio:	0.83659 real, 0.94903 virtual	krb5pa-sha1-opencl, Kerberos 5 
AS-REQ Pre-Auth etype 17/18:Raw
Ratio:	0.84023 real, 0.88025 virtual	RAKP-opencl, IPMI 2.0 RAKP 
(RMCP+):Many salts
Ratio:	0.84828 real, 0.84460 virtual	PBKDF2-HMAC-SHA1-opencl:Raw
Ratio:	0.88036 real, 0.88123 virtual	wpapsk-opencl, WPA/WPA2 PSK:Raw
Ratio:	0.90116 real, 0.91801 virtual	RAKP-opencl, IPMI 2.0 RAKP 
(RMCP+):Only one salt
Ratio:	0.94701 real, 0.90975 virtual	office2007-opencl, MS Office 2007 
(50,000 iterations):Raw
Ratio:	0.96478 real, 1.05856 virtual	office2010-opencl, MS Office 2010 
(100,000 iterations):Raw
Ratio:	1.00629 real, 1.02604 virtual	ntlmv2-opencl, NTLMv2 C/R:Only one salt
Ratio:	1.07017 real, 1.05947 virtual	ntlmv2-opencl, NTLMv2 C/R:Many salts
Ratio:	1.13467 real, 1.14177 virtual	PBKDF2-HMAC-MD4-opencl:Raw
Ratio:	1.16411 real, 1.17945 virtual	PBKDF2-HMAC-MD5-opencl:Raw
Ratio:	1.25004 real, 1.25691 virtual	sha1crypt-opencl, (NetBSD):Raw

Curiously enough, PBKDF*1*-HMAC-SHA-1 (sha1crypt) shows a 25% boost 
while all other iterated SHA-1 formats show regression, including 
PBKDF2. The theoretical difference is miniscule (an xor missing in 
PBKDF1) but in these implementations it also affects register pressure 
due to less state needing to be recorded (for split kernels). I wonder 
if I can optimize that shared PBKDF2 a little.

I guess this also suggests we should implement vectorizing for raw MD4, 
MD5 and SHA-1. And who knows, maybe SHA-256 too.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.