Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 23 Jul 2015 04:00:03 +0200
From: Solar Designer <>
Subject: Re: PHC: yescrypt on GPU

On Thu, Jul 23, 2015 at 01:33:26AM +0200, magnum wrote:
> On 2015-07-23 00:36, Agnieszka Bielec wrote:
> >has anyone idea why copying parts of memory from __global to __private
> >makes my code slower when there are different passwords and faster
> >where all passwords are the same?

Why faster for same passwords:

This is puzzling, but my guess (which could well be wrong) is that the
remaining global memory accesses have better locality of reference
(resulting in better cache hit rate) and/or coalescing potential than
all of them did before you moved some to private memory.  In other
words, you moved the "bad" ones to private and kept the "good" ones in
global.  But they are only "good" when the passwords are the same (and I
guess the salts as well, or there are few different ones), so this is of
no practical use.

Why slower for different passwords:

I guess your LWS or/and GWS became lower.

> >I did in lyra2 something very
> >similar, maybe my code is too big and I have to do split kernels?

Split kernel may be good anyway, but this is most likely unrelated to
this specific occasion.

> Are there differences in length distribution in the two cases?

This should be irrelevant.  The PHC finalists process the plaintext
password into a hash early on, and do not use the plaintext password
frequently.  They are not like e.g. md5crypt in this respect.

> If not, 
> Maybe in the slow case they end up spilling to local memory due to 
> harder register pressure.

Maybe.  This is a possibility with any changes to a kernel.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.