Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 1 Apr 2015 10:29:09 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] John the Ripper support for PHC finalists

Hi Agnieszka,

Thank you for your work on this project so far!

On Tue, Mar 31, 2015 at 04:34:58PM +0200, Agnieszka Bielec wrote:
> POMELO will never be fast on GPU because __local and __private memory
> are limited

It is expected that 7 out of 9 PHC finalists are GPU-unfriendly.
However, we need to try anyway, and determine those PHC finalists' cost
settings where attacks on them with GPUs and CPUs become same speed
per-chip.  With even lower cost settings (likely way lower than those
PHC finalists are normally intended to be used with), GPUs may win.
Where this threshold is might affect which PHC finalists are eventually
selected as winners, so the sooner we obtain such results, the better.

For example, a PHC finalist with a CPU vs. GPU parity threshold at 1 MB
is worse (in this respect) than one with the threshold at 10 KB.

Use of large amounts of memory, beyond the local memory size, does not
necessarily prevent this threshold from being at multi-megabyte levels,
as seen for scrypt here:

http://www.openwall.com/lists/crypt-dev/2014/03/13/1

On the other hand, GPU resistance may happen to be achieved even at very
low memory usage, such as bcrypt's 4 KB:

http://www.openwall.com/presentations/Passwords14-Energy-Efficient-Cracking/slide-44.html
http://www.openwall.com/presentations/Passwords14-Energy-Efficient-Cracking/slide-45.html

As you can see, for scrypt with r=1 the threshold appears to be around
16 MB.  That's with a TMTO factor of 11 (the attacker gets to tune the
TMTO factor optimally for themselves).  If a PHC finalist that you
experiment with is not susceptible to efficient TMTO, which they try not
to be, we might get something like 16/11 = ~1.5 MB for the threshold
_if_ the scheme is otherwise scrypt-like.  Since there are many other
differences from scrypt, beyond TMTO resistance, it is expected that
actual thresholds will be at very different levels, hopefully lower than
this ~1.5 MB estimate.  Another reason why the ~1.5 MB estimate would
not actually hold true even for an scrypt-like with TMTO resistance is
that when TMTO is exploited (such as in the 16 MB benchmark mentioned
above) the attacker pays for the reduction in memory needs by extra
computation.  So when TMTO is not exploited (such as because it can't
reasonably be), we get to do less computation.  This suggests that the
without-TMTO threshold could as well be somewhat higher than ~1.5 MB.

Your task is to produce optimized code and test inputs that will let us
determine exactly where this threshold is, for each PHC finalist.

BTW, note that it is quite possible that different optimizations
(possibly even entirely different OpenCL kernels) might be needed for
different settings.  For example, with scrypt, besides tuning of the
TMTO factor, whether you'd optimally keep X (a buffer holding the
current block) in local or global memory might vary by the value of r
(for r=1 it's optimal to keep X in local memory, but starting with some
threshold this is probably no longer so as it starts to limit the
concurrency too much).  In fact, this same issue will likely arise when
you approach implementing yescrypt.

It might happen that an 8th finalist, Makwa, is also GPU-unfriendly at
some settings.  If so, that would be an interesting and useful result.
Makwa is not explicitly intended to be GPU-unfriendly.

And in case you were wondering, the PHC finalist that is expected to be
GPU-friendly is Parallel.  As currently defined, it uses SHA-512, which
is not the best fit for GPUs, but it's by far not the worst either.

Thanks again,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.