Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 2 Sep 2012 19:13:37 +0400
From: Solar Designer <solar@...nwall.com>
To: crypt-dev@...ts.openwall.com
Subject: Re: using scrypt for user authentication

Since this thread is so old, I'll over-quote, then add the new stuff
below the quote:

On Tue, Aug 07, 2012 at 10:54:36AM +0200, Pavel Kankovsky wrote:
> On Thu, 12 May 2011, Solar Designer wrote:
> 
> > 1. Use such settings that scrypt doesn't use more than, say, 1 MB of
> > memory.  Is 1 MB way too low?  Is scrypt at this setting significantly
> > better than bcrypt or not?
> 
> According to Colin Percival's BSDCan2009 paper the amortized cost (chip 
> area times time) of scrypt is (at least) 1024 N^2 r^2 p s t where 
> parameters N and r determine the size of memory (1024 N r + O(r) bits), p 
> is a paralellization parameter and s and t are unit costs of storage and 
> computation.
> 
> The paper claims the cost of scrypt with (N, r, p) = (2^14, 8, 1) is 
> approximately 35 times higher than the cost of bcrypt with cost = 11 while 
> the time needed to compute both of those functions on a general-purpose 
> CPU is comparable. These ratios are probably quite stable even when
> hardware evolves and unit costs (s, t) change.
> 
> The aformentioned parameters (N = 2^14, r = 8) correspond to 16 MiB of RAM 
> if my calculation is correct. In order to reduce memory consumption to
> 1 MiB you would have to reduce the product of N and r 16-fold. p can be 
> increased from 1 to 16 now but the overall cost would still be reduced by 
> a factor of 16 because its dependence on N and r is quadratic.
> 
> Such a change would degrade the strength of scrypt almost to the level of 
> bcrypt. On customized hardware. On the other hand, it would probably use 
> enough memory and memory bandwidth to choke GPUs and other hardware that 
> has not been explicitly designed to crack it.

We got a curious real-world data point due to Litecoin using scrypt with
(2^10, 1, 1), which corresponds to 128 KiB.  I guess the (misguided?)
intent was to use L2 cache instead of RAM.  IIRC, scrypt was designed
not to bump into RAM bandwidth on typical computers anyway, so there was
little point in limiting it to just L2 cache.  Well, perhaps it'd bump
into RAM bandwidth with many CPU cores running instances of scrypt at once.
(How many concurrent instances would this take?)

A result is that GPU miners for Litecoin now exist, and they outperform
CPUs roughly by a factor of 10 per-chip:

https://github.com/litecoin-project/litecoin/wiki/Mining-hardware-comparison

Yes, they do optionally use the time-memory tradeoff previously
mentioned in here:

http://www.openwall.com/lists/crypt-dev/2011/07/01/1
http://www.openwall.com/lists/crypt-dev/2011/07/01/2

although according to the documentation for cgminer "performance peaks
at a gap of 2", which corresponds to very light use of recomputation
(avoiding storage of every other lookup table element).  Thus, perhaps
part of the 10x speedup comes not from the GPUs' computing power, but
rather from GPU cards' memory subsystem being on par with or faster than
CPUs' L2 cache for scrypt's access pattern (entire cache lines being
requested) and from the greater concurrency of such accesses.

For comparison, current implementations of bcrypt on CPU and GPU achieve
roughly the same speed per-chip (no advantage from GPUs yet, except that
it may be cheaper to build hobbyist multi-GPU than multi-CPU systems).

Thus, we can see that scrypt with improper settings can be worse
than bcrypt for a very realistic attack.  (For attacks with ASICs,
things may be different, but attacks with GPUs are relevant too.)

Curiously, bcrypt uses only 4 KiB, which is less than Litecoin scrypt's
128 KiB.  However, bcrypt's use of L1 (rather than L2) cache on CPUs
provides an advantage to CPU implementations, and its access pattern
(random 32-bit accesses) makes attempts to use L2+ cache or RAM (for
many concurrent instances of bcrypt) very inefficient.  On GPUs, best
performance at bcrypt is currently achieved by using local memory (very
limited in size) and keeping many of the execution units idle (since
there's not enough local memory to use them all).  Trying to use global
memory (partially cached) slows things down.  (A local+global mixed
implementation might be a little bit faster than local alone, though.
This was briefly discussed on john-dev.)  Credit: the experiments with
bcrypt on GPU were performed by Sayantan Datta.

It is plausible that scrypt at 1 MiB is comparable to bcrypt in terms of
attacks with GPUs, but this is not certain.  scrypt at 1 MiB might as
well be somewhat weaker than bcrypt in this respect.

Overall, I think scrypt at 1 MiB is not a good replacement for bcrypt
currently.  We need bigger settings.  (And if we go for much bigger
settings, this may imply a need to limit concurrency when scrypt is used
on authentication servers.)

We might also want to have a tradeoff-defeating variation of scrypt, as
Anthony Ferrara suggested or otherwise.  It appears that this would make
scrypt maybe a factor of 2 more resistant to attacks with current GPUs
at Litecoin's low settings and perhaps a lot more at bigger settings
(where the total GPU card's RAM size is more easily hit if the tradeoff
is not used in an attack).  Maybe this property should be configurable
(a Boolean parameter, to be encoded along with the resulting encrypted
data or password hash).

As usual, I'd appreciate any comments.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.