Date: Wed, 22 Jan 2014 05:58:17 +0400 From: Solar Designer <solar@...nwall.com> To: crypt-dev@...ts.openwall.com Subject: Re: scrypt parallelism vs. area-time vs. TMTO resistance On Thu, Dec 19, 2013 at 06:08:53AM +0400, Solar Designer wrote: > Setting scrypt's p > 1 introduces parallelism at (almost) the highest > level. This has the advantage of needing to synchronize the threads > just once (before the final PBKDF2), but it results in much lower > area-time cost for the attacker than we could achieve by having the > parallelism at a slightly lower level, inside SMix. > > Specifically, in SMix's first loop, we may have the threads each write > to its own portion of V, and in SMix's second loop we may have them all > read from the entire V. This requires that we synchronize the threads > just one more time: between the two loops in SMix. > > For classic scrypt, the running time is 2*N (assuming that we have > enough computing resources to make full use of the parallelism, and > omitting r from this discussion for now). For the revision proposed > above, the running time is only 2*N/p: the first loop fills all N > elements of V in N/p time, and the second loop performs a total of N > lookups (as usual) and accesses 63% of V elements (as usual) in N/p time. > > Now given the same p and the same target running time, we can set N to > be p times higher. > > If for classic scrypt the area-time cost of attack is > O(old_N^2 * p * r^y) , then for the proposed revision it is > O(new_N^2 * r^y). I was wrong. I guess I temporarily forgot that area*time is not the same as memory*computation. The attacker cares about how much memory is tied up for how long. When we increase parallelism at any level within a hash computation, it may be made use of to reduce the total running time (the "how long"), even if the number of memory lookups and crypto primitive computations stays the same. For high N, small p, and an attacker with ASICs, the area cost of extra crypto cores is negligible. Thus, for that scenario, the proposed revision's AT cost is only O(new_N^2 / p * r^y). There was no "/ p" for classic scrypt's SMix AT cost because there's no parallelism in classic scrypt at that level. There was "* p" for classic scrypt's AT cost because either that much more memory is tied up for the same duration or the original (for p=1) amount of memory is tied up for p times longer (because that smaller memory allocation has to be used and reused sequentially, regardless of how many more free crypto cores the attacker could have), or it can be some inbetween combination. > If we maintain the same running time, then > new_N = old_N * p, giving the new area-time cost O(old_N^2 * p^2 * r^y). With the above correction, this changes to O(old_N^2 * p * r^y), which is the same as classic scrypt's. > This is a major improvement: we went from p to p^2 for a factor in the > attacker's cost Unfortunately, we did not, at least not for the most capable attackers with ASICs. However, the change may still be helpful in that it limits the attacker's flexibility. Only attackers with sufficient supply of free crypto cores maintain the classic scrypt's attack cost, and even those attackers are forced to keep more memory per (faster) instance to achieve good efficiency (unlike with classic scrypt, where they have more of a choice). This more-memory-per-scrypt-instance setup might turn out to be slightly less cost-effective than the smaller-memory setup (classic scrypt's), such as because of increased average distance between crypto cores and the now-larger memory and because of extra routing and logic required to deliver data to any of the cores rather than only to a core tied to a specific memory block. If so, there's an attack cost increase by a small factor. And if not, there's no attack cost decrease (since that sort of setup was possible anyway). > While p > 1 is currently most relevant to (offline) uses of the KDF, we > may also look at what happens on authentication servers doing password > hashing and needing to achieve a certain throughput. With classic > scrypt, going from p = 1 (with N and r tuned to achieve the desired > throughput) to p > 1 (with N decreased accordingly) weakens the hashes > (in area-time terms) by a factor of p. With the proposed revision, this > (theoretically) does not affect security, so the decision may be made > based on other criteria (including possibly providing lower latency for > the users by making full use of the available computing resources even > when the server is under less than the maximum supported load). This > may be most relevant for possible defensive use of GPUs and such, where > the amount of parallelism available from concurrent authentication > attempts may be way below what's needed to optimally utilize the device, > even on busy authentication servers (need tens of thousands of > concurrent threads for current GPUs, and this keeps growing). Unfortunately, the same AT cost reduction by a factor of p happens for the proposed revision. Thus, as long as we're maximizing the die area via the memory portion (rather than via the crypto cores), the defender should prefer relatively few fast cores (to keep N or N/p high) over many slower cores (which would require lower N or N/p, for the classic and revised scrypt, respectively). If we're considering attackers with ASICs, then it's only when die area consumed by the crypto cores themselves would be comparable to that consumed by memory that it becomes reasonable to go for using many cores per (possibly revised) scrypt instance instead of running many independent parallel instances (if we have the parallelism from concurrent authentication attempts). > Sounds great so far? It does to me. It was too good to be true, even. Didn't anyone spot the error? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.