Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 7 Oct 2012 23:54:53 +0400
From: Solar Designer <>
Subject: Re: Password hashing at scale (for Internet companies with millions of users) - YaC 2012 slides

On Fri, Oct 05, 2012 at 06:46:09PM +0530, Sayantan Datta wrote:
> I don't understand how the lack of parallelism can affect the defenders? Or
> maybe I don't understand how the defense mechanism works!!

When the hashing method lacks parallelism and you don't have any
parallelism externally either (you only have one password to hash with
one salt value - to authenticate the one user currently logging in), you
(the defender) might not be able to use your CPU fully.  However, the
attacker will be able to use an equivalent CPU more fully, by bringing
parallelism from high level (multiple candidate passwords to test) down
to instruction level.

For example, with PBKDF2-HMAC-SHA-1, a defender might spend 100 ms to
validate a password - with SHA-1 implemented using 32-bit integer
instructions only (can't use SIMD because of the lack of parallelism).
However, an attacker having the same type of CPU will be able to test
e.g. 16 passwords per 100 ms on the same quad-core CPU - using four
32-bit elements per 128-bit SIMD vector and four cores.

The actual speedup might be somewhat different since the CPU might have
a different number of 32-bit ALUs than vector units per core, and the
throughput of these might be different too.  (In terms of attacker's
code optimization, the availability of extra execution units is taken
advantage of by mixing instructions corresponding to different
instances of the hash function being computed, or/and by running
multiple threads per core if the CPU supports SMT.)

Now, the defender might in fact have some parallelism, due to multiple
nearly concurrent authentication attempts (for the same or different
target accounts).  In practice, this lets defenders benefit from
multiple CPU cores and SMT, but usually not from SIMD, nor from other
optimizations to be made within one thread (such as the instruction
mixing I mentioned).  Those optimizations are tricky and risky for
defenders, and they require synchronization of different authentication
attempts (delaying processing of some in order to process them in
parallel with others).  It is a lot better to design the hashing method
such that it has sufficient parallelism in it, and make use of that
parallelism for defense without having to combine processing for
different authentication attempts at a low level.

With possible use of GPUs for defense, the parallelism available from
nearly concurrent authentication attempts is simply insufficient even at
the busiest authentication servers - e.g., a server might receive "only"
500 authentication attempts per 100 ms, but efficient use of a GPU may
require thousands of work-items.

I hope this is clearer now.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.