Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Oct 2012 06:44:03 +0530
From: Sayantan Datta <>
Subject: Re: Password hashing at scale (for Internet companies
 with millions of users) - YaC 2012 slides

On Mon, Oct 8, 2012 at 1:24 AM, Solar Designer <> wrote:

> On Fri, Oct 05, 2012 at 06:46:09PM +0530, Sayantan Datta wrote:
> > I don't understand how the lack of parallelism can affect the defenders?
> Or
> > maybe I don't understand how the defense mechanism works!!
> When the hashing method lacks parallelism and you don't have any
> parallelism externally either (you only have one password to hash with
> one salt value - to authenticate the one user currently logging in), you
> (the defender) might not be able to use your CPU fully.  However, the
> attacker will be able to use an equivalent CPU more fully, by bringing
> parallelism from high level (multiple candidate passwords to test) down
> to instruction level.
> For example, with PBKDF2-HMAC-SHA-1, a defender might spend 100 ms to
> validate a password - with SHA-1 implemented using 32-bit integer
> instructions only (can't use SIMD because of the lack of parallelism).
> However, an attacker having the same type of CPU will be able to test
> e.g. 16 passwords per 100 ms on the same quad-core CPU - using four
> 32-bit elements per 128-bit SIMD vector and four cores.

So the lack of parallelism affects the defenders by not letting them fully
utilize the cpu resources(vector units).

> The actual speedup might be somewhat different since the CPU might have
> a different number of 32-bit ALUs than vector units per core, and the
> throughput of these might be different too.  (In terms of attacker's
> code optimization, the availability of extra execution units is taken
> advantage of by mixing instructions corresponding to different
> instances of the hash function being computed, or/and by running
> multiple threads per core if the CPU supports SMT.)

By ALUs you mean the scalar units. Right? Maybe we could mix bcrypt with
md5 and get awesome resource least on paper.

> Now, the defender might in fact have some parallelism, due to multiple
> nearly concurrent authentication attempts (for the same or different
> target accounts).  In practice, this lets defenders benefit from
> multiple CPU cores and SMT, but usually not from SIMD, nor from other
> optimizations to be made within one thread (such as the instruction
> mixing I mentioned).  Those optimizations are tricky and risky for
> defenders, and they require synchronization of different authentication
> attempts (delaying processing of some in order to process them in
> parallel with others).  It is a lot better to design the hashing method
> such that it has sufficient parallelism in it, and make use of that
> parallelism for defense without having to combine processing for
> different authentication attempts at a low level.

Why would the optimizations become risky for defenders? I mean , does/could
those optimizations you mentioned give the attacker any added advantage?

> With possible use of GPUs for defense, the parallelism available from
> nearly concurrent authentication attempts is simply insufficient even at
> the busiest authentication servers - e.g., a server might receive "only"
> 500 authentication attempts per 100 ms, but efficient use of a GPU may
> require thousands of work-items.
> I hope this is clearer now.
> Alexander


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.