john-users - Re: Password hashing at scale (for Internet companies with millions of users)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+TsHUA0Vr0qAgh1meOgJ-BQ+fOPwJvK9puYnsGszeSKE6GO5A@mail.gmail.com>
Date: Mon, 8 Oct 2012 06:44:03 +0530
From: Sayantan Datta <std2048@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: Password hashing at scale (for Internet companies
 with millions of users) - YaC 2012 slides

On Mon, Oct 8, 2012 at 1:24 AM, Solar Designer <solar@...nwall.com> wrote:

> On Fri, Oct 05, 2012 at 06:46:09PM +0530, Sayantan Datta wrote:
> > I don't understand how the lack of parallelism can affect the defenders?
> Or
> > maybe I don't understand how the defense mechanism works!!
>
> When the hashing method lacks parallelism and you don't have any
> parallelism externally either (you only have one password to hash with
> one salt value - to authenticate the one user currently logging in), you
> (the defender) might not be able to use your CPU fully.  However, the
> attacker will be able to use an equivalent CPU more fully, by bringing
> parallelism from high level (multiple candidate passwords to test) down
> to instruction level.
>
> For example, with PBKDF2-HMAC-SHA-1, a defender might spend 100 ms to
> validate a password - with SHA-1 implemented using 32-bit integer
> instructions only (can't use SIMD because of the lack of parallelism).
> However, an attacker having the same type of CPU will be able to test
> e.g. 16 passwords per 100 ms on the same quad-core CPU - using four
> 32-bit elements per 128-bit SIMD vector and four cores.
>

So the lack of parallelism affects the defenders by not letting them fully
utilize the cpu resources(vector units).

>
> The actual speedup might be somewhat different since the CPU might have
> a different number of 32-bit ALUs than vector units per core, and the
> throughput of these might be different too.  (In terms of attacker's
> code optimization, the availability of extra execution units is taken
> advantage of by mixing instructions corresponding to different
> instances of the hash function being computed, or/and by running
> multiple threads per core if the CPU supports SMT.)
>

By ALUs you mean the scalar units. Right? Maybe we could mix bcrypt with
md5 and get awesome resource utilization...at least on paper.


> Now, the defender might in fact have some parallelism, due to multiple
> nearly concurrent authentication attempts (for the same or different
> target accounts).  In practice, this lets defenders benefit from
> multiple CPU cores and SMT, but usually not from SIMD, nor from other
> optimizations to be made within one thread (such as the instruction
> mixing I mentioned).  Those optimizations are tricky and risky for
> defenders, and they require synchronization of different authentication
> attempts (delaying processing of some in order to process them in
> parallel with others).  It is a lot better to design the hashing method
> such that it has sufficient parallelism in it, and make use of that
> parallelism for defense without having to combine processing for
> different authentication attempts at a low level.
>

Why would the optimizations become risky for defenders? I mean , does/could
those optimizations you mentioned give the attacker any added advantage?

>
> With possible use of GPUs for defense, the parallelism available from
> nearly concurrent authentication attempts is simply insufficient even at
> the busiest authentication servers - e.g., a server might receive "only"
> 500 authentication attempts per 100 ms, but efficient use of a GPU may
> require thousands of work-items.
>
> I hope this is clearer now.
>
> Alexander
>

Regards,
Sayantan
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.