Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 5 Apr 2011 06:52:16 +0400
From: Solar Designer <solar@...nwall.com>
To: crypt-dev@...ts.openwall.com
Subject: project rationale

Hi,

I finally got around to describing the project in more detail.

I'll over-quote a little bit for people who have joined after I posted
the below, then add new content below the quote:

On Mon, Mar 28, 2011 at 10:12:07PM +0400, Solar Designer wrote:
> This is our new mailing list for discussing the project described as a
> GSoC 2011 "idea" as follows:
> 
> http://openwall.info/wiki/ideas
> 
> # New password hashing method for servers (full scope exceeds one summer,
> so the task may be split in two or three)
> 
>     * New crypt(3) flavor using concepts of scrypt (not only iterations,
> but also parallelism and memory), optionally making use of AES-NI
>     * GPU or/and FPGA accelerated password hashing on servers (to better
> compete with similarly accelerated or distributed offline password hash
> cracking), optional local parameterization on specific hardware
> (parameter unreadable from host OS)
> 
> I suggest that we start by discussing "high-level" stuff (such as the
> "context" for this project: what already exists and why there's room for
> something new), then go into detail.

What we have:

Purposefully slow password hashing methods with configurable iteration
counts (which are stored along with the hashes).  As far as I'm aware,
this was first introduced with BSDI's extended DES-based crypt(3)
hashes, and on modern systems we use bcrypt and SHA-crypt instead.

We also have scrypt, which defines a key derivation function, but not a
crypt(3) salt and return value string syntax yet.  The scrypt KDF has
some advantages over bcrypt: it can be configured to use large amounts
of memory and/or contain a lot of parallelism (for just one instance).

Why is parallelism in a password hashing method desirable:

Offline password cracking (against leaked or stolen hashes) has a lot of
potential for parallelism: different candidate passwords may be hashed
in parallel.  Thus an attacker is generally able to utilize the parallel
processing capabilities of their hardware.  (There are some exceptions
to this - e.g., Blowfish's S-box lookups typically can't be efficiently
implemented on SIMD architectures - but an attacker can deal with this
by picking suitable hardware, such as an FPGA instead of a GPU.)

However, a server doing authentication might only have one instance of a
user's password to check at a given time.  If the hashing method lacks
parallelism, only a subset of the server CPUs' execution units will be
usable for the task.

The administrator has to set the iteration count (for newly set/changed
passwords) such that password hashes are computed in reasonable time
(say, in 10 ms).  If only 1/10th of the execution units of a given CPU
type are usable to hash a single password (because of data dependencies),
this means that an attacker can try candidate passwords against hashes
of this type on a comparable CPU 10 times faster (by exploiting the
extra parallelism available due to having multiple candidate passwords).

(In practice, for bcrypt the speedup only available to password cracking
is currently around 80% on Intel Core 2'ish CPUs in 64-bit mode.
That's for just one CPU core and without making use of SIMD extensions,
because Blowfish is not well suited for those, whereas an FPGA or ASIC
implementation "comparable in size" to the server's CPU could achieve a
lot of further speedup.)

By including a lot of inherent parallelism in one instance of password
hash computation, we make this parallelism available to both password
validation on the server and password cracking.  Thus, we let the server
administrator configure the parameters such that hashes are still
computed in reasonable time (say, the same 10 ms), but this process
makes more complete use of the server's resources.  (In simple cases,
this would not require additional configuration beyond setting a higher
iteration count, where each iteration would automatically make use of a
larger number of CPU execution units and/or wider SIMD vectors to run
faster.)  Then the attacker will only be able to check fewer candidate
passwords in parallel (or will have to throw more resources at the attack).

Why it may be desirable to use more memory:

Similarly to a hashing method that lacks inherent parallelism, one that
only needs a little bit of memory is not making use of all server
resources that it reasonably could.  This allows for smaller and cheaper
specialized implementations (that don't include extra memory), thereby
letting an attacker build and run a larger number of those in parallel
(in an offline attack).

By allowing the administrator to configure password hashes (for newly
set/changed passwords) to require a substantial amount of memory for
their computation, we increase the cost of each instance of a
specialized implementation that an attacker would use, thereby
decreasing the number of those that will run in parallel.

Why not just use scrypt:

Actually, we might build upon scrypt.  This needs further research -
scalability to wider SIMD vectors, GPU implementations (to make use of
GPUs in servers doing authentication), FPGA implementations.  We could
see whether scrypt is usable as-is, whether it needs changes, or whether
we need something different/new.

Then, one common complaint regarding bcrypt was that it relied on a
cipher (Blowfish) that was not NIST-approved.  In fact, this was cited
as the primary reason to create SHA-crypt.  scrypt uses Salsa20, which
is likely to result in similar difficulties in getting it accepted, even
though IIRC it also uses SHA-256, which we can refer to.

So I think it could be better to design a similar KDF based on a SHA-2
hash or on AES, although this is not trivial to do when we want to have
near-optimal implementations for very different machine word and SIMD
vector widths.

Local parameterization:

Offline attacks on stolen/leaked hashes may be defeated when hash
computation involves a local parameter (a secret value with high
guessing entropy, such that it is infeasible to brute-force it) and this
parameter is not stolen/leaked.  With a custom FPGA board for use in
servers, we may have a little bit of storage unreadable from the
host system, yet usable by the FPGA logic.  A company could program the
board with a certain local parameter value (and store a backup copy of
this value elsewhere, e.g. printed on paper and locked in a safe), then
put the board into their authentication server.  If the server is ever
compromised over a network, the intruder would only be able to capture
plaintext passwords of users authenticating until the compromise is
detected and dealt with.  The intruder won't be able to mount a
successful offline attack on all of the hashes, which would otherwise
result in compromise of a much larger percentage of accounts, longer
recovery from the compromise, and/or in increased burden on customer
support (for having a larger number of accounts locked and passwords
changed in an emergency).

I think the above should be enough to get us started.  I'd appreciate
any comments and questions.

Thanks,

Alexander

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux - Powered by OpenVZ