Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 12 Jul 2011 12:16:48 +0400
From: Solar Designer <>
Subject: Re: Some thoughts on hard to compute hashes

Hi Anthony,

I'm sorry for not commenting on this sooner.

On Fri, Jul 01, 2011 at 06:36:11PM -0400, Anthony Ferrara wrote:
> So that started me thinking.  What can a general purpose CPU do
> efficiently that neither an ASIC nor a GPU could do in an efficient
> manner (at a reasonable cost at least)?  And the answer that I came up
> with was branching logic.  Modern CPUs have lots of die space devoted
> to branching prediction and prefetching.  Therefore, a CPU should be
> more efficient (from my limited view at least) at running a hashing
> algorithm with significant branching than either a GPU or an
> inexpensive ASIC (as the intelligent branching logic adds complexity
> and significant numbers of transistors to the design).

You're right about GPUs, but probably mostly not about ASICs.  I'd
expect an ASIC not to have to implement the equivalent of a CPU's branch
predictor/prefetcher, but instead just have a hard-coded state machine
specific for your algorithm.  On the other hand, if your code is large
and its different portions are dissimilar, then you might force an
optimal ASIC to effectively be a special-purpose CPU.  The specific
example you provided doesn't do that, though.

> Just wondering if this idea is interesting enough to do some testing
> and modeling of, or if it's been done before and is not worth it (or
> it's not worth it for other reasons).

It's been known before that branching is more unfriendly to stream
processors such as in GPUs than it is to CPUs.  I don't know if it's
been proposed for a KDF or a password hashing method before or not.

My concern is that it's somewhat unfriendly to CPUs as well, so we might
not be making optimal use of the CPUs' execution units (leaving them
idle while code is being fetched into L1 cache).  If we consider attacks
with GPUs only (and not ASICs), then we could fit all the code in L1
cache and still achieve GPU-unfriendliness with branches.  We'd have to
include some amount of parallelism, though (enough for CPUs), and we'd
rely on the CPU's branch predictor to be no worse than some minimum we
design for.  Also, we'd need to keep the number of different branch
targets small enough that we don't fill up the Pentium 4 trace cache or
the like.

Overall, yes, this makes sense.

We've been considering another way of GPU-unfriendliness so far -
arbitrary memory accesses, like in Blowfish.  I think both approaches
may be combined.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.