crypt-dev - Some thoughts on hard to compute hashes

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTi=OVCp8oGthgzPuvP+8vX9BtSBwtw@mail.gmail.com>
Date: Fri, 1 Jul 2011 18:36:11 -0400
From: Anthony Ferrara <ircmaxell@...il.com>
To: crypt-dev@...ts.openwall.com
Subject: Some thoughts on hard to compute hashes

Hello all,

Let me preface this by saying that I'm no expert in cryptography.  I
am pretty good with math and the algorithms behind the ciphers, but
I've never significantly studied them.  I understand how and why they
work, but I'm not versed in the intricacies of their design or
testing.  So I am posting this question to see if this thought is even
worth pursuing.

So I was conversing with Solardiz on a drupal topic
(http://drupal.org/node/1201444#comment-4675994) and we were
discussing scrypt hashing and how there are really two forms of the
algorithm.  One that uses a lot of memory, and one that uses a lot of
logic (as a tradeoff).  The logic version, while significantly more
complicated and slower, could be potentially run on a massively
parallel GPU (not parallelizing the algorithm, just running multiple
hashes at the same time).  While the memory bound version is not
particularly suited to high parallelism due to the memory demands.

So that started me thinking.  What can a general purpose CPU do
efficiently that neither an ASIC nor a GPU could do in an efficient
manner (at a reasonable cost at least)?  And the answer that I came up
with was branching logic.  Modern CPUs have lots of die space devoted
to branching prediction and prefetching.  Therefore, a CPU should be
more efficient (from my limited view at least) at running a hashing
algorithm with significant branching than either a GPU or an
inexpensive ASIC (as the intelligent branching logic adds complexity
and significant numbers of transistors to the design).

So, if we take a memory-hard algorithm such as used by scrypt, and add
branching logic inside the smix() function (for example), we could
theoretically make the algorithm favor general purpose CPUs.

For example, the current smix function is:

X <-- B
For i = 0 ... N - 1
    V_i <-- X
    X <-- H(X)
For i = 0 ... N - 1
    j <-- Integerify(x) mod N
    X <-- H(X xor V_j)

What if we changed the second loop to something such as:

For i = 0 ... N - 1
    j <-- Integerify(x) mod N
    If j > (N / B)
        X <-- H(X xor V_j)
    Elseif j > (N / 2B)
        V_i <-- H(X xor V_j)
    Elseif j > (N / 4B)
        V_j <-- H(X xor V_i)
    Else
        X <-- H(X xor V_i)

Where B is a "branching factor" where 2^N > 4B > 4.  The addition of
the branching factor will adjust the percentages that each branch gets
executed (with B=2 equal to 50%, 25%, 12.5%, 12.5%).  By increasing
this factor, it should adjust the ability of the processor to
optimized the branch (where B=2 would favor a prefetch of all
branches, and large B values favoring prediction).

The exact contents of the branches is open, but I just wanted
something meaningful to demonstrate the concept.

Just wondering if this idea is interesting enough to do some testing
and modeling of, or if it's been done before and is not worth it (or
it's not worth it for other reasons).

Thanks,

Anthony
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.