crypt-dev - Re: alternative approach

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110511190714.GA11467@openwall.com>
Date: Wed, 11 May 2011 23:07:14 +0400
From: Solar Designer <solar@...nwall.com>
To: crypt-dev@...ts.openwall.com
Subject: Re: alternative approach

Hi Yuri,

Is my understanding of your reply correct that you agree it is a good
idea to use the "alternative approach", but you propose that when doing
so we limit ourselves to "well-known and acceptable hashes/ciphers" as
our building blocks?

On Wed, May 11, 2011 at 11:22:38AM -0300, Yuri Gonzaga wrote:
> It is complicated to proposed a new method that doesn't use a combination of
> well-known and acceptable hashes/ciphers because of the lack of
> certification and acceptance since it take some time to happens, only after
> cientific and comercial communities appreciation.
> Thus, it is better to our project to use validated hashes/ciphers in order
> to achieve acceptance.

Right.  However, most modern cryptographic primitives were designed for
efficient software implementation (as well as hardware implementation).
So I'm afraid there isn't a truly software-unfriendly modern hash/cipher
that has received much scrutiny and government certification.

Blowfish (and perhaps Twofish) might be the closest we can get - it has
received a lot of scrutiny, but no government certification.  It is
friendly to 32-bit CPUs, but not to typical SIMD architectures (which
lack vectorized array lookups, for good reasons).

Thus, we may choose to use a component based on Blowfish (such as
Eksblowfish or the full bcrypt) or even one based on our own Blowfish
variation (more unfriendly to software implementation and requiring less
logic in hardware), but include proof that cryptographic security of our
entire KDF is not dependent on that of this component.  That is, even if
a gaping cryptographic weakness is found in this component, the worst
impact would be faster testing of a candidate key in a brute-force
attack - partially or fully bypassing computation of this "non-crypto"
component.  Of course, we wouldn't expect such bypass to be actually
practical, which is why we'd include this component.

(We may use a similar approach to gain acceptance for a variation of
scrypt, formally relying on SHA-256 and PBKDF2 rather than on its
Salsa20/8 component.)

> Also, the time of GSoC will be better spent if we
> focus in implement these blocks instead of proposing a new method.

Maybe, or maybe not.  We definitely need to have something usable and
useful by the end of summer.  We've already decided to include an FPGA
implementation and an optional FPGA-based component in our KDF.  No
matter what we choose, if we implement this at all and it works, it will
be "usable".  As to it being "useful", it will need to be somehow better
than what exists now, for some uses and in some aspects.  I see two ways
to make it useful in this sense:

1. Have our KDF, when running on one CPU plus one FPGA, be significantly
more resistant against offline brute-force attacks by just CPUs (no
FPGAs) and/or by GPUs, compared to an existing KDF implemented on a CPU
only.  By "significantly" I mean something like a 10x difference in
attack speed (assuming fixed $ cost of the attacker's equipment), and
preferably more than 10x.

2. Include and use a host-unreadable local parameter on the FPGA board.
For this, we need to figure out if Pico Computing's boards are readily
usable to achieve this task or not...  If not yet, then maybe we won't
have this by the end of summer (although it'd be a pity).

If we're 100% confident that we'd achieve one of these by the end of
summer, then the other is optional (for the summer).  However, at least
one of these is a must to achieve by the end of the summer.

Now, while we could wish FPGAs were some magic high-performance beasts,
definitely much faster than CPUs when properly programmed, they are not.
If we pick a software-friendly cryptographic primitive, it could turn
out that our optimized implementation of it on a typical server CPU
achieves comparable performance to our optimized implementation on our
FPGA board (one we could offer for installation into servers).

Here are some performance numbers for DES, which was originally designed
for hardware implementation, but then had efficient software
implementations developed for it:

http://www.sump.org/projects/password/

The performance number given for "AMD64 3000+" is incorrect, please
disregard it.  See the comments (some of them posted by me) for correct
performance numbers for CPU implementations.

This shows that XC3S1000-4 achieves a speed similar to a quad-core CPU
of 2008.  Core i7-2600 (this year's quad-core CPU) is twice faster - it
does over 20M of DES-based crypt(3) keys per second with my current code.

Thus, apparently XC3S5000-4, which is reported to theoretically achieve
45 MKeys/sec in a table on that web page, is twice faster than a modern
quad-core CPU.  However, for our project a mere 2x performance gain of
FPGA vs. CPU is not good enough to achieve the #1 goal mentioned above.

(I don't know how FPGAs on Pico Computing's boards meant for production
use in servers compare to a XC3S5000-4.  I'd appreciate info on that.)

Overall, the above makes me think that we may in fact need to design and
implement a software-unfriendly component.  Definitely more than DES
(which can be made to use arbitrarily wide SIMD vectors in software with
a bitslice implementation), and perhaps more than Blowfish.  With
original Blowfish, we might achieve a 4x "boost" in software
unfriendliness vs. DES (since it only uses 32-bit operations, whereas
modern CPUs can do 128-bit almost as fast), which might give us a total
"FPGA boost" vs. CPUs of 8x (relying on speeds for XC3S5000-4 vs. Core
i7-2600 mentioned above).  However, I'd prefer something better than 8x
(although 8x is already reasonable for our project), and I am not
certain of it (full/original Blowfish probably consumes more FPGA logic
than DES does, which I did not take into account in these calculations,
so we'd actually fit fewer Blowfish cores in our FPGA chip; also, there
are differences in pipelining ability, which is nearly perfect for DES).

> So, my suggestion is that you have to define 3 components: one for CPU,
> other for GPU and another for FPGA.

Maybe.  I suggest that we omit a GPU-specific component for now, for a
variety of reasons.

> The result of each component will be built in a final result. For example,
> do a XOR operation using 3 results

Yes, however XOR only works well when the components' outputs are of the
same size and all are "random enough" - otherwise the combined output
might reveal some bits of individual outputs, which would allow for
faster brute-force attacks (on those revealed outputs separately).
Thus, we'd want to feed the components' outputs into a generic KDF
(relatively fast).  We may even include the original password and salt
along with those component outputs.  This gives us proof that the
security of our entire construct depends on that final KDF only.

> or use one output as input in the next component.

This reduces parallelism for actual use while not affecting parallelism
available for attacks.  Not good.

> Finally, as you are the expert in this area, I think you should establish
> these details so I can go on with the project implementation.

Yes, I intend to define the project more specifically, but I am trying
to get you involved in the design decisions and to let you learn new
things.  Also, some design decisions depend on what they'd mean for
implementation.  So I might ask you to try implementing different things
before we decide on which of those to use.

Thanks,

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.