crypt-dev - Re: lyra

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140121163215.GA31555@openwall.com>
Date: Tue, 21 Jan 2014 20:32:15 +0400
From: Solar Designer <solar@...nwall.com>
To: Marcos Simplicio <mjunior@...c.usp.br>
Cc: crypt-dev@...ts.openwall.com, Leonardo Almeida <leocalm@...il.com>,
	"eandrade@...c.usp.br" <eandrade@...c.usp.br>
Subject: Re: lyra

Hi Marcos,

Thank you for your comments, and I'm sorry for my delayed response.
Please see inline:

On Tue, Jan 14, 2014 at 12:49:39PM -0200, Marcos Simplicio wrote:
> I saw the discussion below and thought I could offer some pointers on
> Lyra's design (please see http://eprint.iacr.org/2014/030 or the official
> publication at http://link.springer.com/journal/13389 for all details).

While I wrote the message you're replying to based solely on your
presentation slides (not the paper), I've since found the paper as well,
and added some comments/corrections here:

http://www.openwall.com/lists/crypt-dev/2014/01/13/1

> Please let us know if you find any flaw in our security claims! :-)

The lack of normalization to same running time in your comparisons
against scrypt is a flaw, I think.  However:

> > 2. The claimed higher than scrypt's memory usage for same running time
> > would be very nice.  Whether it is actually achieved, I am not sure.
> > Where would it come from?  scrypt spends half its time on filling
> > memory, and it uses Salsa20/8 for that.  Is it claimed that BLAKE2 is
> > somehow a lot faster than Salsa20/8?  I'd expect these to be roughly
> > similar speed.  Maybe the speedup the authors observed was due to
> > implementation detail (comparison against less-than-optimal scrypt
> > implementation and build) rather than due to the design of Lyra.
> 
> The reason is that we adopt an Alred-like approach, using 1-round Blake2b
> instead of 10-round Blake2b. Since we are interested in diffusion rather
> than cryptographic resistance to inversion or collisions, this seems to be
> a good approach for KDFs. After all, the attacker has no control over the
> inputs provided to the hash... We are open to other people's opinions on
> the matter, though :)

This explains it, and the approach makes sense to me.  We're also likely
to move to something quicker than Salsa20/8 in escrypt.

Unfortunately, this makes normalization to same running time ambiguous,
or you may do it in two different ways: for actual performance achieved
on a real-world system and thus considering the move from Salsa20/8 to
1-round Blake2b, and separately for a theoretical system that computes
either of these primitives with the same latency.  The latter comparison
is relevant in showing the advantage (or lack thereof) of your other
changes, relative to scrypt.  I am mostly interested in this theoretical
comparison, because it'd reflect the less trivial ways in which Lyra
differs from scrypt.

As to "the attacker has no control over the inputs" and not needing
"cryptographic resistance to inversion or collisions", these statements
are only partially valid (in some contexts and to some extent).  The
attacker definitely does have control over new inputs when attempting a
collision against previously computed hashes, and such collisions, if
possible for the entire KDF, would ease practically relevant attacks.
Then, short cycles are undesirable (need to be very unlikely), since
when they do occur the attacker is able to compute the KDF quicker, and
this is related to having some collision resistance (just not to the
same extent normally expected from full-round-count crypto hashes).

> > 3. The claimed TMTO resistance would also be very nice, although it's
> > arguably even nicer when it's optional (like we have in the current
> > development escrypt tree), because there are some defender's uses for
> > scrypt's (deliberate) TMTO-friendliness too.
> 
> I'm not sure whether this solves your point or not, but the TMTO
> resistance is configurable/optional: T is fully configurable; if you want
> the same memory usage with a lower T, but wants to raise the processing
> time, just use 2-round Blake2b (or higher).

You have writes into M in the Wandering phase, regardless of the value
of T (although with a lower T there are fewer such writes).  These
writes introduce some TMTO unfriendliness as compared to scrypt, even at
T=1.  This is both good (when TMTO unfriendliness is desired) and bad
(because it is not optional and not always desired).

> > 4. The summary table on slide 28/36 (page 31 in the PDF) gives different
> > meaning to R for scrypt vs. Lyra.  To compare the "Intermediate states"
> > and "Memory-free" times for scrypt vs. Lyra, they first need to be
> > normalized for same "Sequential (Default)" running time.  Let's use Rs
> > to refer to scrypt's R, and Rl to Lyra's.  We have Rl=Rs/T, so that
> > O(Rl*T) would match scrypt's O(Rs) (normalization for same running time).
> > For memory-free running times, this gives us O(Rs^2) for scrypt and
> > O((Rs/T)^(T+1)) for Lyra.  This is still reasonably good for Lyra, but
> > not as good as the slide makes it appear at first.
> 
> Normalization is indeed a complicating factor when comparing KDFs, and we
> are working on make ours clearer. At first sight (but I haven't check all
> details), we actually have something like Rl/C = Rs/r. Maybe I'm wrong by
> a constant factor, but I'm pretty sure there is no T in this equation
> because T has nothing to do with memory usage...
> 
> > 5. It appears that higher T, as required for Lyra's TMTO resistance,
> > would have an undesirable side-effect of reducing Lyra's memory usage.
> > In fact, from that same summary table the "Sequential (Default)" memory
> > usage for scrypt is O(Rs), but for Lyra it's only O(Rl*C), which means
> > O(Rs/T*C).  The area-time cost of attacks decreases accordingly.
> > I think this is not an acceptable tradeoff in KDF design; it's no good
> > to pay with so much lower memory usage merely for higher TMTO resistance.
> 
> I'm not sure I followed your reasoning correctly, but I can assure that
> this O(Rs/T*C)  is incorrect, since a higher T will always (1) leads to a
> higher memory usage in attacks

Always?  I think not.  There's no point in using more memory than the
defender's, so once T is high enough that an attacker finds it optimal
not to use TMTO, further T increases are of no help to the defender (on
the contrary, they help the attacker - see below).

Further, it might be the case that T=1 is optimal, in that it'd be
enough to make it unreasonable for likely attackers to use TMTO.  As I
mentioned above, you do have some TMTO discouragement even at T=1.

> and (2) does not influence the memory usage
> for legitimate users (only processing time).

Exactly, and that's the problem!

"Does not influence the memory usage" and increases "only processing
time" implies that it reduces memory usage at same processing time.

> An attacker can either store
> sponge states (which increase in number with T, until it reaches T*R) or

Why would any attacker, under any scenario want to store more than R
sponge states (including the overwritten ones)?  I think they would not,
and you confirmed that above in "does not influence the memory usage"
(even though you were referring to the defender's).

Or by "store" do you mean the number of store operations, rather than
the number of storage locations? 

> store nothing and simply recompute sponge states when they are required
> (which places T at the exponent,

Yes.

> leading to the Rl^{T+1} processing cost).

I'm not sure this exact formula is correct, because the Rows Loop
updates random rows, so some rows get updated more than T times (and
some fewer than T times, or none at all).  For the rows that get updated
more than T times, you have a higher exponent there, and they probably
dominate the total running time.  Thus, I think Rl^{T+1} is an
under-estimate.

Overall, I think the values of T recommended in your paper are too high.
I think potentially optimal values are in the range of
slightly-higher-than-0 (Bill chose 1% for NOELKDF, which means T=0.01,
albeit along with revised memory filling loop to have TMTO resistance
via it as well) to 3, and they don't even have to be whole numbers (if
you join the T and R loops into one, turning T into a multiplier for the
iteration count).  As Lyra is currently defined, T=1 may well be optimal.

> Well, the problem with insecure primitives is that they are (or should?)
> likely to be discontinued in future releases of crypto libraries. Hence,
> depending on them is not the best policy.

OK.  In scrypt's case, though, Salsa20/8 core generally has to be custom
code bundled with the scrypt implementation anyway, and SHA-256 is still
considered secure at this time, with no intent to discontinue it.

> In the article this is much
> clearer that in the presentation (I hope this removes the impression of
> FUD, especially because scrypt's security claims do not depend on
> Salsa20/8...)

OK.

> And we never claimed that SHA-1 is used in scrypt, but in PBKDF only. ;)

Oh, that's not how I read the slide.  "The hash function SHA-1 adopted
by the PBKDF2 algorithm and the hash function Salsa20/8 adopted by the
Scrypt algorithm have known vulnerabilities" reads as implying the use
of PBKDF2 in scrypt itself (because there is, in fact, such use).  Yes,
PBKDF2 is commonly used with SHA-1, but newer uses of PBKDF2 generally
use it along with SHA-256 or SHA-512.  I suggest that you split this
bullet point in two to avoid confusion, and clarify that PBKDF2 is not
defined as being based on SHA-1 (rather, PBKDF2 is defined as a
higher-level algorithm that can use any underlying crypto hash function).

I hope this helps, and thanks again for your comments on mine.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.