Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 23 Jan 2014 20:37:04 -0200 (BRDT)
From: "Marcos Simplicio" <mjunior@...c.usp.br>
To: crypt-dev@...ts.openwall.com
Cc: "Leonardo Almeida" <leocalm@...il.com>,
 "eandrade@...c.usp.br" <eandrade@...c.usp.br>
Subject: Re: lyra

Hi, Alexander.

> Hi Marcos,
>
> Thank you for your comments, and I'm sorry for my delayed response.

My thanks to you too. It is always good to have feedback and new inputs on
our algorithms. :)

BTW, I'm currently on vacation and my wife is insisting in keeping me away
from the Internet, so, whenever she succeeds, I'm likely to be late in my
responses too...

> Please see inline:
>
> On Tue, Jan 14, 2014 at 12:49:39PM -0200, Marcos Simplicio wrote:
>> I saw the discussion below and thought I could offer some pointers on
>> Lyra's design (please see http://eprint.iacr.org/2014/030 or the
>> official
>> publication at http://link.springer.com/journal/13389 for all details).
>
> While I wrote the message you're replying to based solely on your
> presentation slides (not the paper), I've since found the paper as well,
> and added some comments/corrections here:
>
> http://www.openwall.com/lists/crypt-dev/2014/01/13/1
>
>> Please let us know if you find any flaw in our security claims! :-)
>
> The lack of normalization to same running time in your comparisons
> against scrypt is a flaw, I think.  However:

Same running time considering that we use the same underlying hash (as
below), right? We will try to take this into account on our new version of
Lyra (ongoing work...)

>
>> > 2. The claimed higher than scrypt's memory usage for same running time
>> > would be very nice.  Whether it is actually achieved, I am not sure.
>> > Where would it come from?  scrypt spends half its time on filling
>> > memory, and it uses Salsa20/8 for that.  Is it claimed that BLAKE2 is
>> > somehow a lot faster than Salsa20/8?  I'd expect these to be roughly
>> > similar speed.  Maybe the speedup the authors observed was due to
>> > implementation detail (comparison against less-than-optimal scrypt
>> > implementation and build) rather than due to the design of Lyra.
>>
>> The reason is that we adopt an Alred-like approach, using 1-round
>> Blake2b
>> instead of 10-round Blake2b. Since we are interested in diffusion rather
>> than cryptographic resistance to inversion or collisions, this seems to
>> be
>> a good approach for KDFs. After all, the attacker has no control over
>> the
>> inputs provided to the hash... We are open to other people's opinions on
>> the matter, though :)
>
> This explains it, and the approach makes sense to me.  We're also likely
> to move to something quicker than Salsa20/8 in escrypt.
>
> Unfortunately, this makes normalization to same running time ambiguous,
> or you may do it in two different ways: for actual performance achieved
> on a real-world system and thus considering the move from Salsa20/8 to
> 1-round Blake2b, and separately for a theoretical system that computes
> either of these primitives with the same latency.  The latter comparison
> is relevant in showing the advantage (or lack thereof) of your other
> changes, relative to scrypt.  I am mostly interested in this theoretical
> comparison, because it'd reflect the less trivial ways in which Lyra
> differs from scrypt.

In the article we went with the first approach, although indeed the lack
of SSE2 optimizations lead to a less practical comparison. We are now more
focused on implementation issues, so I hope this gets solved soon.

A theoretical comparison would probably be focused on the Wandering's
phase read+write approach, which increases resistance to TMTO. However,
only combined with the fast computation of hashes that it makes sense,
since then one can (1) raise T to a achieve high TMTO resistance while not
spending too much processing time, or (2) with a reasonably small T,
achieve a reasonably good TMTO resistance and low processing time. In
scrypt, if we increase the size of the loop in which random memory
positions are read, this does not lead to an increased TMTO.


>
> As to "the attacker has no control over the inputs" and not needing
> "cryptographic resistance to inversion or collisions", these statements
> are only partially valid (in some contexts and to some extent).  The
> attacker definitely does have control over new inputs when attempting a
> collision against previously computed hashes, and such collisions, if
> possible for the entire KDF, would ease practically relevant attacks.
> Then, short cycles are undesirable (need to be very unlikely), since
> when they do occur the attacker is able to compute the KDF quicker, and
> this is related to having some collision resistance (just not to the
> same extent normally expected from full-round-count crypto hashes).

OK, the attacker can control the password (and possibly salt) passed to
the sponge at the beginning of the Setup phase, but the control ends
there: after that input is processed with a full-round Blake2b, the
sponge's internal state (normally 256-512 bits) is highly random unless
Blake2b itself has security vulnerabilities and, thus, cannot be modeled
as a random oracle; similarly, the memory matrix's cells computed during
Setup are determined by that password and cannot be changed later by the
attacker (well, unless he/she is not interested in getting the correct
derived key...)

The point here is that, if the attacker could change these values at will,
then creating collisions should not be so hard, as 1-round Blake2b is not
expected to be collision-resistant, but creating such collisions by
changing rows or the sponge state is useless for attacks against KDFs (at
least as far as I can tell).

If collisions do occur by chance, then I agree that the whole KDF can be
computed much faster than expected. That is exactly reason why we believe
a reduced-round hash is a good primitive to work with: after several
rounds, these collisions come basically from the birthday paradox and can,
thus, be avoided with a reasonable choice of the state lengths.

>
>> > 3. The claimed TMTO resistance would also be very nice, although it's
>> > arguably even nicer when it's optional (like we have in the current
>> > development escrypt tree), because there are some defender's uses for
>> > scrypt's (deliberate) TMTO-friendliness too.
>>
>> I'm not sure whether this solves your point or not, but the TMTO
>> resistance is configurable/optional: T is fully configurable; if you
>> want
>> the same memory usage with a lower T, but wants to raise the processing
>> time, just use 2-round Blake2b (or higher).
>
> You have writes into M in the Wandering phase, regardless of the value
> of T (although with a lower T there are fewer such writes).  These
> writes introduce some TMTO unfriendliness as compared to scrypt, even at
> T=1.  This is both good (when TMTO unfriendliness is desired) and bad
> (because it is not optional and not always desired).

Hum... you make me think of a scenario in which the memory usage is
already so small that the attacker is unlikely to need to explore TMTO. Is
that what you mean? Are there other scenarios? Seriously, I have not
thought of that until I read your phrase :)

>
>> > 4. The summary table on slide 28/36 (page 31 in the PDF) gives
>> different
>> > meaning to R for scrypt vs. Lyra.  To compare the "Intermediate
>> states"
>> > and "Memory-free" times for scrypt vs. Lyra, they first need to be
>> > normalized for same "Sequential (Default)" running time.  Let's use Rs
>> > to refer to scrypt's R, and Rl to Lyra's.  We have Rl=Rs/T, so that
>> > O(Rl*T) would match scrypt's O(Rs) (normalization for same running
>> time).
>> > For memory-free running times, this gives us O(Rs^2) for scrypt and
>> > O((Rs/T)^(T+1)) for Lyra.  This is still reasonably good for Lyra, but
>> > not as good as the slide makes it appear at first.
>>
>> Normalization is indeed a complicating factor when comparing KDFs, and
>> we
>> are working on make ours clearer. At first sight (but I haven't check
>> all
>> details), we actually have something like Rl/C = Rs/r. Maybe I'm wrong
>> by
>> a constant factor, but I'm pretty sure there is no T in this equation
>> because T has nothing to do with memory usage...
>>
>> > 5. It appears that higher T, as required for Lyra's TMTO resistance,
>> > would have an undesirable side-effect of reducing Lyra's memory usage.
>> > In fact, from that same summary table the "Sequential (Default)"
>> memory
>> > usage for scrypt is O(Rs), but for Lyra it's only O(Rl*C), which means
>> > O(Rs/T*C).  The area-time cost of attacks decreases accordingly.
>> > I think this is not an acceptable tradeoff in KDF design; it's no good
>> > to pay with so much lower memory usage merely for higher TMTO
>> resistance.
>>
>> I'm not sure I followed your reasoning correctly, but I can assure that
>> this O(Rs/T*C)  is incorrect, since a higher T will always (1) leads to
>> a
>> higher memory usage in attacks
>
> Always?  I think not.  There's no point in using more memory than the
> defender's, so once T is high enough that an attacker finds it optimal
> not to use TMTO, further T increases are of no help to the defender (on
> the contrary, they help the attacker - see below).
>
> Further, it might be the case that T=1 is optimal, in that it'd be
> enough to make it unreasonable for likely attackers to use TMTO.  As I
> mentioned above, you do have some TMTO discouragement even at T=1.

OK, I think I need some rephrasing here: a higher T will always lead to a
less attractive TMTO. Of course, attacks that do not explore TMTO will not
use more memory if one raises T (it will only affect processing time). If
the user thinks O(N^2) or O(N^3) is enough, then indeed T=1 or T=2 should
be enough. This makes me think of the "optional TMTO resistance" discussed
above.


>
>> and (2) does not influence the memory usage
>> for legitimate users (only processing time).
>
> Exactly, and that's the problem!
>
> "Does not influence the memory usage" and increases "only processing
> time" implies that it reduces memory usage at same processing time.
>

I fully agree with the implication. On the other hand, it appears to me
that in a normalization the value of T would appear in the denominator
only if Lyra was slower than scrypt: if we fix the processing time, than
we get the same memory usage with scrypt and Lyra with some T > 1. In
other words, fixing T=2 would still lead to higher memory usage, not to a
lower memory usage as implied by the "O(Rs/T*C)" normalization.

>> An attacker can either store
>> sponge states (which increase in number with T, until it reaches T*R) or
>
> Why would any attacker, under any scenario want to store more than R
> sponge states (including the overwritten ones)?  I think they would not,
> and you confirmed that above in "does not influence the memory usage"
> (even though you were referring to the defender's).
>

Because sponge states are usually smaller than rows of the memory matrix.
In our tests, for example, a row takes 64*512 bits (i.e., C*bitrate),
while a state takes 1024 bits (the sponge's width). If all sponge states
are stored, then the algorithm can be run with a less unfriendly TMTO.

Also, the "does not influence the memory usage" I mentioned refers to a
legitimate user, not to attackers. The higher T, the higher will be the
memory usage of attacks for achieving a same processing time (this is a
direct result of TMTO unfriendliness). Or are you proposing a
normalization both concerning attackers and defenders?

> Or by "store" do you mean the number of store operations, rather than
> the number of storage locations?

Nope. I mean "number of storage locations" (i.e., memory usage). See above
the reason.

>
>> store nothing and simply recompute sponge states when they are required
>> (which places T at the exponent,
>
> Yes.
>
>> leading to the Rl^{T+1} processing cost).
>
> I'm not sure this exact formula is correct, because the Rows Loop
> updates random rows, so some rows get updated more than T times (and
> some fewer than T times, or none at all).  For the rows that get updated
> more than T times, you have a higher exponent there, and they probably
> dominate the total running time.  Thus, I think Rl^{T+1} is an
> under-estimate.
>
> Overall, I think the values of T recommended in your paper are too high.
> I think potentially optimal values are in the range of
> slightly-higher-than-0 (Bill chose 1% for NOELKDF, which means T=0.01,
> albeit along with revised memory filling loop to have TMTO resistance
> via it as well) to 3, and they don't even have to be whole numbers (if
> you join the T and R loops into one, turning T into a multiplier for the
> iteration count).  As Lyra is currently defined, T=1 may well be optimal.
>

I fully agree that our analysis is quite conservative, which is good from
a security perspective but definitely not so much for performance. Your
T*R approach is indeed interesting and just a small tweak: we decided not
to provide it in the algorithm just because it would make the analysis
more cumbersome, but maybe we were too picky...

>> Well, the problem with insecure primitives is that they are (or should?)
>> likely to be discontinued in future releases of crypto libraries. Hence,
>> depending on them is not the best policy.
>
> OK.  In scrypt's case, though, Salsa20/8 core generally has to be custom
> code bundled with the scrypt implementation anyway, and SHA-256 is still
> considered secure at this time, with no intent to discontinue it.
>
>> In the article this is much
>> clearer that in the presentation (I hope this removes the impression of
>> FUD, especially because scrypt's security claims do not depend on
>> Salsa20/8...)
>
> OK.
>
>> And we never claimed that SHA-1 is used in scrypt, but in PBKDF only. ;)
>
> Oh, that's not how I read the slide.  "The hash function SHA-1 adopted
> by the PBKDF2 algorithm and the hash function Salsa20/8 adopted by the
> Scrypt algorithm have known vulnerabilities" reads as implying the use
> of PBKDF2 in scrypt itself (because there is, in fact, such use).  Yes,
> PBKDF2 is commonly used with SHA-1, but newer uses of PBKDF2 generally
> use it along with SHA-256 or SHA-512.  I suggest that you split this
> bullet point in two to avoid confusion, and clarify that PBKDF2 is not
> defined as being based on SHA-1 (rather, PBKDF2 is defined as a
> higher-level algorithm that can use any underlying crypto hash function).

OK, now I see that it does open way for misinterpretations... The
correction was very welcome :)

>
> I hope this helps, and thanks again for your comments on mine.
>
> Alexander
>


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ