Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 17 Mar 2013 08:03:21 +0400
From: Solar Designer <>
Subject: scrypt TMTO defeater

Colin, Anthony, all -

The message I am following up on is the one where I described an attack
on Anthony's scrypt time-memory tradeoff defeater:

I'll skip most of its content now, and only comment on my attempt to
actually implement a time-memory tradeoff defeater of this sort (despite
of the possible attack).

On Mon, Sep 10, 2012 at 06:27:44AM +0400, Solar Designer wrote:
> For context, here's Anthony Ferrara's comment that I am referring to:
> Here's an attack on Anthony's tradeoff defeater ("simply writing to V_j
> in the second loop"):

[... lots of text skipped ...]

> Does this mean that Anthony's tradeoff defeater is not worthwhile?  Not
> quite.  The cold memory is nevertheless accessed pretty often - just not
> as often as the hot portion.  So there is some reduction in attack speed
> due to the defeater even in this special case.  Then, if we try to
> reduce our hot memory needs by more than a factor of 2, the corresponding
> revisions of the attack described above become a lot less efficient
> (mostly in terms of frequency of accesses to the cold portion).
> Yet we could enhance the tradeoff defeater to make the attack less
> efficient: write into more than one V element per loop iteration and/or
> write into such V elements that more of them are written to (perhaps
> 100% instead of just 63% by loop end).

So I've tried to implement an enhancement like what I described above:
writing not only to V_j, but also to V_i (with another value).
Unfortunately, it turns out that BlockMix's shuffling(*) gets in the way
of efficient implementations, and _possibly_ the inefficiency mostly hurts
cleaner CPU code, as opposed to hacks and ASICs.  Specifically, when the
V element we write into is or might be the same as V_j, in a clean
implementation we have to postpone those writes until the BlockMix
completes, and this means re-reading the same values from memory
(hopefully, from L1 cache) rather than simply storing register values
into two locations at once (such as X and V_j).  In order to do the
latter without introducing extra checks, I had to limit the writes to
elements that are definitely not same as V_j; I chose (~j & (N - 1)) for
their indices.

(*) which I am still looking for an explanation of:

I've attached a diff between two of my code revisions.  escrypt-23
implements the official scrypt, with only some optimizations to
crypto_scrypt-sse.c.  escrypt-24 renames the function to escrypt and
adds a Boolean defeat_tmto parameter.  When it is zero, the function
returns the same values as the official scrypt does.  When it is
non-zero, the trivial TMTO defeater as described above is enabled.
It is implemented in all three versions of the code (-ref, -nosse,
-sse), although only the -sse revision is (partially) optimized (reuses
register values instead of copying memory).  (Checking defeat_tmto
outside of the loop and then using one of two specialized loop versions
is an obvious further optimization.)

I may consider adding a TMTO defeater by means of changes to the first
loop as well - an approach needed for my "ROM-port hard functions" idea
anyway, because the ROM can't be written to in the second loop.

I'd appreciate any comments.

For testing:
MD5 of output of "tests" with defeat_tmto = 0:
MD5 of output of "tests" with defeat_tmto = 1:


View attachment "escrypt-defeat-tmto.diff" of type "text/plain" (12660 bytes)

Download attachment "escrypt-0.0.23.tar.gz" of type "application/x-gzip" (10127 bytes)

Download attachment "escrypt-0.0.24.tar.gz" of type "application/x-gzip" (10329 bytes)

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ