Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 20 Nov 2013 09:57:57 +0400
From: Solar Designer <>
Subject: Re: scrypt TMTO defeater


Attached is a tiny C program demonstrating a time-memory tradeoff still
practical with Anthony Ferrara's proposed change to scrypt.  The
tradeoff is a lot less beneficial to an attacker, if at all (and in
fewer cases) than that possible with the original scrypt.

The basic idea is trivial: instead of storing some of the intermediate
computation results, store the paths leading to them from other
intermediate results.  So we replace large values with smaller indices.

As implemented, the program uses two TMTO factors: 5 for SMix's first
loop (tunable) and 2 for SMix's second loop (embedded in the algorithm).
This results in an overall reduction of modified scrypt's memory usage
to 70% of full (as 20% by first loop and separate 50% by second loop),
plus a little bit of overhead for pointers, indices, and counters (2%
total at scrypt's typical r=8 and assuming 32-bit size of these pointer
and integer values - although they can as well be smaller).  The
performance hit is less than a factor of 2.  By increasing the TMTO
factor for the first loop, the total memory usage may be brought to just
a little over 50% of full - but the performance hit would be much higher.

If someone generalizes this algorithm to higher TMTO factors for the
second loop, I'd appreciate that.  I think this is possible, but it may
be more tricky and I don't know what the costs would be (I am curious).

Here's sample output of the program:

$ ./sim-tmto 
scrypt classic:
B' = 8c126669
t_cost = 512
m_cost = 256
scrypt TMTO = 5:
B' = 8c126669
t_cost = 1010
m_cost = 52
scrypt ircmaxell:
B' = 24efce90
t_cost = 512
m_cost = 256
scrypt ircmaxell TMTO1 = 5, TMTO2 = 2:
B' = 24efce90
t_cost = 978
m_cost = 180 elements + 256 indices (363 alloc + 256 ptrs) + 512 counters

Assuming element size 1024 bytes (which it is in scrypt with r=8) and
32-bit pointers and integers, and including even the memory allocation
fragmentation overhead (which is avoidable), we get:

(180*1024 + (363+256+512)*4) / (256*1024) = 72% memory usage
978/512 = 191% CPU time usage (counting only BlockMix invocations)

Overall area-time is increased by a factor of 1.376 - so not beneficial
for ASICs at all (but could be e.g. for GPUs).  In the case of original
scrypt, overall area-time is instead decreased by up to a constant
factor (as discussed in here earlier), so benefits ASICs.

Indeed, this is not a proof that there's no more efficient TMTO for this
modified scrypt.  There might be.


P.S. Quoted below is a previous message I had posted to this thread,
describing other TMTO attacks on the same modified scrypt.  Those were
less efficient, and I never tried implementing them.

On Mon, Mar 18, 2013 at 02:28:02AM +0400, Solar Designer wrote:
> Here's another attack on this kind of TMTO defeater:
> On Mon, Sep 10, 2012 at 06:27:44AM +0400, Solar Designer wrote:
> > For context, here's Anthony Ferrara's comment that I am referring to:
> > 
> >
> > 
> > Here's an attack on Anthony's tradeoff defeater ("simply writing to V_j
> > in the second loop"):
> > 
> > Assume that we don't want to use the tradeoff to its fullest, but rather
> > just halve our memory needs.  Without the tradeoff defeater, this is
> > accomplished by storing every other V_j in an array that is twice
> > smaller than original.  The odd-indexed V_j's are then recomputed as
> > needed from the preceding values that we did store.
> > 
> > Now, with the tradeoff defeater we can't recompute these values like
> > that because the preceding value might have been modified by prior
> > iterations of the second loop.  Or can we?  Actually, quite often we
> > can: by the time the second loop terminates, it will have modified only
> > about 63% of V elements.  And only 39% when we're at N/2 iterations.
> > 
> > Of course, "quite often" is not sufficient for proper computation of the
> > entire scrypt, so we do need to store the entire V array (not just its
> > half) somewhere.
> ... and then I went on to describe the hot/cold approach:
> However, since we only write into about 63% of V elements, we only have
> to store this many.  We may index them via a hash table (yes, this means
> we'll need some more memory) and we'll also need to store some elements
> of the original V array (as computed by SMix's first loop) - e.g., every
> 8th element.  This way, we'll reduce our memory needs to about 76% of
> original (not counting memory consumed by the hash table, which might
> not be much) and pay for it by doing 8x more computation in SMix's
> second loop, or 4.5x more computation total (not including hash table
> lookups overhead).  In some cases, this tradeoff may be worthwhile (when
> the attacker is constrained to a device with a disproportionate amount
> of computing power for the device's memory size and/or bandwidth).
> Additionally, if the attacker runs multiple such instances in parallel
> and (de)synchronizes them in a smart way, their average memory usage per
> instance will be less than the 76% figure.  If we consider SMix's second
> loop only, it may be about 52% of original.  If we consider both loops
> and assume they take equal time to run, it may be 32% of original.  So
> the attacker would reduce memory needs by a factor of 3 with only a 4.5x
> increase in computation.  This may be worthwhile on even more devices
> than the example above.
> The (de)synchronization trick is also applicable to the original scrypt
> (although it makes less of a difference there), where SMix's first loop
> does not need all of the memory at once.  Only SMix's second loop was
> included in Colin's attack cost estimates, but for practical purposes I
> like to consider both loops.
> Alexander

View attachment "sim-tmto.c" of type "text/x-c" (3267 bytes)

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ