Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 7 Sep 2013 04:42:57 +0400
From: Solar Designer <>
Subject: Re: Parallella: Litecoin mining


On Fri, Sep 06, 2013 at 05:47:06AM +0100, Rafael Waldo Delgado Doblas wrote:
> Today I have implemented a new TMTO. Now the first "NO_TMTO_GROUPS"
> elements from V that the (position%TMTO_RATIO) is equal to (TMTO_RATIO - 1)
> will be stored, in addition of any element that (position%TMTO_RATIO) is
> equal to 0. With this TMTO implementation we can handle tmto ratios between
> 5 and 4. I have chosen (TMTO_RATIO - 1) because this elements should be
> harder to recompute than (TMTO_RATIO - 2),(TMTO_RATIO - 3)...(2),(1).
> The maximum value for NO_TMTO_GROUPS at the moment is 11.

It's very nice that you're experimenting with this and that you finally
have a better understanding of how flexible scrypt's TMTO is.

Your approach above might be suboptimal, though, and your rationale for
choosing (TMTO_RATIO - 1) is flawed.  Yes, those elements are harder to
recompute, but on the other hand e.g. (TMTO_RATIO - 2) would be
beneficial to have twice more often (when you need these exact elements
as well as when you need the next elements).  So, if you stick with this
general approach (which might as well be OK), the exact choice of which
additional elements to store (as room permits) is not as trivial, and
you may need to do some math (for the expected number of recomputations
with the different ones of the possible choices) and/or testing.

When I say that the approach "might be suboptimal", I mean that I'd
expect alternating 4's and 5's to be more optimal (in terms of the
number of recomputations) than all 5's with some of those 5's split in
two (by storing a mid-point, like you do).

As I had pointed out before, the first one of SMix's two loops, which
accesses the elements sequentially, does not even need to use the modulo
division - it could instead maintain an extra counter variable.  This
works great for whole numbers like 5, and it's also easy to implement
for specific easy numbers like 4.5 (simply alternate between 4 and 5 for
the wraparound point for that variable).  When you need to support an
arbitrary (compile-time configurable) number between 4 and 5, it becomes
trickier - but not much so.  What you'd have is analogous to drawing an
arbitrary diagonal line in 2D computer graphics - and there are
well-known efficient algorithms for doing that, which don't involve
integer division.  With SMix's second loop, things might be trickier.

I am not suggesting that you search for and implement the most optimal
solution to this now, though - you've already over-spent time on this
little project, and GSoC is coming to an end soon.  I merely point these
things out for your better understanding.

There's not that much difference between ~4.5 and 5 in terms of overall
performance, even if the ~4.5 is implemented optimally.  In fact, even
if we could do TMTO_RATIO = 4, that would be a 1.75x slowdown compared
to no TMTO, whereas TMTO_RATIO = 5 is a 2x slowdown.  Thus, TMTO_RATIO = 4
is only 12.5% faster, and 4.5 is somewhere inbetween (~6% faster than 5).

Also, beware of running onto the stack.  You don't get any warning when
that happens - you simply get incorrect computation result (maybe not
every time, even!) or a segfault (if you're lucky!)  Since stack usage
may vary from build to build (e.g., with different e-gcc versions and/or
options), we should not distribute a version that uses the absolute
maximum amount of memory it can, unless the stack problem is somehow
dealt with (e.g., a sufficiently exhaustive runtime self-test on every
program invocation, making sure that the very last element of V was made
use of in SMix's second loop, which does not always happen).

A self-test at program start is a good idea anyway.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ