Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 25 Aug 2013 03:36:06 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: Litecoin mining

Rafael,

On Fri, Aug 23, 2013 at 02:31:46AM +0100, Rafael Waldo Delgado Doblas wrote:
> // This function approximation works fine up to a = 32771
> #define DIVTMTO(a) ((10923 * (a))>>16) // If TMTO_RATIO changes you need
> redefine this macro

Good.

Note that you don't have to perform the division, not even in this
optimized fashion, in the first one of SMix's two loops, because it
accesses the elements sequentially.  You may instead introduce an extra
loop counter variable, which you'd reset back to zero when it hits
TMTO_RATIO.  That said, on Epiphany the multiplication might be free,
because IMUL is an FPU instruction, and the FPU is idle most of the time.
So it is unclear which approach to handling this in the first loop is
faster.

> #define DIV2(a) ((a)>>1)
> #define MOD2(a) ((a) - (DIV2(a) << 1)) // This can be optimised in ASM
> using carry
> 
> #define DIV8(a) ((a)>>3)
> #define MOD8(a) ((a) - (DIV8(a) << 3)) // This can be optimised in ASM
> using carry

These are ridiculous.  MOD2 is simply "& 1", and MOD8 is "& 7".  I'm
sure the compiler already performed those optimizations for "% 8",
although I don't mind being explicit with "& 7".

> The performance still the same but now I drop almost 1K.

Good.

> I'm going to check the segfault.

Any luck?  How are you debugging it?

On Sat, Aug 24, 2013 at 01:33:12AM +0100, Rafael Waldo Delgado Doblas wrote:
> It means the core memory is used only up to the address 0000167F. That
> means that I have around 27KB free. I guess that I can run TMTO 5 now or at
> least I'm close.

I took a look at your committed code - it tries to use TMTO 5, but it
just gets stuck somewhere.  So I've just spent an hour playing around
with it, optimizing its memory usage.  Please see the attached patch.
With this patch, the code + read-only data size is reduced by about 1700
bytes, and it pretends to work, but when I enable the debugging output
in driver-epiphany.c, the hashes computed on ARM and Epiphany don't
match.  Moreover, they don't match even if I reduce TMTO to 6 (and
adjust DIVTMTO accordingly).  My guess is that you had introduced some
bug, so I am leaving it up to you to debug it. ;-)  It is, of course,
also possible that the bug is in my patch.

Please note that when there's little memory free, the stack might be
overwriting other data.  This is why I tried TMTO 6 (but it didn't help).
I suggest that you debug this at TMTO 6 (or higher) initially, and only
when you get that working, proceed to set TMTO 5 (now that the reduced
code size permits for that).

BTW, it'd be nice if you introduce a way to easily enable/disable the
debugging output in driver-epiphany.c, e.g. via #define DEBUG_EPI.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.