Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 4 Sep 2013 04:54:55 +0100
From: Rafael Waldo Delgado Doblas <lord.rafa@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: Litecoin mining

Hello Alexander,

2013/9/4 Rafael Waldo Delgado Doblas <lord.rafa@...il.com>

> In addition as you asked this is the work that I perform today:
>
> I implemented a couple salsa20_8 asm versions:
> The first one with bucles "Bor[i] = Bout[i] = (B[i] ^ Bx[i]);" and
> "Bout[i] += Bor[i];" rolled and using the instruction imadd, it save about
> 250B but the performance drops almost 0.5khash/s
> The second keep unrolled the bucles and uses the instruction imadd, it
> save only 50B but the performance also drops almost 0.5khash/s.
>
> At this point looks like imadd instrucction it so slow to be used but roll
> the bucle could be nice.
>

BTW I was debugging with e-gdb and I found that even If I use the imadd
instruction the bynary image uses fmadd, this is not nice at all because
use floating point math in this real scenario:

r44            0x80     128    1.79366203e-43
r61            0x1e     30    4.20389539e-44
r60            0x0      0    0

fmadd r60,r61,r44

Give the next erroneous result:

r44            0x80     128    1.79366203e-43
r61            0x1e     30    4.20389539e-44
r60            0x0      0    0

Instead of the correct one:

r44            0x80     128    1.79366203e-43
r61            0x1e     30    4.20389539e-44
r60            0xF00      3840    ???????

There any way to use integer math? because if not there no way to get an
improvement with this instruction.

Regards,
Rafael.

[ CONTENT OF TYPE text/html SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ