john-dev - Re: Parallella: Litecoin mining

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130907001937.GB8393@openwall.com>
Date: Sat, 7 Sep 2013 04:19:37 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: Litecoin mining

Rafael,

On Fri, Sep 06, 2013 at 04:49:25PM +0100, Rafael Waldo Delgado Doblas wrote:
> 2013/9/6 Yaniv Sapir <yaniv@...pteva.com>
> 
> > There is now way to do that, since the instruction op-codes are either
> > 32-bit or 16-bit wide, and you have to leave a few bits for the code
> > itself...
> 
> Thank you for your answer, now this looks really clear.

To add to Yaniv's answer:

Although there's no way to fit a 32-bit immediate constant in an
instruction, you may quickly load a 32-bit value (including a constant
if you need) into a register with one instruction, or even two 32-bit
values (including constants if you need) into two registers with one
instruction, by using the memory load instructions (LDR, LDRD).  The
prerequisites for this are that you need to have the value(s) somewhere
in local memory, the address needs to be properly aligned (meaning 4- or
8-byte), and if you're loading two values at once, then they need to be
adjacent in memory and the target registers need to be adjacent and
"aligned" too.  Moreover, you need to have a close enough local memory
address already loaded into a register (so that the displacement
relative to that address will fit in the instruction).

BTW, it's the same with other typical RISC archs.  32-bit instruction
size is very common, so a 32-bit constant can be loaded with either two
instructions (each encoding a portion of the constant) or a memory load
instruction (hopefully, accessing an L1 cache if present), and often the
latter is faster (e.g., this is why JtR's arch.h sets MD5_IMM to 0 on
RISC archs - to use arrays with the constants instead of using immediate
values in the code).

> Maybe you can tell
> me why there is no way to use the registers sequentially in a loop.

What do you mean by that?

> I checked a couple of disassembled codes and all times that there is a
> sequential access to an array using a loop, the generated code has a lot of
> load and store instructions but the unrolled version only uses registers. I
> only feel curious about this.

Obviously, using the registers is faster, because they can be directly
encoded in instructions that perform computation.

Are you trying to ask why the non-unrolled loops are unable to index
registers as if they were arrays?  Unfortunately, the Epiphany
architecture - like most others - lacks indexed access to registers.
The arch reference almost implies that such a capability is present,
but as Yaniv pointed out in here before, this is not actually the case:

http://www.openwall.com/lists/john-dev/2013/07/25/27

So we live with the usual limitations that most other archs have.

> BTW I have run test and finally it's finds a share but unfortunately there
> something wrong because it was rejected, at least there wasn't a segfault.
> Rejected 00000000 Diff 0/63 EPI 0  (target-miss)

"Diff 0/63" means that the share had a difficulty of 0, whereas the
required minimum is currently 63.  I think this is why it was a
"target-miss".  The first number in "Diff 0/63" should be at least 63.
You need to find out why it's zero in your case, and correct the root
cause.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.