Date: Thu, 29 Aug 2013 06:06:03 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: Litecoin mining On Thu, Aug 29, 2013 at 02:55:25AM +0100, Rafael Waldo Delgado Doblas wrote: > Maybe is because the memcpy it's stored in the external memory Oh, of course. > > > I'm checking the bug in the host code > > > > Do you mean the occasional segfault? How are you debugging it, exactly? > > (I asked you this question a few days ago, but you did not answer.) > > Sorry I didn't see it. Yes I mean the occasional segfault and yes it was > produced when a share is find, I think that I fixed it, but I'm testing it > right now. Can you briefly describe how you debug(ged) it? > I was thinking in replace the rotates by IMADD and LSR as you said > before,using inline asm. OK, that's a good start. Please give this a try. I'd expect this change to result in slight code size reduction (each rotate will be 2 instructions instead of 3... unless an instruction is wasted on a MOV or similar), but as to speed I have no specific expectations, because with just one rotate operation (two instructions) implemented in inline asm, performance will depend heavily on whether IMADD's result is attempted to be used by subsequent instructions 4+ cycles later, or sooner than that. Unfortunately, gcc inline asm does not let us specify that an output register is better not accessed for a certain number of cycles. gcc has this sort of info for instructions that it generates itself, but a piece of inline asm is opaque to it (so it won't notice the IMADD in there, even if otherwise it is aware that an IMADD has 4-cycle latency). This is a reason why, after this experiment, we'll likely need to convert more code to asm (perhaps the Salsa20/8 core). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.