Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 22 Jul 2013 00:04:33 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

On Sun, Jul 21, 2013 at 07:21:27PM +0200, Katja Malvoni wrote:
> In internal.ldf all code is in local memory, in fast.ldf user code and
> stack are placed in local memory while standard libraries are in external
> SRAM.

I think you mean external DRAM.

OK, this explains it.  I did not realize that memcpy() was not in local
memory.

> > So I don't buy it when you say that BF_ROUND and BF_encrypt are already
> > optimal.
[...]

> I agree now, I was looking at whole thing from completely wrong
> perspective. I didn't see the fact that using IADD doesn't mean it will be
> faster than ADD but it means IALU will be free for other instructions.

I took another look at sections "7.9 Pipeline Description" and "7.10
Dual-Issue Scheduling Rules" in epiphany_arch_reference_3.12.12.18.pdf,
and from those it appears that we'll need to interleave 2+ instances of
bcrypt if we're to use IADD and possibly IMADD efficiently, because FPU
instructions take 3 cycles more to complete than IALU ones do.  (This is
extra latency only; it does not mean that these instructions are any
slower in terms of throughput - it's 1 instruction per cycle in terms
of throughput for IALU and FPU, so 2 per cycle total.)

When we don't use the FPU at all, we're only capable of issuing one
instruction per cycle.  So using IADD and possibly IMADD is our only
hope at dual-issue.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.