Date: Fri, 17 May 2013 23:15:00 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Hi Katja, Thank you for bringing this topic up. On Fri, May 17, 2013 at 06:44:53PM +0200, Katja Malvoni wrote: > I read the Epiphany documentation and went trough bcrypt implementation. > And now I am thinking about possible approach. At the first sight, it seems > that having one bcrypt instance per a core could work. Since each core has > 32 KB of memory divided into 4 memory banks, memory isn't a problem. Correct. We should implement one or maybe two bcrypt instances per core. (We might need two to hide instruction latencies yet fully use the dual-issue capability, although I think that for the current Epiphany chips one instance will be enough for that.) Some other data for bcrypt (such as the tiny P-box) and code will also occupy a portion of the 32 KB local memory. Overall, though, 32 KB is more than enough, and much of this memory will likely remain unused even with the instances of Blowfish fully unrolled (for their 16 rounds). > What > worries me is how big impact will have overhead of writing S-boxes in every > core local memory bcrypt is slow to compute. We're talking speeds on the order of 100 per second per core, or less (depending on bcrypt's cost setting, which is configurable per hash). Thus, the overhead and delay needed to transfer the 4 KB of data to/from Epiphany cores is likely small, even if you do it sequentially for the 16 or 64 cores and don't start computation until all of these transfers have completed. (As a further optimization, you may implement asynchronous transfers and keep the cores busy computing on previously transferred data while new transfers are in progress, but I don't expect this to matter much. My gut feeling is that it might provide a speedup of a few percent when bcrypt is used with relatively low cost settings, and less than that with high cost settings.) This is assuming that you only implement bcrypt's most costly loop on Epiphany, leaving the rest for host CPU. If you implement full bcrypt on Epiphany, the data transfer sizes would be even smaller (about 100 bytes in each direction per bcrypt computation), and you could keep the initial values for the S-boxes in each core's local memory (there's enough of it to hold an extra 4 KB). > and will those 16 instances running on 1 GHz processors > be enough to still provide speedup when compared to CPU. We don't expect the current Epiphany chips to outperform current desktop CPUs in terms of raw speed at bcrypt. At best, we'll reach similar performance, and it will likely take the 64-core Epiphany to do this. However, our experiments with this may help shape up future generations of Epiphany, including specifically for symmetric crypto use. Besides, future Epiphany chips may have more cores too (such as 1024 "soon" and 4096 eventually). Also, clusters of Parallella boards may be easier and cheaper to build (once the boards are easily available in sufficient quantities) than clusters of traditional computers, and they will almost certainly be more energy-efficient. > In worst case it would take 16*512 cycles just to transfer S-boxes. This may seem like a lot, but for bcrypt it is not. At 8k cycles, you'll be able to do this 100k times per second. Given that you might only be able to compute about 100 bcrypts per second per core, this translates to like a 0.1% overhead, which is acceptable. > What is the overhead in case that S-boxes are hard coded in the device > code? What do you mean by this? > Should I try this approach of one instance per a core Yes, please. > And am I missing something in my approach? You seem to be focusing on the host / Epiphany communication overhead, but for bcrypt it's likely not where most room for optimization is. (It would be for other hash types - for faster to compute ones.) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.