Date: Sat, 28 Sep 2013 09:30:24 +0200 From: Katja Malvoni <kmalvoni@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Katja's weekly report #15 Hi Alexander, I'm sorry about delay, I moved to a new place and I had some problems with internet connection, now it's all sorted out. On Thu, Sep 26, 2013 at 5:01 AM, Solar Designer <solar@...nwall.com> wrote: > > Accomplishments: > > 1. Updated wiki page > > Thanks! As I had mentioned, we/you need to get the page at > http://openwall.info/wiki/john/development/Parallella linked from some > other wiki page(s), such as from john/development or/and from john. > I added link on john/development > > > 2. Fixed bug so that bcrypt on FPGA doesn't fail self test on first run > > Great. What was the bug? > I should have said this differently - when I started using true dual-port RAM for storing Sbox bug disappeared, I don't know what exactly it was and I made changes to a big portion of code so I can't point to specific part of code. > > 3. Partially optimized bcrypt on FPGA > > - using true dual port RAM for Sbox with two cycle latency. In > > simulation I have it with 1 cycle latency, 3 cycles per BF_ROUND and > > 1709766 cycles in total but it doesn't work on ZedBoard. > > 3 cycles per BF_ROUND sounds just right to me. I assume it's one cycle > to fetch first two S-box elements, another cycle to fetch the other two, > and a third cycle to process these fetched values and compute the next > set of S-box indices, for the next round. Correct? > That is correct. > Can you perhaps reduce this further, to two cycles per Blowfish round > (for most rounds), by fetching the next round's first two S-box elements > during the current round's "computation" cycle? I think I can, I stopped working on optimizing it further when I noticed I can't get current code working on ZedBoard. > [...] > > Does the above sound right to you? > It does. The only thing which worries me a bit is adding more bcrypt cores. At the moment I have two ideas. First one is to connect all additional cores to the same AXI bus and than use software registers to synchronise reading and writing. I think that this approach could have large communication overhead. Instead of software registers, additional logic can be used to distribute data to cores and start computation. I will probably try both approaches and see which one has better performance. Second idea is to create one shared BRAM per core but I think I can't do that without creating one DMA per core and a few AXI buses. This approach would waste too many resources. > > 3. Replace mmap() calls in BF_fpga.c with proper drivers > > What would those proper drivers be? UIO, as I mentioned here? - > > http://www.openwall.com/lists/john-dev/2013/06/04/2 > My idea was to follow example from Xilinx - http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_4/ug873-zynq-ctt.pdfChapter 9. In this manual they are using modules. Katja Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.