Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 28 Sep 2013 09:30:24 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Katja's weekly report #15

Hi Alexander,

I'm sorry about delay, I moved to a new place and I had some problems with
internet connection, now it's all sorted out.

On Thu, Sep 26, 2013 at 5:01 AM, Solar Designer <solar@...nwall.com> wrote:

> > Accomplishments:
> > 1. Updated wiki page
>
> Thanks!  As I had mentioned, we/you need to get the page at
> http://openwall.info/wiki/john/development/Parallella linked from some
> other wiki page(s), such as from john/development or/and from john.
>

I added link on john/development


>
> > 2. Fixed bug so that bcrypt on FPGA doesn't fail self test on first run
>
> Great.  What was the bug?
>

I should have said this differently - when I started using true dual-port
RAM for storing Sbox bug disappeared, I don't know what exactly it was and
I made changes to a big portion of code so I can't point to specific part
of code.


> > 3. Partially optimized bcrypt on FPGA
> >        - using true dual port RAM for Sbox with two cycle latency. In
> > simulation I have it with 1 cycle latency, 3 cycles per BF_ROUND and
> > 1709766 cycles in total but it doesn't work on ZedBoard.
>
> 3 cycles per BF_ROUND sounds just right to me.  I assume it's one cycle
> to fetch first two S-box elements, another cycle to fetch the other two,
> and a third cycle to process these fetched values and compute the next
> set of S-box indices, for the next round.  Correct?
>

That is correct.


> Can you perhaps reduce this further, to two cycles per Blowfish round
> (for most rounds), by fetching the next round's first two S-box elements
> during the current round's "computation" cycle?


I think I can, I stopped working on optimizing it further when I noticed I
can't get current code working on ZedBoard.


> [...]
>
> Does the above sound right to you?
>

It does. The only thing which worries me a bit is adding more bcrypt cores.
At the moment I have two ideas. First one is to connect all additional
cores to the same AXI bus and than use software registers to synchronise
reading and writing. I think that this approach could have large
communication overhead. Instead of software registers, additional logic can
be used to distribute data to cores and start computation. I will probably
try both approaches and see which one has better performance. Second idea
is to create one shared BRAM per core but I think I can't do that without
creating one DMA per core and a few AXI buses. This approach would waste
too many resources.


> > 3. Replace mmap() calls in BF_fpga.c with proper drivers
>
> What would those proper drivers be?  UIO, as I mentioned here? -
>
> http://www.openwall.com/lists/john-dev/2013/06/04/2
>

My idea was to follow example from Xilinx -
http://www.xilinx.com/support/documentation/sw_manuals/xilinx14_4/ug873-zynq-ctt.pdfChapter
9. In this manual they are using modules.

Katja

[ CONTENT OF TYPE text/html SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ