Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 28 Sep 2013 09:30:24 +0200
From: Katja Malvoni <>
Subject: Re: Katja's weekly report #15

Hi Alexander,

I'm sorry about delay, I moved to a new place and I had some problems with
internet connection, now it's all sorted out.

On Thu, Sep 26, 2013 at 5:01 AM, Solar Designer <> wrote:

> > Accomplishments:
> > 1. Updated wiki page
> Thanks!  As I had mentioned, we/you need to get the page at
> linked from some
> other wiki page(s), such as from john/development or/and from john.

I added link on john/development

> > 2. Fixed bug so that bcrypt on FPGA doesn't fail self test on first run
> Great.  What was the bug?

I should have said this differently - when I started using true dual-port
RAM for storing Sbox bug disappeared, I don't know what exactly it was and
I made changes to a big portion of code so I can't point to specific part
of code.

> > 3. Partially optimized bcrypt on FPGA
> >        - using true dual port RAM for Sbox with two cycle latency. In
> > simulation I have it with 1 cycle latency, 3 cycles per BF_ROUND and
> > 1709766 cycles in total but it doesn't work on ZedBoard.
> 3 cycles per BF_ROUND sounds just right to me.  I assume it's one cycle
> to fetch first two S-box elements, another cycle to fetch the other two,
> and a third cycle to process these fetched values and compute the next
> set of S-box indices, for the next round.  Correct?

That is correct.

> Can you perhaps reduce this further, to two cycles per Blowfish round
> (for most rounds), by fetching the next round's first two S-box elements
> during the current round's "computation" cycle?

I think I can, I stopped working on optimizing it further when I noticed I
can't get current code working on ZedBoard.

> [...]
> Does the above sound right to you?

It does. The only thing which worries me a bit is adding more bcrypt cores.
At the moment I have two ideas. First one is to connect all additional
cores to the same AXI bus and than use software registers to synchronise
reading and writing. I think that this approach could have large
communication overhead. Instead of software registers, additional logic can
be used to distribute data to cores and start computation. I will probably
try both approaches and see which one has better performance. Second idea
is to create one shared BRAM per core but I think I can't do that without
creating one DMA per core and a few AXI buses. This approach would waste
too many resources.

> > 3. Replace mmap() calls in BF_fpga.c with proper drivers
> What would those proper drivers be?  UIO, as I mentioned here? -

My idea was to follow example from Xilinx -
9. In this manual they are using modules.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.