john-dev - Re: Parallella: bcrypt

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130820175926.GA16553@openwall.com>
Date: Tue, 20 Aug 2013 21:59:26 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Katja, Yaniv -

On Tue, Aug 20, 2013 at 05:24:17PM +0200, Katja Malvoni wrote:
> Hm... If I understood them correctly, Yaniv's postings were about case when
> there are multiple write (or read) calls from host while we have only one
> which transfers whole inputs structure. From this code
> https://github.com/adapteva/epiphany-libs/blob/master/src/e-hal/src/epiphany-hal.cit
> seems that whole buffer is transferred at once.

My understanding was/is that the re-ordering may happen in hardware
regardless of whether it's multiple or just one write or read call.

> From postings I got
> impression that only thing which can be done is to check whether data is
> transferred correctly before writing start flag. So I changed code to do
> that - removed start arrays from inputs structure and put only one start
> array in shared_buffer structure. And did check of transferred data. That
> didn't help. Am I missing something here?

How did you implement the "check of transferred data"?

What we could do is treat the channel as unreliable and use e.g.
sequence numbers combined with a checksum like CRC-32 on each transfer.
Both sides (Epiphany and host) could use this protocol.  They'd wait
until the sequence number is right and the checksum matches the data.

This feels like a weird thing to do for communication within one system,
where there ought to be a reliable way to communicate without having to
build it ourselves like that.  Yet if nothing else works (and is expected
to work reliably for specific reasons) then we may resort to doing this.

> I managed to get a version which hasn't failed in over 50 runs (code is
> attached). I did three things - added CHECK macro on host side, transfer
> one salt per core and copy keys from external memory only when they change.
> If I change any of those it misses some passwords.

That's a weird solution, and one that does not make me comfortable about
its reliability, given how very fragile you say it is.  It's more like a
workaround for current Parallella systems, which I guess may break again
on a future meant-to-be-compatible revision.  If we have to do things
like that, I'd rather see us do the sequence number + checksum thing,
which is a theoretically sound approach with well-understood properties.

[...]
> But there is something about this scenario that I don't understand. If host
> reads inputs from external memory and proceeds with writing start only when
> read data is equal to written data how it is possible that Epiphany reads
> wrong values from that same location in external memory?

When you use e_write(), does it go through host CPU's caches?  Can
e_read() possibly read from there?  Even if not caches, then perhaps at
least write buffers.

In epiphany-hal.c functions ee_write*() and ee_read*(), we ultimately
write and read, respectively, using host's address space, and there's no
indication that these addresses would somehow be excluded from caching
on the host (our ARM CPU _does_ have caches, unlike Epiphany).  I guess
the CPU can commit those cache lines to external memory any time it
likes, and not necessarily in order.  Similarly, its write buffers might
not be committed (to L2 cache?) right away and in order.

... I've just found the relevant ARM Cortex-A9 documentation:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/Caccifbd.html

They call these things "store" rather than "write" buffers (it's "write"
buffers in some other CPUs' docs), but overall I think I was right.

"The Cortex-A9 CPU has a store buffer with four 64-bit slots with data
merging capability."

> I added 32 bytes padding -

Apparently, this is exactly the size needed to neutralize the effect of
store buffers (but not caches).

> If I read keys from external memory always than
> it seems to be working. But if keys are read only when changed than it
> cracks around 25000 passwords.

This is puzzling.  It suggests that some further action is attempted too
soon when the keys are not re-read... but the only other thing you're
reading is the salt, and you read it before the keys, right?  Puzzling.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.