john-dev - Re: Parallella: bcrypt

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+EaD-ZkmRd7qBjOWWs0EX7j90WXdnZ5=XYindw1DUrOMvGLXQ@mail.gmail.com>
Date: Sun, 7 Jul 2013 21:00:43 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Hi Yaniv,
On Sun, Jul 7, 2013 at 8:04 PM, Yaniv Sapir <yaniv@...pteva.com> wrote:

> Katja,
>
> I played with your code a little bit - here's a few things to consider:
>
> 1. *RACE CONDITIONS* - [...]
>
> I noticed this few hours ago and removed it.


> 2. There's another point to take care of - but it is NOT related to the
> programs synching problem: I checked the offsetof() operator on the mailbox
> structure in both host and device. Many of the structure members are arrays
> of ints. The compiler sometimes aligns these members in memory on a
> double-word boundary. However, "done" member is a single word, so if you
> put an array after it, there might be a "hole" in the structure due to the
> following member alignment. The problems is that e-gcc and teh ARM gcc do
> not act the same all the time. Specifically, here, e-gcc leaves a "hole",
> while the gcc does not.
>
> What you can do is one of two things:
>
> 2.a) Add a dummy int member after "done", this way forcing gcc to leave a
> hole, so the next member is aligned.
> 2.b) Use the "packed" attribute on the structure definition. (
> http://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Variable-Attributes.html) - the
> downside might be less optimal memory accesses, but it is not a real
> problem right now.
>

Changed this as well.

>
> 3. As far as I can tell - there is no computational dependency between the
> cores (am I right?). So, it is not really necessary that core #0 is
> managing all the stuff. Why not let the host manage the program flow? Let
> each core negotiate with the host instead of core #0. This way, if a core
> fails or takes longer to compute, it does not kill all other cores. That
> said - if all cores do similar work, then it is probably somewhat faster to
> manage the work with a single core. I suggest that once the host-driven
> program is robust enough, you get to the 2nd order optimizations like this.
>

I first started with signalling start to cores independently but some cores
weren't getting the signal so I changed my approach to make it work.


> ... so I went on and commented out the "outbuf.start[corenum] = 0" line.
> This is what I get (when using only the first 2 rows - i.e., only 8 cores -
> b/c I use a faulty chip):
>
> eCore 0x808 (0, 0): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
> 74420600
> eCore 0x809 (0, 1): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
> 74420600
> eCore 0x80a (0, 2): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
> 74420600
> eCore 0x80b (0, 3): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
> 74420600
> eCore 0x848 (1, 0): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
> 74420600
> eCore 0x849 (1, 1): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
> 74420600
> eCore 0x84a (1, 2): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
> 74420600
> eCore 0x84b (1, 3): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
> 74420600
> Execution time - Epiphany: 19.040000 ms
> done = 8
> core_done[ 0] = 1     test[ 0] = 16     ciphertext[0] = P
> core_done[ 1] = 1     test[ 1] = 16     ciphertext[1] = P
> core_done[ 2] = 1     test[ 2] = 16     ciphertext[2] = P
> core_done[ 3] = 1     test[ 3] = 16     ciphertext[3] = P
> core_done[ 4] = 1     test[ 4] = 16     ciphertext[4] = P
> core_done[ 5] = 1     test[ 5] = 16     ciphertext[5] = P
> core_done[ 6] = 1     test[ 6] = 16     ciphertext[6] = P
> core_done[ 7] = 1     test[ 7] = 16     ciphertext[7] = P
>
>
> does this make sense?
>
> Yaniv.
>

It does, these are correct results.
I did two things - commented out outbuf.start[corenum] = 0 and
outbuf.core_done[corenum] = 0 and I put done at the end of the structure
(full.zip) and I get wrong results for some cores. Key and hash aren't
transferred correctly.

I also tried your advice from the previous email and I removed all bcrypt
stuff and tested only communication (minimal.zip). Host core signalization
works fine but transfers to core's local memory don't. Key and hash aren't
transferred correctly and cores with wrong values aren't always the same.
They change from run to run. What can cause this? In the crypt_all()
function in parallella_bf_fmt.c (
https://github.com/kmalvoni/JohnTheRipper/tree/master) I transfer setting
and key in the same manner and it works. And when I started to change
things around that implementation, problems with transfer appeared.

Content of type "text/html" skipped

Download attachment "minimal.zip" of type "application/zip" (5248 bytes)

Download attachment "full.zip" of type "application/zip" (14683 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.