Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 7 Jul 2013 14:04:09 -0400
From: Yaniv Sapir <>
Subject: Re: Parallella: bcrypt


I played with your code a little bit - here's a few things to consider:

1. *RACE CONDITIONS* - these are some of the nastiest and hardest to track
parallel programming pitfalls. In the latest code you attached, I see at
least one possible race: the start signal. Note that at the beginning of
the device program, you set *outbuf.start[corenum]* to *0*. Then, if it is
core #0, you set the start signal of all cores to 16. Then, all cores poll
on their respective start signal, waiting for core #0 to write that 16. So
far, so good.... or, is it?

What would happen if core #1 gets its SYNC signal from that loader (i.e.,
the signal that actually makes the core start running after the load) *
*after** core #0 wrote that 16 on core #1's mailbox? What would happen is
that at the beginning of core #1's program it will actually **erase** core
#0's signal! Thus it will be stuck at the polling loop forever.

Now - although generally I'd consider this bad practice - you can rely on
the fact that static variables are initialized during load time, so you *
know* that outbuf.start[i] is 0 when the program starts running. This way
you will not have the above race.

One more thing to keep in mind - all cores share the same mailbox space.
Once a core program is loaded, its image will erase whatever was in the
mailbox. If a core was running at the time another core was loaded, then
the running core's data would have been erased by the loader. This is not a
problem here because the SYNC signals are sent by e_load_group() only after
all cores were loaded.

2. There's another point to take care of - but it is NOT related to the
programs synching problem: I checked the offsetof() operator on the mailbox
structure in both host and device. Many of the structure members are arrays
of ints. The compiler sometimes aligns these members in memory on a
double-word boundary. However, "done" member is a single word, so if you
put an array after it, there might be a "hole" in the structure due to the
following member alignment. The problems is that e-gcc and teh ARM gcc do
not act the same all the time. Specifically, here, e-gcc leaves a "hole",
while the gcc does not.

What you can do is one of two things:

2.a) Add a dummy int member after "done", this way forcing gcc to leave a
hole, so the next member is aligned.
2.b) Use the "packed" attribute on the structure definition. ( - the
downside might be less optimal memory accesses, but it is not a real
problem right now.

3. As far as I can tell - there is no computational dependency between the
cores (am I right?). So, it is not really necessary that core #0 is
managing all the stuff. Why not let the host manage the program flow? Let
each core negotiate with the host instead of core #0. This way, if a core
fails or takes longer to compute, it does not kill all other cores. That
said - if all cores do similar work, then it is probably somewhat faster to
manage the work with a single core. I suggest that once the host-driven
program is robust enough, you get to the 2nd order optimizations like this.

4. For optimization's sake - you programs use external memory for passing
messages between cores. This will generate an impact on performance, as
external access is much slower than internal. Especially when you consider
core #0 that actually polls on all the other cores' signals.

... so I went on and commented out the "outbuf.start[corenum] = 0" line.
This is what I get (when using only the first 2 rows - i.e., only 8 cores -
b/c I use a faulty chip):

eCore 0x808 (0, 0): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x809 (0, 1): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x80a (0, 2): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x80b (0, 3): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x848 (1, 0): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x849 (1, 1): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x84a (1, 2): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x84b (1, 3): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
Execution time - Epiphany: 19.040000 ms
done = 8
core_done[ 0] = 1     test[ 0] = 16     ciphertext[0] = P
core_done[ 1] = 1     test[ 1] = 16     ciphertext[1] = P
core_done[ 2] = 1     test[ 2] = 16     ciphertext[2] = P
core_done[ 3] = 1     test[ 3] = 16     ciphertext[3] = P
core_done[ 4] = 1     test[ 4] = 16     ciphertext[4] = P
core_done[ 5] = 1     test[ 5] = 16     ciphertext[5] = P
core_done[ 6] = 1     test[ 6] = 16     ciphertext[6] = P
core_done[ 7] = 1     test[ 7] = 16     ciphertext[7] = P

does this make sense?


On Sun, Jul 7, 2013 at 5:45 AM, Katja Malvoni <> wrote:

> On Sun, Jul 7, 2013 at 12:23 AM, Yaniv Sapir <> wrote:
>> Katja,
>> I think that I see some potential problem(s), but before getting into it,
>> can you please attach your build and run commands (scripts?). It is always
>> best to attach an SSCCE (
>> { I am sorry if you already sent these previously, if so, I lost track of
>> it. }
>> Yaniv.
> Okay, they are attached
> Katja

Yaniv Sapir
Adapteva Inc.
1666 Massachusetts Ave, Suite 14
Lexington, MA 02420
Phone: (781)-328-0513 (x104)
CONFIDENTIALITY NOTICE: This e-mail may contain information
that is confidential and proprietary to Adapteva, and Adapteva hereby
designates the information in this e-mail as confidential. The information
 intended only for the use of the individual or entity named above. If you
not the intended recipient, you are hereby notified that any disclosure,
distribution or use of any of the information contained in this
transmission is
strictly prohibited and that you should immediately destroy this e-mail and
contents and notify Adapteva.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ