Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 8 Jul 2013 20:28:33 +0200
From: Katja Malvoni <>
Subject: Re: Parallella: bcrypt

Hi Yaniv,

On Mon, Jul 8, 2013 at 5:38 AM, Yaniv Sapir <> wrote:

> Katja,
> It is a little bit hard to follow your question - I hope I get it right:
>  I changed one thing in the code I sent in previous emails and now it
>> works. I did something you recommended not to do - I used e_write and after
>> it I used e_load_group(). Now both minimal and full code work.
> That's great.

It is and it's not... With this approach I have to load image in every
iteration of the loop, I can't implement your suggestion to have server on

But I put it after e_writes to local memory.
> What is the "it" you refer to? What did you put after the e_write()?

"It" is e_load_group(), so I have e_load_group() after two e_write() calls
used to transfer key and hash to core's local memory.

>> If I put e_load after writing to shared dram than it doesn't work.
> Probably b/c you overrun the whatever is in the DRAM with the
> initialization values of the external objects that are written by the
> e_load().

Hm... Does that mean that e_load() puts zeroes in shared dram if variable
is declared as static? If so, than my whole shared buffer is filled with
zeroes. How can than I read garbage from that buffer?

On the other hand, if it's before e_wirtes to core's local memory than data
>> in local memory isn't correct for some of the cores.
> If you launch the program (E_TRUE input to e_load()) before writing the
> data, there surely is a situation where you are actually processing garbage
> (GIGO)....
>> I got it working in one more way. If I start cores using e_start() after
>> e_write() (attached code) than it also works.
> .... which is why this method works. You load, then write data, then start
> program!
> So I created one more really minimal code, it's attached. I load the
program and than do only one write to shared memory.
And I ran it many times. I have two scenarios - first one is when I use
"while(result.core_done[0] == 0)". In that case there is some garbage
(different from zero) in result.core_done[0] and data is read from the
shared memory immediately. What I read in that case is some garbage for
result.core_done array and start array has correct values for all the cores
(in this example that's 16). If I use sleep(1) (or any longer or shorter
(usleep) interval), I read only zeroes because all cores are in infinite
loop - whole start array is filled with zeroes. The only explanation I have
for this is that whole start array gets overwritten. But only e_load can
overwrite those since I don't do any writes to start array except when host
writes 16 to every array location. Cores do not write in that array. I
checked all offsets, for the attached code, both cores and host return
exactly the same number for every variable in the data structure.

On Sun, Jul 7, 2013 at 8:04 PM, Yaniv Sapir <> wrote:
... so I went on and commented out the "outbuf.start[corenum] = 0" line.
This is what I get (when using only the first 2 rows - i.e., only 8 cores -
b/c I use a faulty chip):

eCore 0x808 (0, 0): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x809 (0, 1): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x80a (0, 2): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x80b (0, 3): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x848 (1, 0): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x849 (1, 1): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x84a (1, 2): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
eCore 0x84b (1, 3): 1bb69143, f9a8d304, c8d23d99, ab049a77, a68e2ccc,
Execution time - Epiphany: 19.040000 ms
done = 8
core_done[ 0] = 1     test[ 0] = 16     ciphertext[0] = P
core_done[ 1] = 1     test[ 1] = 16     ciphertext[1] = P
core_done[ 2] = 1     test[ 2] = 16     ciphertext[2] = P
core_done[ 3] = 1     test[ 3] = 16     ciphertext[3] = P
core_done[ 4] = 1     test[ 4] = 16     ciphertext[4] = P
core_done[ 5] = 1     test[ 5] = 16     ciphertext[5] = P
core_done[ 6] = 1     test[ 6] = 16     ciphertext[6] = P
core_done[ 7] = 1     test[ 7] = 16     ciphertext[7] = P

does this make sense?

I tried this and for me it doesn't work, I don't get correct results. I
took code I sent you (
and did only that, commented out "outbuf.start[corenum] = 0" and some cores
return wrong results. Even worse, cores that return wrong results are
different for every run and wrong results are also different. Output that I
get is attached. Could you please check that you changed only that?

And one more question - can core halt itself and if can, how? Since writing
to core's local memory and than starting cores using e_start works, I can
have infinite loop in which first instruction is halt and after all writes
are done, host starts the cores with e_start. After one iteration core
halts itself and gets new data. If this is possible scenario, what would
happen if core is halted and data transfers aren't executed completely (by
data transfer I mean transfer of result from local memory to shared

Thank you,


Content of type "text/html" skipped

View attachment "output.txt" of type "text/plain" (6153 bytes)

Download attachment "" of type "application/zip" (2930 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.