john-dev - Re: Parallella: bcrypt

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFYn=yAsg50iLjuY21FZ8XyXGPa1UXv3fUrAdeWxowaGHQMZog@mail.gmail.com>
Date: Thu, 4 Jul 2013 23:59:53 -0400
From: Yaniv Sapir <yaniv@...pteva.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Katja,

>From a quick glance at the attached *host *program you attached, I spotted
a couple of things that *may* be a problem:

1. You use e_write() to write data on the cores (at address 0x7700and
0x7800). Then, you load the program code itself using e_load(). What you
need to realize is that e_load() may override the previously written data
with some program data. Especially if you defined global (static) buffers
in your program for the key and the hash at those addresses, then they are
initialized with zeros (per the C standard), and these zeros are part of
the program image, written at load time.

An easy thing you can do to verify that is to look at the *.srec file and
see if those addresses are written or not (for reference, the SREC format
is described in Wikipedia. Look at the S3 records).

So, you probably want to reorder the sequence of things - first, load the
program, then write the data and then run the program. As you guessed, the
last parameter to the e_load() function is a switch telling the loader
whether to run the program after loading or not. In case you want to run
the program separately from the load, pass E_FALSE to e_load() and then use
e_start().


2. You have a while loop polling the result.done signal. However, the when
the while() statement first tests the signal's value it is used
uninitialized, so it contain the last round's result. Either set it to 0
before entering the loop, or use "do {} while();" construct instead.


3. Your e_open() call is inside the "a" loop, while the e_close() call is
outside. You can safely move e_open() outside of the loop.


4. You reload the program in each iteration of the loop. Is this really
necessary? Does the program change between iterations? Also, note that
loading a program while a previous process is still running on the core
(i.e., the core is not in IDLE mode) is unsafe and can cause harmful
result. My guess is that when you use the usleep() call then you are lucky
enough that the previous run was concluded, and the program returned from
main(), eventually calling the TRAP instruction. In order to make sure that
the core is in IDLE state, either reset the system, or reset the core
itself.

However, while the e_reset_system() is quite robust, it contains a 1 second
delay after issuing the reset command. OTOH, e_reset_core() is less robust
and can have side effects if returning memory transactions (as a result of
an LDR instruction) are on their way to the core when the reset was
performed. So care should be taken here.


The way I suggest doing stuff like this would be to write the device
program as a "server" program. When started, it waits for a signal from the
host to start processing the data. Once done, it signals the host that the
result is ready. Then, it loops back to the "wait for signal" part. This
way, the program is ran once, saving the repeated load time, and saving the
headache of resetting the cores before each round.

As to your question - generally speaking, the e_read() and e_write() are
not synchronous in the sense that there is no guarantee that RAW (Read
After Write) transactions will be in order. That is, if you write data to
an address on the core and then read from the same address, then there is
no guarantee that you will read the new value. Because the two transactions
arrive at the core via two different networks, it is possible that the
write transaction gets delayed from some reason, and that the read
transaction will reach the core first.

Yaniv.



On Thu, Jul 4, 2013 at 9:42 PM, Katja Malvoni <kmalvoni@...il.com> wrote:

> Hello,
>
> I fixed bug in Epiphany bcrypt implementation and with Lukas's help, it is
> integrated in JtR - https://github.com/kmalvoni/JohnTheRipper/tree/master
> But there is still one problem. It passes test when using usleep(20000)
> for waiting but it doesn't pass it when busy wait is used. Busy wait
> solution works when there is only one load to epiphany. If the code is in a
> loop than results aren't correct. In case of JtR, test fails on
> get_hash[0](1). Attached code is bcrypt implementation outside JtR. It has
> the same problem. When there is only one test vector it works. When it's in
> a loop (like in attached code) it always returns correct output for first
> test vectors. It seems like others are never sent to device. But when while
> loop is substituted with usleep(20000) everything works fine.
> Connected to that, Yaniv, are e_read() and e_write() synchronous?
> And what happens if parameter in e_load_group() is false? How to start
> execution on eCore, using e_start()?
>
> Thanks,
>
> Katja
>



-- 
===========================================================
Yaniv Sapir
Adapteva Inc.
1666 Massachusetts Ave, Suite 14
Lexington, MA 02420
Phone: (781)-328-0513 (x104)
Email: yaniv@...pteva.com
Web: www.adapteva.com
============================================================
CONFIDENTIALITY NOTICE: This e-mail may contain information
that is confidential and proprietary to Adapteva, and Adapteva hereby
designates the information in this e-mail as confidential. The information
is
 intended only for the use of the individual or entity named above. If you
are
not the intended recipient, you are hereby notified that any disclosure,
copying,
distribution or use of any of the information contained in this
transmission is
strictly prohibited and that you should immediately destroy this e-mail and
its
contents and notify Adapteva.
==============================================================

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.