Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 9 Nov 2013 22:09:03 +0100
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: bcrypt-parallella on 64-core (was: Katja's weekly
 report #13)

Hi Alexander,

On Wed, Oct 30, 2013 at 4:28 PM, Solar Designer <solar@...nwall.com> wrote:

> On Wed, Oct 30, 2013 at 01:55:51PM +0100, Katja Malvoni wrote:
> > Another idea I have is to use local memory instead of external - with
> this
> > approach I would avoid external loads completely.
>
> Right, that's what you need to do in order to use all cores despite of
> external loads to some cores being broken.
>
> > But this approach failed
> > with E16 because stack would overwrite part of S-box in some cases. But
> > format would usually pass self test. I have that code somewhere so I'll
> > modify optimized code accordingly and try this approach, it might work
> (at
> > least for self test so that we can see performance).
>
> I think it should be fairly easy to avoid the stack overwrite issue.
> I think we didn't approach the full 32 KB usage closely enough for this
> to be a difficult problem.  We're using 8 KB for the S-boxes, some
> kilobytes more for code, and perhaps under 1 KB for other misc. data and
> under 1 KB for stack (unless you inadvertently(?) place something large
> on the stack).
>

I tried this and it "works". Performance on E64 is 4691 c/s.
But if I run self test again, it fails on get_hash[0](1).

In local memory, this is how expanded key should be stored (this is for
core 0, 0):
[0x000049a8] = 0x552a5500
[0x000049ac] = 0x552a5500
[0x000049b0] = 0x552a5500
[0x000049b4] = 0x552a5500
[0x000049b8] = 0x552a5500
[0x000049bc] = 0x552a5500
[0x000049c0] = 0x552a5500
[0x000049c4] = 0x552a5500
[0x000049c8] = 0x552a5500
[0x000049cc] = 0x552a5500
[0x000049d0] = 0x552a5500
[0x000049d4] = 0x552a5500
[0x000049d8] = 0x552a5500
[0x000049dc] = 0x552a5500
[0x000049e0] = 0x552a5500
[0x000049e4] = 0x552a5500
[0x000049e8] = 0x552a5500
[0x000049ec] = 0x552a5500

But for core 0, 1 on second run it is like this:
[0x000049a8] = 0x552a552a
[0x000049ac] = 0x00552a55
[0x000049b0] = 0x2a00552a
[0x000049b4] = 0x552a0055
[0x000049b8] = 0x2a552a00
[0x000049bc] = 0x552a552a
[0x000049c0] = 0x00552a55
[0x000049c4] = 0x2a00552a
[0x000049c8] = 0x552a0055
[0x000049cc] = 0x2a552a00
[0x000049d0] = 0x552a552a
[0x000049d4] = 0x00552a55
[0x000049d8] = 0x2a00552a
[0x000049dc] = 0x552a0055
[0x000049e0] = 0x2a552a00
[0x000049e4] = 0x552a552a
[0x000049e8] = 0x00552a55
[0x000049ec] = 0x2a00552a

If I change host code to store it on another memory location than again it
passes self test only once. Epiphany code doesn't modify expanded key so I
don't see how these extra zeros end up on those memory locations.
I tried this on zed system (E16) and the same thing happens. With the same
code, performance on E16 system is 1194 c/s because I'm transferring whole
inputs structure no matter whether keys or salt changed. Only change I did
to run the code on E16 was to set EPIPHANY_CORES to 16.

Katja

Content of type "text/html" skipped

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ