Date: Sat, 9 Nov 2013 22:09:03 +0100 From: Katja Malvoni <kmalvoni@...il.com> To: john-dev@...ts.openwall.com Subject: Re: bcrypt-parallella on 64-core (was: Katja's weekly report #13) Hi Alexander, On Wed, Oct 30, 2013 at 4:28 PM, Solar Designer <solar@...nwall.com> wrote: > On Wed, Oct 30, 2013 at 01:55:51PM +0100, Katja Malvoni wrote: > > Another idea I have is to use local memory instead of external - with > this > > approach I would avoid external loads completely. > > Right, that's what you need to do in order to use all cores despite of > external loads to some cores being broken. > > > But this approach failed > > with E16 because stack would overwrite part of S-box in some cases. But > > format would usually pass self test. I have that code somewhere so I'll > > modify optimized code accordingly and try this approach, it might work > (at > > least for self test so that we can see performance). > > I think it should be fairly easy to avoid the stack overwrite issue. > I think we didn't approach the full 32 KB usage closely enough for this > to be a difficult problem. We're using 8 KB for the S-boxes, some > kilobytes more for code, and perhaps under 1 KB for other misc. data and > under 1 KB for stack (unless you inadvertently(?) place something large > on the stack). > I tried this and it "works". Performance on E64 is 4691 c/s. But if I run self test again, it fails on get_hash(1). In local memory, this is how expanded key should be stored (this is for core 0, 0): [0x000049a8] = 0x552a5500 [0x000049ac] = 0x552a5500 [0x000049b0] = 0x552a5500 [0x000049b4] = 0x552a5500 [0x000049b8] = 0x552a5500 [0x000049bc] = 0x552a5500 [0x000049c0] = 0x552a5500 [0x000049c4] = 0x552a5500 [0x000049c8] = 0x552a5500 [0x000049cc] = 0x552a5500 [0x000049d0] = 0x552a5500 [0x000049d4] = 0x552a5500 [0x000049d8] = 0x552a5500 [0x000049dc] = 0x552a5500 [0x000049e0] = 0x552a5500 [0x000049e4] = 0x552a5500 [0x000049e8] = 0x552a5500 [0x000049ec] = 0x552a5500 But for core 0, 1 on second run it is like this: [0x000049a8] = 0x552a552a [0x000049ac] = 0x00552a55 [0x000049b0] = 0x2a00552a [0x000049b4] = 0x552a0055 [0x000049b8] = 0x2a552a00 [0x000049bc] = 0x552a552a [0x000049c0] = 0x00552a55 [0x000049c4] = 0x2a00552a [0x000049c8] = 0x552a0055 [0x000049cc] = 0x2a552a00 [0x000049d0] = 0x552a552a [0x000049d4] = 0x00552a55 [0x000049d8] = 0x2a00552a [0x000049dc] = 0x552a0055 [0x000049e0] = 0x2a552a00 [0x000049e4] = 0x552a552a [0x000049e8] = 0x00552a55 [0x000049ec] = 0x2a00552a If I change host code to store it on another memory location than again it passes self test only once. Epiphany code doesn't modify expanded key so I don't see how these extra zeros end up on those memory locations. I tried this on zed system (E16) and the same thing happens. With the same code, performance on E16 system is 1194 c/s because I'm transferring whole inputs structure no matter whether keys or salt changed. Only change I did to run the code on E16 was to set EPIPHANY_CORES to 16. Katja Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.