Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 27 Jun 2013 16:54:31 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Hi Alexander,

On Thu, Jun 27, 2013 at 4:40 PM, Solar Designer <solar@...nwall.com> wrote:

> Katja,
>
> On Mon, Jun 24, 2013 at 04:54:45PM +0200, Katja Malvoni wrote:
> > On Tue, May 28, 2013 at 1:58 AM, Solar Designer <solar@...nwall.com>
> wrote:
> > > On Sun, May 26, 2013 at 07:37:55PM -0400, Yaniv Sapir wrote:
> > > > -mfp-mode=int        # this sets the FPU mode to integer. However,
> please
> > > > make sure that the generated code does not re-program the CONFIG
> register
> > > > before every integer operation
> > >
> > > Let's definitely try this.  I was afraid we'd have to resort to
> assembly
> > > code to use the FPU in integer mode - it's great news to me that we
> seem
> > > not to have to.
> >
> > Unfortunately, this doesn't help a lot... Execution speed with -02 is
> > 45.969000 ms and with -mfp-mode=int is 45.951000 ms. I checked generated
> > assembly code it seems that CONFIG register isn't re-programmed before
> > every integer operation.
>
> ... but are there uses of the IADD instruction (the one implemented on
> the FPU) at all, or only plain ADD (the one implemented on IALU)?
>

In whole disassembly only ADD is used.


>
> Can you show us a piece of disassembly - e.g., for one Blowfish round?
>
>
Here it is:
00000234 <_BF_encrypt>:
 234:    d54c 4400     ldr r22,[sp,+0x2]
 238:    a01b 4009     add r21,r0,72
 23c:    1feb 4002     mov r16,0xff
 240:    20ef 4002     mov r17,r0
 244:    854c 2a00     ldr r12,[r17],+0x2
 248:    860f 208a     eor r12,r1,r12
 24c:    920f 4406     lsr r20,r12,0x10
 250:    510f 4406     lsr r18,r12,0x8
 254:    330f 0406     lsr r1,r12,0x18
 258:    905f 490a     and r20,r20,r16
 25c:    485f 490a     and r18,r18,r16
 260:    911b 4822     add r20,r20,274
 264:    251b 0002     add r1,r1,18
 268:    705f 450a     and r19,r12,r16
 26c:    905f 4806     lsl r20,r20,0x2
 270:    2456          lsl r1,r1,0x2
 272:    491b 4842     add r18,r18,530
 276:    485f 4806     lsl r18,r18,0x2
 27a:    6d1b 4862     add r19,r19,786
 27e:    8249 4100     ldr r20,[r0,+r20]
 282:    20c1          ldr r1,[r0,r1]
 284:    6c5f 4806     lsl r19,r19,0x2
 288:    4149 4100     ldr r18,[r0,+r18]
 28c:    309f 080a     add r1,r20,r1
 290:    61c9 4100     ldr r19,[r0,+r19]
 294:    250f 010a     eor r1,r1,r18
 298:    44cc 4900     ldr r18,[r17,-0x1]
 29c:    259f 010a     add r1,r1,r19
 2a0:    250f 010a     eor r1,r1,r18
 2a4:    488a          eor r2,r2,r1
 2a6:    6a0f 4006     lsr r19,r2,0x10
 2aa:    490f 4006     lsr r18,r2,0x8
 2ae:    6c5f 490a     and r19,r19,r16
 2b2:    2b06          lsr r1,r2,0x18
 2b4:    485f 490a     and r18,r18,r16
 2b8:    6d1b 4822     add r19,r19,274
 2bc:    251b 0002     add r1,r1,18
 2c0:    6c5f 4806     lsl r19,r19,0x2
 2c4:    2456          lsl r1,r1,0x2
 2c6:    491b 4842     add r18,r18,530
 2ca:    485f 4806     lsl r18,r18,0x2
 2ce:    61c9 4100     ldr r19,[r0,+r19]
 2d2:    20c1          ldr r1,[r0,r1]
 2d4:    4149 4100     ldr r18,[r0,+r18]
 2d8:    2c9f 080a     add r1,r19,r1
 2dc:    250f 010a     eor r1,r1,r18
 2e0:    485f 410a     and r18,r2,r16
 2e4:    491b 4862     add r18,r18,786
 2e8:    485f 4806     lsl r18,r18,0x2
 2ec:    6149 4100     ldr r19,[r0,+r18]
 2f0:    454c 4a00     ldr r18,[r17],+0x2
 2f4:    259f 010a     add r1,r1,r19
 2f8:    250f 010a     eor r1,r1,r18
 2fc:    908f 240a     eor r12,r12,r1
 300:    26bf 090a     sub r1,r17,r21
 304:    a410          bne 24c <_BF_encrypt+0x18>
 306:    20cc 0002     ldr r1,[r0,+0x11]
 30a:    8cdc 2000     str r12,[r3,+0x1]
 30e:    288a          eor r1,r2,r1
 310:    2c54          str r1,[r3]
 312:    6c1b 0001     add r3,r3,8
 316:    59bf 080a     sub r2,r22,r3
 31a:    50ef 0402     mov r2,r12
 31e:    9120          bgtu 240 <_BF_encrypt+0xc>
 320:    04e2          mov r0,r1
 322:    194f 0402     rts
 326:    01a2          nop


Execution time when using all 16 cores is 294.676000 ms

Katja

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.