Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
Date: Tue, 3 Sep 2013 00:57:56 +0100
From: Rafael Waldo Delgado Doblas <>
Subject: Rafael's weekly report #12


1. Debuged epiphany-scrypt and driver-epiphany.
2. Replaced memcpy, it improve a little bit the performance.
3. Started with asm inline codification.

1. Follow with asm codification,

Well I started with inline asm. But the results werenít better than C. This
is because the C compiler optimizes the Bout vector access storing it in
the general registers and after calls to R  macro:

7b0:   879f e10a       add r60,r1,r23
7b4:   b32f fc06       lsr r61,r60,0x19
7b8:   90ff fc06       lsl r60,r60,0x7
7bc:   967f ff8a       orr r60,r61,r60
7c0:   4a0f 6f8a       eor r26,r26,r60

Howerver when I define the R function with inline asm:
    __asm__("LSR r60, %1, %3\n\tIMADD r60, %1, %4\n\t EOR %0, %1, r60" \
        : "=r"(b)\
        : "r"(a), "r"(b), "n"(c), "r"(d) \
        : "memory", "%r60" \

This optimization is not performed, this adds 2 instructions more:

788:       4a4c c001       ldr r50,[r2,+0xc]
78c:       284c c000       ldr r49,[r2,+0x0]
790:       289f db0a       add r49,r50,r49
794:       84ef f806       lsr r60,r49,0x7
798:       863f f887       fmadd r60,r49,r12
79c:       260f db8a       eor r49,r49,r60
7a0:       2a5c c000       str r49,[r2,+0x4]

BTW fmadd/imadd has cannot work with immediate values then we will need to
add extra 4 movs in order to store the multiplication constants in

I will write a new ASM file with the salsa20_8 implementation using the C
optimized version as base.



