Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 23 May 2015 19:19:28 +0300
From: Solar Designer <>
Subject: Re: interleaving in SHA256 & SHA512

On Sat, May 23, 2015 at 02:27:47PM +0300, Aleksey Cherepanov wrote:
> I count instructions and bytes of code with the following 2 commands:
> objdump -d JohnTheRipper/src/rawSHA512_my_fmt_plug.o | sed -ne '/<crypt_all>/,/^$/ p' > asm && wc -l asm
> perl -pe 's/[^\t]*\t//; s/\t.*//' asm | tail -n +2 | perl -pe 's/\s+//g' | perl -lne 'print(length($_) / 2, " bytes of code")'

For code size, you may want to keep the relevant function in a separate
source file, producing a separate .o file, and simply use the size(1)
command on the .o file.  In fact, simply try:

size rawSHA512_my_fmt_plug.o

and see how it compares to your Perl's output above.

> It's on core i7 950, with 64kb L1 cache. So there should be only 32kb
> of cache for code.

Yes, modern Intel CPUs have 32 KB for code and 32 KB for data.  This is
shared between 2 threads running on a core, so you should target up to
16 KB for code and 16 KB for data.

As I told you via jabber, it is also possible to execute unrolled code
at full speed out of L2 cache if you're very careful about instruction
size - but you won't achieve that with gcc.  For Haswell, you need to
stay at <= 16 bytes per 3 instructions, which is do-able with careful
choice of registers+offsets for the "memory" operands (actually using
them as your extended virtual register file, giving up to 80 "registers").
I didn't test this on older CPUs.  It might or might not be similar on
Sandy/Ivy Bridge.  I think this approach only makes much sense for
bitslicing, and we should in fact explore it a bit later.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.