Date: Sun, 21 Jul 2013 15:40:30 +0200 From: Katja Malvoni <kmalvoni@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Hi Alexander, Since Epiphany code became much smaller when I integrated it with JtR, I tried using internal.ldf instead of fast.ldf and code fits in local memory. Speed is 932 c/s (sometimes it's 934 c/s). BF_fmt port was slow because I didn't copy salt in local memory, when I did that speed was 790 c/s. Than I used internal.ldf and got speed of 932 c/s. If I try to interleave two instances of bcrypt than code can't fit in local memory. At the moment, interleaving two instances of bcrypt doesn't work, it fails self test on get_hash(1). Should I pursue this approach further or not? On Sun, Jul 21, 2013 at 2:25 AM, Solar Designer <solar@...nwall.com> wrote: > Yes, I also think that copying within local memory is faster. However, > we may want to optimize the memcpy(). The existing implementation is > too generic - it doesn't use the 64-bit dual-register load/store > instructions, it has little unrolling, and it include support for sizes > that are not a multiple of 4. (This is from a quick glance at > "e-objdump -d parallella_e_bcrypt.elf".) Can you try creating a simpler > specialized implementation instead, which would use the ldrd/strd insns > and greater unrolling (e.g., 32 ldrd/strd pairs times 16 loop iterations, > for a total of 512 x 64-bit data copies, or 4096 bytes total)? Also, > re-order the instructions such that the store is not attempted > immediately after its corresponding load, to hide its latency - e.g.: > [...] > Ok, I'll try this. Katja Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.