Date: Mon, 1 Oct 2012 10:56:33 +0400 From: Solar Designer <solar@...nwall.com> To: Colin Percival <cperciva@...snap.com> Cc: crypt-dev@...ts.openwall.com Subject: Re: using scrypt for user authentication On Sun, Sep 30, 2012 at 03:25:16PM +0400, Solar Designer wrote: > Trying to see how much further we can go by replacing Salsa20/8 with > something else (or simply reducing the number of rounds), I tried > placing a "return 0;" at the start of salsa20_8(). With this, I am able > to fit 64 MB in 100 ms. Not that much of a change. :-( Correction: once I've excluded some overhead and system time from my measurements, I am able to go to 256 MB with salsa20_8() NOP'ed out, yet stay under 100 ms. I am not sure if avoiding the system time would be possible in actual use, though - I just presume that it might be. This is (2^18, 8, 1) with salsa20_8() NOP'ed out on E5649 (2.53 GHz): real 0m0.240s user 0m0.080s sys 0m0.159s At 2 rounds of Salsa20 (instead of 8), I am still only able to go for 64 MB: (2^16, 8, 1), 2 rounds, same CPU: real 0m0.124s user 0m0.087s sys 0m0.037s (2^15, 8, 1), 8 rounds (default), same CPU: real 0m0.121s user 0m0.100s sys 0m0.020s I also did some testing with other values of r, with higher p, with "#pragma omp parallel for" for the p loop, and with concurrent instances (without parallelization within the instances) - for up to 24 (this machine has two E5649's, so 12 cores, 24 logical). As expected, scalability is quite poor, except at relatively low working set sizes (such as fitting in the 12 MB L3 cache). I don't have time to properly process and post all of the results right now. I was compiling the SSE2 intrinsics code for x86-64 using gcc 4.6.2 with "-march=native -O3 -fomit-frame-pointer". Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.