Date: Sat, 5 Sep 2015 05:47:18 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: MD5 on XOP, NEON, AltiVec On Sat, Sep 05, 2015 at 05:25:16AM +0300, Solar Designer wrote: > Here's what we had last year: > > Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 8x]... (8xOMP) DONE > Raw: 201472 c/s real, 25152 c/s virtual > > Here's what we have now: > > Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE > Raw: 150272 c/s real, 18784 c/s virtual > > I tried looking at "objdump -d sse-intrinsics.o" in the old build vs. > "objdump -d simd-intrinsics.o" in the current version, and I don't see > any obvious problem. Moreover, raw-md5 hasn't regressed, and I think > both it and md5crypt share the SIMDmd5body() function. At this point, > my best guess is we might be getting unaligned buffers. Guess not confirmed. We use buffers on the stack, and they are properly aligned for 128-bit SIMD. This is unreliable for AVX2 and above, though. Disabling the "#if __SSE4_1__ || __MIC__" block in SIMDmd5body() improves performance slightly: Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE Raw: 156160 c/s real, 19520 c/s virtual Perhaps there are other changes like this causing regressions as well. We'll need to bisect the changes. magnum, will you do that? > Once we figure this out and fix it, we'll need to revise MD5_I in > simd-intrinsics.c to use my newly found expression with vcmov() on XOP, > and the obvious expression with OR-NOT on NEON and AltiVec (IIRC, those > archs have OR-NOT, which might be lower latency than select). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.