Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 5 Sep 2015 05:47:18 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: MD5 on XOP, NEON, AltiVec

On Sat, Sep 05, 2015 at 05:25:16AM +0300, Solar Designer wrote:
> Here's what we had last year:
> 
> Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 8x]... (8xOMP) DONE
> Raw:    201472 c/s real, 25152 c/s virtual
> 
> Here's what we have now:
> 
> Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
> Raw:    150272 c/s real, 18784 c/s virtual
> 
> I tried looking at "objdump -d sse-intrinsics.o" in the old build vs.
> "objdump -d simd-intrinsics.o" in the current version, and I don't see
> any obvious problem.  Moreover, raw-md5 hasn't regressed, and I think
> both it and md5crypt share the SIMDmd5body() function.  At this point,
> my best guess is we might be getting unaligned buffers.

Guess not confirmed.  We use buffers on the stack, and they are properly
aligned for 128-bit SIMD.  This is unreliable for AVX2 and above, though.

Disabling the "#if __SSE4_1__ || __MIC__" block in SIMDmd5body()
improves performance slightly:

Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
Raw:    156160 c/s real, 19520 c/s virtual

Perhaps there are other changes like this causing regressions as well.
We'll need to bisect the changes.  magnum, will you do that?

> Once we figure this out and fix it, we'll need to revise MD5_I in
> simd-intrinsics.c to use my newly found expression with vcmov() on XOP,
> and the obvious expression with OR-NOT on NEON and AltiVec (IIRC, those
> archs have OR-NOT, which might be lower latency than select).

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ