Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 5 Sep 2015 05:25:16 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: MD5 on XOP, NEON, AltiVec

magnum, Lei -

Here's what we had last year:

Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 8x]... (8xOMP) DONE
Raw:    201472 c/s real, 25152 c/s virtual

Here's what we have now:

Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
Raw:    150272 c/s real, 18784 c/s virtual

I tried looking at "objdump -d sse-intrinsics.o" in the old build vs.
"objdump -d simd-intrinsics.o" in the current version, and I don't see
any obvious problem.  Moreover, raw-md5 hasn't regressed, and I think
both it and md5crypt share the SIMDmd5body() function.  At this point,
my best guess is we might be getting unaligned buffers.

Once we figure this out and fix it, we'll need to revise MD5_I in
simd-intrinsics.c to use my newly found expression with vcmov() on XOP,
and the obvious expression with OR-NOT on NEON and AltiVec (IIRC, those
archs have OR-NOT, which might be lower latency than select).

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ