Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 9 Sep 2015 23:43:41 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: SHA-1 H()

On Sep 8, 2015, at 4:47 PM, Solar Designer <solar@...nwall.com> wrote:
> 
> Lei,
> 
> On Tue, Sep 08, 2015 at 03:04:57PM +0800, Lei Zhang wrote:
>> On Sep 2, 2015, at 11:20 PM, Solar Designer <solar@...nwall.com> wrote:
>>> 
>>> Lei, will you test/benchmark on NEON and AltiVec once magnum commits the
>>> fixes, please?
>> 
>> On AltiVec (4xOMP):
> 
> Is this 4 threads likely across different CPU cores?

I think so. The benchmark results just fluctuated too bad when I utilize the maximum number of hardware threads, so I switched to a small number of threads, without binding them to a specific core though.


> What we need for benchmarking is the maximum number of threads supported
> in hardware on a certain number of CPU cores (on 1 core is OK if you
> can't reliably use the entire machine's cores).  So on POWER8 I guess
> you'll run 8 threads all locked to one physical CPU core.  You should be
> able to do that with OpenMP env vars (affinity).

I'll post the updated results later.


>> On NEON (2xOMP):
>> 
>> [before]
>> pbkdf2-sha1:	578 c/s real, 289 c/s virtual
>> pbkdf2-sha256:	276 c/s real, 138 c/s virtual
>> pbkdf2-sha512:	125 c/s real, 62.7 c/s virtual
>> 
>> [after]
>> pbkdf2-sha1:	501 c/s real, 250 c/s virtual
>> pbkdf2-sha256:	276 c/s real, 138 c/s virtual
>> pbkdf2-sha512:	125 c/s real, 62.7 c/s virtual
>> 
>> There's no significant change on Altivec,
> 
> OK, but you need to run 8 threads/core benchmarks.

Why? Our ZedBoard has only two cores.


>> while SHA1 somehow gets slower on NEON.
> 
> It might need higher interleaving factor now.  You haven't even tried
> introducing interleaving for these archs, have you?  (I don't recall.)

No, I haven't. I'll put this on my todo list.


> Also, as I suggested in the "MD5 on XOP, NEON, AltiVec" thread:
> 
> "[...] we'll need to revise MD5_I in simd-intrinsics.c to use [...]
> the obvious expression with OR-NOT on NEON and AltiVec (IIRC, those
> archs have OR-NOT, which might be lower latency than select)."

I just checked the manuals. NEON does support OR-NOT, but AltiVec seems to only support NOT-OR (~(a|b)). So only NEON can benefit from this optimization perhaps.


Lei

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ