john-dev - Re: SHA-1 H()

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <3465D09F-8C78-4120-8C7A-6A9E75712E9E@gmail.com>
Date: Wed, 9 Sep 2015 23:43:41 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: SHA-1 H()

On Sep 8, 2015, at 4:47 PM, Solar Designer <solar@...nwall.com> wrote:
> 
> Lei,
> 
> On Tue, Sep 08, 2015 at 03:04:57PM +0800, Lei Zhang wrote:
>> On Sep 2, 2015, at 11:20 PM, Solar Designer <solar@...nwall.com> wrote:
>>> 
>>> Lei, will you test/benchmark on NEON and AltiVec once magnum commits the
>>> fixes, please?
>> 
>> On AltiVec (4xOMP):
> 
> Is this 4 threads likely across different CPU cores?

I think so. The benchmark results just fluctuated too bad when I utilize the maximum number of hardware threads, so I switched to a small number of threads, without binding them to a specific core though.


> What we need for benchmarking is the maximum number of threads supported
> in hardware on a certain number of CPU cores (on 1 core is OK if you
> can't reliably use the entire machine's cores).  So on POWER8 I guess
> you'll run 8 threads all locked to one physical CPU core.  You should be
> able to do that with OpenMP env vars (affinity).

I'll post the updated results later.


>> On NEON (2xOMP):
>> 
>> [before]
>> pbkdf2-sha1:	578 c/s real, 289 c/s virtual
>> pbkdf2-sha256:	276 c/s real, 138 c/s virtual
>> pbkdf2-sha512:	125 c/s real, 62.7 c/s virtual
>> 
>> [after]
>> pbkdf2-sha1:	501 c/s real, 250 c/s virtual
>> pbkdf2-sha256:	276 c/s real, 138 c/s virtual
>> pbkdf2-sha512:	125 c/s real, 62.7 c/s virtual
>> 
>> There's no significant change on Altivec,
> 
> OK, but you need to run 8 threads/core benchmarks.

Why? Our ZedBoard has only two cores.


>> while SHA1 somehow gets slower on NEON.
> 
> It might need higher interleaving factor now.  You haven't even tried
> introducing interleaving for these archs, have you?  (I don't recall.)

No, I haven't. I'll put this on my todo list.


> Also, as I suggested in the "MD5 on XOP, NEON, AltiVec" thread:
> 
> "[...] we'll need to revise MD5_I in simd-intrinsics.c to use [...]
> the obvious expression with OR-NOT on NEON and AltiVec (IIRC, those
> archs have OR-NOT, which might be lower latency than select)."

I just checked the manuals. NEON does support OR-NOT, but AltiVec seems to only support NOT-OR (~(a|b)). So only NEON can benefit from this optimization perhaps.


Lei

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.