Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 13 Jul 2012 11:21:29 +0400
From: Solar Designer <>
Cc: Tavis Ormandy <>
Subject: Re: Rotate and bitselect investigation

magnum, Tavis -

On Mon, Jul 09, 2012 at 10:30:29AM +0400, Solar Designer wrote:
> On Mon, Jul 09, 2012 at 10:15:54AM +0530, Sayantan Datta wrote:
> > F(x,y,z) ((x & y) | (z & (x | y)))==F(x,y,z) (bitselect(x, y, z) ^
> > bitselect(x, (uint)0, y))
> Wow.  I wonder if this trick for SHA-1 was known at all.  Not to us, it
> seems.  The second bitselect() is essentially an and-not, so the speed
> might be better if it's written as such (if there's an and-not
> instruction).  Also, I guess this change should hurt on NVIDIA (does
> it?), so you'll need to wrap it in some #ifdef.
> Anyway, I've just tried it on CPU (XOP).  Patch attached.  Here are the
> speeds (best of several invocations in each case):
> Before:
> Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 XOP intrinsics 4x]... DONE
> Raw:    28925K c/s real, 28925K c/s virtual
> After:
> Benchmarking: Raw SHA-1 (pwlen <= 15) [128/128 XOP intrinsics 4x]... DONE
> Raw:    28435K c/s real, 28435K c/s virtual

On another build (same machine), the patched version is faster.  So I
guess it depends on placement in caches and such.  The code becomes a
bit smaller (9179 bytes reduces to 9115 bytes for rawSHA1_ng_fmt.o
.text).  So I think we should apply the patch from my previous posting
on this as-is.  There's clear speedup for sse-intrinsics.c's SHA-1.

magnum - please commit.  I think this can be in the fixes branch as well
(trivial change in terms of possible breakage).



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.