Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 21 Mar 2012 06:47:28 +0400
From: Solar Designer <>
Subject: SSSE3 PSHUFB (was: AMD Bulldozer and XOP)

On Thu, Mar 15, 2012 at 11:46:09AM +0200, Milen Rangelov wrote:
> Actually the quoted improvement percentage was not correct. I did some more
> improvements (like e.g using SSE3 shuffle to speed up the byte order
> reversals in SHA1 and optimizing a bit the early checks).

Regarding "using SSE3 shuffle to speed up the byte order reversals", do
you actually mean SSSE3 (not SSE3) and its PSHUFB instruction?

Is this something you'd possibly be willing to implement and contribute
for JtR as well?  Sounds like a good way for you to get involved in our
project more directly. ;-)

BTW, I just realized how very powerful PSHUFB is.  It's not just a
shuffle.  It's 16 parallel 4-to-4 array lookups, usable e.g. for 16
parallel S-box lookups.  It could even compete with bitslice DES, or
even if it'd lose to bitslice DES in terms of speed, it could allow for
a very fast non-bitslice DES or 3DES implementation, where we readily
have 8 6-to-4 S-box lookups (or 32 4-to-4 lookups) to make in just one
instance.  It would be usable e.g. to encrypt just one data stream
sequentially while meeting an existing standard, where a bitslice
implementation would not be usable (we have no such task in JtR
currently, but I imagine that it'd be helpful e.g. in some IPSEC
implementation).  We could try it for DES and for Lotus5.

Also, a new KDF built upon PSHUFB would be friendly to recent CPUs with
that instruction, but GPU-unfriendly (until/unless a GPU has something
similar - which I am not aware of).  So it could be used to slow down
GPU-based offline attacks.



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.