john-dev - Re: "AVX 4x" instead of "AVX 8x"

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <b63620363c0e1446ca855c18b2a1edc4@smtp.hushmail.com>
Date: Wed, 02 Oct 2013 18:56:17 +0200
From: magnum <john.magnum@...hmail.com>
To: "john-dev@...ts.openwall.com" <john-dev@...ts.openwall.com>
Subject: Re: "AVX 4x" instead of "AVX 8x"

On 2013-10-02 09:26, Dhiru Kholia wrote:
> well.openwall.net shows "AVX 8x" when bench-marking the RAKP format.
>
> However, my Haswell system running Fedora 20 shows "AVX 4x". Pretty
> weird, right?
>
> Any insights into what is going on here?

It's not weird at all :)  I'll send this to john-dev in case someone 
else has missed this. In the end it means how many keys are processed by 
a single call to the SSE2 SHA1 function.

There are a chain of #ifdefs in x86-64.h (or x86-sse.h for 32-bit) that 
chooses SHA1_SSE_PARA (and similar macros for MD4 and MD5) depending on 
what compiler you run, and what version. Because different compilers 
perform better at different values.

SHA1_SSE_PARA controls how much interleaving sse-intrinsics.c builds 
into the SSE2/AVX/XOP SHA1 function. This interleaving hides latency if 
the compiler can schedule instructions well. Intel's icc is very good at 
this.

So for SHA1_SSE_PARA=1 you get 4x and for SHA1_SSE_PARA=2 you get 8x 
(SSE2, AVX or XOP are all 4x [we call that MMX_COEF], multiplied with 
the SHA1_SSE_PARA of 2 == 8x).

If you try a new compiler (eg. a future gcc-4.9) you can run "make 
testpara" or "make testpara-native" and see what extra #ifdefs we should 
use in x86-64.h.

Future AVX2 functions might get an MMX_COEF of 8 instead of 4. Then 
you'll get "AVX2 16x" provided you use SHA1_SSE_PARA=2.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.