Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 26 Dec 2011 22:15:56 +0400
From: Solar Designer <>
Subject: Re: MD5 intrinsics compile-time condition

On Sat, Dec 24, 2011 at 03:34:11AM +0100, magnum wrote:
> On 12/23/2011 04:49 PM, Solar Designer wrote:
> >Apparently, the condition that enables the use of intrinsics is not the
> >same for md5 vs. dynamic_27 and 28, and apparently it is non-optimal for
> >md5 for certain gcc version(s) (I guess Apple's gcc 4.2).
> You introduced it, on purpose :)  It started here:

Sure.  My latest "complaint" is not about having this condition, but
about it apparently being inconsistent for different formats
representing the same hash type.

> ...then it was tweaked over time (search list for MD5_in_sse_intrinsics) 
> and today it looks like this:
> #if !defined(MD5_in_sse_intrinsics) && defined(__GNUC__) && \
>     (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 4)) && \
>     !defined(USING_ICC_S_FILE)
> #undef MD5_SSE_PARA
> #endif
> I can't find any note of why/when it was changed from 4.0 to 4.4 but 
> j5c4 had 4.4.

I changed it to 4.4 shortly before the contest because someone on
john-users reported a performance regression with gcc 4.3.x when
compiling with the intrinsics.

> Anyways, I *guess* we can drop that whole test, and do someting like 
> this in the arch.h's:
>  #elif defined(__GNUC__) && (__GNUC__ == 4 && __GNUC_MINOR__ == 5)
>  #define MD5_SSE_PARA                   2
>  #define MD5_N_STR                      "8x"
> -#elif defined(__GNUC__)
> +#elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC_MINOR__ == 4 && 
> __GNUC_MINOR__ > 5))

That's a weird check.

>  #define MD5_SSE_PARA                   3
>  #define MD5_N_STR                      "12x"
> +#elif defined(__GNUC__)
> +#define MD5_SSE_PARA                   1
> +#define MD5_N_STR                      "4x"
>  #else
>  #define MD5_SSE_PARA                   3
>  #define MD5_N_STR                      "12x"
> The current code picks PARA 3 (12x) for any gcc other than 4.5. I 
> recently tweaked those tests after empirical tests with 4.4, 4.5 and 4.6 
> (and clang and icc) - the versions that were available in my Ubuntu repo 
> at the time. I suppose PARA 1 (4x) would be the safe choice for any 
> untested version and it should always be faster than disabling SSE.

Maybe.  We need to test with pre-4.0 versions of gcc as well.

> I can do this change, but I will probably not find time to actually test 
> it on ancient compilers. If someone else can produce test results for 
> para 1, 2 and 3 for versions of gcc older than 4.4 and running on intel, 
> we can put additional clauses for them instead. Otherwise this change 
> may be detrimental for other intrinsics formats with some versions of 
> gcc. The optimal para's for MD4 and SHA1 should ideally also be tested. 
> Also, all tests should be separate for 32-bit and 64-bit...

Yes, this is a lot of testing.

> Like I said in the 
> optimal solution would be build-time checking.

We could do that, but a drawback of build-time benchmarking is that
builds may be inconsistent, such as depending on system load.  I'd be
happier to have this tested with a range of gcc versions for both 32-
and 64-bit at least on a Core 2'ish CPU, so we'd have hard-coded
settings for each range of gcc versions (similar to what we have now).

> Here are some test 
> results that illustrates how important the PARA setting is (each figure 
> is geometrical mean for 10 runs iirc):



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.