Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 24 Dec 2011 03:34:11 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: MD5 intrinsics compile-time condition

On 12/23/2011 04:49 PM, Solar Designer wrote:
> Apparently, the condition that enables the use of intrinsics is not the
> same for md5 vs. dynamic_27 and 28, and apparently it is non-optimal for
> md5 for certain gcc version(s) (I guess Apple's gcc 4.2).

You introduced it, on purpose :)  It started here:
http://www.openwall.com/lists/john-dev/2011/06/08/13

...then it was tweaked over time (search list for MD5_in_sse_intrinsics) 
and today it looks like this:

#if !defined(MD5_in_sse_intrinsics) && defined(__GNUC__) && \
     (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 4)) && \
     !defined(USING_ICC_S_FILE)
#undef MD5_SSE_PARA
#endif

I can't find any note of why/when it was changed from 4.0 to 4.4 but 
j5c4 had 4.4.

Anyways, I *guess* we can drop that whole test, and do someting like 
this in the arch.h's:

  #elif defined(__GNUC__) && (__GNUC__ == 4 && __GNUC_MINOR__ == 5)
  #define MD5_SSE_PARA                   2
  #define MD5_N_STR                      "8x"
-#elif defined(__GNUC__)
+#elif defined(__GNUC__) && (__GNUC__ >= 4 || (__GNUC_MINOR__ == 4 && 
__GNUC_MINOR__ > 5))
  #define MD5_SSE_PARA                   3
  #define MD5_N_STR                      "12x"
+#elif defined(__GNUC__)
+#define MD5_SSE_PARA                   1
+#define MD5_N_STR                      "4x"
  #else
  #define MD5_SSE_PARA                   3
  #define MD5_N_STR                      "12x"

The current code picks PARA 3 (12x) for any gcc other than 4.5. I 
recently tweaked those tests after empirical tests with 4.4, 4.5 and 4.6 
(and clang and icc) - the versions that were available in my Ubuntu repo 
at the time. I suppose PARA 1 (4x) would be the safe choice for any 
untested version and it should always be faster than disabling SSE.

I can do this change, but I will probably not find time to actually test 
it on ancient compilers. If someone else can produce test results for 
para 1, 2 and 3 for versions of gcc older than 4.4 and running on intel, 
we can put additional clauses for them instead. Otherwise this change 
may be detrimental for other intrinsics formats with some versions of 
gcc. The optimal para's for MD4 and SHA1 should ideally also be tested. 
Also, all tests should be separate for 32-bit and 64-bit...

Like I said in http://www.openwall.com/lists/john-dev/2011/12/11/4 the 
optimal solution would be build-time checking. Here are some test 
results that illustrates how important the PARA setting is (each figure 
is geometrical mean for 10 runs iirc):

== icc_64_Q9550_md5 ==
PARA 3: 32058 real, 31994 virtual
PARA 4: 29443 real, 29443 virtual
PARA 2: 27142 real, 27142 virtual
PARA 5: 25399 real, 25399 virtual
PARA 1: 18265 real, 18265 virtual
PARA 6: 7013 real, 6985 virtual
PARA 7: 6231 real, 6231 virtual
PARA 8: 6043 real, 6031 virtual

== gcc-4.6_64_Q9550_md5 ==
PARA 4: 27445 real, 27554 virtual
PARA 3: 26783 real, 26836 virtual
PARA 2: 26080 real, 26080 virtual
PARA 1: 17294 real, 17363 virtual
PARA 5: 14320 real, 14291 virtual
PARA 6: 5877 real, 5889 virtual
PARA 7: 5262 real, 5273 virtual
PARA 8: 5031 real, 5031 virtual

== gcc-4.5_64_Q9550_md5 ==
PARA 2: 18528 real, 18528 virtual
PARA 3: 16480 real, 16513 virtual
PARA 4: 13638 real, 13638 virtual
PARA 1: 13273 real, 13300 virtual
PARA 6: 4416 real, 4389 virtual
PARA 5: 4308 real, 4317 virtual
PARA 8: 4063 real, 4087 virtual
PARA 7: 3910 real, 3902 virtual

== gcc-4.4_64_Q9550_md5 ==
PARA 3: 24996 real, 25147 virtual
PARA 2: 19603 real, 19642 virtual
PARA 4: 18221 real, 18221 virtual
PARA 1: 17014 real, 17048 virtual
PARA 5: 8023 real, 8023 virtual
PARA 6: 5480 real, 5458 virtual
PARA 7: 5253 real, 5232 virtual
PARA 8: 5067 real, 5047 virtual

Worse yet, the optimal setting for intel is not optimal for AMD:

== gcc-4.4_64_AMD_md5 ==
PARA 5: 18543 real, 18506 virtual
PARA 4: 17961 real, 17961 virtual
PARA 6: 17851 real, 17851 virtual
PARA 7: 16945 real, 16979 virtual
PARA 3: 15523 real, 15523 virtual
PARA 8: 14860 real, 14890 virtual
PARA 2: 13455 real, 13455 virtual
PARA 1: 8779 real, 8779 virtual

Using intel's best values is less detrimental for AMD, than the other 
way round (in general, lower value is safer than higher).

Actually, the figures are depending on exact CPU model too (like Q9550 
vs P8600), but to a lesser degree than intel vs AMD.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.