Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 7 Jul 2012 13:37:00 -0500
From: "jfoug" <>
To: <>
Subject: make-generic aligned vs unaligned speeds, and generic vs SS2i speeds.

>From: jfoug []
>There are 2 patches here.
>2. I added the allows_unaligned logic to detect.c  (in the 'unaligned'
>patch above).
>The 2nd, I believe, should be in bleeding, and magnum-jumbo.  I will
>post timing differences shortly on a mag-bleeding build, where the
>aligned vs unaligned is the only difference.  I also plan on comparing
>to sse2i build (to see if what non-optimal formats there are in that

Benching on my older core2, showed little difference between the
ALLOW-ALIGNED and !ALLOW-ALIGNED 'generic' build. Some but not a lot.  It is
sort of hard to tell, when I am VPN'd in, and have to VPN using screen
'fitting'.  I would consider a 2% not to be out of ordinary (even 3%)
especially when I have to show the screen every couple of minutes and at
least wiggle the mouse to keep from going into a screen saver mode.  So the
bench testing is not ideal on the setup I have right now, but it 'does' work
good enough to show there is not that big of a difference.  However, I do
feel that we should 'honor' the allows-aligned in generic builds if the
compiler uses a known flag of a known 'intel/AMD' out-of-align capable CPU.

Number of benchmarks:           203
Minimum:                        0.97033 real, 0.97314 virtual
Maximum:                        1.16071 real, 1.16837 virtual
Median:                         0.99931 real, 0.99845 virtual
Median absolute deviation:      0.00375 real, 0.00490 virtual
Geometric mean:                 1.00095 real, 1.00046 virtual
Geometric standard deviation:   1.01710 real, 1.01749 virtual

Now, I also compared generic to 32 bit sse2i build.  Here are the problem
items, and I am going to look into them, to see why they are so much slower.

Ratio:  0.90214 real, 0.90313 virtual   CRC-32:Only one salt
Ratio:  0.97073 real, 0.97016 virtual   Kerberos v5 TGT 3DES:Raw
Ratio:  0.89424 real, 0.89681 virtual   dynamic_16:
md5(md5(md5($p).$s).$s2):Many salts
Ratio:  0.96105 real, 0.95752 virtual   dynamic_1003
Ratio:  0.91862 real, 0.91647 virtual   dynamic_16:
md5(md5(md5($p).$s).$s2):Only one salt
Ratio:  0.97320 real, 0.97388 virtual   EPiServer salted SHA-1/SHA-256:Raw
Ratio:  0.89469 real, 0.89307 virtual   dynamic_15: md5($u.md5($p).$s):Many
Ratio:  0.89227 real, 0.89040 virtual   dynamic_15: md5($u.md5($p).$s):Only
one salt

Some of them are formats which may bounce into and out of SSE2. It may
simply be faster on some of them, to re-write them to simply say not-SSE2,
which will cause the oSSL (or md5.c) code to be used from the start.  The
EIiServer/Kerberos were likely within the bounds of 'normal' fluctuation on
this runtime environment, but left here anyway.

Also note, dyna_15 and dyna_16 are totally fabricated formats, put in there
simply to exercise the $u and $s2 'features'.  It could easily be that the
functions I have used, requires a different script for SSE vs non-SSE, which
I have had to do for a format or 2 already.    And, crc32 is only a
'play-toy' type format in reality, used to exercise the FMT_NOT_EXACT flags
(even though it 'could' be used for some things).

On the other side, there were some formats 7x faster on my system for SSSE2i
than generic.   Overall it was
SSE2i vs generic-align0
Number of benchmarks:           203
Minimum:                        0.89227 real, 0.89040 virtual
Maximum:                        7.06252 real, 7.06982 virtual
Median:                         1.26075 real, 1.26075 virtual
Median absolute deviation:      0.36847 real, 0.37035 virtual
Geometric mean:                 1.70610 real, 1.70507 virtual
Geometric standard deviation:   1.77154 real, 1.77143 virtual


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.