Date: Sat, 7 Jul 2012 13:37:00 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: make-generic aligned vs unaligned speeds, and generic vs SS2i speeds. >From: jfoug [mailto:jfoug@....net] >There are 2 patches here. > >2. I added the allows_unaligned logic to detect.c (in the 'unaligned' >patch above). > >The 2nd, I believe, should be in bleeding, and magnum-jumbo. I will >post timing differences shortly on a mag-bleeding build, where the >aligned vs unaligned is the only difference. I also plan on comparing >to sse2i build (to see if what non-optimal formats there are in that >build). Benching on my older core2, showed little difference between the ALLOW-ALIGNED and !ALLOW-ALIGNED 'generic' build. Some but not a lot. It is sort of hard to tell, when I am VPN'd in, and have to VPN using screen 'fitting'. I would consider a 2% not to be out of ordinary (even 3%) especially when I have to show the screen every couple of minutes and at least wiggle the mouse to keep from going into a screen saver mode. So the bench testing is not ideal on the setup I have right now, but it 'does' work good enough to show there is not that big of a difference. However, I do feel that we should 'honor' the allows-aligned in generic builds if the compiler uses a known flag of a known 'intel/AMD' out-of-align capable CPU. Number of benchmarks: 203 Minimum: 0.97033 real, 0.97314 virtual Maximum: 1.16071 real, 1.16837 virtual Median: 0.99931 real, 0.99845 virtual Median absolute deviation: 0.00375 real, 0.00490 virtual Geometric mean: 1.00095 real, 1.00046 virtual Geometric standard deviation: 1.01710 real, 1.01749 virtual Now, I also compared generic to 32 bit sse2i build. Here are the problem items, and I am going to look into them, to see why they are so much slower. Ratio: 0.90214 real, 0.90313 virtual CRC-32:Only one salt Ratio: 0.97073 real, 0.97016 virtual Kerberos v5 TGT 3DES:Raw Ratio: 0.89424 real, 0.89681 virtual dynamic_16: md5(md5(md5($p).$s).$s2):Many salts Ratio: 0.96105 real, 0.95752 virtual dynamic_1003 md5(md5($p).md5($p)):Raw Ratio: 0.91862 real, 0.91647 virtual dynamic_16: md5(md5(md5($p).$s).$s2):Only one salt Ratio: 0.97320 real, 0.97388 virtual EPiServer salted SHA-1/SHA-256:Raw Ratio: 0.89469 real, 0.89307 virtual dynamic_15: md5($u.md5($p).$s):Many salts Ratio: 0.89227 real, 0.89040 virtual dynamic_15: md5($u.md5($p).$s):Only one salt Some of them are formats which may bounce into and out of SSE2. It may simply be faster on some of them, to re-write them to simply say not-SSE2, which will cause the oSSL (or md5.c) code to be used from the start. The EIiServer/Kerberos were likely within the bounds of 'normal' fluctuation on this runtime environment, but left here anyway. Also note, dyna_15 and dyna_16 are totally fabricated formats, put in there simply to exercise the $u and $s2 'features'. It could easily be that the functions I have used, requires a different script for SSE vs non-SSE, which I have had to do for a format or 2 already. And, crc32 is only a 'play-toy' type format in reality, used to exercise the FMT_NOT_EXACT flags (even though it 'could' be used for some things). On the other side, there were some formats 7x faster on my system for SSSE2i than generic. Overall it was SSE2i vs generic-align0 Number of benchmarks: 203 Minimum: 0.89227 real, 0.89040 virtual Maximum: 7.06252 real, 7.06982 virtual Median: 1.26075 real, 1.26075 virtual Median absolute deviation: 0.36847 real, 0.37035 virtual Geometric mean: 1.70610 real, 1.70507 virtual Geometric standard deviation: 1.77154 real, 1.77143 virtual Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.