Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 10 Nov 2011 09:48:01 -0600
From: "jfoug" <>
To: <>
Subject: RE: more targets using sse-intrinsics.S

Windows is not a straight forward build.  I have gotten it built, and it passes -test=0.  I am building a 'pre .S' sse-intrinsics.c build, with cygwin (x86-32), and will compare speeds.

Here are the changes needed for cygwin building (I am thinking about a perl script to run during the make, for win32-cygwin-x86-sse2i build)

1. all lines that are .type need to be commented out
2. all lines that are .size need to be commented out
3. a couple of .section lines at end of file need commented out.
4. a #ifdef UNDERSCORES section needed added at top of file.

	.file "sse-intrinsics.c"
#define memcpy        _memcpy
#define memset        _memset
#define strlen        _strlen
#define MD5_Init      _MD5_Init
#define MD5_Update    _MD5_Update
#define MD5_Final     _MD5_Final
#define SSEmd5body    _SSEmd5body
#define SSESHA1body   _SSESHA1body
#define SSEmd4body    _SSEmd4body
#define md5cryptsse   _md5cryptsse
# -- Begin  sse_debug

Here are some timings, testing original md5-mmx.S, an intrinsic build using sse-intrisics.c and one using a patched sse-intrisic-32.S.

***Timings md5-mmx.S  (original Bartavelle .S code).
dynamic_0: md5($p)  (raw-md5)    [32x4 .S]  9925K
dynamic_1: md5($p.$s)  (joomla)  [32x4 .S]  Many:  8285K 1salt:  5552K
dynamic_2: md5(md5($p))  (e107)  [32x4 .S]  5502K
dynamic_3: md5(md5(md5($p)))     [32x4 .S]  3792K
dynamic_4: md5($s.$p)  (OSC)     [32x4 .S]  Many:  9108K 1salt:  5620K
dynamic_5: md5($s.$p.$s)         [32x4 .S]  Many:  7348K 1salt:  4966K
FreeBSD MD5 [32/32]                          6796 c/s
Raw SHA-1 [4x]                              7568K c/s

***Timings sse-intrinsics.c
dynamic_0: md5($p)  (raw-md5)    [16x4x2]  9412K
dynamic_1: md5($p.$s)  (joomla)  [16x4x2]  Many:  8243K 1salt:  5594K
dynamic_2: md5(md5($p))  (e107)  [16x4x2]  5363K
dynamic_3: md5(md5(md5($p)))     [16x4x2]  3727K
dynamic_4: md5($s.$p)  (OSC)     [16x4x2]  Many:  9311K 1salt:  5739K
dynamic_5: md5($s.$p.$s)         [16x4x2]  Many:  7655K 1salt:  5097K
FreeBSD MD5 [8x]                           14480 c/s
Raw SHA-1 [8x]                             5495K c/s

***Timings sse-intrinsics-32.S  (patched for Win32)
dynamic_0: md5($p)  (raw-md5)    [10x4x3]  11449K
dynamic_1: md5($p.$s)  (joomla)  [10x4x3]  Many:  9953K 1salt:  6270K
dynamic_2: md5(md5($p))  (e107)  [10x4x3]  6763K
dynamic_3: md5(md5(md5($p)))     [10x4x3]  4785K
dynamic_4: md5($s.$p)  (OSC)     [10x4x3]  Many: 11447K 1salt:  6455K
dynamic_5: md5($s.$p.$s)         [10x4x3]  Many:  9123K 1salt:  5677K
FreeBSD MD5 [12x]                          21042
Raw SHA-1 [8x]                             7079K c/s

So, the intrinsic-32.S is about 22% faster for dynamic, about 45% faster for the Crypt(3) MD5, and about 22% faster for raw-SHA1.
However, the intrinsic-32.S is about 20-22% faster for dynamic, about 300% faster for the Crypt(3) MD5 (only supports SSE2i), and about 6-7% slower for raw-SHA1.

All in all, this is not a bad way to proceed.  These are some very prelim tests.  I have not had time to really dig in too deep.


>From: magnum []
>2011-11-09 22:57, magnum wrote:
>> 2011-11-09 18:47, magnum wrote:
>>> 2. Should we support this for 32-bit at all? I suppose I can cross
>>> compile a 32-bit .S file with icc (haven't tried it) but I have no
>>> if it will perform better or worse than gcc on a 32-bit machine. I
>>> suppose this should be verified on a machine that is really 32-bit.
>A somewhat experimental patch 0004 is now uploaded. It adds a 32-bit
>version of the intrinsics asm file generated by icc, and modifies all
>sse2i targets to utilize it.
>Works fine (and do give a performance boost) on linux-x86-64-32-sse2i
>test target. I'd be interested to hear results from testing on Windows.

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ