Date: Thu, 10 Nov 2011 09:48:01 -0600 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: more targets using sse-intrinsics.S Windows is not a straight forward build. I have gotten it built, and it passes -test=0. I am building a 'pre .S' sse-intrinsics.c build, with cygwin (x86-32), and will compare speeds. Here are the changes needed for cygwin building (I am thinking about a perl script to run during the make, for win32-cygwin-x86-sse2i build) 1. all lines that are .type need to be commented out 2. all lines that are .size need to be commented out 3. a couple of .section lines at end of file need commented out. 4. a #ifdef UNDERSCORES section needed added at top of file. .file "sse-intrinsics.c" #ifdef UNDERSCORES #define memcpy _memcpy #define memset _memset #define strlen _strlen #define MD5_Init _MD5_Init #define MD5_Update _MD5_Update #define MD5_Final _MD5_Final #define SSEmd5body _SSEmd5body #define SSESHA1body _SSESHA1body #define SSEmd4body _SSEmd4body #define md5cryptsse _md5cryptsse #endif .text ..TXTST0: # -- Begin sse_debug .... Here are some timings, testing original md5-mmx.S, an intrinsic build using sse-intrisics.c and one using a patched sse-intrisic-32.S. ***Timings md5-mmx.S (original Bartavelle .S code). dynamic_0: md5($p) (raw-md5) [32x4 .S] 9925K dynamic_1: md5($p.$s) (joomla) [32x4 .S] Many: 8285K 1salt: 5552K dynamic_2: md5(md5($p)) (e107) [32x4 .S] 5502K dynamic_3: md5(md5(md5($p))) [32x4 .S] 3792K dynamic_4: md5($s.$p) (OSC) [32x4 .S] Many: 9108K 1salt: 5620K dynamic_5: md5($s.$p.$s) [32x4 .S] Many: 7348K 1salt: 4966K FreeBSD MD5 [32/32] 6796 c/s Raw SHA-1 [4x] 7568K c/s ***Timings sse-intrinsics.c dynamic_0: md5($p) (raw-md5) [16x4x2] 9412K dynamic_1: md5($p.$s) (joomla) [16x4x2] Many: 8243K 1salt: 5594K dynamic_2: md5(md5($p)) (e107) [16x4x2] 5363K dynamic_3: md5(md5(md5($p))) [16x4x2] 3727K dynamic_4: md5($s.$p) (OSC) [16x4x2] Many: 9311K 1salt: 5739K dynamic_5: md5($s.$p.$s) [16x4x2] Many: 7655K 1salt: 5097K FreeBSD MD5 [8x] 14480 c/s Raw SHA-1 [8x] 5495K c/s ***Timings sse-intrinsics-32.S (patched for Win32) dynamic_0: md5($p) (raw-md5) [10x4x3] 11449K dynamic_1: md5($p.$s) (joomla) [10x4x3] Many: 9953K 1salt: 6270K dynamic_2: md5(md5($p)) (e107) [10x4x3] 6763K dynamic_3: md5(md5(md5($p))) [10x4x3] 4785K dynamic_4: md5($s.$p) (OSC) [10x4x3] Many: 11447K 1salt: 6455K dynamic_5: md5($s.$p.$s) [10x4x3] Many: 9123K 1salt: 5677K FreeBSD MD5 [12x] 21042 Raw SHA-1 [8x] 7079K c/s So, the intrinsic-32.S is about 22% faster for dynamic, about 45% faster for the Crypt(3) MD5, and about 22% faster for raw-SHA1. However, the intrinsic-32.S is about 20-22% faster for dynamic, about 300% faster for the Crypt(3) MD5 (only supports SSE2i), and about 6-7% slower for raw-SHA1. All in all, this is not a bad way to proceed. These are some very prelim tests. I have not had time to really dig in too deep. Jim. >From: magnum [mailto:john.magnum@...hmail.com] > >2011-11-09 22:57, magnum wrote: >> 2011-11-09 18:47, magnum wrote: >>> 2. Should we support this for 32-bit at all? I suppose I can cross >>> compile a 32-bit .S file with icc (haven't tried it) but I have no >idea >>> if it will perform better or worse than gcc on a 32-bit machine. I >>> suppose this should be verified on a machine that is really 32-bit. > >A somewhat experimental patch 0004 is now uploaded. It adds a 32-bit >version of the intrinsics asm file generated by icc, and modifies all >sse2i targets to utilize it. > >Works fine (and do give a performance boost) on linux-x86-64-32-sse2i >test target. I'd be interested to hear results from testing on Windows. > >magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.