Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 25 Apr 2013 01:12:19 +0200
From: magnum <john.magnum@...hmail.com>
To: "john-dev@...ts.openwall.com" <john-dev@...ts.openwall.com>
Subject: ICC performance regression

Jim and I have independantly re-built new intrinsics .S files with latest icc (13.1.1) with BAD results. There are huge performance regressions.

We could try to find an older version (last files were built with 12.1.4), but even so, it seems gcc has gotten so much better, the incentives for maintaining the pre-built ones are gone now:


Old pre-built files, icc 12.1.4:
Benchmarking: Raw MD4 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	41129K c/s real, 41129K c/s virtual

Benchmarking: Raw MD5 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	31789K c/s real, 31789K c/s virtual

Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 8x]... DONE
Raw:	15239K c/s real, 15239K c/s virtual

Benchmarking: FreeBSD MD5 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	39204 c/s real, 39204 c/s virtual


New pre-built files, icc 13.1:
Benchmarking: Raw MD4 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	28670K c/s real, 28670K c/s virtual

Benchmarking: Raw MD5 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	20144K c/s real, 20144K c/s virtual

Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 8x]... DONE
Raw:	14799K c/s real, 14799K c/s virtual

Benchmarking: crypt-MD5 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	21996 c/s real, 21996 c/s virtual


gcc 4.7.2, plain -64 target:
Benchmarking: Raw MD4 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	42395K c/s real, 42395K c/s virtual

Benchmarking: Raw MD5 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	30036K c/s real, 30036K c/s virtual

Benchmarking: Raw SHA-1 [128/128 SSE2 intrinsics 4x]... DONE
Raw:	17231K c/s real, 17231K c/s virtual

Benchmarking: crypt-MD5 [128/128 SSE2 intrinsics 12x]... DONE
Raw:	35952 c/s real, 35952 c/s virtual


...not to mention that when not using the pre-built files, you can use a -native, -xop or -avx target and get even better results:


gcc 4.7.2, -native target:
Benchmarking: Raw MD4 [128/128 AVX intrinsics 12x]... DONE
Raw:	44632K c/s real, 44632K c/s virtual

Benchmarking: Raw MD5 [128/128 AVX intrinsics 12x]... DONE
Raw:	29950K c/s real, 29950K c/s virtual

Benchmarking: Raw SHA-1 [128/128 AVX intrinsics 4x]... DONE
Raw:	19419K c/s real, 19419K c/s virtual

Benchmarking: crypt-MD5 [128/128 AVX intrinsics 12x]... DONE
Raw:	36936 c/s real, 36936 c/s virtual


Personally, I think we should just drop the -i targets, but Jim will investigate some more.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.