Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 Jun 2015 09:36:51 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Interleaving of intrinsics


> On Jun 9, 2015, at 8:46 PM, Lei Zhang <zhanglei.april@...il.com> wrote:
> 
> I tried to see the 'size' of sse-intrinsics.o under different interleaving factors and compiled by clang and icc respectively.
> 
> lei-mac:src lei$ size clang/*
> __TEXT	__DATA	__OBJC	others	dec	hex
> 122863	0	0	26572	149435	247bb	clang/x1.o
> 127951	0	0	28699	156650	263ea	clang/x2.o
> 128479	0	0	28614	157093	265a5	clang/x3.o
> 127679	0	0	28527	156206	2622e	clang/x4.o
> 
> lei-mac:src lei$ size icc/*
> __TEXT	__DATA	__OBJC	others	dec	hex
> 102084	7545	0	50442	160071	27147	icc/x1.o
> 113012	9799	0	49375	172186	2a09a	icc/x2.o
> 113348	9799	0	51275	174422	2a956	icc/x3.o
> 114740	9799	0	53235	177774	2b66e	icc/x4.o

I forgot to mention that the interleaving factor I experimented is SIMD_PARA_SHA256.

The corresponding performance of pbkdf2-hmac-sha256 is:

[clang]
x1
Raw:	289 c/s real, 289 c/s virtual
x2
Raw:	271 c/s real, 271 c/s virtual
x3
Raw:	273 c/s real, 273 c/s virtual
x4
Raw:	269 c/s real, 269 c/s virtual

[icc]
x1
Raw:	300 c/s real, 300 c/s virtual
x2
Raw:	235 c/s real, 235 c/s virtual
x3
Raw:	242 c/s real, 242 c/s virtual
x4
Raw:	226 c/s real, 226 c/s virtual

There's more noticeable degradation for icc when interleaving is increased from x1 to x2. Considering the size change, it looks icc is indeed more aggressive when unrolling.

OTOH, when interleaving is increased from x2 to x4, the size of text segment doesn't change as significantly as from x1 to x2. I don't know why this happened.

> interleaving	loops unrolled
> --------------------------------------
> x1			215
> x2			225
> x3			225
> x4			225

I think 225 - 215 = 10 corresponds to the number of unrolled SHA256_PARA_DOs, which is a bit less than the actual number of SHA256_PARA_DOs used in the source code. I manually compared the report given by icc and the source code, and confirmed that a few SHA256_PARA_DOs are indeed not unrolled. This again implies that manual unrolling may be needed. Or maybe tweaking the compiler flags can help.


Lei

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ