Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 14 Jul 2015 09:56:33 +0800
From: Lei Zhang <>
Subject: Re: Interleaving of intrinsics

> On Jun 23, 2015, at 2:03 AM, Solar Designer <> wrote:
> One thing that is clear is that non-fully-unrolled *_PARA_DO are not
> acceptable.  If there are not enough registers for fully unrolling
> these without incurring spilling, then the interleaving factor should be
> smaller.  On MIC, there should be enough registers for the interleaving
> factors considered above (up to 5x).

The only mechanism I can find to control the unrolling of a specific loop is '#pragma unroll (n)', which supposedly tell the compiler to unroll the loop by the factor of exactly n. I just tried it on MD5_PARA_DO, with icc, clang and gcc respectively. Below are the size of text segment before and after using this directive, sorted by interleaving factors. All compilers were invoked with -O2.

factor	before	after
x1	118220	118220
x2	132564	132268
x3	138276	146220
x4	152420	164732

factor	before	after
x1	117562	117562
x2	124658	125602
x3	131042	133954
x4	136882	143170

factor	before	after
x1	124897	124897
x2	124471	124471
x3	131537	131537
x4	138291	138291

It seems icc and clang paid enough respect to this directive, but gcc somehow just ignored it.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.