Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 14 Jul 2015 12:11:16 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Interleaving of intrinsics


> On Jun 23, 2015, at 2:03 AM, Solar Designer <solar@...nwall.com> wrote:
> 
> One thing that is clear is that non-fully-unrolled *_PARA_DO are not
> acceptable.  If there are not enough registers for fully unrolling
> these without incurring spilling, then the interleaving factor should be
> smaller.  On MIC, there should be enough registers for the interleaving
> factors considered above (up to 5x).

I just manually unrolled SHA256_STEP and SHA512_STEP respectively, and compared the performance with the auto-unrolled ones, using magnum's testpara.pl. The figures below are obtained on my laptop (formats are pbkdf2-*):

[auto]
hash\para  |       1  |       2  |       3  |       4  |       5  |
-----------|----------|----------|----------|----------|----------|
sha256     |  **4020**|    3760  |    3924  |    3801  |    3940  |
sha512     |  **1624**|    1092  |    1413  |    1409  |    1435  |

[manual]
hash\para  |       1  |       2  |       3  |       4  |       5  |
-----------|----------|----------|----------|----------|----------|
sha256     |  **4144**|    1888  |    1817  |    1837  |    1821  |
sha512     |  **1646**|     748  |     708  |     720  |     722  |

With manual unrolling, the performance degrades drastically from interleaving x1 to x2, but not so much upwards. BTW, I didn't change the original array tmps. Just the loop is manually unrolled here.


Lei

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ