john-dev - Re: Interleaving of intrinsics

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <513a4b9826006c9659f32134e87004a6@smtp.hushmail.com>
Date: Wed, 10 Jun 2015 19:19:58 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Interleaving of intrinsics

On 2015-06-10 17:59, Lei Zhang wrote:
> I further did some investigation into the asm code generated under x1
> & x2 (SIMD_PARA_SHA256) by icc on my laptop (AVX). In SSESHA256body,
> there're about 200 vmovdqu instructions generated under x1, and the
> number is 260 under x2. Most of the vmovdqu instructions seem to be
> used for loading & storing xmm registers, only a few for
> inter-register moving. I think it's likely those additional vmovdqu
> instructions under x2 are for register spilling.

So we get 30% more load/store for 100% more work done. That should be a 
win! But this assumes we're not having actual loops in the code.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.