Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 Jun 2015 19:19:58 +0200
From: magnum <>
Subject: Re: Interleaving of intrinsics

On 2015-06-10 17:59, Lei Zhang wrote:
> I further did some investigation into the asm code generated under x1
> & x2 (SIMD_PARA_SHA256) by icc on my laptop (AVX). In SSESHA256body,
> there're about 200 vmovdqu instructions generated under x1, and the
> number is 260 under x2. Most of the vmovdqu instructions seem to be
> used for loading & storing xmm registers, only a few for
> inter-register moving. I think it's likely those additional vmovdqu
> instructions under x2 are for register spilling.

So we get 30% more load/store for 100% more work done. That should be a 
win! But this assumes we're not having actual loops in the code.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.