Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 11 Jun 2015 09:30:35 +0200
From: magnum <>
Subject: Re: Interleaving of intrinsics

On 2015-06-11 04:11, Lei Zhang wrote:
> I manually checked the report given by icc under interleaving x2. By
> checking the line number of the unrolled loops in the report, I can
> tell if a specific loop in the source is unrolled.
> There're 13 instances of SHA256_PARA_DO in SSESHA256body. According
> to icc's report, 10 of them are fully unrolled.
> In addition, there're 64 instances of SHA256_STEP, which in turn
> invokes SHA256_PARA_DO. But none of them are unrolled according to
> the report.
> So there're in total 13 + 64 = 77 loops contributed by
> SHA256_PARA_DO, but only 10 of them are unrolled. That doesn't look
> good.

Now we're getting somewhere. What if you build the "unrolled" topic 
branch instead, using para 2 (I think I didn't add code for higher para 
yet). This will be manually unrolled. How many vmovdqu can you see in 
that? Do you see other differences compared to the bleeding code (at 
same para)?


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.