Date: Thu, 11 Jun 2015 09:30:35 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics On 2015-06-11 04:11, Lei Zhang wrote: > I manually checked the report given by icc under interleaving x2. By > checking the line number of the unrolled loops in the report, I can > tell if a specific loop in the source is unrolled. > > There're 13 instances of SHA256_PARA_DO in SSESHA256body. According > to icc's report, 10 of them are fully unrolled. > > In addition, there're 64 instances of SHA256_STEP, which in turn > invokes SHA256_PARA_DO. But none of them are unrolled according to > the report. > > So there're in total 13 + 64 = 77 loops contributed by > SHA256_PARA_DO, but only 10 of them are unrolled. That doesn't look > good. Now we're getting somewhere. What if you build the "unrolled" topic branch instead, using para 2 (I think I didn't add code for higher para yet). This will be manually unrolled. How many vmovdqu can you see in that? Do you see other differences compared to the bleeding code (at same para)? magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.