Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 9 Jun 2015 00:24:49 +0800
From: Lei Zhang <>
Subject: Re: Interleaving of intrinsics

> On Jun 6, 2015, at 7:47 PM, Solar Designer <> wrote:
> Your use of VTune appears to be similar to use of gprof.  If you use
> VTune at all, I'd expect you to profile things such as cache misses and
> pipeline stalls, as well as utilization of the CPU's execution units.
> Things that only the CPU vendor's profiler is capable of.

I played with VTune for a while and gathered some more statistics. There're so many micro-architecture metrics that it's a bit overwhelming. I picked a some metrics which VTune marked as non-optimal for some interleaving factors and showed them here:
(figures prefixed with * are marked as non-optimal by VTune; x1/2/3/4 denote interleaving factors)

Configurations: icc, non-OpenMP, Linux VM, --test=20 --format=pbkdf2-hmac-sha256

Filled Pipeline Slots -> Retirement
x1	0.613*
x2	0.658*
x3	0.620*
x4	0.592
(This metric represents a fraction of slots during which CPU was retiring uOps not originated from the Microcode Sequencer)

Unfilled Pipeline Slots -> Back-End Bound
x1	0.355*
x2	0.246*
x3	0.342*
x4	0.338*
(Identify slots where no uOps are delivered due to a lack of required resources for accepting more uOps in the back-end of pipeline)

Unfilled Pipeline Slots -> Front-End Bound -> Cache Misses
x1	0.004
x2	0.003
x3	0.024*
x4	0.018*
(A proportion of instruction fetches are missing in the instruction cache)

Full reports are attached, containing more detailed metrics. But some near-zero figures might be inaccurate where VTune reports the amount of samples collected is too low.

Honestly I don't have a very solid understanding of micro-architecture and could't interpret many of those metrics. Maybe you can get some hints out of them.


View attachment "report-x1.txt" of type "text/plain" (2123 bytes)

View attachment "report-x2.txt" of type "text/plain" (2123 bytes)

View attachment "report-x3.txt" of type "text/plain" (2122 bytes)

View attachment "report-x4.txt" of type "text/plain" (2123 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.