Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 26 Aug 2015 10:52:12 +0300
From: Solar Designer <>
Subject: Re: LWS and GWS auto-tuning

On Tue, Aug 25, 2015 at 08:36:44PM +0200, magnum wrote:
> Worst/best 10 for Tahiti (oldoffice failing):
> Ratio:	0.55217 real, 0.56253 virtual	keychain-opencl, Mac OS X 
> Keychain:Raw
> Ratio:	0.69896 real, 0.70076 virtual	agilekeychain-opencl, 1Password 
> Agile Keychain:Raw

There's no slowdown for these two.  I think you saw slowdown because
these formats use OpenMP (in addition to OpenCL) and you didn't set
GOMP_CPU_AFFINITY=0-31.  With this setting, I am getting stable speeds,
and the auto-tuning results and the speeds are the same for old and new
auto-tuning code.

Without that setting, OpenMP speed at full thread count fluctuates badly
on super.  Unfortunately, we "can't" make this setting the default on
super because it's undesirable e.g. when running two instances of john,
each with a lower OMP_NUM_THREADS.  Or when running an OpenMP-enabled
build with --fork=32 (which disables use of OpenMP, yet IIRC OpenMP's
initialization does the CPU binding anyway... so all 32 child processes
end up fighting for logical CPU 0 only).

BTW, when running newer Linux kernels on similar hardware, the issue
doesn't arise (well, mostly).  We're running a RHEL6'ish distro on super
primarily for Xeon Phi support.

I've just added "export GOMP_SPINCOUNT=10000" to /etc/profile.d/
on super, which isn't nearly as good, but doesn't have the bad
side-effects for the scenarios I mentioned above.  It should greatly
reduce the fluctuation of OpenMP benchmarks most of the time.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.