Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 1 Jan 2012 10:11:55 +0400
From: Solar Designer <>
Subject: Re: DES with OpenMP

On Sat, Dec 31, 2011 at 08:32:43PM +0000, Alex Sicamiotis wrote:
> I read what you said about icc improving per core speed, well I hadn't thought of the difference because non-openMP builds produce similar results due to the asm code. I just did a run of the openMP build with OMP_NUM_THREADS=1. It seems you are right. I got 4527k vs 4330-4350 of the best GCC/ICC/Open64 builds I have.... this is +4% in single core cracking speed. Not bad at all. Apparently icc is better even than the asm code. So it's efficiency is actually 8707 / (4527X2) = ~96.1%.

Thanks for the info.  Yes, icc is really good.  Also, you probably have
it tune for your specific CPU model, whereas in the supplied assembly
code I couldn't reasonably focus on just one CPU model.

> Btw, I also found very small gains by tweaking the linux kernel... I'm using opensuse and opensuse has two kernels... desktop (low latency, more timeslicing) and server (higher latency, more processing throughput). Server gave ~+30k c/s peak relative to desktop, and a custom built kernel for my cpu, less debug code etc, gave another 10k c/s relative to Server. I was kind of disappointed though. 5 or 6 years ago, in the Athlon XP days, latency was more critical for perfromance. IIRC, I was around 920k c/s with low-latency kernel, 970-980k c/s with server kernel and >1.020.000 in win32 with cygwin - but the system was clearly less responsive than even the server kernel of linux. I later read somewhere that XP uses a timer frequency of 100Hz (linux server kernel = 250, linux desktop kernel = 1000Hz). The increased hz frequency is more disruptive, and increased timeslicing leads to more cache-misses. Somehow, today's kernels only show marginal differences :|

I think the differences you observed with the Athlon were caused by
something else.  In fact, even the +30k c/s that you report now sounds
excessive to me.  Back in the 1990s with 100 to 200 MHz original Pentium
CPUs, I measured a difference of around 1% between 100 Hz and 1000 Hz
timer frequency in Linux (custom patch for Linux 2.0.x kernels).  This
should be like 0.1% with a 1 GHz CPU.

My guess is that you have some user-space processes running - perhaps
they're part of a GUI desktop.

As to the Cygwin build, clearly it was different from the Linux build in
several ways (relative placement of variables and pieces of code, etc. -
maybe more lucky in your case).

> For the time being, I'm absolutely "settled" with the icc openmp version. It practically eliminates the need for two sessions of john, except when I'm running KDE desktop. Then the speed falls from 8600 k c/s to 8200-8300 k c/s, and ~7500k c/s when having mp3s playing or stuff. That's when I prefer 2 concurrent non-openMP versions. Even a 2% load in, mp3s, plasma and stuff is very disruptive to openMP, despite nice values of minus 11 to minus 15 in a "server" kernel.

Yes, OpenMP is very sensitive to other load.  You may try to mitigate
this to some extent by tweaking GOMP_SPINCOUNT:

You may also try adding schedule(dynamic) to the relevant #pragma line.
This may make things slightly slower on an otherwise idle system, but
slightly faster when there's other load.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.