Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 16 Aug 2015 23:04:17 +0300
From: Solar Designer <>
Subject: Re: FMT_OMP_BAD

On Sun, Aug 16, 2015 at 08:05:42PM +0300, Solar Designer wrote:
> In this test, I compared an OpenMP-enabled build at different thread
> counts (1 vs. 10).  It would also be relevant to compare these against a
> non-OpenMP build.  For some formats, it may show substantially different
> numbers than we're seeing for 1 thread in an OpenMP-enabled build (and
> this will suggest there's a need for optimization for the slower one of
> these two cases).  In fact, for deciding on where to add
> FAST_FORMATS_OMP checks, a comparison of non-OpenMP vs. 10 threads would
> be more relevant than the above comparison for 1 vs. 10 threads.

I just ran non-OpenMP benchmarks as well.  Here's the comparison of
non-OpenMP vs. 1 thread on super:

Number of benchmarks:           403
Minimum:                        0.55939 real, 0.55939 virtual
Maximum:                        1.12820 real, 1.12820 virtual
Median:                         0.99948 real, 0.99922 virtual
Median absolute deviation:      0.01257 real, 0.01305 virtual
Geometric mean:                 0.98630 real, 0.98600 virtual
Geometric standard deviation:   1.05860 real, 1.05886 virtual

Median and mean are pretty close to 1.0, which is good.  However, there
are some outliers.  I've attached the output of:

./relbench -v a0 a1 | grep ^Ratio: | sort -nk2 > nonvsomp.txt

where a0 was non-OpenMP, and a1 was OpenMP with 1 thread.

The worst performance impact of enabling OpenMP is seen for:

Ratio:  0.55939 real, 0.55939 virtual   dynamic_1400:Raw
Ratio:  0.64531 real, 0.64531 virtual   dynamic_1401:Only one salt

This is followed by my bitslice DES formats, for which the impact is
20%.  That's pretty bad.  They start collecting much larger groups of
candidate passwords when OpenMP is enabled, and this may result in
higher cache miss rate.  This doesn't explain why the performance impact
is roughly the same regardless of iteration count, though.  There must
be something else.  I'll need to check if this same performance impact
is seen in core tree.

The highest performance improvement is seen for:

Ratio:  1.12820 real, 1.12820 virtual   Fortigate, FortiOS:Many salts
Ratio:  1.07760 real, 1.07760 virtual   kwallet, KDE KWallet:Raw
Ratio:  1.07405 real, 1.07405 virtual   EPI, EPiServer SID:Only one salt
Ratio:  1.07247 real, 1.07247 virtual   SSHA512, LDAP:Many salts
Ratio:  1.07125 real, 1.07125 virtual   Tiger:Raw
Ratio:  1.06730 real, 1.06730 virtual   sapb, SAP CODVN B (BCODE):Only one salt
Ratio:  1.06550 real, 1.05489 virtual   dahua, "MD5 based authentication" Dahua:Raw
Ratio:  1.06022 real, 1.06022 virtual   has-160:Raw

which might suggest that they need an equivalent of OMP_SCALE, and thus
higher max_keys_per_crypt, even for non-OpenMP builds.  Someone might
want to explore this.

Overall, I am relieved these performance differences aren't much worse.


View attachment "nonvsomp.txt" of type "text/plain" (25851 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.