john-users - Re: comparing two benchmarks

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20111107220005.GA3087@openwall.com>
Date: Tue, 8 Nov 2011 02:00:05 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: comparing two benchmarks

On Mon, Nov 07, 2011 at 01:10:23AM -0800, firstname lastname wrote:
> Thanks Alex for this Perl Script. Will go through the code and try to understand it. Also, will try out the script and see what it does! :)

Attached is second revision of the script.

The idea to calculate and report the usual standard deviation wasn't a
very good one for two reasons.  First, the value being reported was
still not exactly standard deviation under the usual definition (even
after my fix) because it was relative to the geometric mean rather than
to the expected value (which is the same as the arithmetic mean when the
weights are equal).  Second, the usual standard deviation isn't a good
measure to consider in this context (even if calculated properly).  The
script will now report the geometric standard deviation instead.  It
will also report several other measures.  Sample output:

$ ./relbench.pl asis asis  
Number of benchmarks:           158
Minimum:                        1.00000 real, 1.00000 virtual
Maximum:                        1.00000 real, 1.00000 virtual
Median:                         1.00000 real, 1.00000 virtual
Median absolute deviation:      0.00000 real, 0.00000 virtual
Geometric mean:                 1.00000 real, 1.00000 virtual
Geometric standard deviation:   1.00000 real, 1.00000 virtual

$ ./relbench.pl asis Os
Number of benchmarks:           158
Minimum:                        0.52807 real, 0.52807 virtual
Maximum:                        1.12544 real, 1.12007 virtual
Median:                         0.92055 real, 0.91886 virtual
Median absolute deviation:      0.05072 real, 0.05383 virtual
Geometric mean:                 0.91469 real, 0.91521 virtual
Geometric standard deviation:   1.09400 real, 1.09357 virtual

The relevant concepts are explained here:

http://en.wikipedia.org/wiki/Median
http://en.wikipedia.org/wiki/Median_absolute_deviation
http://en.wikipedia.org/wiki/Geometric_mean
http://en.wikipedia.org/wiki/Geometric_standard_deviation

Here's why we should in fact use the geometric mean (and not the
arithmetic mean or some other mean):

http://en.wikipedia.org/wiki/Geometric_mean#Properties

"... This makes the geometric mean the only correct mean when averaging
normalized results, that is results that are presented as ratios to
reference values.  This is the case when presenting computer performance
with respect to a reference computer ..."
"... using the arithmetic or harmonic mean would change the ranking of
the results depending on what is used as a reference."
"In all cases, the ranking given by the geometric mean stays the same as
the one obtained with unnormalized values."

We have to use normalized values because the hashes and ciphers being
benchmarked have very different performance.

As to the median and the median absolute deviation, these are more
robust when there are outliers:

http://en.wikipedia.org/wiki/Robust_statistics#Examples_of_robust_and_non-robust_statistics
http://en.wikipedia.org/wiki/Outlier

I recommend that we primarily use the geometric mean, but also look at
the median as a sanity check.  When they're very different from each
other (not the case in sample output above), this indicates that there's
at least one outlier that we may want to exclude from the script's input
(and consider it separately).  Here's an example:

$ ./relbench.pl asis hacked
Number of benchmarks:           158
Minimum:                        1.00000 real, 1.00000 virtual
Maximum:                        1000.00000 real, 1.00000 virtual
Median:                         1.00000 real, 1.00000 virtual
Median absolute deviation:      0.00000 real, 0.00000 virtual
Geometric mean:                 1.04469 real, 1.00000 virtual
Geometric standard deviation:   1.72946 real, 1.00000 virtual

In this case, I edited the output of one of the 158 benchmarks making it
appear 1000 times faster (in terms of "real c/s" only).  The geometric
mean shows this as a 4.5% overall speedup and it also shows relatively
high geometric standard deviation.  The medians simply ignore it (as an
outlier).

Alexander

View attachment "relbench.pl" of type "text/plain" (3181 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.