Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 26 Oct 2011 01:49:35 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: new DES key setup

magnum, Erik -

Thank you for testing!

There's 1.7.8.5 in CVS now, with some very minor changes over .4 - such
as for portability to ancient systems (it compiles with gcc 2.7.2.3 on
Slackware 3.x again) and addition of -Os into OPT_INLINE (even though
one has to remove it and -lcrypt when compiling on those ancient systems).

-Os appears to deal with the performance regression we saw with gcc 4.6 -
not only on x86-64/SSE2 builds, but also on several others.  I was not
able to trace this to a specific -f* option nor to a parameter.  For
example, these commands:

gcc -Os -Q --help=optimizers
gcc -O2 -finline-functions -Q --help=optimizers

produce identical output for me, which is consistent with gcc source
code (indeed), but not consistent with documentation, nor with actual
optimizations I am seeing in generated code (so there must be something
else that is relevant but is not shown with "-Q --help=optimizers").

Anyway, here are the new numbers for Core i7-2600K 3.4 GHz (turbo up to
3.8 GHz when only one core is in use, 3.5 GHz when all four are in use
and the CPU is not overheating), Ubuntu 11.10, "gcc version 4.6.1
(Ubuntu/Linaro 4.6.1-9ubuntu3)":

OpenMP, 8 threads:

Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE
Many salts:     22773K c/s real, 2857K c/s virtual
Only one salt:  18284K c/s real, 2282K c/s virtual

Benchmarking: LM DES [128/128 BS AVX-16]... DONE
Raw:    88670K c/s real, 11125K c/s virtual

4 threads:

Benchmarking: LM DES [128/128 BS AVX-16]... DONE
Raw:    110428K c/s real, 27815K c/s virtual

(limiting the number of threads to 4 only helps with LM).

Non-OpenMP:

Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE
Many salts:     5805K c/s real, 5864K c/s virtual
Only one salt:  5507K c/s real, 5507K c/s virtual

Benchmarking: LM DES [128/128 BS AVX-16]... DONE
Raw:    70803K c/s real, 71519K c/s virtual

On Tue, Oct 25, 2011 at 01:32:58AM +0200, magnum wrote:
> 2011-10-25 01:23, magnum wrote:
> > I have no turbo modes confusing stuff: It scales to 89% in "many salts"
> > and 92% in "one salt".
> 
> To be correct I think I should have said 88% in "many" and 86% in "one"
> salt. That is:
> 
> Non-OMP build:
> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:	2781K c/s real, 2787K c/s virtual
> Only one salt:	2677K c/s real, 2682K c/s virtual
> 
> OMP build ran on 2 cores:
> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:	4903K c/s real, 2471K c/s virtual
> Only one salt:	4597K c/s real, 2314K c/s virtual
> 
> 
> 4903/(2787*2)
> .87961966271977036239
> 
> 4597/(2682*2)
> .85700969425801640566

It depends on what you take for 100%.  Maybe your system was under
slight other load - e.g., from GUI desktop apps such as clock, load
monitor, etc.  You could run two instances of the non-OMP build in
parallel (with a script) and add their speeds up instead of merely
multiplying one instance's speed by two.  Not that this matters much.

Thanks again,

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ