Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sun, 16 May 2010 20:03:11 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: OpenMP benchmarks on UltraSPARC T2

Hi,

For the curious, here are benchmarks of 1.7.5-omp-2 on UltraSPARC T2
(quad-core, 8 threads per core).  The system:

$ uname -a
SunOS host 5.10 Generic_142900-10 sun4v sparc SUNW,SPARC-Enterprise-T5120

"/usr/sbin/psrinfo -v" reports 32 "virtual processors", all of which are
"online".  "/usr/platform/sun4v/sbin/prtdiag -v" also reports all 32,
but somehow only 28 of them are reported as "on-line".  I did not look
into this discrepancy.  Both report the clock rate as 1165 MHz.

The compiler:

$ cc -V
cc: Sun C 5.9 SunOS_sparc Patch 124867-14 2010/03/30

The default BF_mt of 24 (in BF_std.h) obviously would not use more than
24 threads, so I edited it to be 32.  Then I built with:

gmake solaris-sparc64-cc -j32

One thread (but OpenMP-capable build):

$ ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    96.7 c/s real, 96.6 c/s virtual

4, 8, 16, and 32 threads:

$ OMP_NUM_THREADS=4 ../run/john -te -fo=bf 
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    373 c/s real, 96.5 c/s virtual

$ OMP_NUM_THREADS=8 ../run/john -te -fo=bf 
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    393 c/s real, 70.8 c/s virtual

$ OMP_NUM_THREADS=16 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    397 c/s real, 28.6 c/s virtual

$ OMP_NUM_THREADS=32 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    596 c/s real, 19.0 c/s virtual

This scales pretty well (considering that the CPU is only quad-core with
SMT, not 32-core indeed).

For comparison, a non-OpenMP build does:

Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    110 c/s real, 110 c/s virtual

So we're getting a 5.42x speedup by going with the OpenMP build and
running 32 threads.  That's not bad for a quad-core with SMT.

Surprisingly, running 32 separate instances of the non-OpenMP build
(started with a script at almost the same time) results in only 18.0 c/s
per process, or 576 c/s total.  So the efficiency, measured in this way,
is 103%.  Maybe the OpenMP build results in more efficient usage of the
shared caches (a few mostly-read-only data structures may be shared),
which more than compensates for the performance hit of the
multi-threaded code (the speed reduction from 110 c/s to 96.7 c/s for a
single thread).

I've also tried increasing BF_mt to 96.  This resulted in the following
performance numbers for 32, 48, and 96 threads:

$ OMP_NUM_THREADS=32 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    601 c/s real, 19.0 c/s virtual

$ OMP_NUM_THREADS=48 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    601 c/s real, 19.0 c/s virtual

$ OMP_NUM_THREADS=96 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    602 c/s real, 19.0 c/s virtual

That's 104.5% efficiency, and 5.47x the speed of a single thread.

Overall, this is not a fast machine indeed, but it is good for OpenMP
performance testing.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ