[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sun, 16 May 2010 20:03:11 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: OpenMP benchmarks on UltraSPARC T2
Hi,
For the curious, here are benchmarks of 1.7.5-omp-2 on UltraSPARC T2
(quad-core, 8 threads per core). The system:
$ uname -a
SunOS host 5.10 Generic_142900-10 sun4v sparc SUNW,SPARC-Enterprise-T5120
"/usr/sbin/psrinfo -v" reports 32 "virtual processors", all of which are
"online". "/usr/platform/sun4v/sbin/prtdiag -v" also reports all 32,
but somehow only 28 of them are reported as "on-line". I did not look
into this discrepancy. Both report the clock rate as 1165 MHz.
The compiler:
$ cc -V
cc: Sun C 5.9 SunOS_sparc Patch 124867-14 2010/03/30
The default BF_mt of 24 (in BF_std.h) obviously would not use more than
24 threads, so I edited it to be 32. Then I built with:
gmake solaris-sparc64-cc -j32
One thread (but OpenMP-capable build):
$ ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 96.7 c/s real, 96.6 c/s virtual
4, 8, 16, and 32 threads:
$ OMP_NUM_THREADS=4 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 373 c/s real, 96.5 c/s virtual
$ OMP_NUM_THREADS=8 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 393 c/s real, 70.8 c/s virtual
$ OMP_NUM_THREADS=16 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 397 c/s real, 28.6 c/s virtual
$ OMP_NUM_THREADS=32 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 596 c/s real, 19.0 c/s virtual
This scales pretty well (considering that the CPU is only quad-core with
SMT, not 32-core indeed).
For comparison, a non-OpenMP build does:
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 110 c/s real, 110 c/s virtual
So we're getting a 5.42x speedup by going with the OpenMP build and
running 32 threads. That's not bad for a quad-core with SMT.
Surprisingly, running 32 separate instances of the non-OpenMP build
(started with a script at almost the same time) results in only 18.0 c/s
per process, or 576 c/s total. So the efficiency, measured in this way,
is 103%. Maybe the OpenMP build results in more efficient usage of the
shared caches (a few mostly-read-only data structures may be shared),
which more than compensates for the performance hit of the
multi-threaded code (the speed reduction from 110 c/s to 96.7 c/s for a
single thread).
I've also tried increasing BF_mt to 96. This resulted in the following
performance numbers for 32, 48, and 96 threads:
$ OMP_NUM_THREADS=32 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 601 c/s real, 19.0 c/s virtual
$ OMP_NUM_THREADS=48 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 601 c/s real, 19.0 c/s virtual
$ OMP_NUM_THREADS=96 ../run/john -te -fo=bf
Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw: 602 c/s real, 19.0 c/s virtual
That's 104.5% efficiency, and 5.47x the speed of a single thread.
Overall, this is not a fast machine indeed, but it is good for OpenMP
performance testing.
Alexander
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux -
Powered by OpenVZ