Date: Thu, 6 Dec 2018 04:08:57 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: MPI Benchmark Hi Nicholas, On Mon, Dec 03, 2018 at 04:16:47PM -0600, Nicholas McCollum wrote: > I noticed on the benchmarking page, the quote, "That said, *if in doubt > whether your benchmark results are of value, please do submit them*,"... Right. Thank you for posting these results. > I have a small cluster sitting idle at the moment, so I thought I would run > JtR across 792 cores of Skylake-SP. I figured since the largest on the > list was 384 cores of Nehalem series, it might be interesting. I also have > a 192 core (8x Xeon Platinum 8160, 6TB DDR4) OpenMP machine available for > benchmarking if anyone is interested. Yes, that would be interesting to see, too. > I downloaded the github bleeding version and compiled JtR with OpenMPI > 3.1.1 and GCC 8.2.0 with MPI support and verified that it did compile with > AVX512 instructions. Nodes are running CentOS 7.5. It appears that you either have Hyperthreading disabled or at least don't run enough processes to use it. I'd also be interested in seeing results with Hyperthreading in use, so 1584 on your MPI cluster and 384 on your OpenMP machine. > I thought I would submit the results to the community. I'm sure that this > could be improved somewhat, and I am open to recompiling or tweaking if > anyone is interested. I'd start by comparing these against a single core run and a single node run. Need to see what scales and what does not. You can use the relbench script to compare benchmark outputs. > This is 22 nodes of dual Xeon Gold 6150's with 12x 8GB DDR4 2666Mhz. > > MPI in use, disabling OMP (see doc/README.mpi) > Node numbers 1-792 of 792 (MPI) > Benchmarking: descrypt, traditional crypt(3) [DES 512/512 AVX-512]... > (792xMPI) DONE > Many salts: 12458M c/s real, 12583M c/s virtual > Only one salt: 7761M c/s real, 7839M c/s virtual This is pretty good speed. 12458/792 = 15.73M c/s per core, which is better than we see e.g. on i7-4770K AVX2 cores (IIRC, 11M to 12M). AVX-512 should have helped, but the lower clock rate hurt. Overall, this may very well be close to optimal (perhaps enabling/using HT will provide a few percent speedup here). It's roughly on par with 10 modern GPUs. > Benchmarking: bsdicrypt, BSDI crypt(3) ("_J9..", 725 iterations) [DES > 512/512 AVX-512]... (792xMPI) DONE > Speed for cost 1 (iteration count) of 725 > Many salts: 401779K c/s real, 405837K c/s virtual > Only one salt: 321118K c/s real, 324362K c/s virtual This is consistent with the above (is expected to be ~29 times lower). > Benchmarking: md5crypt, crypt(3) $1$ [MD5 512/512 AVX512BW 16x3]... > (792xMPI) DONE > Raw: 7277K c/s real, 7277K c/s virtual This is ~15 times lower than expected for your hardware, and is less than one modern GPU. > Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... > (792xMPI) DONE > Speed for cost 1 (iteration count) of 32 > Raw: 186748 c/s real, 186748 c/s virtual This is ~5 times lower than expected for your hardware, yet corresponds to several modern GPUs... or less than two ZTEX 1.15y FPGA boards. Also, this is tuned for use of HT, but you have it disabled - this alone costs about 1/3 of performance at this test. And so on. I don't know why some hash types scaled well, but others poorly. It'd be helpful to figure this out and perhaps fix something. > All 408 formats passed self-tests! At least we have this. :-) Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.