Date: Sun, 26 Apr 2015 00:49:26 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] John the Ripper support for PHC finalists On Sat, Apr 25, 2015 at 10:35:51PM +0200, Frank Dittrich wrote: > After pulling the latest changes up to commit a988a38c, I did reset > BENCHMARK_LENGTH back to 0 to get these numbers. > > On one of my systems (i5-4570 CPU (4 physical, 4 logical cores, AVX2 > build), the difference for --costs=2:2,2:2 is hard to notice. > For --cost=0:0,0:0, it is 1.3%, for --costs=2:2,0:0, it is about 0.5%, > for --costs=0:0,2:2 it is about 0.25%. > > On my laptop with Core(TM) i7-2820QM CPU (4 physical, 8 logical cores, > SSE2 build), I get a 13% difference for --costs=0:0,0:0. > The difference is about 9% for --costs=2:2,0:0, or --costs=0:0,2:2, and > for --costs=2:2,2:2 it is about 8%. > On this laptop, I tried to use short --test times, to avoid throttling > So I used several runs of --test=1, but I'm afraid that throttling > interfered nevertheless. Any idea why the 8-thread CPU and build is impacted much more? Are we possibly exceeding L1 data cache size? It's the same for both CPUs (32 KB), but is twice lower per-thread when you have 2 threads/core. The core appears to allocate 16 of (PLAINTEXT_LENGTH + 1) and of BINARY_SIZE per thread. That's: 16*((125+1)+257) = 6128 Hmm, looks like it should fit either way. The overhead on top of that shouldn't be that much. Yet you could try halving OMP_SCALE to test. > After resetting BENCHMARK_LENGTH to -1, I get results for "Raw" that are > similar to the "Many salts" case in my previous tests with > BENCHMARK_LENGTH 0. Upon a second thought, I think we should keep it at 0. I was wrong. At low cost settings, this format isn't slow enough for the difference to be negligible. On a related note, maybe we should change md5crypt-opencl's BENCHMARK_LENGTH to 0 too, since it gets to pretty high speeds on GPU, and the set_key() overhead on CPU and keys transfer may play a role. In fact, it includes "if (new_keys)" there specifically to optimize the "many salts" case. Oh, and pomelo-opencl needs this optimization too. Right now, it includes partial keys transfer in set_key() instead, followed by transfer of remaining keys in crypt_all(). This was reasonable in the fast and saltless opencl_mysqlsha1_fmt_plug.c, which pomelo-opencl is based on, but since POMELO is salted we need to check whether the keys have changed before transferring the remaining keys in crypt_all(). Given the existing code, maybe this is as simple as setting "key_offset = key_idx;" right after the keys have been transferred in crypt_all(), so that subsequent crypt_all() calls will skip that. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.