Date: Sun, 20 Nov 2011 05:35:23 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: OpenMP not using all threads On Sat, Nov 19, 2011 at 07:55:50PM -0500, Stephen Reese wrote: > I had a feeling that the 32-bit architecture might be an issue as I > noticed that "OpenMP example" was only twice as fast (32-bit OpenMP) > instead of four times (64-bit OpenMP). > http://openwall.info/wiki/internal/gcc-local-build#OpenMP-example. > Though OpenMP example is four times as fast neither the CVS nor > stable/patch versions of John would provide the 4x speed-up I was > hoping for even on the 64-bit. Maybe XEN and the other respective > hosts across the multiple Linodes I am testing are causing roughly a > 45 - 60% slowdown from a bare-metal instance but not affecting the > "OpenMP Example". It appears that you simply have unstable system performance (changing over time as load from other VMs changes). > root@:~# time ./loop2 > 615e5600 > real 0m2.229s > user 0m2.226s > sys 0m0.002s > root@:~# time ./loop > 615e5600 > real 0m0.333s > user 0m1.313s > sys 0m0.003s This would be a 7x speedup if it were for real, but notice how the user time decreased as well - indicating that load from other VMs probably halved between these two invocations. You'll need many more invocations of your benchmarks to see the overall difference between the different builds despite of the changing load. > What I am trying to achieve: I have 42 DES passwords and three > Linodes. Password list is currently split-up so each host has 12 > entries and are running in incremental mode. Is there a better way, > such as specifying a thread per instance on a single host? > > Is there a performance/time benefit in splitting up the password list > amongst multiple hosts or is one host going to achieve the same > results as the three? This depends on the hashes per salt ratio. You didn't mention how many different salts you have. Is it 42 hashes with 42 different salts? Anyhow, you may achieve a very slight increase in c/s rate (due to lower key setup overhead) by not splitting your 42 hashes (have all nodes load all 42), but instead splitting the candidate password space. However, this improvement would probably be negated by slightly less optimal order in which candidate passwords would be tested then (e.g., you'd split by length: 0-6, 7, 8). So continuing like you have started is fine. 3*12 is 36, not 42, though. Also note that OpenMP generally performs poorly when the system is under other load. In your case, this other load comes from other VMs. Even a 10% load from other processes/VMs may result in a 50% slowdown of your task with OpenMP, unfortunately. And it can be even worse than that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706 (yes, you may try the GOMP_SPINCOUNT workarounds from there). As an alternative, you may try an MPI build of -jumbo, even across all three of your Linodes. Or as a simpler alternative, yes, you may choose to use many instances of non-OpenMP builds. Then other load will have less of an effect, but the key setup overhead will increase. The CVS version and the -fast-des-key-setup-3 patch (your choice) reduce the key setup overhead, though, making it almost negligible. In 1.7.8 release, it's about 10% when cracking just one DES-based crypt(3) hash. With the newer code or the patch, it reduces to about 3%. You probably lose a lot more than that to OpenMP's unfriendliness to system load, so you'll improve things overall by going for separate processes. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.