Date: Sat, 8 May 2010 18:56:20 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: OpenMP Hi, Some of you might recall my past comments on OpenMP: http://www.openwall.com/lists/john-users/2006/05/23/1 http://openwall.info/wiki/internal/gcc-local-build#Application-to-John-the-Ripper While this is not the most efficient way to parallelize JtR, it does have its advantages - most notably, simplicity of code changes (when we talk about a single hash type) and ease of use. So I have sort of temporarily "given up" (procrastinating a "proper" parallel processing implementation for JtR) and implemented OpenMP support for one of the hash types. Attached is a patch against JtR 1.7.5 to crack OpenBSD bcrypt hashes fast. ;-) I've also uploaded the patch to the wiki: http://openwall.info/wiki/john/patches It uses OpenMP - tested with gcc 4.5.0 (on Linux) and Sun Studio 12 (on Solaris). The only new requirement is an OpenMP-capable C compiler, such as recent gcc. Overall, this is easier and more reliable to use than "MPI-patched" builds of JtR are, but drawbacks do exist as well (per-hash-type code and a performance hit - see below). I've measured the efficiency (vs. multiple separate-process instances of JtR) to be as high as 98.5% for a build with gcc on otherwise idle Linux systems with Intel CPUs. With Sun Studio, Solaris, and Opterons, it was down to between 78% and 91% (I did not investigate why). Multi-threaded code involves more complicated addressing modes and synchronization between the threads, so some performance hit (vs. multiple separate processes) is to be expected - but the ease of use is great. Core i7 920 2.67 GHz: Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE Raw: 3698 c/s real, 462 c/s virtual The first number is actual speed, the second is per-thread (8 threads, so the benchmark uses roughly 8 times more CPU time than real time). That's 5.27x the speed of a non-threaded build (which does 702 c/s) - not bad for a quad-core with SMT. Dual Xeon X5460 3.16 GHz: Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE Raw: 6528 c/s real, 816 c/s virtual Now this is a "true" 8-core system, and we get 7.91x the speed of a non-threaded build (which does 825 c/s). In both cases, each thread was computing 2 hashes in parallel (for greater instruction-level parallelism), so there were a total of 16 hashes being computed in parallel. With gcc/Linux, the number of threads to run equals the number of logical CPUs by default (e.g., 8 on a Core i7 with Hyperthreading). It may be adjusted with the OMP_NUM_THREADS environment variable. With Sun Studio, setting this variable is mandatory (otherwise only a single thread is run). For example: OMP_NUM_THREADS=4 ../run/john --test=1 --format=bf It may also make sense to set OMP_WAIT_POLICY=PASSIVE to free up a little bit of CPU time (make it idle). At least in theory, this may improve the real c/s rate on CPUs with SMT, but make it a little bit worse on others. Here's how to quickly build gcc 4.5.0 as a user: http://openwall.info/wiki/internal/gcc-local-build The first (simpler) build shown on this wiki page will do. Have fun. As usual, any feedback is welcome. Alexander View attachment "john-1.7.5-omp-1.diff" of type "text/plain" (12310 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.