john-users - OpenMP

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100508145620.GA7669@openwall.com>
Date: Sat, 8 May 2010 18:56:20 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: OpenMP

Hi,

Some of you might recall my past comments on OpenMP:

http://www.openwall.com/lists/john-users/2006/05/23/1
http://openwall.info/wiki/internal/gcc-local-build#Application-to-John-the-Ripper

While this is not the most efficient way to parallelize JtR, it does
have its advantages - most notably, simplicity of code changes (when we
talk about a single hash type) and ease of use.  So I have sort of
temporarily "given up" (procrastinating a "proper" parallel processing
implementation for JtR) and implemented OpenMP support for one of the
hash types.

Attached is a patch against JtR 1.7.5 to crack OpenBSD bcrypt hashes
fast. ;-)  I've also uploaded the patch to the wiki:

http://openwall.info/wiki/john/patches

It uses OpenMP - tested with gcc 4.5.0 (on Linux) and Sun Studio 12 (on
Solaris).  The only new requirement is an OpenMP-capable C compiler,
such as recent gcc.  Overall, this is easier and more reliable to use
than "MPI-patched" builds of JtR are, but drawbacks do exist as well
(per-hash-type code and a performance hit - see below).

I've measured the efficiency (vs. multiple separate-process instances of
JtR) to be as high as 98.5% for a build with gcc on otherwise idle Linux
systems with Intel CPUs.  With Sun Studio, Solaris, and Opterons, it was
down to between 78% and 91% (I did not investigate why).  Multi-threaded
code involves more complicated addressing modes and synchronization
between the threads, so some performance hit (vs. multiple separate
processes) is to be expected - but the ease of use is great.

Core i7 920 2.67 GHz:

Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    3698 c/s real, 462 c/s virtual

The first number is actual speed, the second is per-thread (8 threads,
so the benchmark uses roughly 8 times more CPU time than real time).

That's 5.27x the speed of a non-threaded build (which does 702 c/s) -
not bad for a quad-core with SMT.

Dual Xeon X5460 3.16 GHz:

Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    6528 c/s real, 816 c/s virtual

Now this is a "true" 8-core system, and we get 7.91x the speed of a
non-threaded build (which does 825 c/s).

In both cases, each thread was computing 2 hashes in parallel (for
greater instruction-level parallelism), so there were a total of 16
hashes being computed in parallel.

With gcc/Linux, the number of threads to run equals the number of
logical CPUs by default (e.g., 8 on a Core i7 with Hyperthreading).
It may be adjusted with the OMP_NUM_THREADS environment variable.
With Sun Studio, setting this variable is mandatory (otherwise only a
single thread is run).  For example:

OMP_NUM_THREADS=4 ../run/john --test=1 --format=bf

It may also make sense to set OMP_WAIT_POLICY=PASSIVE to free up a
little bit of CPU time (make it idle).  At least in theory, this may
improve the real c/s rate on CPUs with SMT, but make it a little bit
worse on others.

Here's how to quickly build gcc 4.5.0 as a user:

http://openwall.info/wiki/internal/gcc-local-build

The first (simpler) build shown on this wiki page will do.

Have fun.  As usual, any feedback is welcome.

Alexander

View attachment "john-1.7.5-omp-1.diff" of type "text/plain" (12310 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.