john-dev - Re: 1.7.9-jumbo-7

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120920113554.GA28820@openwall.com>
Date: Thu, 20 Sep 2012 15:35:54 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: 1.7.9-jumbo-7

On Thu, Sep 20, 2012 at 11:46:28AM +0200, magnum wrote:
> I tested this patch on OSX 10.8.1:
> 
> macosx-x86-64 builds and runs fine, and CUDA does too.
> 
> OpenCL builds fine, but a number of formats fail at run-time (bf, mscash2, nt, rar, raw-sha512, sha512crypt, wpapsk and xsha512).

bf-opencl may require smaller WORK_GROUP_SIZE in opencl_bf_std.h.
Otherwise you're probably exceeding the available local memory size on
your GPU.

> Maybe we should add a note in doc/BUGS stating that some OpenCL formats are known not to work on OSX yet. I do think most or even all problems are due to Apple driver bugs.

Here's my current BUGS:

---
	Known issues with using this release.

Not working on big-endian CPU architectures (these formats fail
self-test on big-endian CPUs):
* mssql05
* office
* rar
(x86 and x86-64 are little-endian, so they are not affected.)

Not working on HD 4000 series and older ATI GPUs (these formats need
byte-addressable store, which is only present in HD 5000 series and
newer ATI/AMD GPUs):
* sha512crypt-opencl
* wpapsk-opencl

Many OpenCL formats fail at runtime on Mac OS X (whereas CUDA ones work
fine).  We've seen these fail on Mac OS X 10.8.1: bf-opencl,
mscash2-opencl, nt-opencl, rar, raw-sha512-opencl, sha512crypt-opencl,
wpapsk-opencl, and xsha512-opencl.  We suspect that this may be caused
by driver bugs.  The same formats work fine on Linux.

In GPU-enabled builds, running "john --test" (with no --format
restriction) will eventually fail (before it has a chance to test all
formats).  This is because GPU resources allocated by one format are
currently not freed before proceeding to test another format (they're
only freed when John exits).  We're going to correct this in a future
release.  Meanwhile, please test GPU-enabled formats one by one, e.g.
with "john --test --format=mscash2-opencl", etc.

Some OpenCL-enabled formats (for "slow" hashes and non-hashes) may
sometimes trigger "ASIC hang" errors as reported by AMD/ATI GPU drivers,
requiring system reboot to re-gain access to the GPU.  For example, on
HD 7970 this problem is known to occur with sha512crypt-opencl, but is
known not to occur with mscash2-opencl.  Our current understanding is
that this has to do with OpenCL kernel running time and watchdog timers.
We're working on reducing kernel run times to avoid such occurrences in
the future.

All CUDA formats substantially benefit from compile-time tuning.
README-CUDA includes some info on this.  In short, on GTX 400 series and
newer NVIDIA cards, you'll likely want to change "-arch sm_10" to "-arch
sm_20" or greater (as appropriate for your GPU) on the NVCC_FLAGS line
in Makefile.  You'll also want to tune BLOCKS and THREADS for the
specific format you're interested in.  These are typically specified in
cuda_*.h files.  README-CUDA includes a handful of pre-tuned settings.
It is not unusual to obtain e.g. a 3x speedup (compared to the generic
defaults) with this sort of tuning.

Some OpenCL formats benefit from compile-time tuning, too.  For example,
bf-opencl is pre-tuned for HD 7970 cards, and will need to be re-tuned
for other cards (adjust WORK_GROUP_SIZE in opencl_bf_std.h and
opencl/bf_kernel.cl; you may also adjust MULTIPLIER).  In fact, on
smaller GPUs this specific format might not work at all until
WORK_GROUP_SIZE is reduced.  Most OpenCL formats may benefit from tuning
of KEYS_PER_CRYPT, although higher values, while generally increasing
the c/s rate, may create usability issues (more work lost on
interrupted/restored sessions, less optimal order of candidate passwords
being tested).

Even though wpapsk-cuda and wpapsk-opencl primarily use the GPU, they
also do a (small, but not negligible) portion of the computation on CPU
and thus they substantially benefit from OpenMP-enabled builds.  We
intend to reduce their use of CPU in a future version.

Interrupting a cracking session that uses an ATI/AMD GPU with Ctrl-C
often results in:
	../../../thread/semaphore.cpp:87: sem_wait() failed
	Aborted
When this happens, the john.pot and .log files are not updated with
latest cracked passwords.  To mitigate this, reduce the Save setting in
john.conf from the default of 600 seconds to a lower value (e.g., 60).

With GPU-enabled formats (and sometimes with OpenMP on CPU as well), the
number of candidate passwords being tested concurrently can be very
large (thousands).  When the format is of a "slow" type (such as an
iterated hash) and the number of different salts is large, interrupting
and restoring a session may result in a lot of work being re-done (many
minutes or even hours).  It is easy to see if a given session is going
to be affected by this or not: watch the range of candidate passwords
being tested as included in the status line printed on a keypress.  If
this range does not change for a long while, the session is going to be
affected since interrupting and restoring it will retry the entire
range, for all salts, including for salts that already had the range
tested against them.

"Single crack" mode is relatively inefficient with GPU-enabled formats
(and sometimes with OpenMP on CPU as well), because it might not be able
to produce enough candidate passwords per target salt to fully utilize a
GPU, as well as because its ordering of candidate passwords from most
likely to least likely is lost when the format is only able to test a
large number of passwords concurrently (before proceeding to doing the
same for another salt).  You may reasonably start with quick "single
crack" mode runs on CPU (possibly without much use of OpenMP) and only
after that proceed to using GPU-enabled formats (or with heavier use of
OpenMP, beyond a few CPU cores), locking those runs to specific cracking
modes other than "single crack".

Some formats lack proper binary_hash() functions, resulting in duplicate
hashes (if any) not being eliminated at loading and sometimes also in
slower cracking (when the number of hashes per salt is large).  When
this happens, the following message is printed:
	Warning: excessive partial hash collisions detected
	(cause: the "format" lacks proper binary_hash() function definitions)
Known to be affected are: bfegg, dominosec, md5crypt-cuda, phpass-cuda,
hmac-*, sip, vnc.
Also theoretically present, but less likely to be triggered in practice,
are similar issues in: dmd5, krb4, krb5, skey, pwsafe-cuda, keepass,
keychain, mozilla, mskrb5, odf, office, pwsafe-opencl, pdf, rar, ssh, zip.
---

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.