Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 24 May 2011 04:24:26 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Lukas's Status Report - #2 of 15

Lukas,

Thank you for the status update.

On Mon, May 23, 2011 at 10:02:43PM +0200, ?ukasz Odzioba wrote:
> I am truly disappointed by perfomance of cryptsha256cuda patch. It is
> 8-10 slower than it should be. And only few times faster than cpu
> version.
> I tried some tricks but without a significant results. There must be a
> bottleneck in local memory load/store operations and I want try
> nVision profiler to seek them and destroy. Now I am moving for
> md5-based crypt rather than planned sha512-based crypt patch because
> it is easier and I don't want get into infinite optimization loop
> during premature sha512 coding.

Your decision to implement MD5-based crypt before proceeding with
further work on SHA-crypt sounds right to me.  This should be easier,
yet very useful.

BTW, the SHA-256 based flavor of SHA-crypt suddenly became of some
practical relevance:

http://www.dragonflybsd.org/release210/

"The default password hash is now sha256"

Other systems that use SHA-crypt tend to use its SHA-512 based flavor
(which makes better use of 64-bit CPUs, so I think DragonFly's choice
was wrong).

Anyway, I took a brief look at your john-1.7.7-cryptsha256cuda-0.diff.
Here are some minor comments:

You should be able to drop validchar().  Instead, use atoi64[] from
common.[ch], like some other formats do.

You have 5 test vectors, of which 4 use the default 5000 rounds, but one
uses 1000 rounds.  I suggest that we standardize on 5000 for benchmarks.
However, since you need to test support for other values, you may keep
the average at 5000.  For example, you may have one at 1000, another at
9000, and the rest at the default of 5000.

...It'd be curious to see if this results in a performance hit as
compared to having all at 5000 rounds.  Having those hashes take
non-equal time to compute might keep more GPU resources idle... so you'd
need to consider this special case in your optimizations.  Meanwhile,
you could narrow the range - e.g., use 4999 and 5001 for those
non-default test values - such that we get benchmarks that represent the
default of 5000 quite accurately.

Thanks again,

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ