Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 15 Apr 2012 05:30:38 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: MSCash2 OpenCL (was: OpenCL tests on HD 7970)

Hi Sayantan,

On Fri, Apr 13, 2012 at 10:44:41AM +0530, SAYANTAN DATTA wrote:
> I have posted my final performance update(+ 13%)  to magnum.   It would be
> really great if you could test them on 7970 and post the results.

It became a lot slower:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl -pla=1
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE
Raw:    36781 c/s real, 52459 c/s virtual

GPU load is now reported at 94%.  Probably it's not such a good
indicator, then.  I am also able to get it to 99% by simultaneously
running two instances of JtR using the 7970, but the cumulative speed
does not improve much (46k c/s above, 60k c/s with your previous code
version - still slower than the 75k c/s with one instance of the
previous version).

For the sake of completeness:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl
OpenCL platform 0: NVIDIA CUDA, 1 device(s).
Using device 0: GeForce GTX 570
Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE
Raw:    13631 c/s real, 13631 c/s virtual

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl -pla=1 -dev=1
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 1: AMD FX(tm)-8120 Eight-Core Processor
Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE
Raw:    624 c/s real, 78.4 c/s virtual

CPU benchmark with old version (that did 75k c/s on 7970):

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl
OpenCL platform 0: AMD Accelerated Parallel Processing, 1 device(s).
Using device 0: AMD FX(tm)-8120 Eight-Core Processor
Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE
Raw:    642 c/s real, 90.7 c/s virtual

For reference, CPU non-OpenCL benchmarks:

-64i:

One core (4.5 GHz):

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2
Benchmarking: M$ Cache Hash 2 (DCC2) [SSE2i 8x]... DONE
Raw:    1291 c/s real, 1291 c/s virtual

OpenMP (something like 3.7 GHz, bumps into 125 W TDP):

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2
Benchmarking: M$ Cache Hash 2 (DCC2) [SSE2i 8x]... (8xOMP) DONE
Raw:    3584 c/s real, 446 c/s virtual

-xop:

One core:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2
Benchmarking: M$ Cache Hash 2 (DCC2) [SSE2i 8x]... DONE
Raw:    1784 c/s real, 1784 c/s virtual

OpenMP:

user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2
Benchmarking: M$ Cache Hash 2 (DCC2) [SSE2i 8x]... (8xOMP) DONE
Raw:    4928 c/s real, 612 c/s virtual

With 8 independent processes, I am getting 720 c/s per process, for a
total of 5760 c/s (so our OpenMP parallelization for MSCash2 is not
perfect - need to improve it).

Comparing the best CPU vs. GPU benchmarks, we achieve a 13x speedup by
going from XOP with 8 independent processes on FX-8120 o/c to your
previous version of the OpenCL code on 7970 (stock clocks so far).

Thanks,

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ