Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 28 Mar 2012 04:28:49 +0400
From: Solar Designer <>
Subject: Re: patch(using shared memory)

On Tue, Mar 27, 2012 at 11:26:58PM +0800, myrice wrote:
> I used shared memory in as Lukas comments as to-do)
> There are still space for improvement. I think sha256 access patterns have
> bank conflict.
> Overall speedup by ~6% in sha256 and 8% in sha224
> =====Before===============
> Benchmarking: raw-sha256-cuda [SHA256]... DONE
> Raw: 1979K c/s real, 1998K c/s virtual
> Average: 1933.3 c/s real, 1965.6 c/s virtual
> ============After=================
> Benchmarking: raw-sha256-cuda [SHA256]... DONE
> Raw: 2062K c/s real, 2085K c/s virtual
> Average: 2048.6 c/s real, 2080.0 c/s virtual
> Speedup: ~6%

That's nice, but this is still awfully slow.  In fact, even the
benchmarks we have on the wiki somehow show higher speeds, even though
you have a faster card (GTX-580, right?)

    * C-01: i3 2100, 4GB 1333MHz, GeForce 9800GT, slackware 13.1 32bit
    * C-03: C2Duo P7350 2GHz,GF 9600m
    * C-04: 9800GTX
    * C-06: GTX 460 1024M

Benchmarking: SHA256CUDA [SHA256] DONE

    * C-01 : Raw: 5734K c/s real, 5745K c/s virtual
    * C-03 : Raw: 1795k c/s real, 1795k c/s virtual
    * C-04 : Raw: 4456k c/s real 4412k c/s virtual
    * C-06 : Raw: 10443K c/s real, 10527K c/s virtual

(This is for an older revision of Lukas' code.)

Here's what I am getting on CPU with OpenSSL calls:

Benchmarking: Raw SHA-256 [32/64]... DONE
Raw:    1565K c/s real, 1565K c/s virtual

Benchmarking: Raw SHA-256 [32/64]... (8xOMP) DONE
Raw:    6342K c/s real, 791325 c/s virtual

The formats interface bottleneck is somewhere above 50M c/s.  Actually,
--format=dummy shows it at around 130M c/s on Core i7-2600, which is
what you said you use, but indeed interfacing to the GPU takes time.
With Samuele's fast hash implementations in OpenCL and running on GPU,
we're getting close to 50M c/s.  So you also need to get close to that.
This is a good thing for you to attempt.

(And once you get there, you'd need to somehow demonstrate that your
code would be even faster without the interface bottleneck - e.g., by
starting to implement candidate password generation and hash comparison
on GPU in whatever quick way you can for the demo.)



Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ