Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 31 Oct 2014 18:58:20 +0530
From: Sayantan Datta <>
To: john-dev <>
Subject: Re: descrypt speed

On Fri, Oct 31, 2014 at 6:14 PM, Solar Designer <> wrote:

> Besides latency, PCIe bandwidth may also be a bottleneck in low lane
> count, older PCIe revision slots.  My guess is that magnum's GTX 980
> might be in a faster PCIe slot than Royce's GTX 970.
> Sayantan, how much data does descrypt-opencl transfer over PCIe per
> candidate, in each direction?  Actually, only the higher of the two
> numbers should matter in terms of a possible bandwidth bottleneck, since
> PCIe is full-duplex.
> PCIe 2.x is 500 MB/s per lane, 3.0 is 985 MB/s per lane.  If we're
> transferring 8 bytes per candidate password (for max length, and no
> separator char since not necessary for this length), then 100M c/s gives
> us 800 MB/s - should fit in 1 lane with PCIe 3.0, but not with PCIe 2.x.
> I doubt Royce uses a 1-lane slot, though - must be 4, 8, or 16 if it's a
> typical motherboard with no PCIe extender in use.  But are we possibly
> transferring much more data?  Can we reduce it to 8 bytes/password (max
> for either direction)?
> Alexander

We're transferring around 1.1MB to the GPU with every kernel call. We
transfer back one 4 byte integer per salt, which I think should be
insignificant.  We transfer the hashes back only if we crack something.
Ideally we'd want kernel execution time far exceed the PCIe transfer time.
When the transfer time and kernel execution time are comparable, pipelining
or forking should help. At the extreme end, like what you said, when
transfer time far exceeds kernel execution, pipeling or forking may not
prove beneficial.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ