Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 26 Jun 2012 04:56:45 +0400
From: Solar Designer <>
Subject: phpass OpenCL and CUDA

Lukas -

The reverted opencl/ currently in magnum-jumbo looks a
bit dirtier than the newer version did.  Perhaps you can re-apply some
minor changes to it, without going all the way for a vectorized
implementation yet (so that you don't reintroduce whatever problem that
had)?  Specifically:

1. There's a function called "cuda_md5" - rename it.

2. Make use of rotate() and bitselect().

The speeds are now down to:

OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Max local work size 256
Optimal local work size = 32
Benchmarking: phpass MD5 ($P$9 length 8) [OpenCL]... DONE
Raw:    606441 c/s real, 2926K c/s virtual

OpenCL platform 0: NVIDIA CUDA, 1 device(s).
Using device 0: GeForce GTX 570
Compilation log: 
ptxas info    : Compiling entry function 'phpass' for 'sm_20'
ptxas info    : Function properties for phpass
 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 35 registers, 44 bytes cmem[0]
Max local work size 896
Optimal local work size = 32
Benchmarking: phpass MD5 ($P$9 length 8) [OpenCL]... DONE
Raw:    302400 c/s real, 300833 c/s virtual

Previously, the speed on 7970 was about 1050K c/s.

The CUDA code on the GTX 570 achieves:

Benchmarking: phpass MD5 ($P$9 lengths 1 to 15) [CUDA]... DONE
Raw:    510171 c/s real, 507581 c/s virtual

(in a default build).  IIRC, previously, this was 600k to 730k c/s
depending on settings.  Did you have to revert anything in the CUDA
code, too?

Are we releasing with these lower speeds?



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.