Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 4 Mar 2012 08:37:58 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: CUDA & OpenCL status

On Sun, Mar 04, 2012 at 02:47:03AM +0100, magnum wrote:
> On 03/04/2012 01:26 AM, Solar Designer wrote:
> > Benchmarking: PHPASS-OPENCL [PORTABLE-MD5]... ../../../thread/semaphore.cpp:87: sem_wait() failed
> > Aborted (core dumped)
> 
> This is a very common problem. Try using APP AMD 2.5 instead and I think
> you'll get rid of it.

Turns out it's some poor interaction between John's and a library's use
of timers.  I first saw hints to this with strace and ltrace, then
re-compiled with OS_TIMER 0 in x86-64.h, and the problem went away:

user@...l:~/john/magnum-jumbo/src$ LD_LIBRARY_PATH=/usr/lib ../run/john -te -fo=phpass-opencl
OpenCL Platforms: 1
OpenCL Platform: <<<AMD Accelerated Parallel Processing>>> 2 device(s), using device: <<<Tahiti>>>
Optimal Group work Size = 128
Benchmarking: PHPASS-OPENCL [PORTABLE-MD5]... DONE
Raw:    943896 c/s real, 1500K c/s virtual

user@...l:~/john/magnum-jumbo/src$ LD_LIBRARY_PATH=/usr/lib ../run/john -te -fo=phpass-opencl
OpenCL Platforms: 1
OpenCL Platform: <<<AMD Accelerated Parallel Processing>>> 2 device(s), using device: <<<Tahiti>>>
Optimal Group work Size = 256
Benchmarking: PHPASS-OPENCL [PORTABLE-MD5]... DONE
Raw:    936345 c/s real, 1481K c/s virtual

user@...l:~/john/magnum-jumbo/src$ LD_LIBRARY_PATH=/usr/lib ../run/john -te -fo=phpass-opencl
OpenCL Platforms: 1
OpenCL Platform: <<<AMD Accelerated Parallel Processing>>> 2 device(s), using device: <<<Tahiti>>>
Optimal Group work Size = 256
Benchmarking: PHPASS-OPENCL [PORTABLE-MD5]... DONE
Raw:    936345 c/s real, 1481K c/s virtual

With optimized F() and G():

#define F(x, y, z) bitselect((z), (y), (x))
#define G(x, y, z) bitselect((y), (x), (z))

user@...l:~/john/magnum-jumbo/src$ LD_LIBRARY_PATH=/usr/lib ../run/john -te -fo=phpass-opencl 
OpenCL Platforms: 1
OpenCL Platform: <<<AMD Accelerated Parallel Processing>>> 2 device(s), using device: <<<Tahiti>>>
Optimal Group work Size = 256
Benchmarking: PHPASS-OPENCL [PORTABLE-MD5]... DONE
Raw:    959370 c/s real, 1444K c/s virtual

I've also tried:

#define ROTATE_LEFT(x, s) rotate((x), (uint32_t)(s))

which works, but does not obviously improve performance here.

http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/bitselect.html
http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/rotate.html

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.