Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 26 Apr 2015 15:28:37 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] John the Ripper support for PHC finalists

2015-04-25 22:28 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>:
> 2015-04-25 21:20 GMT+02:00 Solar Designer <solar@...nwall.com>:
>> I think you should use a vector data type instead of the separate
>> work-items, much like the AVX2 code on CPU uses __m256i and doesn't
>> rely on the compiler's auto-vectorization.  Granted, auto-vectorization
>> is much more common with OpenCL than with C, but you can help the
>> compiler by doing a part of it explicitly anyway (and then it'd be the
>> compiler's job to combine these narrow SIMD portions into possibly wider
>> SIMD that the hardware might need).  I think you need ulong4 there:
>>
>> https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/vectorDataTypes.html
>
> sorry, I didn't tested this. I have noticed now that in function H
> index_global and index_local are always divisible by 4 so I can use
> this with coalescing

results with version with vectors:

[a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=1
Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient,
development use only)]... Device 1: Tahiti [AMD Radeon HD 7900 Series]
Local worksize (LWS) 64, global worksize (GWS) 4096
DONE
Speed for cost 1 (N) of 2, cost 2 (r) of 2
Raw:    87487 c/s real, 9011K c/s virtual

[a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=5
Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient,
development use only)]... Device 5: GeForce GTX TITAN
Local worksize (LWS) 64, global worksize (GWS) 4096
DONE
Speed for cost 1 (N) of 2, cost 2 (r) of 2
Raw:    59650 c/s real, 59650 c/s virtual

results with the  previous version:

[a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=5
Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient,
development use only)]... Device 5: GeForce GTX TITAN
Local worksize (LWS) 64, global worksize (GWS) 8192
DONE
Speed for cost 1 (N) of 2, cost 2 (r) of 2
Raw:    82671 c/s real, 82671 c/s virtual

[a@...er run]$ ./john --test --format=pomelo-opencl --cost=2:2,2:2 --dev=1
Benchmarking: pomelo-opencl, POMELO [POMELO OpenCL (inefficient,
development use only)]... Device 1: Tahiti [AMD Radeon HD 7900 Series]
Local worksize (LWS) 64, global worksize (GWS) 2048
DONE
Speed for cost 1 (N) of 2, cost 2 (r) of 2
Raw:    77053 c/s real, 3891K c/s virtual

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ