Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 15 Aug 2015 13:38:33 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

2015-08-15 9:12 GMT+02:00 magnum <john.magnum@...hmail.com>:
> On 2015-08-14 20:44, Solar Designer wrote:
>>
>> On Fri, Aug 14, 2015 at 08:40:28PM +0200, Agnieszka Bielec wrote:
>>>
>>> this shows that there is a bug in auto tune or in my configuration (
>>> but if there is a bug in my configuration there is also in auto tune,
>>> even if I configured something wrong john shouldn't show that GWS=1024
>>> when GWS=256)
>>> but I don't have this problem on my laptop (another or it's just only
>>> that first call of crypt_all() is just slower)
>>
>>
>> OK, you and magnum need to figure this out and fix whatever bug there is.
>
>
> I doubt it's a bug in shared code. Agnieszka, you need to establish what
> happens and why. It's just a matter of adding a bunch of debug prints.

the first issue is a MEM_SIZE.

made some tests on my laptop with nvidia and on AMD on super

my laptop, MEM_SIZE

none@...e ~/Desktop/r/run $ ./john --test --format=argon2i-opencl --v=4
Benchmarking: argon2i-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=32
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256        1358 c/s        1358 rounds/s 188.400ms per crypt_all()!
gws:       512        1475 c/s        1475 rounds/s 346.913ms per crypt_all()+
Local worksize (LWS) 64, global worksize (GWS) 512
using different password for benchmarking
DONE
Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1
aaa Many salts: 1462 c/s real, 1476 c/s virtual
zzzzz Only one salt:    1462 c/s real, 1449 c/s virtual
___

my laptop, MEM_SIZE/4

none@...e ~/Desktop/r/run $ ./john --test --format=argon2i-opencl --v=4
Benchmarking: argon2i-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: GeForce GTX 960M
Options used: -I ./kernels -cl-mad-enable -cl-nv-verbose -D__GPU__
-DDEVICE_INFO=131090 -DDEV_VER_MAJOR=352 -DDEV_VER_MINOR=21
-D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64
-DPLAINTEXT_LENGTH=32
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256        1366 c/s        1366 rounds/s 187.301ms per crypt_all()!
gws:       512        1476 c/s        1476 rounds/s 346.862ms per crypt_all()+
gws:      1024        1900 c/s        1900 rounds/s 538.875ms per crypt_all()+
Local worksize (LWS) 64, global worksize (GWS) 1024
using different password for benchmarking
DONE
Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1
aaa Many salts: 1914 c/s real, 1914 c/s virtual
zzzzz Only one salt:    1896 c/s real, 1896 c/s virtual

__
super AMD, MEM_SIZE

[a@...er run]$ ./john --test --format=argon2i-opencl --v=4
Benchmarking: argon2i-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: Tahiti [AMD Radeon HD 7900 Series]
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=138
-DDEV_VER_MAJOR=1800 -DDEV_VER_MINOR=5 -D_OPENCL_COMPILER
-DBINARY_SIZE=256 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=32
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         388 c/s         388 rounds/s 659.713ms per crypt_all()!
gws:       512         719 c/s         719 rounds/s 711.542ms per crypt_all()+
gws:      1024        1309 c/s        1309 rounds/s 782.178ms per crypt_all()+
Local worksize (LWS) 64, global worksize (GWS) 1024
using different password for benchmarking
DONE
Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1
aaa Many salts: 530 c/s real, 102400 c/s virtual
zzzzz Only one salt:    525 c/s real, 102400 c/s virtual

___

super AMD, MEM_SIZE/4

[a@...er run]$ ./john --test --format=argon2i-opencl --v=4
Benchmarking: argon2i-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: Tahiti [AMD Radeon HD 7900 Series]
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=138
-DDEV_VER_MAJOR=1800 -DDEV_VER_MINOR=5 -D_OPENCL_COMPILER
-DBINARY_SIZE=256 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=32
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         389 c/s         389 rounds/s 657.904ms per crypt_all()!
gws:       512         719 c/s         719 rounds/s 711.460ms per crypt_all()+
gws:      1024        1302 c/s        1302 rounds/s 786.136ms per crypt_all()+
OpenCL error (CL_INVALID_BUFFER_SIZE) in file
(opencl_argon2i_fmt_plug.c) at line (118) - (Error creating device
buffer)

___
according to this link:
https://devtalk.nvidia.com/default/topic/496980/cl_device_max_mem_alloc_size-looking-for-an-device-with-1gb/
"As has been pointed out in, e.g., this thread,
CL_DEVICE_MAX_MEM_ALLOC_SIZE should be the maximum size of memory
objects. The OpenCL specs demand that it is at least a quarter of the
total memory (which I find a severe restriction). However, NVIDIAs
(and Apple's) OpenCL implementations always return exactly that
quarter, even if you can create larger memory objects in practice, so
this looks more like a misinterpretation of the specs or a kind of
bug. "

some info here:
https://devtalk.nvidia.com/default/topic/478783/cuda-programming-and-performance/cl_device_max_mem_alloc_size-incorrect-/

tests shows that AMD treats it differently than nvidia. I coud'n find
on internet how exactly amd treats CL_DEVICE_MAX_MEM_ALLOC_SIZE, only
found this https://community.amd.com/thread/152028 but maybe this link
is not important

this bug will be hard to fix because we don't know how device behaves
and always something can be changed in the future

the second bug can be linked with MEM_SIZE | MEM_SIZE/4.

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ