Date: Sun, 24 Nov 2013 12:47:51 +0100 From: Lukas Odzioba <lukas.odzioba@...il.com> To: john-dev@...ts.openwall.com Subject: Re: md5crypt-opencl 2013/6/12 Lukas Odzioba <lukas.odzioba@...il.com>: > 2013/6/12 Solar Designer <solar@...nwall.com>: >> You also said it was not >> reliable enough to be committed - so perhaps debug it, make it work >> reliably, then commit? Or alternatively start from scratch (as there >> are several possible approaches to try) and achieve better performance. > > After 20 of June I will not have studies nor job on my head. > In the first week after that I want to add pbkdf2-sha512 support for > opencl and after that I will be happy to work on md5crypt speed. Just for the record, I wasn't able to make mentioned version of code reliable, so I decided to add 8-buffer mod to current working code. Unfortunatelly after this modification code is no longer stable, but speed are much better. Solar stated this on private list: 2013/11/24 Solar Designer <solar@...nwall.com>: > I guess you're still quite far from reaching hashcat's md5crypt speed on > AMD GPUs, though - so perhaps focus on just stability and better speeds > than you had before, and only then compare against hashcat. This is exactly what I would like to achieve now. > BTW, I guess one of the reasons hashcat/md5crypt/AMD is so good is that > hashcat groups candidate passwords by length. When there are different > length passwords being tested on a GPU at the same time, you're probably > wasting some local memory on supporting excess lengths for the shorter > ones of the passwords. You could try implementing something similar - > e.g. per-length OpenCL kernel invocations out of one crypt_all(), or > simply detection of and optimization for the special case when all > passwords in a crypt_all() are same length (maybe you do this already?) I don't do that yet, but I like the idea of per-length kernel invocation. I guess this will make code less readable/modificable and that's why it is to early for such modification. So what's my current status, I have code that pass self-test (saltlen=8,pass=8). c/s results are the following: 570: 1.1k c/s 7970 1.6k c/s Titan 2.2k c/s I think that we could get some more amd by using LDS for at least most common used buffers (all 8 won't fit local memory with reasonable LWS) Unfortunatelly this code does not pass TS. On 7970 it is unable to crack any password longer than 8 characters, on 570 it cracks such passwords but it leads to some kind of hang (I did not investigate that). Main difference between nv/amd kernels is PUTCHAR definition. If I remember correctly on 5850 and 6950 the best result I got with forced byte_addressable_store and this: -//md5_digest(&ctxs[cid], (uint*)ctx_out, &ctxs_buflen[cid]); +md5_digest(&ctxs[cid], (uint*)ctx_out, &ctxs_buflen[cid]); -md5_digest(&ctxs[cid], alt_result, &ctxs_buflen[cid]); -buf_update(ctx_out, (uchar *)alt_result, 16); I attached my current kernel, maybe someone can point me out what is broken here. You can test it by dropping to current jumbo. I'll try to debug it anyway. Thanks, Lukas Download attachment "cryptmd5.cl" of type "application/octet-stream" (11504 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.