Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 24 Nov 2013 12:47:51 +0100
From: Lukas Odzioba <>
Subject: Re: md5crypt-opencl

2013/6/12 Lukas Odzioba <>:
> 2013/6/12 Solar Designer <>:
>> You also said it was not
>> reliable enough to be committed - so perhaps debug it, make it work
>> reliably, then commit?  Or alternatively start from scratch (as there
>> are several possible approaches to try) and achieve better performance.
> After 20 of June I will not have studies nor job on my head.
> In the first week after that I want to add pbkdf2-sha512 support for
> opencl and after that I will be happy to work on md5crypt speed.

Just for the record, I wasn't able to make mentioned version of code
reliable, so I decided to add 8-buffer mod to current working code.
Unfortunatelly after this modification code is no longer stable, but
speed are much better.

Solar stated this on private list:
2013/11/24 Solar Designer <>:
> I guess you're still quite far from reaching hashcat's md5crypt speed on
> AMD GPUs, though - so perhaps focus on just stability and better speeds
> than you had before, and only then compare against hashcat.

This is exactly what I would like to achieve now.

> BTW, I guess one of the reasons hashcat/md5crypt/AMD is so good is that
> hashcat groups candidate passwords by length.  When there are different
> length passwords being tested on a GPU at the same time, you're probably
> wasting some local memory on supporting excess lengths for the shorter
> ones of the passwords.  You could try implementing something similar -
> e.g. per-length OpenCL kernel invocations out of one crypt_all(), or
> simply detection of and optimization for the special case when all
> passwords in a crypt_all() are same length (maybe you do this already?)

I don't do that yet, but I like the idea of per-length kernel
invocation. I guess this will make code less readable/modificable and
that's why it is to early for such modification.

So what's my current status, I have code that pass self-test
(saltlen=8,pass=8). c/s results are the following:

570: 1.1k c/s
7970 1.6k c/s
Titan 2.2k c/s

I think that we could get some more amd by using LDS for at least most
common used buffers (all 8 won't fit local memory with reasonable LWS)

Unfortunatelly this code does not pass TS. On 7970 it is unable to
crack any password longer than 8 characters, on 570 it cracks such
passwords but it leads to some kind of hang (I did not investigate
that). Main difference between nv/amd kernels is PUTCHAR definition.

If I remember correctly on 5850 and 6950 the best result I got with
forced byte_addressable_store and this:
-//md5_digest(&ctxs[cid], (uint*)ctx_out, &ctxs_buflen[cid]);
+md5_digest(&ctxs[cid], (uint*)ctx_out, &ctxs_buflen[cid]);
-md5_digest(&ctxs[cid], alt_result, &ctxs_buflen[cid]);
-buf_update(ctx_out, (uchar *)alt_result, 16);

I attached my current kernel, maybe someone can point me out what is
broken here. You can test it by dropping to current jumbo. I'll try to
debug it anyway.


Download attachment "" of type "application/octet-stream" (11504 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.