Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 24 Nov 2013 12:47:51 +0100
From: Lukas Odzioba <>
Subject: Re: md5crypt-opencl

2013/6/12 Lukas Odzioba <>:
> 2013/6/12 Solar Designer <>:
>> You also said it was not
>> reliable enough to be committed - so perhaps debug it, make it work
>> reliably, then commit?  Or alternatively start from scratch (as there
>> are several possible approaches to try) and achieve better performance.
> After 20 of June I will not have studies nor job on my head.
> In the first week after that I want to add pbkdf2-sha512 support for
> opencl and after that I will be happy to work on md5crypt speed.

Just for the record, I wasn't able to make mentioned version of code
reliable, so I decided to add 8-buffer mod to current working code.
Unfortunatelly after this modification code is no longer stable, but
speed are much better.

Solar stated this on private list:
2013/11/24 Solar Designer <>:
> I guess you're still quite far from reaching hashcat's md5crypt speed on
> AMD GPUs, though - so perhaps focus on just stability and better speeds
> than you had before, and only then compare against hashcat.

This is exactly what I would like to achieve now.

> BTW, I guess one of the reasons hashcat/md5crypt/AMD is so good is that
> hashcat groups candidate passwords by length.  When there are different
> length passwords being tested on a GPU at the same time, you're probably
> wasting some local memory on supporting excess lengths for the shorter
> ones of the passwords.  You could try implementing something similar -
> e.g. per-length OpenCL kernel invocations out of one crypt_all(), or
> simply detection of and optimization for the special case when all
> passwords in a crypt_all() are same length (maybe you do this already?)

I don't do that yet, but I like the idea of per-length kernel
invocation. I guess this will make code less readable/modificable and
that's why it is to early for such modification.

So what's my current status, I have code that pass self-test
(saltlen=8,pass=8). c/s results are the following:

570: 1.1k c/s
7970 1.6k c/s
Titan 2.2k c/s

I think that we could get some more amd by using LDS for at least most
common used buffers (all 8 won't fit local memory with reasonable LWS)

Unfortunatelly this code does not pass TS. On 7970 it is unable to
crack any password longer than 8 characters, on 570 it cracks such
passwords but it leads to some kind of hang (I did not investigate
that). Main difference between nv/amd kernels is PUTCHAR definition.

If I remember correctly on 5850 and 6950 the best result I got with
forced byte_addressable_store and this:
-//md5_digest(&ctxs[cid], (uint*)ctx_out, &ctxs_buflen[cid]);
+md5_digest(&ctxs[cid], (uint*)ctx_out, &ctxs_buflen[cid]);
-md5_digest(&ctxs[cid], alt_result, &ctxs_buflen[cid]);
-buf_update(ctx_out, (uchar *)alt_result, 16);

I attached my current kernel, maybe someone can point me out what is
broken here. You can test it by dropping to current jumbo. I'll try to
debug it anyway.


[ CONTENT OF TYPE application/octet-stream SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ