Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 11 Sep 2013 22:44:27 +0200
From: magnum <john.magnum@...hmail.com>
To: "john-dev@...ts.openwall.com" <john-dev@...ts.openwall.com>
Subject: PUTCHAR() macro

Sayantan,

I made this change to the RAR OpenCL format:

+ #if no_byte_addressable(DEVICE_INFO)
   #define PUTCHAR(buf, index, val) (buf)[(index)>>2] = ((buf)[(index)>>2] & ~(0xffU << (((index) & 3) << 3))) + ((val) << (((index) & 3) << 3))
+ #else
+ #define PUTCHAR(buf, index, val) ((uchar*)(buf))[(index)] = (val)
+ #endif

This means we are using byte-addressed stores if allowed, instead of the bit flogging macro. This alone made for a 33% boost for RAR with recent AMD drivers and it benefits nvidia with any driver I know of. However, when testing it with your mask-mode raw-md5 it *halves* the speed for AMD (using 13.4 driver on Bull). It might be that the 32-bit macro is better for global memory or something like that. Good to know for future tweaking.

I will make this change to most PUTCHAR macros in bleeding (some already have it) but leave AMD alone until we test it more, like this:

- #if no_byte_addressable(DEVICE_INFO)
+ #if gpu_amd(DEVICE_INFO) || no_byte_addressable(DEVICE_INFO)
...


magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ