Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 26 Jan 2013 23:46:23 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Proposed optimizations to pwsafe

Thanks Milen. Brian, to use this selectively in our OpenCL formats, you should do this in the kernel:

	#include "opencl_device_info.h"

	#if gpu_amd(DEVICE_INFO)
	#define USE_BITSELECT
	#endif

	...

	#ifdef USE_BITSELECT
	#define Ch(x,y,z) (bitselect(z,y,x))
	...
	#else
	#define Ch(x, y, z) (z ^ (x & (y ^ z)))
	...
	#endif

Just so you don't spend time inventing wheels.

We probably need to check all our SHA-2 OpenCL kernels for this. I thought we already did, but I have just copy'n'pasted code (as always :) so my formats probably lack it if the "templates" do.

magnum


On 26 Jan, 2013, at 23:35 , Milen Rangelov <gat3way@...il.com> wrote:

> Just a side note, I just had a look at your opencl pwsafe code and there are obvious optimizations that can be done. Some are minor, but the most important is the following. You have this:
> 
> 
> #define Ch(x, y, z) (z ^ (x & (y ^ z)))
> #define Maj(x, y, z) ((y & z) | (x & (y | z)))
> 
> If you replace those by:
> 
> #define Ch(x,y,z) (bitselect(z,y,x))
> #define Maj(x,y,z) (bitselect(y, x,(z^y)))
> 
> You are effectively using just 1 ALU operation per Ch as compared to 3 and 2 ALU ops per Maj as compared to 4.
> 
> You've got 80 steps per SHA256 block operation, so you save 360 ALU ops per SHA256. bitselect is mapped to the hardware instruction BFI_INT. This is applicable to amd hardware only, not nvidia.
> 
> Hope that helps :)
> 
> 


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.