Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 14 Apr 2012 12:59:51 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Find best LWS in OpenCL formats

On 04/14/2012 02:05 AM, Lukas Odzioba wrote:
> Currently OpenCL LWS testing takes ages on gpu devices because testing
> starts from 1 thread goes up to maximum value for particular
> device/thread.
> I think we should change it to start from 32 for gpu's, it is simply
> not make sense use small values.
> For example for wpapsk this change affects much shorter -test run (46s
> reduced to 6s).
> 
> ///Find best local work size
> 	my_work_group = 1;
> 	if(device_type==CL_DEVICE_TYPE_GPU) my_work_group=32;
> 	for (; (int) my_work_group <= (int) max_group_size;
> 	    my_work_group *= 2) {
> 		(...)
> 

For better future adoption it might be better to start at
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE (this will be 32 for
current nvidia).

Also, the global work size should be lower for CPU. I have experimented
with limiting time for a run. This too is good for CPU. If run time
exceeds 10 seconds (for RAR, possibly much less for most formats), we
should stop trying even higher numbers. There really is no point in
trying a worksize of 1024 on a dual core CPU :)

BTW I believe CL_KERNEL_WORK_GROUP_SIZE is a better maximum than
CL_DEVICE_MAX_WORK_GROUP_SIZE.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.