Date: Wed, 25 Apr 2012 21:15:35 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: New RAR OpenCL kernel On 04/25/2012 06:05 AM, Claudio André wrote: > >> That is odd. I saw similar things when experimenting with Loveland and >> Cedar cards. Even using really low figures like 16, I never got rid of >> register spill. I must have hit a per-thread max rather than a total max. > > If you look at the profile output, you gonna see 18 ScratchRegs used in > the previous version. This should be 0. > I saw this happen in my code once, but i did not understood why. Looking > at what i did, seems the compiler made some "optimization". The > "optimization" could be better than my original code, but i'm not sure > about it. To me it was a compiler decision when i have a not very good > code (to GPU arch). Here's a similar issue: Today I noticed that when I enable shared memory on nvidia (supposedly for decreasing GPR pressure by a whopping 40 registers) the final GPR use *increase* by 2 instead, doh! >> I think we should come up with a couple of -Ddefines that are >> automagically added by common-opencl at (JIT-)build time, depending on >> device. I think we could use these or more: >> >> -DAMD or -DNVIDIA for starters. >> And perhaps -DGCN, -DFERMI, I'm not sure. I know Milen use -DOLD_ATI for >> 4xxx (btw I just re-read everything he ever wrote to this list and it >> was well worth the time) > > I'm not sure how is this "defines" going to be set, but, they are going > to be useful. AMD has some code that you can get the GPU family, so we > can get it and use/adapt. See page 2-7 in  Great, I'll have a look. Today I realised that for the simplest cases (just AMD vs nvidia) I added this in my kernel: #ifdef cl_nv_pragma_unroll + #define NVIDIA #pragma OPENCL EXTENSION cl_nv_pragma_unroll : enable #endif ...simple as that :) then further down I just #ifdef NVIDIA ... #else ... #endif for the architecture-specific things. > I tried a lot my own find_best_KPC (and it follows Samuele ideas). It is > not deterministic. If i am using a clear, unused GPU (no X on it), i > will try to improve it (if possible), but in a live environment, the > results were ok. > > I haven't seen bad results from find LWS and KPC routines, i've seen > suboptimal. With suboptimal numbers in mind, i did some experiments and > selected what was the best to the test i did. We could recommend > something like this. > For LWS, we can always start on work group size multiple. Have you > tried using this constrain? Yes, I start on CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE, double for each pass, and end at CL_KERNEL_WORK_GROUP_SIZE (as opposed to CL_DEVICE_MAX_WORK_GROUP_SIZE which is useless). The problem is that for the LWS enumeration to be correct, the global work size used in that loop must suite: If I use a too low value, a powerful card will not show any difference in speed between LWS 32 or 1024. But if I use a too high value, a weak card will take minutes to go through the loop! This is why I factored in number of SP, but it is not really the perfect solution. The KPC test is easier. It is correct for a given LWS, no problem there except how to know when to stop. Also, to be really sure you pick the absolute best, this loop should start at LWS, decrease with LWS and end when it's going downhill. But to be faster, it's better to start at LWS and just double. This is more of a design decision. > I'll email the developer. Something is wrong. > > OpenCL GPU device #0 not found, CPU is used As an alternative benchmark I tried oclhashcat today with Cedar, 9600GT and GTX580, for raw SHA-1. The Cedar and 9600GT was almost equal and they were about 1/10 of the GTX580. Using my RAR, the 9600GT is also 1/10 of GTX580 but the Cedar is less than 1/100. I think my code is actually pretty decent for big and small nvidia's (for a n00b's first project) but it's not near good enough for AMD. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.