Date: Wed, 02 May 2012 08:30:59 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: New RAR OpenCL kernel On 04/29/2012 02:11 AM, magnum wrote: > I'm currently trying to vectorize the rar format because I reckon it > might be a good thing even on scalar platforms because I do 4x the work > for 1x the potential branches and stuff. Does this make sense? Vectorization now seem to work fine on AMD. On CPU, Intel fails to make it SSE2. AMD succeeds: OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). Using device 1: AMD FX(tm)-8120 Eight-Core Processor Compilation log: /tmp/OCLat3GEL.cl(253): warning: unknown attribute "vec_type_hint" __kernel void __attribute__((vec_type_hint(uint4))) SetCryptKeys( This is odd. Both AMD and nvidia emit this warning - I can't see what I do wrong. Works fine on Intel SDK (though Intel stupidly fails to vectorize, this is also odd). Note: OpenCL device is CPU. A non-OpenCL build may be faster. Local worksize (LWS) 8, Global worksize 1024, KPC 4096 Benchmarking: RAR3 (4 characters) [OpenCL]... DONE Raw: 947 c/s real, 118 c/s virtual I might have to remove that remark about CPU build being faster... this is on Bull's 8 cores. I suppose this is faster than OMP. However it fails to even build for nvidia, with weird warnings that might suggest there are bugs in the nvidia API. Speed on HD 7970 indicated > 17000 c/s but an ASIC hang stopped be from further testing. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.