Date: Sat, 11 Aug 2012 20:17:44 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bf-opencl fails self-test on CPU On Sat, Aug 11, 2012 at 01:12:16PM +0530, Sayantan Datta wrote: > On Sat, Aug 11, 2012 at 8:17 AM, Solar Designer <solar@...nwall.com> wrote: > > Any idea why bf-opencl fails self-test on CPU (with AMD's SDK)? Will it > > succeed with some other settings in opencl_bf_std.h maybe? > > I guess it is due to the lack of LDS on CPU. I'm not sure though but I'll > find out. Yes, please. I'd expect that if we're exhausting some resource, we'd get a compile-time or a runtime error rather than a self-test failure. > Also is it necessary to run the bf-opencl on CPU? We might need a > little modified kernel for that. Ideally, we should be able to run the exact same OpenCL code on CPU as well, although this would not be expected to deliver optimal performance. We'd do it just for more extensive testing of the code. We've already seen that e.g. uses of uninitialized array elements are not always detected in individual builds/tests, so doing more kinds of builds may be more likely to expose bugs. We also need bf-opencl working on future Intel CPUs with AVX2 (where this might be faster than the existing CPU/OpenMP code) and on Intel MIC architecture coprocessors (there's no OpenCL for those yet, but it is expected to become available). In this context, it is good news that bf-opencl works with Intel's SDK already (as per magnum's message). So currently the problem is just with AMD's SDK when the target is CPU, which is less relevant - yet it could help possibly find and fix a bug. Then, we know what near-optimal performance on current CPUs is - so we have this target for performance when optimizing your OpenCL code on CPU. You may, for example, try the two hashes at a time approach (BF_X2 in the C code) and see if it helps on CPU and/or GPU. I'd expect that it'd only help on CPU currently, but who knows. > Here's one out of topic question: > In an sse2(with omp) build the non-opencl cpu version of bf scores around > 5300 c/s on fx 8120 . However I found that using the same build on i5 2500k > ,the cpu version benches at around 3300 c/s. Does that mean bulldozer is > really that much better than SB in this test ? Yes, Bulldozer is about the most suitable CPU for this task currently (if we're talking stock clock rates), although faster Sandy Bridge CPUs are not that much slower - e.g., Core i7-2600K (at stock clocks) with Hyperthreading enabled does 4800 c/s. I think i5-2500K lacks Hyperthreading. However, I think Sandy Bridge CPUs have more overclocking potential, so with a maximum stable overclock I think i7-2600K would outperform FX-8120 at this test. My guess is that it'd get to 6000+ c/s vs. overclocked FX-8120's 5650 c/s. I don't know if e.g. FX-8170 would be any faster than overclocked FX-8120 or not - I suspect not. 6-core Intel CPUs are slightly faster (e.g., 10800 c/s on two E5-2630 at stock clocks, meaning 5400 c/s per chip), but that's a different category. With AVX2 and proper code for it (maybe OpenCL with auto-vectorization by Intel's SDK, maybe intrinsics, maybe assembly), this should change, making all of the speeds above appear low. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.