Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 11 Aug 2012 21:59:16 +0530
From: Sayantan Datta <>
Subject: Re: bf-opencl fails self-test on CPU

On Sat, Aug 11, 2012 at 9:47 PM, Solar Designer <> wrote:

> On Sat, Aug 11, 2012 at 01:12:16PM +0530, Sayantan Datta wrote:
> > On Sat, Aug 11, 2012 at 8:17 AM, Solar Designer <>
> wrote:
> > > Any idea why bf-opencl fails self-test on CPU (with AMD's SDK)?  Will
> it
> > > succeed with some other settings in opencl_bf_std.h maybe?
> >
> > I guess it is due to the lack of LDS on CPU.  I'm not sure though but
> I'll
> > find out.
> Yes, please.  I'd expect that if we're exhausting some resource, we'd
> get a compile-time or a runtime error rather than a self-test failure.
> > Also is it necessary to run the bf-opencl on CPU? We might need a
> > little modified kernel for that.
> Ideally, we should be able to run the exact same OpenCL code on CPU as
> well, although this would not be expected to deliver optimal
> performance.  We'd do it just for more extensive testing of the code.
> We've already seen that e.g. uses of uninitialized array elements are
> not always detected in individual builds/tests, so doing more kinds of
> builds may be more likely to expose bugs.
> We also need bf-opencl working on future Intel CPUs with AVX2 (where
> this might be faster than the existing CPU/OpenMP code) and on Intel MIC
> architecture coprocessors (there's no OpenCL for those yet, but it is
> expected to become available).  In this context, it is good news that
> bf-opencl works with Intel's SDK already (as per magnum's message).
> So currently the problem is just with AMD's SDK when the target is CPU,
> which is less relevant - yet it could help possibly find and fix a bug.
> Then, we know what near-optimal performance on current CPUs is - so we
> have this target for performance when optimizing your OpenCL code on CPU.
> You may, for example, try the two hashes at a time approach (BF_X2 in
> the C code) and see if it helps on CPU and/or GPU.  I'd expect that it'd
> only help on CPU currently, but who knows.
> > Here's one out of topic question:
> > In an sse2(with omp) build the non-opencl cpu version of bf scores around
> > 5300 c/s on fx 8120 . However I found that using the same build on i5
> 2500k
> > ,the cpu version benches at around 3300 c/s. Does that mean bulldozer is
> > really that much better than SB in this test ?
> Yes, Bulldozer is about the most suitable CPU for this task currently
> (if we're talking stock clock rates), although faster Sandy Bridge CPUs
> are not that much slower - e.g., Core i7-2600K (at stock clocks) with
> Hyperthreading enabled does 4800 c/s.  I think i5-2500K lacks
> Hyperthreading.  However, I think Sandy Bridge CPUs have more
> overclocking potential, so with a maximum stable overclock I think
> i7-2600K would outperform FX-8120 at this test.  My guess is that it'd
> get to 6000+ c/s vs. overclocked FX-8120's 5650 c/s.  I don't know if
> e.g. FX-8170 would be any faster than overclocked FX-8120 or not - I
> suspect not.
> 6-core Intel CPUs are slightly faster (e.g., 10800 c/s on two E5-2630 at
> stock clocks, meaning 5400 c/s per chip), but that's a different category.
> With AVX2 and proper code for it (maybe OpenCL with auto-vectorization
> by Intel's SDK, maybe intrinsics, maybe assembly), this should change,
> making all of the speeds above appear low.
> Alexander

I will make a cpu optimized kernel targeting for intel cpus.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.