Date: Tue, 10 Jul 2012 08:19:02 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bf_kernel.cl On Tue, Jul 10, 2012 at 09:23:11AM +0530, Sayantan Datta wrote: > On Tue, Jul 10, 2012 at 8:16 AM, Solar Designer <solar@...nwall.com> wrote: > > > Shouldn't we expect more like a 50% improvement, based on the speeds for > > the implementation using global memory that you had before? Compared to > > your LDS-using implementation, we're adding uses of computing and memory > > resources that would otherwise be completely idle. > > We are still not capable of utilizing 100% of the hardware. Of course not. But I don't see what prevents us from achieving the combined speed of your global-memory-using and your LDS-using implementations. Yes, the former tried to use all SIMDs (if I understand correctly), even though it kept them stalled waiting for data most of the time, but can't we achieve roughly the same speed with fewer SIMDs (such as with just two per CU, which we're not using for LDS), since the task is memory speed bound anyway? I think we'll saturate the 384-bit bus even with just two SIMDs per CU, or even with just one per CU, for that matter (so we may save some electricity and heat dissipation by leaving one SIMD per CU completely unused). Am I missing something? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.