Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 10 Jul 2012 08:19:02 +0400
From: Solar Designer <>
Subject: Re:

On Tue, Jul 10, 2012 at 09:23:11AM +0530, Sayantan Datta wrote:
> On Tue, Jul 10, 2012 at 8:16 AM, Solar Designer <> wrote:
> > Shouldn't we expect more like a 50% improvement, based on the speeds for
> > the implementation using global memory that you had before?  Compared to
> > your LDS-using implementation, we're adding uses of computing and memory
> > resources that would otherwise be completely idle.
> We are still not capable of utilizing 100% of the hardware.

Of course not.  But I don't see what prevents us from achieving the
combined speed of your global-memory-using and your LDS-using
implementations.  Yes, the former tried to use all SIMDs (if I
understand correctly), even though it kept them stalled waiting for data
most of the time, but can't we achieve roughly the same speed with fewer
SIMDs (such as with just two per CU, which we're not using for LDS),
since the task is memory speed bound anyway?  I think we'll saturate the
384-bit bus even with just two SIMDs per CU, or even with just one per
CU, for that matter (so we may save some electricity and heat
dissipation by leaving one SIMD per CU completely unused).

Am I missing something?


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.