Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 21 Apr 2012 03:39:49 +0300
From: Milen Rangelov <gat3way@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: cl_khr_byte_addressable_store

Yes, there are natively no 8-bit writes, it's 32-bit. When you actually
write a char to local or global memory, using the extension, the compiler
does some things behind the scenes. uchar writes to global memory use
complete path. Writes to local memory have (I guess) implicit barrier even
if each workitem writes to its part of local memory because the compiler
can't be smart enough to figure that out. Not that you can't use byte
addressable stores, but it's slow because the compiler can't make the
assumptions about your code. Fortunately though, bitwise macros work good
for both hardware vendors, even with vectorized code.

Anyway that's just my opinion, I am not trying to impose it. The best thing
is to code several versions and profile/benchmark them.



On Sat, Apr 21, 2012 at 2:26 AM, magnum <john.magnum@...hmail.com> wrote:

> On 04/21/2012 01:03 AM, Solar Designer wrote:
> > On Sat, Apr 21, 2012 at 12:45:56AM +0200, magnum wrote:
> >> Then I'm afraid you lost me. Just how should I approach this? Should I
> >> do two separate kernels or should I try some kind of bit-flipping
> >> madness that just might work on both AMD and nvidia?
> >
> > I can't speak for Milen, but I guess that to write a byte you need to
> > read a naturally aligned 4-byte word, mask out the original byte in it,
> > OR in your new byte value, and write that word back.  Of course, this is
> > non-atomic, but you should not be accessing nearby bytes from another
> > thread anyway.
> >
> > An obvious optimization would be to combine multiple byte writes
> > together such that you read/write fewer words (such as one per 4 bytes).
>
> Yes, thanks. I already do things similar to what you say for performance
> reasons but the non-aligned cases will get nasty (or tedious at the very
> least) if I am not allowed to ever write an unaligned byte. I am really
> surprised by this limitation, this was not the obstacles I was picturing
> when I got into this game.
>
> The older I get, the older I become :)
>
> magnum
>
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ