Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 21 Apr 2012 01:26:21 +0200
From: magnum <>
Subject: Re: cl_khr_byte_addressable_store

On 04/21/2012 01:03 AM, Solar Designer wrote:
> On Sat, Apr 21, 2012 at 12:45:56AM +0200, magnum wrote:
>> Then I'm afraid you lost me. Just how should I approach this? Should I
>> do two separate kernels or should I try some kind of bit-flipping
>> madness that just might work on both AMD and nvidia?
> I can't speak for Milen, but I guess that to write a byte you need to
> read a naturally aligned 4-byte word, mask out the original byte in it,
> OR in your new byte value, and write that word back.  Of course, this is
> non-atomic, but you should not be accessing nearby bytes from another
> thread anyway.
> An obvious optimization would be to combine multiple byte writes
> together such that you read/write fewer words (such as one per 4 bytes).

Yes, thanks. I already do things similar to what you say for performance
reasons but the non-aligned cases will get nasty (or tedious at the very
least) if I am not allowed to ever write an unaligned byte. I am really
surprised by this limitation, this was not the obstacles I was picturing
when I got into this game.

The older I get, the older I become :)


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.