Date: Sat, 21 Apr 2012 13:03:55 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: cl_khr_byte_addressable_store I'll try it out. I would get rid of the initial 16 endian swaps in sha_block() as a bonus, I suppose that is a good thing. magnum On 04/21/2012 02:39 AM, Milen Rangelov wrote: > Yes, there are natively no 8-bit writes, it's 32-bit. When you actually > write a char to local or global memory, using the extension, the compiler > does some things behind the scenes. uchar writes to global memory use > complete path. Writes to local memory have (I guess) implicit barrier even > if each workitem writes to its part of local memory because the compiler > can't be smart enough to figure that out. Not that you can't use byte > addressable stores, but it's slow because the compiler can't make the > assumptions about your code. Fortunately though, bitwise macros work good > for both hardware vendors, even with vectorized code. > > Anyway that's just my opinion, I am not trying to impose it. The best thing > is to code several versions and profile/benchmark them. > > > > On Sat, Apr 21, 2012 at 2:26 AM, magnum <john.magnum@...hmail.com> wrote: > >> On 04/21/2012 01:03 AM, Solar Designer wrote: >>> On Sat, Apr 21, 2012 at 12:45:56AM +0200, magnum wrote: >>>> Then I'm afraid you lost me. Just how should I approach this? Should I >>>> do two separate kernels or should I try some kind of bit-flipping >>>> madness that just might work on both AMD and nvidia? >>> >>> I can't speak for Milen, but I guess that to write a byte you need to >>> read a naturally aligned 4-byte word, mask out the original byte in it, >>> OR in your new byte value, and write that word back. Of course, this is >>> non-atomic, but you should not be accessing nearby bytes from another >>> thread anyway. >>> >>> An obvious optimization would be to combine multiple byte writes >>> together such that you read/write fewer words (such as one per 4 bytes). >> >> Yes, thanks. I already do things similar to what you say for performance >> reasons but the non-aligned cases will get nasty (or tedious at the very >> least) if I am not allowed to ever write an unaligned byte. I am really >> surprised by this limitation, this was not the obstacles I was picturing >> when I got into this game. >> >> The older I get, the older I become :) >> >> magnum >> >> >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.