john-dev - Re: cl_khr_byte_addressable

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d6840bd2d02a915a0a98e97bd16596e0@smtp.hushmail.com>
Date: Sat, 21 Apr 2012 13:03:55 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: cl_khr_byte_addressable_store

I'll try it out. I would get rid of the initial 16 endian swaps in
sha_block() as a bonus, I suppose that is a good thing.

magnum


On 04/21/2012 02:39 AM, Milen Rangelov wrote:
> Yes, there are natively no 8-bit writes, it's 32-bit. When you actually
> write a char to local or global memory, using the extension, the compiler
> does some things behind the scenes. uchar writes to global memory use
> complete path. Writes to local memory have (I guess) implicit barrier even
> if each workitem writes to its part of local memory because the compiler
> can't be smart enough to figure that out. Not that you can't use byte
> addressable stores, but it's slow because the compiler can't make the
> assumptions about your code. Fortunately though, bitwise macros work good
> for both hardware vendors, even with vectorized code.
> 
> Anyway that's just my opinion, I am not trying to impose it. The best thing
> is to code several versions and profile/benchmark them.
> 
> 
> 
> On Sat, Apr 21, 2012 at 2:26 AM, magnum <john.magnum@...hmail.com> wrote:
> 
>> On 04/21/2012 01:03 AM, Solar Designer wrote:
>>> On Sat, Apr 21, 2012 at 12:45:56AM +0200, magnum wrote:
>>>> Then I'm afraid you lost me. Just how should I approach this? Should I
>>>> do two separate kernels or should I try some kind of bit-flipping
>>>> madness that just might work on both AMD and nvidia?
>>>
>>> I can't speak for Milen, but I guess that to write a byte you need to
>>> read a naturally aligned 4-byte word, mask out the original byte in it,
>>> OR in your new byte value, and write that word back.  Of course, this is
>>> non-atomic, but you should not be accessing nearby bytes from another
>>> thread anyway.
>>>
>>> An obvious optimization would be to combine multiple byte writes
>>> together such that you read/write fewer words (such as one per 4 bytes).
>>
>> Yes, thanks. I already do things similar to what you say for performance
>> reasons but the non-aligned cases will get nasty (or tedious at the very
>> least) if I am not allowed to ever write an unaligned byte. I am really
>> surprised by this limitation, this was not the obstacles I was picturing
>> when I got into this game.
>>
>> The older I get, the older I become :)
>>
>> magnum
>>
>>
>

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.