john-dev - Re: Problems on recent AMD driver

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <082b6bc7f200757201c9d9e12146cc8a@smtp.hushmail.com>
Date: Wed, 21 Nov 2012 22:06:35 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Problems on recent AMD driver

On 21 Nov, 2012, at 19:53 , magnum <john.magnum@...hmail.com> wrote:
> On 21 Nov, 2012, at 19:11 , Claudio André <claudioandre.br@...il.com> wrote:
>> Even crazier; commit a99312a works on Radeon 6770, GTX 570 and CPUs. Without doing the proper allocation? Yes. How? It shouldn't!!
> 
> You mean the host code gets a pointer that is not made with CL_MEM_ALLOC_HOST_PTR? This is not bound to fail, it's just not guaranteed to work... On most platforms I guess it will work fine, especially 64-bit ones.

On second thought I believe it's not violating any spec and it *should* work on all platforms. It maps the memory object (host addressable or not) into host addressable space. The difference is that if you use CL_MEM_ALLOC_HOST_PTR, you should get DMA transfer or zero-copy, while if you don't you might end up with a slow transfer.

Per the Khronos spec (as I understand it) we can do one of these things:

Init:
  1. mem_object = clCreateBuffer(...);
  2. host_ptr = malloc(...); /* or just an array */

Loop:
  3. (write stuff to host_ptr buffer)
  4. clEnqueueWriteBuffer(..., mem_object, ..., host_ptr, ...);
  5. clEnqueueNDRangeKernel(...);
  6. goto 3

or using map and unmap:

Init:
  1. mem_object = clCreateBuffer(...);

Loop:
  2. host_ptr = clEnqueueMapBuffer(..., mem_object, ...);
  3. (write stuff to host_ptr buffer)
  4. clEnqueueUnmapMemObject(..., mem_object, host_ptr, ...);
  5. clEnqueueNDRangeKernel(...);
  6. goto 2

...but per nvidia whitepapers, the following (which is what I do, and others before me) is recommended for using "pinned" or "page-locked" memory. And it's a mix of the above:

Init:
  1. mem_object = clCreateBuffer(...);
  2. host_ptr = clEnqueueMapBuffer(..., mem_object, ...);

Loop:
  3. (write stuff to host_ptr buffer)
  4. clEnqueueWriteBuffer(..., mem_object, ..., host_ptr, ...);
  5. clEnqueueNDRangeKernel(...);
  6. goto 3

So in this last example, we map it once and never unmap - and we are *also* using clEnqueueWriteBuffer(). If you only go per the Khronos spec, this seems to violate things. From the spec of clEnqueueMapBuffer: "The behavior of OpenCL function calls that enqueue commands that write or copy to regions of a memory object that are mapped is undefined". Maybe I should ask on the Khronos forum to get this straightened out for certain.

If someone can see anything I've got wrong in these three, please yell, even if you are not sure. I am not sure %-)

Someone that reads this oughta know. Milen?

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.