Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 24 Mar 2012 23:24:40 +0100
From: magnum <>
Subject: OpenCL vector tactics

Could someone brave tell me why/how/if to use vector types like uint4 in
OpenCL kernels? I do understand sse2 intrinsics, like sse-intrinsics.c
in john, but as far as I understand, that does not translate well to
OpenCL. I don't quite get how to apply vectors to OpenCL.

How would I attack it? Would each kernel do four inputs and produce four
outputs? If I have an existing kernel that does one input and ends up
with one output, should I just convert it to do four things at a time
for every move it makes, just like sse2 intrinsics? Just like that? And
if so, what would I set as local workgroup size? The real size divided
by four?

And what would this accomplish? Did I understand right that this would
benefit CPUs and perhaps AMD GPUs but likely not nvidia? Would it be
detrimental for nvidia?


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.