Date: Tue, 22 Oct 2013 14:10:06 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: OpenCL vectorizing how-to. On 2013-10-22 04:08, Lukas Odzioba wrote: > 2013/10/21 magnum <john.magnum@...hmail.com>: >> One thing I can't understand is why pre-vectorized code with the correct >> width is not used "as-is" by these compilers. Apparently the compiler first >> scalarizes it and then re-vectorizes it - with very poor results, at least >> on Well. OTOH this isn't a problem now that we can supply the requested >> [lack of] width. > > "(...)We're likely to generate better code(...)" :) > > I guess better is not necessarily faster :) > > http://www.youtube.com/watch?feature=player_detailpage&v=QsoLyvvhRuc#t=853 Thanks, that was interesting. He did not fully answer my question though. If I supply a kernel with the native width of the device, they could compile it to eg. AVX2 or AVX512 instructions right away with no added execution masks or other overhead. In some cases even the key buffer as supplied from host code can be vectorized (and we could do the same with the output buffer) - but if they ask me for scalar code they will obviously get a scalar buffer so the end result will be unnecessarily complicated vectorized code dealing with that. This particular case is not currently affecting any inner loop though. I think auto-vectorizing is a really great thing but I can't see why they refuse to use pre-vectorized code as supplied. Apparently the assumption is that noone vectorizes (or should vectorize) for performance, only for the problem domain, as he puts it at ~15:05. Maybe in the long run they are right. Or maybe future versions of their optimizers will be able to analyze pre-vectorized code and decide whether it could be used without "re-vectorizing". magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.