Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 16 Oct 2015 10:17:27 +0300
From: Solar Designer <>
Subject: Re: 64-bit rotate on AMD GCN

On 2015-10-15 22:25, Pavel Semjanov wrote:
> Not working on small numbers and rotate by 8, like ror (0x220, 8).
> I guess it's bitalign error. The only one mention I found is:

Ouch.  When you say "on small numbers", do you mean only compile-time
constants, or also such numbers computed at runtime?

On Thu, Oct 15, 2015 at 11:02:56PM +0200, magnum wrote:
> What device and driver version(s) did you see that with? I recall Atom 
> told me he'd seen rotate() fail with numbers divisible by 8.  I'm pretty 
> sure he meant the OpenCL function but it could be the same underlying 
> bug. That was in June last year so maybe Cat 14.4 or something. I never 
> saw that very bug surface though.

The closest I had heard of is this comment by Alain:

"The 64 bit rotation is done manually, not using OpenCL rotate.
am_bitalign provides a very small speedup, but note that when used with
multiples of 8 it generate errors, at least when I test it, so we need
to use amd_bytealign then."

I never ran into these issues, even though the code I got into our
md5crypt-opencl actually uses amd_bitalign() only with multiples of 8
(since it does that for the unaligned writes).  I also tried
amd_bytealign(), which was similar speed, but I chose to stay with
amd_bitalign() since it's the same as NVIDIA's funnel shift, so is
easier to substitute with that in our macros.  OTOH, it wouldn't be hard
to multiply the constants by 8 in a macro if necessary.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.