Date: Fri, 6 Aug 2021 19:23:19 +0200 From: "Stefan Kanthak" <stefan.kanthak@...go.de> To: "Rich Felker" <dalias@...c.org> Cc: "Alexander Monakov" <amonakov@...ras.ru>, "Szabolcs Nagy" <nsz@...t70.net>, <musl@...ts.openwall.com> Subject: Re: [Patch] src/math/i386/remquo.s: remove conditional branch, shorter bit twiddling Rich Felker <dalias@...c.org> wrote: If you don't want patches for assembly modules, please state so VERY CLEAR in your FAQ. > On Fri, Aug 06, 2021 at 12:17:12PM +0200, Stefan Kanthak wrote: >> Alexander Monakov <amonakov@...ras.ru> wrote: >> >> > On Wed, 4 Aug 2021, Stefan Kanthak wrote: >> >> The change just follows by removing 6 LOC/instructions.-) >> > >> > Have you considered collecting the three bits in one go via a multiplication? >> >> No. My mind is not that twisted;-) >> >> > You can first isolate the necessary bits with 'and $0x4300, %eax', then do >> > 'imul $0x910000, %eax, %eax' to put the required bits in EAX[31:29] in the >> > right order, then shift right by 29. Three instructions, 14 bytes. >> >> Thanks, VERY NICE! How did you come up to it? >> >> Revised patch with shorter bit twiddling attached. > > The path forward for all the math asm is moving it to inline asm in C > files, with no flow control or bit/register shuffling in the asm, only > using asm for the single instructions. See how Alexander Monakov did > x86_64 remquol in commit 19f870c3a68a959c7c6ef1de12086ac908920e5e. This commit is for i386 fmod/fmodf/fmodl. The bit twiddling used in <https://git.musl-libc.org/cgit/musl/plain/src/math/x86_64/remquol.c> (which I hadn't noticed yet) and the code GCC generates for it is but (almost) as bad as the original assembly code: | shrl $8, %eax | movl %eax, %ecx | movabsq $8463725162920157216, %rax | rolb $4, %cl | andl $60, %ecx | sarq %cl, %rax | andl $7, %eax vs. | mov %ah,%dl | shr %dl | and $1,%dl | mov %ah,%al | shr $5,%al | and $2,%al | or %al,%dl | mov %ah,%al | shl $2,%al | and $4,%al | or %al,%dl > I haven't read the mul trick here in detail but I believe it should be > duplicable with plain C * operator. It is. > I really do not want to review/merge asm changes that keep this kind > of complex logic in asm when there's no strong motivation for it (like > fixing an actual bug, vs just reducing size or improving speed). The > risk to reward ratio is just not reasonable. Final patch attached! Stefan Download attachment "remquo.patch" of type "application/octet-stream" (3126 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.