Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 6 Aug 2021 19:23:19 +0200
From: "Stefan Kanthak" <stefan.kanthak@...go.de>
To: "Rich Felker" <dalias@...c.org>
Cc: "Alexander Monakov" <amonakov@...ras.ru>,
	"Szabolcs Nagy" <nsz@...t70.net>,
	<musl@...ts.openwall.com>
Subject: Re: [Patch] src/math/i386/remquo.s: remove conditional branch, shorter bit twiddling

Rich Felker <dalias@...c.org> wrote:

If you don't want patches for assembly modules, please state so VERY
CLEAR in your FAQ.


> On Fri, Aug 06, 2021 at 12:17:12PM +0200, Stefan Kanthak wrote:
>> Alexander Monakov <amonakov@...ras.ru> wrote:
>> 
>> > On Wed, 4 Aug 2021, Stefan Kanthak wrote:
>> >> The change just follows by removing 6 LOC/instructions.-)
>> > 
>> > Have you considered collecting the three bits in one go via a multiplication?
>> 
>> No. My mind is not that twisted;-)
>> 
>> > You can first isolate the necessary bits with 'and $0x4300, %eax', then do
>> > 'imul $0x910000, %eax, %eax' to put the required bits in EAX[31:29] in the
>> > right order, then shift right by 29. Three instructions, 14 bytes.
>> 
>> Thanks, VERY NICE! How did you come up to it?
>> 
>> Revised patch with shorter bit twiddling attached.
> 
> The path forward for all the math asm is moving it to inline asm in C
> files, with no flow control or bit/register shuffling in the asm, only
> using asm for the single instructions. See how Alexander Monakov did
> x86_64 remquol in commit 19f870c3a68a959c7c6ef1de12086ac908920e5e.

This commit is for i386 fmod/fmodf/fmodl. The bit twiddling used in
<https://git.musl-libc.org/cgit/musl/plain/src/math/x86_64/remquol.c>
(which I hadn't noticed yet) and the code GCC generates for it is but
(almost) as bad as the original assembly code:

|        shrl    $8, %eax
|        movl    %eax, %ecx
|        movabsq $8463725162920157216, %rax
|        rolb    $4, %cl
|        andl    $60, %ecx
|        sarq    %cl, %rax
|        andl    $7, %eax

vs.

|        mov %ah,%dl
|        shr %dl
|        and $1,%dl
|        mov %ah,%al
|        shr $5,%al
|        and $2,%al
|        or %al,%dl
|        mov %ah,%al
|        shl $2,%al
|        and $4,%al
|        or %al,%dl

> I haven't read the mul trick here in detail but I believe it should be
> duplicable with plain C * operator.

It is.

> I really do not want to review/merge asm changes that keep this kind
> of complex logic in asm when there's no strong motivation for it (like
> fixing an actual bug, vs just reducing size or improving speed). The
> risk to reward ratio is just not reasonable.

Final patch attached!

Stefan
Download attachment "remquo.patch" of type "application/octet-stream" (3126 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.