musl - Re: Considering x86-64 fenv.s to C

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200117145350.GR30412@brightrain.aerifal.cx>
Date: Fri, 17 Jan 2020 09:53:50 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Considering x86-64 fenv.s to C

On Fri, Jan 17, 2020 at 02:36:20PM +1100, Damian McGuckin wrote:
> 
> Feedback/Discussion please, especially in terms of what extra
> comments I need to make?  I hope I have not missed anything.
> 
> General Comments
> ****************
> 
> Except where noted, the approach taken to invalid input is to mask
> out the invalid data, use what data is left, and never inform the
> calling program of invalid data.
> 
> The i386(sometimes), X32 and X86-64 generally need to realise that
> they have both the X87 FPU and the SSE.  Are there scenarios where
> this will not be the case or do we need to plan for future
> scenarious where this will not be the case?
> 
> Do we need to consider what is in the latest IEEE 754 2019 standard
> to see what enhancements are needed or just wait for C2X?
> 
> Other Architectures
> *******************
> 
> Should we look at what is needed for Sparc and Power9 to ensure that
> the (eventually-) chosen abstraction will work with these? Are there
> any other chips which need to be considered. If you look at more
> recent chipset designs, they have all been able to leverage the
> experience of working with IEEE 754 exceptions and rounding and
> follow the same style of use of an exception status and round
> control register . So I think catering for the current crop, plus
> those 2 mentioned above, should be adequate. But
> am I wrong?
> 
> Is Power9 the same as PowerPC64?  I have never seen one. I know I do
> not know enough about this chip as the 128-bit floating point
> discussion talks about Rounding-To-Odd mode? I have tried to read
> the 1358 pages of the ISA 3.0 architecture manual but I have a long
> way to go before I know even 10% of what is in there. Are the newer
> beefy ARMS likely to change what they
> do not in the context of 'fenv' routines?
> 
> Also, and I could be wrong, currently MUSL assumes that there is an
> integral type for every floating type.  On some architectures, I
> believe this is not always the case for 128-bit floating point
> numbers. On some Sparcs, I am not sure it was even the case for
> 64-bit numbers but that was a long time. I do not think that this
> restriction will influence anything here.  How it affects MUSL in
> general is another question irrelevant to this discussion.
> 
> Summary
> *******
> 
> aarch64 (arm)
> 
> *	All assembler
> 
> arm (bare)
> 
> *	Empty

I think you missed that there's fenv-hf.S for armhf. The default ABI
does not have fpu/fenv.

> i386
> 
> *	All assembler
> 
> *	The fldenv instruction to update the status registers has a serious
> 	overhead which cannot be avoided in 'feraiseexcept'. No attempt is
> 	made to optimize any unnecessary usage (as occurs in feclearexcept).

One thing we could do in C is write feraiseexcept portably to raise
the exceptions via performing appropriate floating point operations,
rather than directly modifying status word. This would probably be
faster on most if not all archs.

> 	Note that fldenv also makes the 'feclearexcept' routine unavoidably
> 	complex.

Note that the high level C could avoid any action when the flags to be
cleared are already clear.

> *	What is the best way to query '__hwcap' from inline __asm__ statement,

>From *inline* asm you don't have to do it at all. You just write the
branch in C. This is one of the reasons to prefer C.

> 	specifically to verify if SSE instructions have to be supported

It's not to verify that they have to be supported, rather to verify
that they *can* be used. If the bit is not set in hwcap, the
instructions are not there and will fault if executed.

> m68k
> 
> *	In C.
> 
> *	Very clear
> 
> *	feclearexcept and feraiseexcept
> 
> 		if (exception_mask & ~FE_ALL_EXCEPT) return (-1)
> 
> 	Different to the way others handle invalid input. Is this cast
> 	behaviour cast in stone based on standard documentation?

C specifies it as:

    "The feraiseexcept function returns zero if the excepts argument
    is zero or if all the specified exceptions were successfully
    raised. Otherwise, it returns a nonzero value."

It's not 100% clear to me that this is supposed to apply to invalid
arguments rather than just some kind of failure to raise valid
arguments, but I'd err on the side of assuming it does apply if we're
overhauling this code in C. It's a minor issue though.

> mips/mips64/mipsn32
> 
> *	All assembler
> 
> *	Not overly complex.
> 
> powerpc
> 
> *	All assembler
> 
> *	I think that this architecture has more exception bits than IEEE 754
> 	specifies. It has lots of specific cases of FE_INVALID. This needs
> 	to be considered when dealing with FE_INVALID.

I'm trying to remember -- does the hardware lack a unified FE_INVALID
bit, so that you have to emulate it by aggregating the individual
ones? I think that could be hidden inside a primitive that
loads/stores the status word.

> powerpc64
> 
> *	In C.
> 
> *	Very clear
> 
> *	Note that this architecture has more exception bits than IEEE 754
> 	specifies. It has lots of specific cases of FE_INVALID. This needs
> 	to be considered when dealing with FE_INVALID.
> 
> *	This is the first time I have seen this style of coding to cast a
> 	double to a union and then extract the data as a long.
> 
> 		return (union {double f; long i;}) {get_fpscr_f()}.i;
> 
> 	Is this style of coding universally accepted within MUSL? From my
> 	reading of other routines, it is normally done as
> 
> 		union {double f; long i;} f = { get_fpscr_f() };
> 
> 		return f.i;
> 
> 	Just curious.

Yes, the compound literal form is preferred since it avoids a
gratuitous named variable.

> riscv64
> 
> *	All assembler.
> 
> *	Very clear.
> 
> *	The architecture has obviously been done after a review of lots
> 	of experience with the IEEE 754 standard.
> 
> s390x
> 
> *	In C.
> 
> *	Very clear.
> 
> *	Why is __fesetround(int) 'hidden'? Where is fesetround()?

In src/fenv. It does the validity check in C before calling into the
arch backend.

> sh (SuperH??)
> 
> *	In assembler
> 
> *	I know zero about this assembler
> 
> *	There is some pecularity about updating the environment. I have no
> 	idea what is going on here. Anybody clear to elaborate?

The comment about preserving precision bit is just incorrect as long
as this is an external function. The function call ABI has precision
set to double on function entry. If it's inline asm we might have to
think about ensuring it's safe. I'm not sure how to constrain the
precision on __asm__ statement entry but I assume there must be a way
or you couldn't write inline asm using floating point operands.

> x32
> 
> *	In assembler
> 
> *	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?
> 
> *	It is this really much the same as x86-64 (or am I wrong)?

Yes, they're identical except that pointers in memory occupy only 4
bytes.

> x86_64
> 
> *	In assembler
> 
> *	Why does 'feclearexcept' disrespect the flags by clearing ALL x86 bits?

As you said above, updating x87 status register is expensive because
the only way to write it is to read-modify-write the whole fenv. But
since we know on x86_64 we have sse registers, we can just move all
the flags to the sse register, then use fnclex to clear the x87 one
inexpensively, and the effective set of raised flags remains the same.

I think we could do this on i386 too with a couple tricks:

1. Do the same thing if sse is available (hwcap check).

2. If sse is not available, clear all flags then re-raise the desired
set via arithmetic operations.

Note that this approach is not compatible with trapping exceptions,
but we don't support them anyway.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.