musl - Re: Considering x86-64 fenv.s to C

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.02.2001241227030.6298@key0.esi.com.au>
Date: Fri, 24 Jan 2020 15:13:32 +1100 (AEDT)
From: Damian McGuckin <damianm@....com.au>
To: musl@...ts.openwall.com
Subject: Re: Considering x86-64 fenv.s to C


Let me know if I missed something in the following:

On Thu, 23 Jan 2020, Rich Felker wrote:

>> #ifndef FE_ALL_EXCEPT
>> #define FE_VALID_EXCEPT (FE_INEXACT|FE_DIV_BY_ZERO|FE_UNDERFLOW|FE_OVERFLOW)
>> #define FE_ALL_EXCEPT   (FE_INVALID|FE_VALID_EXCEPT)
>> #endif
>
> I don't understand what this is for. FE_ALL_EXCEPT is defined in the
> public fenv.h and FE_VALID_EXCEPT is not used.

For the first question, it was done purely to avoid the need to define 
FE_ALL_EXCEPT in all of the various 'fenv.h' files. I am happy to throw it 
away.  For the second question, at one stage, I considered having the 
concept of what was anything except an INVALID exception.

I will remove those 4 lines.

>> /*
>>  * The usual approach to validating an exception mask is to throw out
>>  * anything which is illegal. Some systems will NOT do this, choosing
>>  * instead to return a non-zero result on encountering an illegal mask
>>  */
>
> The latter is not desirable behavior and not something you need to be
> trying to reproduce. It's a bug or at least an unspecified case we
> don't care about in existing code.

Wonderful. I was purely trying to preserve historical performance. I am 
more than happy to NOT do this. That makes the next problem easy to fix.

>> #ifndef FE_VALIDATE_EXCEPT
>> #define FE_VALIDATE_EXCEPT(e)   e &= FE_ALL_EXCEPT
>> #endif
>
> Write things like this directly in code; don't hide them behind
> macros. Doing that just makes the code hard to read because you can't
> tell what it's doing without searching for the macro definition. (Also
> the macro is not protected by parens as written but that's easily
> fixable.)

Because we can ignore that non-zero return case, we no longer need
an architecture dependent macro for this so it can go directly in
the code. So, again, an easy problem to fix.

>> /*
>>  * There is usually no need to mess the generic valid exceptions bits
>>  * given to 'feclearexcept' and 'feraiseexcept' so define a 'nothing'
>>  * macro for such a scenario - is this the MUSL accepted style????
>>  */
>> #ifndef FE_QUALIFY_CLEAR_EXCEPT
>> #define FE_QUALIFY_CLEAR_EXCEPT(excepts)    (0)
>> #endif
>> #ifndef FE_QUALIFY_RAISE_EXCEPT
>> #define FE_QUALIFY_RAISE_EXCEPT(excepts)    (0)
>> #endif
>
> Can you clarify how these would be defined for powerpc? Could the logic 
> just be embedded in get_sr/set_sr if needed?

Address the second question first, my original aim was to keep any logic 
out of where embedded assembler was used, mainly because of my own very
irregular experience with it. I could also not convince myself that
  get_sr and set_sr may need to know from where they are called. In the 
case of the powerpc64, with the SR and CR being the same, that adds to the 
complexity, so I gave up.

Answering your first question, for the powerpc64.

#define	QUALIFY_CLEAR_EXCEPT(e)	if (e & FE_INVALID) e |= FE_ALL_INVALID
#define	QUALIFY_RAISE_EXCEPT(e)	if (e & FE_INVALID) e |= FE_INVALID_SOFTWARE

An alternative is to avoid these macros and then

 	#ifndef FE_ALL_INVALID
 	#define	FE_ALL_INVALID 0
 	#endif
 	#ifndef FE_INVALID_SOFTWARE
 	#define	FE_INVALID_SOFTWARE 0
 	#endif

and have respectively the following code in feclear.. and feraise..

 	if (excepts & FE_INVALID)
 		excepts |= FE_ALL_INVALID;
and
 	if (excepts & FE_INVALID)
 		excepts |= FE_INVALID_SOFTWARE;

An optimize should see the OR with zero for most architectures and then 
ignore that whole 'if' statement because the action was a NO-OP.

>
>> /*
>>  * Compute the rounding mask with some rigid logic
>>  */
>> #ifndef FE_TO_NEAREST
>> #define FE_TO_NEAREST   0
>> #define ROUND_MASK  ((unsigned int) 0)
>> #else
>> #ifndef FE_TOWARDS_ZERO
>> #define ROUND_MASK  ((unsigned int) 0)
>> #else
>> #ifdef  FE_DOWNWARD
>> #ifdef  FE_UPWARD
>> #define ROUND_MASK  ((unsigned int) (FE_DOWNWARD|FE_UPWARD|FE_TOWARDZERO))
>> #else
>> #define ROUND_MASK  UNSUPPORTED rounding
>> #endif
>> #else
>> #ifdef  FE_UPWARD
>> #define ROUND_MASK  UNSUPPORTED rounding
>> #else
>> #define ROUND_MASK  ((unsigned int) (FETOWARDZERO))
>> #endif
>> #endif
>> #endif
>> #endif
>
> I don't see why you have "UNSUPPORTED rounding" cases here. Any 
> combination should be valid.

My deliberate assumption was that the only legal combinations were

 	round to nearest only
or
 	round to nearest and truncate only
or
 	round to nearest, truncate, and round to either +/- INFINITY

Whether that assumption is valid is another thing but I am happy to be 
proven wrong.  If somebody defined an architecture where that assumption 
was not valid, I wanted the compiler to complain at me.

> Since these are not bitmasks, probably the arch should just have to 
> define the rounding mode mask. For example if the values were 0b000, 
> 0b001, 0b100, and 0b101, but the 0 in the middle was significant, it 
> should probably be treated as part of the mask.

That was not the way I read it but again, I am happy to be proven wrong.
If the net OR'ing of the various types of rounding cannot be treated as a 
mask, then every *BSD libc implementations that I know will need to be 
fixed because they do treat it that way. I did run tests and none of the
ones I tested from BSD and MUSL had that structure. Maybe my test code
was wrong. They all seems to be 2 bits where TO_NEAREST was all bits
zero and one of the others was the some of the other two.

> If you really want to generate the mask, something like
>
> #ifdef FE_TONEAREST
> #define FE_TONEAREST_MASK FE_TONEAREST
> #else
> #define FE_TONEAREST_MASK 0
> #endif
> ...
> #define ROUND_MASK (FE_TONEAREST_MASK | FE_DOWNWARD_MASK | ...)
>
> should be simpler and comprehensive.

I will run a test but I think they produce the same result for all
the architecture for which I have definition files.

>> int
>> feclearexcept(excepts)
>> {
>>     FE_VALIDATE_EXCEPT(excepts);
>>     FE_QUALIFY_CLEAR_EXCEPT(excepts);
>>     setsr(getsr() & ~((unsigned int) excepts));
>>     return(0);
>> }
>
> No function-like parens around return value, no gratuitous casts.

No problems. Sorry, gratuitous casting is a habit to stop various linters 
complaining. I went looking for a MUSL style guide but could not find one.

>> static inline int
>> __raisearithmetically(int excepts)
>> {
>>     /*
>>      * assume single OP is faster than double OP
>>      */
>>     const float one = (float) 1;
>>     const float zero = (float) 0;
>>     const float tiny = (float) 0x1.0p-126;
>>     const float huge = (float) 0x1.0p+126;
>>     volatile float x;
>>
>>     /*
>>      * if it is just a simple exception, arithmetic expressions are optimal
>>      */
>>     switch(excepts)
>>     {
>>     case FE_INVALID:
>>         x = zero, x /= x;
>>         break;
>>     case FE_DIVBYZERO:
>>         x = zero, x = one / x;
>>         break;
>>     case FE_INEXACT:
>>         x = tiny, x += one;
>>         break;
>>     case (FE_OVERFLOW | FE_INEXACT):
>>         x = huge, x *= x;
>>         break;
>>     case (FE_UNDERFLOW | FE_INEXACT):
>>         x = tiny, x *= x;
>>         break;
>>     default: /* if more than one exception exists, a sledgehammer is viable */
>>         setsr(getsr() | ((unsigned int) excepts));
>>         break;
>>     }
>>     return(0);
>> }
>
> This is probably ok (aside from style things mentioned above), but 
> simply recording the flags in software without writing them to the 
> status register at all may be preferable. In that case we would not even 
> need a set_sr primitive, only get_sr and clear_sr.

Yes. I like the idea. But I was deliberately not trying to change anything 
external to 'fenv.c'. Also, is the handling of __pthread_self() protected
into the future?

> Doing this would also simplify the ppc special-casing for FE_INVALID I
> think..

I think only for feraiseexcept as you show.  It does not simplify 
feclearexcept as far as I can see.

>> int
>> feraiseexcept(int excepts)
>> {
>>     FE_VALIDATE_EXCEPT(excepts);
>>     FE_QUALIFY_RAISE_EXCEPT(excepts);
>>     return __raisearithmetically(excepts);
>> }
>
> Then this could just be __pthread_self()->fpsr |= excepts;

Very clean and simple. Is it portable outside of Linux? Not that this is
relevant for MUSL which is designed for Linux-based devices but I do work
on non-Linux (and no C/C++) systems so I am curious.

>> int
>> fegetround(void)
>> {
>>     return (int) (getcr() & ROUND_MASK);
>> }
>>
>> int
>> fesetround(int rounding_mode)
>> {
>>     if ((rounding_mode & ~ROUND_MASK) == 0)
>>     {
>>         unsigned int mode = ((unsigned int) rounding_mode);
>>
>>         return (setcr((getcr() & ~ROUND_MASK) | (mode & ROUND_MASK)), 0);
>>     }
>>     return(-1);
>> }
>
> I would put the error test at the beginning rather than putting the
> whole body inside {}:
>
>     if (rounding_mode & ~ROUND_MASK) return -1;

Happy to.  A while ago, I found that the alternative produces more optimal 
code when dealing with floating point dominant routines. But that is huge 
generalization and I have not tested that generalization in this case.

>> int
>> fegetenv(fenv_t *envp)
>> {
>>     return ((gete(envp)), 0);
>> }
>
> Is there a reason for using comma operator here?

I used it to let the compiler prove that my #defines for gete() and sete() 
worked as if they were a single statement. That gave me some rigour on the 
design of that macro, which in the case of the m68k (and eventually the 
i386), does demand a comma operator.

But for the production code, it will disappear.

>> int
>> fesetenv(fenv_t *envp)
>> {
>>     fenv_t envpd = FE_DFL_ENV_DATA, *e = envp == FE_DFL_ENV ? &envpd : envp;
>>
>>     return ((sete(envp)), 0);
>> }
>
> It might be preferable to use a static const object for default env;
> I'm not sure. Not a big deal either way though.

I thought of that. I used to always do that until 5 years ago until I 
found that an auto variable would often produce more optimal code. It
needs to be checked but it is irrelevant to the fundament design issues.

Ongoing:

I will incorporate your suggestions except for the __pthread_self()->fpsr
tweaks which I like but cannot test across multiple O/S's and then we can
review this.  Should I repost to make sure I got everything and we have a
reference point?

I will post some inline assembler for your review and entertainment but in 
the spirit of a generic template, they should be reviewed in isolation.

Also, I notice that on the mips, the single unsigned in an 'fenv_t' is 
encapsulated in a struct. Is this historical, mandatory, stylistic, a way 
to fix a compiler bug, compatability or something else? I also notice that 
there is a subtle difference in the 'fegetenv' assembler between

 	diff mips/*.S mips64/*.S

 	64c64
 	< 	addiu	$5, $4, 1
 	---
 	> 	daddiu	$5, $4, 1

I guess this is just struct alignment issues? Sorry, it is 20 years since 
I worked with the assembler on a MIPS so my knowledge is really rusty.

Regards - Damian

Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.