Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 20 Jan 2020 23:22:31 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Considering x86-64 fenv.s to C

On Tue, Jan 21, 2020 at 02:53:53PM +1100, Damian McGuckin wrote:
> 
> On Thu, 16 Jan 2020, Rich Felker wrote:
> 
> >Would you be interested in assessing what kind of abstraction makes
> >sense here?
> 
> I think it is quite difficult, but eventually feasibly.
> 
> Even having one abstract version for i386/x32 and x86_64 is not easy.
> 
> My thoughts were to do an abstraction that works for at least those three,
> simplify this to be even more abstract,  and then see how well it
> works for say something else.  The i386/x32 and x86 are arguably
> among the worst as
> they effectively have 2 lots of status and control registers which are
> not synced on-chip but that need to be for MUSL.

It's possible that the x86's are actually the worst fit for the
abstraction, and should be left separate, while the rest are unified.

> The only assembler in which I have even limited skills is Sparc32/64
> which is not terribly useful for MUSL but in terms of an
> abstraction, may be as good as anything. I will be investing in an
> ARM soon but my skills will be starting from a base of none.

If you don't feel ready to do unification or work on archs you're
unfamiliar with, I think it's okay to either (1) only do the x86 work
now, with no unification, or (2) start the unification in
src/fenv/*.c, but with the arch files left in place in
src/fenv/*/*.[csS] for all the archs that haven't been converted yet.
I don't want to block improvement of the x86 versions just because the
bigger task is too big.

> On Fri, 17 Jan 2020, Rich Felker wrote:
> 
> >As you said above, updating x87 status register is expensive because
> >the only way to write it is to read-modify-write the whole fenv. But
> >since we know on x86_64 we have sse registers, we can just move all
> >the flags to the sse register, then use fnclex to clear the x87 one
> >inexpensively, and the effective set of raised flags remains the same.
> >
> >I think we could do this on i386 too with a couple tricks:
> >
> >1. Do the same thing if sse is available (hwcap check).
> 
> Yes.
> >
> >2. If sse is not available, clear all flags then re-raise the desired
> >set via arithmetic operations.
> 
> That works.  That said, Based on a comment earlier today, my
> thoughts are to use an arithmetic expression for the case where only
> a single exception was active, including the pairs INEXACT/OVERFLOW
> and INEXACT/UNDERFLOW, and use a fegetenv/set-register/fesetenv for
> anything more complex.

I think arithmetic should be far better for *any* case it works on.

Another really stupid but perhaps very efficient idea we could do is
just emulating the flags. Add a TLS slot for an fexcept_t value, move
exceptions there as needed, and or it onto the result when reading
back current exceptions. This would also make it dirt cheap for the
math library to raise any exception it wants, without needing
arithmetic, and it would make it possible to have the math library
return errors via exception flags even on softfloat archs.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.