musl - Re: [PATCH] math: move i386 sqrtf to C

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200110091710.GQ23985@port70.net>
Date: Fri, 10 Jan 2020 10:17:10 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] math: move i386 sqrtf to C

* Rich Felker <dalias@...c.org> [2020-01-09 21:07:47 -0500]:
> On Fri, Jan 10, 2020 at 12:18:58AM +0100, Szabolcs Nagy wrote:
> > i think -fexcess-precision=standard was introduced in
> > gcc 4.5 and to get reliable behaviour before that we
> > needed -ffloat-store.
> 
> I don't think the behavior was "reliable" with -ffloat-store; it's
> wrong with respect to the defined meaning of FLT_EVAL_METHOD.

well reliable in the sense that the results are more
consistent with other targets that have FLT_EVAL_METHOD==0.

> > i think we would need to add back the old annotations
> > to make old compilers safe without -ffloat-store.
> > (fdlibm often raises fenv exceptions via a final rounding
> > before return, those could be often handled more cleanly
> > by __math_oflow etc helpers, but since it was not designed
> > for inline errno handling some normal return paths can
> > raise fp exceptions too and thus need eval_as_* annotation).
> 
> I think I asked you before, but from a standpoint of fenv stuff, I'm
> confused why the eval_as_* things are useful at all; it looks like you
> would need fp_barrier* to ensure they're actually evaluated (e.g. in
> the presence of LTO with a compiler that doesn't honor fenv right).

if you mean the current definition of eval_as_* in musl,
then those are not useful (except in the middle of some
arithmetic expression or assignment to double_t etc
where c99 normally would not drop excess precision)

their point is to annotate where we need to drop excess
precision so we can define it appropriately for specific
targets (and allow all other places to use excess prec).

e.g. in case of 'return 0x1p999*0x1p999' to raise overflow
and return inf, we need fp barrier to avoid const folds and
eval_as_double to drop excess precision so it becomes:

  return eval_as_double(fp_barrier(0x1p999) * 0x1p999);

in principle fp_barrier need not drop excess precision so

  return fp_barrier(fp_barrier(0x1p999) * 0x1p999);

would not be enough, but i ended up using float and double
in fp_barrier instead of float_t and double_t so now it
drops excess precision too (may be a mistake, but we almost
always want to drop excess prec when forcing an evaluation).
separate eval_as_double is still useful since on most targets
it is a nop while fp_barrier is not a nop. (but e.g. i386
could define eval_as_double as fp_barrier)

the bigger problem with fp_barrier is that it just hides
a value (behind volatile or asm volatile) but it should be
forcing the mul operation, e.g. currently

  if (cond)
    return eval_as_double(fp_barrier(0x1p999) * 0x1p999);

may be transformed to

  if (cond)
    x = fp_barrier(0x1p999);
  y = x * 0x1p999;
  if (cond)
    return eval_as_double(y);

by a compiler that assumes mul has no side-effects and
then the mul is evaluated unconditionally (but such
transformation is unlikely so in practice the barrier
works).

i don't think there is a trick that allows us to force
the evaluation of an individual fp op, so for now i'm
happy with fp_barrier and eval_as_*.

> 
> But I think it's also useful to distinguish between possibility of
> wrong exceptions being raised, which is a rather minor issue since
> some widely-used compilers don't even support fenv reasonably at at
> all, and the possibility of wrong values being returned for functions
> where the result is required to be correctly rounded. I would deem it
> a serious problem for sqrt[f] or fma[f] to return the wrong value when
> compiled with gcc3 or pcc. I don't think I would particularly care if
> exceptions failed to be raised properly when compiled with gcc3 or
> pcc, though. So I probably would like to ensure that, whatever code we
> end up with in i386 sqrt[f].c, it it ends up working even if the
> compiler does not handle excess precision correctly.

note that rounding down to double to force an overflow
usually also forces an infinity and without the force
the result would be >DBL_MAX finite which then can cause
problems, just like sqrt with excess prec.

so most eval_as_double has the same importance: if any
of them is missed the result may be wrong.

i have fp_force_eval which has no return value and is
only used to force side-effects, if you don't care about
fenv then that may be defined as nop.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.