musl - Re: Possible Mistype in exp.c

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190202033929.GK21289@port70.net>
Date: Sat, 2 Feb 2019 04:39:29 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Subject: Re: Possible Mistype in exp.c

* Morten Welinder <mwelinder@...il.com> [2019-02-01 18:26:59 -0500]:
> Is this code guaranteed to produced correctly-rounded results (like crlibm)?
> 
> I ask because having different machines produce different results is a
> no-go for my use case.  The preferred way of solving that is
> correctly-rounded results, but overriding libc's exp with a specific,
> reasonable implementation isn't out of the question.

libc implementations normally do not guarantee portable behaviour.

correct rounding is not practical for the libc. (i think it can have
similar average performance to an almost-cr implementation, but the
worst case will be 100x slower and it will require many big tables.
glibc used to be an exception: it had several double precision
functions impemented with correct rounding in nearest rounding mode,
but they were hard to maintain and nobody liked the >10000x latency
spikes, so it was decided long ago that cr should not be the goal
of glibc. meanwhile the ts 18661-4 spec introduced separate cr
prefixed names for correctly rounded math functions so if glibc
ever gets cr implementations again they will use different names.)

my new implementations are generic c code, and it's possible to get
portable results with the fp semantics of c across targets which
implement annex F with FLT_EVAL_METHOD==0: you have to turn fp
contraction off, disable builtins and other compiler smartness
(musl already does these) and remove any target specific ifdefs or
asm that may introduce non-portable results (in the new code only the
__FP_FAST_FMA checks).

that said, in practice there may be many reasons why a libc does not
provide completely portable behaviour:

- long double cannot be made portable since the format is different
  across targets.

- there are FLT_EVAL_METHOD==2 targets (musl supports i386 and m68k)
  which won't give the same results as ieee754 targets (without a
  correctly-rounding implementation).

- musl does not support other non-ieee754 targets, but they exist
  (and sometimes quite popular e.g. the flush-to-zero fpu mode).
  supporting such targets may become interesting at some point but
  portable results cannot be guaranteed.

- a target may have its own asm implementation if it has special
  instructions (currently only i386 has asm that may give different
  results).

- nan results may be different across targets since they are not
  fully specified by ieee754. there are other target variations such
  as when underflow is raised. so even on ieee754 targets, exactly
  matching behaviour is not always possible.

- fenv status may not be the same across targets (or compilers)
  because musl (and c99) does not care about inexact in many cases
  (so e.g. const folding of (int)1.2 may change it) and because
  some compilers don't support fenv access (e.g. clang, although
  the new code tries to fix that up as much as possible).

- compiler can introduce different results in user code anyway: libc
  only controls the libc cflags, not what's used for user code and
  e.g. gcc can const-fold math calls with correct rounding if it
  knows the inputs.

- not all targets have fma instruction, but fma usage can give
  significantly better performance on some targets and it's not nice
  to make things slower because of other targets. emulating fma is
  not realistic on non-fma targets (slow), but all modern cpus that
  are used for heavy numeric computation have fma so within those
  targets the results can be made portable without compromising
  prefromance. (it requires some changes to the new code not to rely
  on contraction for optimal fma usage which is non-deterministic
  across compilers)

- int rounding instructions can also make a significant difference
  in performance or quality if non-nearest rounding modes need to be
  supported (e.g. i currently don't have a satisfactory solution for
  >= double precision trigonometric argument reduction on targets
  that have no rounding mode independent nearest toint instruction).
  the fdlibm code is broken in non-nearest rounding modes, the fix i
  have is expensive and breaks useful properties, such fix is not
  necessary on some targets if we allow different results.

- i don't know other instructions that make a big difference across
  targets, but new ones may come up (e.g. 1/x and 1/sqrt estimate
  instructions may be useful in some algorithms, or target specific
  cutoff values depending on faster int vs fp cmp) and then someone
  may want to provide target specific optimizations and the libc
  has to evaluate the maintenance cost vs benefits.

> Also, what is the state of tests for the code?

there are no universal tests for math functions, i still maintain
detailed special case checks in libc-test, i ran statistical tests
that verify that the calculated ulp error bounds hold over many
millions of inputs and the glibc math tests pass (with and without
fma). there are a few regressions (e.g. exp2(-1023) raises underflow,
pow(4,1.5) raises inexact exception with the new code) and several
improvements (mostly precision and non-nearest rounding behaviour).
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.