![]() |
|
Message-ID: <20190202033929.GK21289@port70.net> Date: Sat, 2 Feb 2019 04:39:29 +0100 From: Szabolcs Nagy <nsz@...t70.net> To: musl@...ts.openwall.com Subject: Re: Possible Mistype in exp.c * Morten Welinder <mwelinder@...il.com> [2019-02-01 18:26:59 -0500]: > Is this code guaranteed to produced correctly-rounded results (like crlibm)? > > I ask because having different machines produce different results is a > no-go for my use case. The preferred way of solving that is > correctly-rounded results, but overriding libc's exp with a specific, > reasonable implementation isn't out of the question. libc implementations normally do not guarantee portable behaviour. correct rounding is not practical for the libc. (i think it can have similar average performance to an almost-cr implementation, but the worst case will be 100x slower and it will require many big tables. glibc used to be an exception: it had several double precision functions impemented with correct rounding in nearest rounding mode, but they were hard to maintain and nobody liked the >10000x latency spikes, so it was decided long ago that cr should not be the goal of glibc. meanwhile the ts 18661-4 spec introduced separate cr prefixed names for correctly rounded math functions so if glibc ever gets cr implementations again they will use different names.) my new implementations are generic c code, and it's possible to get portable results with the fp semantics of c across targets which implement annex F with FLT_EVAL_METHOD==0: you have to turn fp contraction off, disable builtins and other compiler smartness (musl already does these) and remove any target specific ifdefs or asm that may introduce non-portable results (in the new code only the __FP_FAST_FMA checks). that said, in practice there may be many reasons why a libc does not provide completely portable behaviour: - long double cannot be made portable since the format is different across targets. - there are FLT_EVAL_METHOD==2 targets (musl supports i386 and m68k) which won't give the same results as ieee754 targets (without a correctly-rounding implementation). - musl does not support other non-ieee754 targets, but they exist (and sometimes quite popular e.g. the flush-to-zero fpu mode). supporting such targets may become interesting at some point but portable results cannot be guaranteed. - a target may have its own asm implementation if it has special instructions (currently only i386 has asm that may give different results). - nan results may be different across targets since they are not fully specified by ieee754. there are other target variations such as when underflow is raised. so even on ieee754 targets, exactly matching behaviour is not always possible. - fenv status may not be the same across targets (or compilers) because musl (and c99) does not care about inexact in many cases (so e.g. const folding of (int)1.2 may change it) and because some compilers don't support fenv access (e.g. clang, although the new code tries to fix that up as much as possible). - compiler can introduce different results in user code anyway: libc only controls the libc cflags, not what's used for user code and e.g. gcc can const-fold math calls with correct rounding if it knows the inputs. - not all targets have fma instruction, but fma usage can give significantly better performance on some targets and it's not nice to make things slower because of other targets. emulating fma is not realistic on non-fma targets (slow), but all modern cpus that are used for heavy numeric computation have fma so within those targets the results can be made portable without compromising prefromance. (it requires some changes to the new code not to rely on contraction for optimal fma usage which is non-deterministic across compilers) - int rounding instructions can also make a significant difference in performance or quality if non-nearest rounding modes need to be supported (e.g. i currently don't have a satisfactory solution for >= double precision trigonometric argument reduction on targets that have no rounding mode independent nearest toint instruction). the fdlibm code is broken in non-nearest rounding modes, the fix i have is expensive and breaks useful properties, such fix is not necessary on some targets if we allow different results. - i don't know other instructions that make a big difference across targets, but new ones may come up (e.g. 1/x and 1/sqrt estimate instructions may be useful in some algorithms, or target specific cutoff values depending on faster int vs fp cmp) and then someone may want to provide target specific optimizations and the libc has to evaluate the maintenance cost vs benefits. > Also, what is the state of tests for the code? there are no universal tests for math functions, i still maintain detailed special case checks in libc-test, i ran statistical tests that verify that the calculated ulp error bounds hold over many millions of inputs and the glibc math tests pass (with and without fma). there are a few regressions (e.g. exp2(-1023) raises underflow, pow(4,1.5) raises inexact exception with the new code) and several improvements (mostly precision and non-nearest rounding behaviour).
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.