Date: Mon, 5 Sep 2022 15:08:59 +0200 From: Szabolcs Nagy <nsz@...t70.net> To: Paul Zimmermann <Paul.Zimmermann@...ia.fr> Cc: Rich Felker <dalias@...c.org>, musl@...ts.openwall.com Subject: Re: Re: integration of CORE-MATH routines into Musl? * Paul Zimmermann <Paul.Zimmermann@...ia.fr> [2022-09-05 11:09:04 +0200]: > Dear Rich, > > > Could you summaraize briefly what you have in mind, and what tradeoffs > > might be? Are these intended to be drop-in replacements for the > > existing standard functions, or implementations for the "cr" versions > > thereof? I have not followed closely the "mandatory requirement of > > correct rounding for mathematical functions in the next revision of > > the IEEE-754 standard" topic and how it relates to the future of C, > > but my vague recollection was that the direction folks were leaning > > was towards a separate set of cr*() functions or something. > > the current situation is: > > - IEEE 754 does not require correct rounding for mathematical functions > > - indeed, the C2X standard will reserve cr_xxx names for correctly rounded > functions > > - thus mathematical libraries will have essentially 3 choices: > > 0) either provide incorrectly rounded functions as (say) exp. > This is the current situation. > > 1) provide incorrectly rounded functions as exp, and correctly rounded > functions as cr_exp. > > 2) or provide only exp, with correct rounding (then cr_exp could be an alias > for exp) > > It seems that LLVM-libc will go for 2), I have no news from other libraries. > > > But if > > it's possible to do correct rounding in a way that's all-wins > > (performance, size, quality of results) or nearly all wins (maybe > > slightly larger?), at least for select functions, that seems very > > interesting. > > If you look at Table II from https://hal.inria.fr/hal-03721525, you see that > for *single* precision functions (binary32), indeed that's all-wins. when i worked on exp and log i noticed that for single prec it is easy to do correct rounding with only minor overhead, but it required either a bit bigger lookup table or a bit bigger polynomial vs going for < 1 ulp error only. a slightly bigger lookup table will not be measureable in a micro beanchmark, but clearly will use more resources. (also measuring against the system library is misleading: your code is static linked while the libc is dynamic linked, -fPIC, does not have the same error handling logic, same cflags for security hardening or whatever. such differences can be significant when the entire math function is about 10 cycles.) so do we want correct rounding for a little bit of overhead? my idea was that most users of binary32 care more about performance than precision (if they want precision they can use binary64 with only a small overhead on most cpus). so not doing cr was deliberate choice. my answer changes if users start to expect cr version of math functions in the libc, 2x code is hard to justify for minor benefits, so then we should only have cr versions for binary32. there are libm functions where the current code is low quality and a fast cr implementation is strictly better, those are welcome i think. the binary64 (and binary128) story is different (there it's more work to get cr_ functions, in some cases like pow i don't think there is a low latency solution as the worst case is not known). for functions where the worst-case is known we can provide cr, if the overhead is not too big (i think 10% overhead is acceptable in the common case and 10x overhead for the rare half-way cases, but only if the code/rodata size does not explode).
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.