Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 5 Sep 2022 15:08:59 +0200
From: Szabolcs Nagy <>
To: Paul Zimmermann <>
Cc: Rich Felker <>,
Subject: Re: Re: integration of CORE-MATH routines into Musl?

* Paul Zimmermann <> [2022-09-05 11:09:04 +0200]:
>        Dear Rich,
> > Could you summaraize briefly what you have in mind, and what tradeoffs
> > might be? Are these intended to be drop-in replacements for the
> > existing standard functions, or implementations for the "cr" versions
> > thereof? I have not followed closely the "mandatory requirement of
> > correct rounding for mathematical functions in the next revision of
> > the IEEE-754 standard" topic and how it relates to the future of C,
> > but my vague recollection was that the direction folks were leaning
> > was towards a separate set of cr*() functions or something.
> the current situation is:
> - IEEE 754 does not require correct rounding for mathematical functions
> - indeed, the C2X standard will reserve cr_xxx names for correctly rounded
>   functions
> - thus mathematical libraries will have essentially 3 choices:
> 0) either provide incorrectly rounded functions as (say) exp.
>    This is the current situation.
> 1) provide incorrectly rounded functions as exp, and correctly rounded
>    functions as cr_exp.
> 2) or provide only exp, with correct rounding (then cr_exp could be an alias
>    for exp)
> It seems that LLVM-libc will go for 2), I have no news from other libraries.
> > But if
> > it's possible to do correct rounding in a way that's all-wins
> > (performance, size, quality of results) or nearly all wins (maybe
> > slightly larger?), at least for select functions, that seems very
> > interesting.
> If you look at Table II from, you see that
> for *single* precision functions (binary32), indeed that's all-wins.

when i worked on exp and log i noticed that for single prec it is
easy to do correct rounding with only minor overhead, but it required
either a bit bigger lookup table or a bit bigger polynomial vs going
for < 1 ulp error only.

a slightly bigger lookup table will not be measureable in a micro
beanchmark, but clearly will use more resources. (also measuring
against the system library is misleading: your code is static linked
while the libc is dynamic linked, -fPIC, does not have the same error
handling logic, same cflags for security hardening or whatever.
such differences can be significant when the entire math function is
about 10 cycles.)

so do we want correct rounding for a little bit of overhead?

my idea was that most users of binary32 care more about performance
than precision (if they want precision they can use binary64 with only
a small overhead on most cpus). so not doing cr was deliberate choice.

my answer changes if users start to expect cr version of math functions
in the libc, 2x code is hard to justify for minor benefits, so then
we should only have cr versions for binary32.

there are libm functions where the current code is low quality and
a fast cr implementation is strictly better, those are welcome i think.

the binary64 (and binary128) story is different (there it's more work
to get cr_ functions, in some cases like pow i don't think there is a
low latency solution as the worst case is not known). for functions
where the worst-case is known we can provide cr, if the overhead is not
too big (i think 10% overhead is acceptable in the common case and 10x
overhead for the rare half-way cases, but only if the code/rodata size
does not explode).

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.