```Date: Wed, 30 Jan 2019 13:19:47 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Subject: Re: Possible Mistype in exp.c

* Damian McGuckin <damianm@....com.au> [2019-01-30 22:14:01 +1100]:
> On Wed, 30 Jan 2019, Szabolcs Nagy wrote:
> > there is not much published about the algorithms (although there aren't
> > many new ideas in the code,
>
> Sadly, a lot of the existing ideas are not documented well. Even the texts
> on this topic seem to be lacking all the information to actually implement a
> fast low-level routine. They cover fundamentals well and important topics
> like error analysis but omit the basic techniques and little details and
> basic techniques that are needed to make it fast and still achieve that
> 1*ULP (or better) error bound.

i found "Jean-Michel Muller - Elementary Functions" useful for the math.

for < 1 ulp the main trick is computing the result as x+y where x is
exact and y is inexact but much smaller than x in magnitude. (this is
the only way to get worst case error < 1 ulp with an fp operation that
does a rounding and at least one of the inputs is inexact.)

the bigger the distance between x and y in magnitude, the closer you
get to 0.5 ulp (rounding error of the final add), and to get exact x
you sometimes need exact add and mul algorithms.

> > it's mostly about optimizing for modern cpus), but i plan to at least
> > put the polynomial and lookup table generation code in that repo.
>
> Thanks. For some reason, I have this aversion to table lookup because
> suddenly there are lots more numbers hanging about. But, it certainly
> helps to achieve optimal performance. I had a look at the GLIBC exp.c
> which comes out of the Ultimate Libray done by IBM and I find that
> inpossible to read/comprehend especially with its table lookups.

table lookup is needed

1) to make the approximation efficient:

the main reason why fdlibm is slow is that it prefers using div or
large polynomials to table lookups. (the other reason is that it
uses a long chain of dependency between operations, while modern
cpus can evaluate many operations in parallel if there are no
dependencies between them)

2) and to get close to 0.5 ulp errors:

usually there is a polynomial evaluation like c0 + c1*x + ... where
c1*x must be much smaller than c0 to get close to 0.5 ulp error,
which requires reducing x into a very small range around 0, which
usually requires a table lookup (otherwise several terms in the
polynomial would need to be evaluated with multi-precision tricks
which is a lot of operations).

> > the polynomial coefficients were generated with the sollya tool from
> > inria.
>
> Yes, nice tool.  What range reduction is used for those routines and why are
> there several polynomials?  Just curious.

the comment in the code says

/* exp(x) = 2^(k/N) * exp(r), with exp(r) in [2^(-1/2N),2^(1/2N)].  */
/* x = ln2/N*k + r, with int k and r in [-ln2/2N, ln2/2N].  */

so k = toint(N/ln2*x); r = x - ln2/N*k; to compute 2^(k/N) a lookup
table is used (N is the table size), and exp(r) is a polynomial.

the different polynomials were computed for different pecision and table
size requirements and because some targets lack efficient toint in
non-nearest rounding mode so the reduction interval is larger.
in the end the same polynomial is used in glibc, bionic, newlib and
musl (where the algorithms were upstreamed) and the polynomial
error is small enough in the wider intervall so the non-nearest toint
never gives more than 1 ulp error in non-nearest rounding mode.
```

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.