Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 25 Feb 2020 12:27:07 -0800
From: enh <enh@...gle.com>
To: libc-coord@...ts.openwall.com
Subject: Re: Behavior of function nan()

Android and iOS use the FreeBSD code.

looking at Android's tests, they all supply "" to nan(3), which
suggests we've never encountered any issues in practice (or we'd have
added some tests). we do have tests that we can parse arbitrary NaNs
with the scanf family and strto[fd]. that was the result of a bug
report, but the bug report was that we historically couldn't parse
*any* NaNs. interestingly, i note that even the new code seems to just
ignore everything between the optional parentheses after "nan", rather
than passing it to nan(3). since our printf(3) never outputs any
parentheses, that seems legal, if weird.

btw, note that NAN is equivalent to nanf(3), not nan(3).

On Tue, Feb 25, 2020 at 6:50 AM Pascal Cuoq <cuoq@...st-in-soft.com> wrote:
>
> Hello all,
>
> long time lurker, first time poster :)
>
> I hope the matter below is not too trivial, or that at least it is on-topic for the list despite being quite trivial.
>
> The C standard defines a function nan (https://port70.net/~nsz/c/c11/n1570.html#7.12.11.2p1 ):
>
> double nan(const char *tagp);
>
> There exist at least three widespread implementations:
>
> - Musl's implementation ignores its argument (https://git.musl-libc.org/cgit/musl/tree/src/math/nan.c?id=bf9d9dcaa631db9918452d05c188f01c8e5f537f ):
>
> double nan(const char *s)
> {
> return NAN;
> }
>
> - Glibc eventually calls a function defined in a generic way as STRTOD_NAN in this file:
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD
>
> 32 FLOAT
> 33 STRTOD_NAN (const STRING_TYPE *str, STRING_TYPE **endptr, STRING_TYPE endc)
> 34 {
> 35 const STRING_TYPE *cp = str;
> 36
> 37 while ((*cp >= L_('0') && *cp <= L_('9'))
> 38 || (*cp >= L_('A') && *cp <= L_('Z'))
> 39 || (*cp >= L_('a') && *cp <= L_('z'))
> 40 || *cp == L_('_'))
> 41 ++cp;
> 42
> 43 FLOAT retval = NAN;
> 44 if (*cp != endc)
> 45 goto out;
> 46
> 47 /* This is a system-dependent way to specify the bitmask used for
> 48 the NaN. We expect it to be a number which is put in the
> 49 mantissa of the number. */
> 50 STRING_TYPE *endp;
> 51 unsigned long long int mant;
> 52
> 53 mant = STRTOULL (str, &endp, 0);
> 54 if (endp == cp)
> 55 SET_NAN_PAYLOAD (retval, mant);
> 56
> 57 out:
> 58 if (endptr != NULL)
> 59 *endptr = (STRING_TYPE *) cp;
> 60 return retval;
> 61 }
>
> - Whereas FreeBSD uses this implementation:
>
> https://github.com/freebsd/freebsd/blob/260ba0bff18bb32b01216d6870c8273cf22246a7/lib/msun/src/s_nan.c
>
> void
> _scan_nan(uint32_t *words, int num_words, const char *s)
> {
> int si; /* index into s */
> int bitpos; /* index into words (in bits) */
>
> bzero(words, num_words * sizeof(uint32_t));
>
> /* Allow a leading '0x'. (It's expected, but redundant.) */
> if (s[0] == '0' && (s[1] == 'x' || s[1] == 'X'))
> s += 2;
>
> /* Scan forwards in the string, looking for the end of the sequence. */
> for (si = 0; isxdigit(s[si]); si++)
> ;
>
> /* Scan backwards, filling in the bits in words[] as we go. */
> #if _BYTE_ORDER == _LITTLE_ENDIAN
> for (bitpos = 0; bitpos < 32 * num_words; bitpos += 4) {
> #else
> for (bitpos = 32 * num_words - 4; bitpos >= 0; bitpos -= 4) {
> #endif
> if (--si < 0)
> break;
> words[bitpos / 32] |= digittoint(s[si]) << (bitpos % 32);
> }
> }
>
> Even the latter two do not seem at cursory sight to produce the same results for all string inputs, although they will produce the same results for the simple inputs.
>
> Working on what can be described as “a modelization of a libc as a necessary element for verifying C programs”, such variations are a bit annoying for me.
>
> It is normal for the C standard to leave some things underspecified, but the function nan() alone is not very convenient or usable, as the FreeBSD man page notes:
>
> https://www.freebsd.org/cgi/man.cgi?query=nan&apropos=0&sektion=0&manpath=FreeBSD+12.1-RELEASE+and+Ports&arch=default&format=html
>
> COMPATIBILITY
>
>      Calling these functions with a non-empty string isn't portable.  Another
>      operating system may translate the string into a different NaN encoding,
>      and furthermore, the meaning of a given NaN encoding varies across ma-
>      chine architectures.  If you understood the innards of a particular plat-
>      form well enough to know what string to use, then you would have no need
>      for these functions anyway, so don't use them.  Use the NAN macro in-
>      stead.
>
> With a lot of imagination, it's possible to imagine programs that would use the function nan() meaningfully:
>
> double nan1 = nan("1");
> double nan2 = nan("2");
> int r = memcmp(&nan1, &nan2, sizeof double);
>
> The function nan() can also be used by programmers that failed to notice the more convenient macro NAN, with no intent of generating specific NaN representations.
>
> Does anyone remember encountering C programs that made use of the nan() function in such a way that the program behaved differently between musl, Glibc or FreeBSD?
>
> Pascal
>

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.