Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 25 Feb 2020 14:50:42 +0000
From: Pascal Cuoq <cuoq@...st-in-soft.com>
To: "libc-coord@...ts.openwall.com" <libc-coord@...ts.openwall.com>
Subject: Behavior of function nan()

Hello all,

long time lurker, first time poster :)

I hope the matter below is not too trivial, or that at least it is on-topic for the list despite being quite trivial.

The C standard defines a function nan (https://port70.net/~nsz/c/c11/n1570.html#7.12.11.2p1 ):

double nan(const char *tagp);

There exist at least three widespread implementations:

- Musl's implementation ignores its argument (https://git.musl-libc.org/cgit/musl/tree/src/math/nan.c?id=bf9d9dcaa631db9918452d05c188f01c8e5f537f ):

double nan(const char *s)
{
return NAN;
}

- Glibc eventually calls a function defined in a generic way as STRTOD_NAN in this file:

https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD

32<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l32> FLOAT
33<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l33> STRTOD_NAN (const STRING_TYPE *str, STRING_TYPE **endptr, STRING_TYPE endc)
34<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l34> {
35<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l35> const STRING_TYPE *cp = str;
36<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l36>
37<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l37> while ((*cp >= L_('0') && *cp <= L_('9'))
38<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l38> || (*cp >= L_('A') && *cp <= L_('Z'))
39<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l39> || (*cp >= L_('a') && *cp <= L_('z'))
40<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l40> || *cp == L_('_'))
41<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l41> ++cp;
42<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l42>
43<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l43> FLOAT retval = NAN;
44<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l44> if (*cp != endc)
45<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l45> goto out;
46<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l46>
47<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l47> /* This is a system-dependent way to specify the bitmask used for
48<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l48> the NaN. We expect it to be a number which is put in the
49<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l49> mantissa of the number. */
50<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l50> STRING_TYPE *endp;
51<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l51> unsigned long long int mant;
52<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l52>
53<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l53> mant = STRTOULL (str, &endp, 0);
54<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l54> if (endp == cp)
55<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l55> SET_NAN_PAYLOAD (retval, mant);
56<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l56>
57<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l57> out:
58<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l58> if (endptr != NULL)
59<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l59> *endptr = (STRING_TYPE *) cp;
60<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l60> return retval;
61<https://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/strtod_nan_main.c;h=db3f05395088d1a0c8d0fe74f71bff00b823eb3d;hb=HEAD#l61> }

- Whereas FreeBSD uses this implementation:

https://github.com/freebsd/freebsd/blob/260ba0bff18bb32b01216d6870c8273cf22246a7/lib/msun/src/s_nan.c

void
_scan_nan(uint32_t *words, int num_words, const char *s)
{
int si; /* index into s */
int bitpos; /* index into words (in bits) */

bzero(words, num_words * sizeof(uint32_t));

/* Allow a leading '0x'. (It's expected, but redundant.) */
if (s[0] == '0' && (s[1] == 'x' || s[1] == 'X'))
s += 2;

/* Scan forwards in the string, looking for the end of the sequence. */
for (si = 0; isxdigit(s[si]); si++)
;

/* Scan backwards, filling in the bits in words[] as we go. */
#if _BYTE_ORDER == _LITTLE_ENDIAN
for (bitpos = 0; bitpos < 32 * num_words; bitpos += 4) {
#else
for (bitpos = 32 * num_words - 4; bitpos >= 0; bitpos -= 4) {
#endif
if (--si < 0)
break;
words[bitpos / 32] |= digittoint(s[si]) << (bitpos % 32);
}
}

Even the latter two do not seem at cursory sight to produce the same results for all string inputs, although they will produce the same results for the simple inputs.

Working on what can be described as “a modelization of a libc as a necessary element for verifying C programs”, such variations are a bit annoying for me.

It is normal for the C standard to leave some things underspecified, but the function nan() alone is not very convenient or usable, as the FreeBSD man page notes:

https://www.freebsd.org/cgi/man.cgi?query=nan&apropos=0&sektion=0&manpath=FreeBSD+12.1-RELEASE+and+Ports&arch=default&format=html

COMPATIBILITY

     Calling these functions with a non-empty string isn't portable.  Another
     operating system may translate the string into a different NaN encoding,
     and furthermore, the meaning of a given NaN encoding varies across ma-
     chine architectures.  If you understood the innards of a particular plat-
     form well enough to know what string to use, then you would have no need
     for these functions anyway, so don't use them.  Use the NAN macro in-
     stead.

With a lot of imagination, it's possible to imagine programs that would use the function nan() meaningfully:

double nan1 = nan("1");
double nan2 = nan("2");
int r = memcmp(&nan1, &nan2, sizeof double);

The function nan() can also be used by programmers that failed to notice the more convenient macro NAN, with no intent of generating specific NaN representations.

Does anyone remember encountering C programs that made use of the nan() function in such a way that the program behaved differently between musl, Glibc or FreeBSD?

Pascal


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.