Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 09 Jan 2013 12:02:29 +0100
From: John Spencer <maillist-musl@...fooze.de>
To: musl@...ts.openwall.com
CC: Luca Barbato <lu_zero@...too.org>, basile@...nsource.dyc.edu
Subject: NULL

glibc defines NULL as __null: a magic variable supplied by GCC and 
compatibles which always has pointer context.

musl defines NULL to 0 in C++.
this is correct per the standard, but breaks a lot of software on 64bit 
archs, because it promotes to int.

for example there are 140 variadic functions in glib/gtk, most of them 
depend on a so-called sentinel value (nullptr end marker).
examples:
http://developer.gnome.org/gobject/2.35/gobject-The-Base-Object-Type.html#g-object-new
http://developer.gnome.org/gobject/2.35/gobject-The-Base-Object-Type.html#g-object-get

any C++ program that uses those variadic functions and uses NULL as the 
sentinel, will invoke UB and consequently segfault on musl, with a very 
hard-to-debug backtrace (you cannot inspect variadic arguments in gdb 
(unless you know exactly what kind of asm gcc creates and how to read 
it), but exactly those arguments will end up being bogus).
the first such bug i encountered (in audacity) costed me several hours 
of debugging distributed over 2 days.

note that most example code involving these variadic functions is 
written in C, and happily uses NULL, as the problems surrounding NULL 
are not exactly well-known.
http://gustedt.wordpress.com/2010/11/07/dont-use-null/

so the C++ guys basically just copypaste existing C code snippets (which 
are theoretically broken, but work in any real world implementation), 
and since it works in their environment they assume the code is correct.

after having fixed audacity, i tackled the segfaults encountered in evince.
here not even evince itself is broken, but also poppler, on which it 
depends.
i've fixed the bugs in evince since, but still need to tackle the UB in 
poppler.

frankly speaking i'm not very willing to go into the trouble of 
debugging any C++ gtk application, and reporting the bugs i found.
when i discussed this with evince developers, they were asking all the 
time "why does it work for me ? why didn't it break for anyone ?"
i told them that in C++ NULL expands to 0 and it worked because of luck. 
after discussing with them for about half an hour, i got them to the 
point where they agreed to merge a patch if i come up with one.

had i told them that i use another libc which does things differently 
regarding NULL, i am not sure if i could have convinced them to change 
their code.
at this point the issue in evince is not fixed, i still need to create a 
patch based on their current git master, subscribe to their ML, send 
them the patch and hope that they haven't changed their mind.

so for me, there are 3 options how to deal with issue in the future:

1) go into the trouble of debugging all future C++ apps i will port, 
potentially spending hours to fix each of them, and then even more hours 
discussing this topic on their mailing lists. since sabotage will very 
likely only support a small subset of these programs, there will remain 
a lot of others that stay broken.
the gentoo guys or whoever is going to port them will have a *very* hard 
time figuring out why it crashes for them on musl.
this approach is in my eyes insane, as it will need a huge investment of 
time and nerves, and the number of broken C++ apps is theoretically 
infinite.

(as a side note: when i googled for g_signal_emit, i found multiple bug 
reports to multiple projects, of which the backtraces showed exactly the 
kind of varargs UB that i encountered.
all of them were not fixed, but merely worked around. apparently the 
developers of these projects didnt find the real cause of their bugs.
this is not unlikely, as it is very hard to find out whats going on.
https://bugs.launchpad.net/compiz/+bug/932382
https://trac.transmissionbt.com/ticket/1135 ... )


2) change musl so it is compatible with those apps. this would mean:
#if defined(__GNUC__) && defined(__cplusplus__)
#define NULL __null
#elif defined (__cplusplus__)
#define NULL 0
#else
#define NULL (void *) 0 /* for C code */
#end
this change is the easiest solution: any problem will be magically fixed.

3) create some kind of code analysis tool that will scan C++ code for 
usage of NULL that emits warnings for each occurence of NULL used in a 
variadic function call. the results could then be used to created 
automated patches and used in distros based on musl and sent to the 
corresponding package maintainers.
i suspect it is possible to convince the authors of the "cppcheck" 
program to add such a check, as i alreaded contacted them in the past 
and they seemed very cooperative.
this could be tackled as a side-project even if we decide for option 2.

i'm welcoming any comments on the issue. i'm especially interested what 
the gentoo developers think.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.