Date: Wed, 09 Jan 2013 12:02:29 +0100 From: John Spencer <maillist-musl@...fooze.de> To: musl@...ts.openwall.com CC: Luca Barbato <lu_zero@...too.org>, basile@...nsource.dyc.edu Subject: NULL glibc defines NULL as __null: a magic variable supplied by GCC and compatibles which always has pointer context. musl defines NULL to 0 in C++. this is correct per the standard, but breaks a lot of software on 64bit archs, because it promotes to int. for example there are 140 variadic functions in glib/gtk, most of them depend on a so-called sentinel value (nullptr end marker). examples: http://developer.gnome.org/gobject/2.35/gobject-The-Base-Object-Type.html#g-object-new http://developer.gnome.org/gobject/2.35/gobject-The-Base-Object-Type.html#g-object-get any C++ program that uses those variadic functions and uses NULL as the sentinel, will invoke UB and consequently segfault on musl, with a very hard-to-debug backtrace (you cannot inspect variadic arguments in gdb (unless you know exactly what kind of asm gcc creates and how to read it), but exactly those arguments will end up being bogus). the first such bug i encountered (in audacity) costed me several hours of debugging distributed over 2 days. note that most example code involving these variadic functions is written in C, and happily uses NULL, as the problems surrounding NULL are not exactly well-known. http://gustedt.wordpress.com/2010/11/07/dont-use-null/ so the C++ guys basically just copypaste existing C code snippets (which are theoretically broken, but work in any real world implementation), and since it works in their environment they assume the code is correct. after having fixed audacity, i tackled the segfaults encountered in evince. here not even evince itself is broken, but also poppler, on which it depends. i've fixed the bugs in evince since, but still need to tackle the UB in poppler. frankly speaking i'm not very willing to go into the trouble of debugging any C++ gtk application, and reporting the bugs i found. when i discussed this with evince developers, they were asking all the time "why does it work for me ? why didn't it break for anyone ?" i told them that in C++ NULL expands to 0 and it worked because of luck. after discussing with them for about half an hour, i got them to the point where they agreed to merge a patch if i come up with one. had i told them that i use another libc which does things differently regarding NULL, i am not sure if i could have convinced them to change their code. at this point the issue in evince is not fixed, i still need to create a patch based on their current git master, subscribe to their ML, send them the patch and hope that they haven't changed their mind. so for me, there are 3 options how to deal with issue in the future: 1) go into the trouble of debugging all future C++ apps i will port, potentially spending hours to fix each of them, and then even more hours discussing this topic on their mailing lists. since sabotage will very likely only support a small subset of these programs, there will remain a lot of others that stay broken. the gentoo guys or whoever is going to port them will have a *very* hard time figuring out why it crashes for them on musl. this approach is in my eyes insane, as it will need a huge investment of time and nerves, and the number of broken C++ apps is theoretically infinite. (as a side note: when i googled for g_signal_emit, i found multiple bug reports to multiple projects, of which the backtraces showed exactly the kind of varargs UB that i encountered. all of them were not fixed, but merely worked around. apparently the developers of these projects didnt find the real cause of their bugs. this is not unlikely, as it is very hard to find out whats going on. https://bugs.launchpad.net/compiz/+bug/932382 https://trac.transmissionbt.com/ticket/1135 ... ) 2) change musl so it is compatible with those apps. this would mean: #if defined(__GNUC__) && defined(__cplusplus__) #define NULL __null #elif defined (__cplusplus__) #define NULL 0 #else #define NULL (void *) 0 /* for C code */ #end this change is the easiest solution: any problem will be magically fixed. 3) create some kind of code analysis tool that will scan C++ code for usage of NULL that emits warnings for each occurence of NULL used in a variadic function call. the results could then be used to created automated patches and used in distros based on musl and sent to the corresponding package maintainers. i suspect it is possible to convince the authors of the "cppcheck" program to add such a check, as i alreaded contacted them in the past and they seemed very cooperative. this could be tackled as a side-project even if we decide for option 2. i'm welcoming any comments on the issue. i'm especially interested what the gentoo developers think.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux - Powered by OpenVZ