musl stdio review All work presented in here was based off the last effective stdio related commit b64539ae06aa91a407359238f4e909adb9bfab3d Unless otherwise stated, the approach taken in the review was to view each stdio related file in alphabeticalized order and setup test harnesses for each with no pre-conditions or invariants until a unit made explicit reference to it, for instance if foo.c had function foo taking pointer to FILE*, and did not check for FILE* to be NULL, but references to foo did, then those were noted and the harness was updated with that invariant. This approach, while tedious ensures that all references maintain the same invariants throughout. The review results are layed out in a tree like structure where the first most column line is the file and each indentation level within it is a function within that translation unit proceeded with a status that is right justified on the 80th column, of either [nothing], indicating that nothing of merit was discovered, [question] for something I'm unsure about, [style] for tricky statements or expressions that made it difficult to understand what was happening, [bug] for things that I precieved as potential bugs, [comment] for things that should be commented and [optimize] for things that can be optimized. Similarly, if a function has multiple tags like [bug] and [comment] then sections pertaining to each are in separate sections delimited by a new line. Functions that are aliases, or wrappers for existing functions are treated together as two cases with an implied fallthrough much like cases in a switch statement. There is two parts to this review. The initial stdio infastructure that is in src/stdio which encompasses the first part of the review and which is the longest. Then there is the supplemental part for the undocumented named "scan helper" interface and stdio implementation details which can be found in src/internal. ======================================================================== src/stdio ======================================================================== __fclose_ca.c: __fclose_ca: [nothing] __fdopen.c: __fdopen: [nothing] fdopen: [nothing] __fmodeflags.c: __fmodeflags: [nothing] __fopen_rb_ca.c: __fopen_rb_ca: [question] Is it possible for this function to be called with a length less than UNGET? [comment] I was unsure what the ca meant until finding a reference in a deeply nested internal header that it was "caller allocated", maybe emphasis that in the source file to make review easier. __lockfile.c: __lockfile: [nothing] __unlockfile: [nothing] __overflow.c: __overflow: [question] The interface contract of this function is it takes int and returns int, yet the passed in character is truncated through narrowing conversion to unsigned char for the purposes of stdio, should that same narrowed conversion be returned or should the unnarrowed result be returned? __stdio_close.c: __dummy: [nothing] __aio_close: [nothing] __stdio_close: [nothing] __stdio_exit.c: close_file: [question] Function acquires lock on FILE but never unlocks it, this looks intended since we're exiting stdio? If that's the case this should probably be commented. __stdio_exit: [question] __stdio_exit_needed: [question] [comment] Function acquires lock via __ofl_lock but never calls __ofl_unlock, this looks intended since we're exiting stdio? If that's the case this should probably be commented. __stdio_read.c: __stdio_read: [style] Tricky expression involving F_EOF/F_ERR and xor masks. __stdio_seek.c: __stdio_seek: [nothing] __stdio_write.c: __stdio_write: [question] Is it possible for len to be less than f->wpos-f->wbase? __stdout_write.c: __stdout_write: [question] Is it possible for len to be less than f->wpos-f->wbase? __string_read.c: __string_read: [nothing] __toread.c: __toread: [comment] It's sister function __towrite comments its' intent, maybe this should as well for consistency. __towrite.c: __towrite: [style] What is the purpose of parenthesis around F_NOWR? does it imply a missing flag in the list, looks suspicious. __uflow.c: __uflow: [nothing] asprintf.c: asprintf: [nothing] clearerr.c: clearerr: [nothing] clearerr_unlocked: [question] An alias for clearerr but clearerr locks, is this correct? or should clearerr_unlocked just remove EOF/ERR flags, and clearerr hold a lock calling clearerr_unlocked? dprintf.c: dprintf: [nothing] ext.c: __flushlbf: [nothing] __fsetlocking: [nothing] __fwriting: [nothing] __freading: [nothing] __freadable: [nothing] __fwritable: [nothing] __flbf: [nothing] __fbufsize: [nothing] __fpending: [nothing] __fpurge: [bug] fpurge: [bug] The interface contract for this function is that it returns void and not int, whereas fpurge returns int, the latter does not even appear to be part of Linux yet musl aliases them to the same. Should musl's return void? ext2.c: __freadahead: [nothing] __freadptr: [nothing] __freadptrinc: [nothing] __fseterr: [nothing] fclose.c: fclose: [style] Checking for null pointer before calling free for f->getln_buf feof.c: [style] There's two instances of feof involved, one provided by stdio_impl that just checks flags without a lock held and this one which means this file has to #undef. That just seems tricky to me, you could see an feof somewhere that is using the macro definition where others could be using a function (depending on includes and include order.) feof: [nothing] feof_unlocked: [question] _IO_feof_unlocked: [question] Similar to clearerr_unlocked, is this correct? The unlocked variants are actually locked. ferror.c: [style] There's two instances of ferror involved, one provided by stdio_impl that just checks flags without a lock held and this one which means this file has to #undef. That just seems tricky to me, you could see an ferror somewhere that is using the macro definition where others could be using a function (depending on includes and include oder.) ferror: [nothing] ferror_unlocked: [question] _IO_ferror_unlocked: [question] Similar to clearerr_unlocked and feof_unlocked, is this correct? The unlocked variants are actually locked. fflush.c: fflush: [nothing] fgetc.c: fgetc: [nothing] fgetln.c: fgetln: [nothing] fgetpos.c: fgetpos: [bug] using *(off_t*) to write _Int64 data into fpos_t structure when memcpy should be used, this breaks strict aliasing. Maybe add an off_t to the fpos_t union? [comment] POSIX states this function shall fail if the data cannot be stored in the fpos_t structure but the structure is 16-byte in size and this is a narrowing conversion. Maybe just comment that it can never fail. This came up in the test harness several times. fgets.c: fgets: [nothing] fgetwc.c: __fgetwc_unlocked_internal: [question] I don't understand the conditional if (l+1 >= 1), is that not the same as just if (l) since the only time when l+1 >=1 is if l==0, even for overflow. In which case the l += !l part in the body is even more concerning because !l will always be false? __fgetwc_unlocked: [nothing] fgetwc: [nothing] fgetwc_unlocked: [nothing] getwc_unlocked: [nothing] fgetws.c: fgetws: [nothing] __fgetws_unlocked: [question] Similar to clearerr_unlocked, feof_unlocked and ferror_unlocked is this correct? The unlocked variants are actually locked. fileno.c: fileno: [nothing] fileno_unlocked: [question] Similar to clearerr_unlocked, feof_unlocked, ferror_unlocked and __fetws_unlocked is this correct? The unlocked variants are actually locked. flockfile.c: flockfile: [nothing] fmemopen.c: mseek: [style] It does goto upwards. Compound literal table to reference whence as a lookup table as a single expression. The function would be less tricky if just rewritten. mread: [nothing] mwrite: [nothing] fmemopen: [bug] The size check for fmemopen cookie to report ENOMEM is incorrect with the calloc below. [style] The cookie allocation handling can be greatly simplified with a nother structure that embeds everything and using the sizeof that structure instead. fopen.c: fopen: [nothing] fopencookie.c: fopencookie: [bug] Several other functions which utilize __ofl_add do the following if (!libc.threaded) f->lock = -1; However this one does not. fprintf.c: fprintf: [nothing] fput.c: fput: [nothing] fputs.c: fputs: [nothing] fputs_unlocked: [question] See all other instances of this question. fputwc.c: __fputwc_unlocked: [nothing] fputwc: [nothing] fputwc_unlocked: [nothing] putwc_unlocked: [nothing] fputws.c: fputws: [nothing] fputws_unlocked: [question] See all other instances of this question. fread.c: fread: [nothing] fread_unlocked: [question] See all other instances of this question. freopen.c: freopen: [bug] In the case of no filename and the F_SETFL system call failing, and also __dup3 failing, the file is still locked. fscanf.c: fscanf: [nothing] __isoc99_fscanf: [nothing] fseek.c: __fseeko_unlocked: [nothing] __fseeko: [nothing] fseek: [nothing] fseeko: [nothing] fsetpos.c: fsetpos: [bug] off_t is written into here incorrectly and also read incorrectly as this breaks strict aliasing. Maybe consider putting off_t in the fpos_t union and referencing it that way to avoid memcpy? ftell.c: __ftello_unlocked: [style] The whole thing is formatted in a confusing to read manner. __ftello: [nothing] ftell: [nothing] ftello: [nothing] ftrylockfile.c: __do_orphaned_stdio_locks: [nothing] __unlocked_locked_file: [nothing] ftrylockfile: [nothing] funlockfile.c: funlockfile: [nothing] fwide.c: fwide: [nothing] fwprintf.c: fwprintf: [nothing] fwrite.c: fwrite: [question] Should there be a check for size*nmemb overflow? fwrite_unlocked: [question] See all other instances of this question. fwscanf.c: fwscanf: [nothing] __isoc99_fwscanf: [nothing] getc.c: getc: [nothing] _IO_getc: [nothing] getc_unlocked.c: getc_unlocked: [nothing] fgetc_unlocked: [nothing] _IO_getc_unlocked: [nothing] getchar.c: getchar: [nothing] getchar_unlocked.c: getchar_unlocked: [nothing] getdelim.c: [style] The MIN macro is never used in this translation unit. getdelim: [nothing] __getdelim: [nothing] getline.c: getline: [nothing] gets.c: gets: [optimize] The length of the string is calculated twice to strip the newline character off it. Why not rewrite it as: if (ret) { size_t i = strlen(s)-1; if (s[i] == '\n') s[i] = 0; } getw.c: getw: [nothing] getwc.c: getwc: [nothing] getwchar.c: getwchar: [nothing] getwchar_unlocked: [question] This isn't actually unlocked. ofl.c: __ofl_lock: [nothing] __ofl_unlock: [nothing] ofl_add.c: __ofl_add: [nothing] open_memstream.c: ms_seek: [style] It does goto upwards. Compound literal table to reference whence as a lookup table as a single expression. The function would be less tricky if just rewritten. ms_write: [nothing] ms_close: [nothing] open_memstream: [question] Why does this allocate a 1 byte buffer? That seems small. [style] This is also doing the weird arithmetic as well for the cookie This function can be rewritten much nicer. open_wmemstream.c: This has all the same problems as openmem_stream.c pclose.c: pclose: [nothing] perror.c: perror: [nothing] popen.c: popen: [nothing] printf.c: printf: [nothing] putc.c: putc: [nothing] _IO_putc: [nothing] putc_unlocked.c: putc_unlocked: [nothing] fputc_unlocked: [nothing] _IO_putc_unlocked: [nothing] putchar.c putchar: [nothing] putchar_unlocked.c putchar_unlocked: [nothing] puts.c: puts: [nothing] putw.c: putw: [nothing] putwc.c: putwc: [nothing] putwchar.c: putwchar: [nothing] putwchar_unlocked: [question] This isn't actually unlocked. remove.c: remove: [nothing] rename.c: rename: [nothing] rewind.c: rewind: [nothing] scanf.c: scanf: [nothing] setbuf.c: setbuf: [nothing] setbuffer.c: setbuffer: [nothing] setlinebuf.c: setlinebuf: [nothing] setvbuf.c: setvbuf: [bug] No locks are ever held on the file when changing fields of it. snprintf.c: snprintf: [nothing] sprintf.c: sprintf: [nothing] sscanf.c: sscanf: [nothing] stdderr.c: [nothing] stdin.c: [nothing] stdout.c: [nothing] swprintf.c: swprintf: [nothing] swscanf.c: swscanf: [nothing] __isoc99_swscanf: [nothing] tempnam.c: tempnam: [nothing] tmpfile.c: tmpfile: [nothing] tmpnam.c; tmpnam: [nothing] ungetc.c: ungetc: [nothing] ungetwc.c: ungetwc: [style] Ignoring the overly long if statement, the non-ascii route below does memcpy(f->rpos -= l, mbc, l) and that is just odd since I read that as f->rpos - l initially which almost looks correct but failing to update rpos which is when I noticed it actually does -=. vasprintf.c: vasprintf: [nothing] vdprintf.c: wrap_write: [nothing] vdprintf: [nothing] vfprintf.c: out: [nothing] pad: [nothing] fmt_x: [nothing] fmt_o: [nothing] fmt_u: [nothing] fmt_fp: [question] getint: [bug] This function overflows on INT_MIN for the else case. printf_core: [style] if (0) cutting into switch statements specified in a very smart order to make for really tiny code. This whole function is impossible to audit. I tried, I really did. Things like this: *(a=z-(p=1))=arg.i; are really tricky. vfprintf: [nothing] vfscanf.c: store_int: [nothing] arg_n: [nothing] vfscanf: [bug] __isoc99_vfscanf: Calculating width using the same 10*n+*c-'0' trick can overflow on INT_MIN. [style] This function gets really difficult to audit further down. It's just very dense and should be split up somehow. vfwprintf.c: Has all the same problems as vfprintf.c vfwscanf.c: Has all the same problems as vfscanf.c vprintf.c: vprintf: [nothing] vscanf.c: vscanf: [nothing] __isoc99_vscanf: [nothing] vsnprintf.c: sn_write: [nothing] vsnprintf: [nothing] vsprintf.c: vsprintf: [nothing] vsscanf.c: do_read: [nothing] vsscanf: [nothing] __isoc99_vsscanf: [nothing] vswprintf.c: sw_write: [nothing] vswprintf: [style] sizeof(FILE) should be sizeof f compound initializer should be used for file instead of what ever is done here, every other function that creates local FILE struct does it as FILE f = { .member = value, ... }, is this different because of the memset? is the memset even needed? vswscanf.c: wstring_read: [nothing] vswscanf: [nothing] __isoc_vswscanf: [nothing] vwprintf.c: vwprintf: [nothing] wvscanf.c: vwscanf: [nothing] __isoc99_vwscanf: [nothing] wprintf.c: wprintf: [nothing] wscanf.c: wscanf: [nothing] ======================================================================== src/internal ======================================================================== stdio_impl.h: [style] FUNLOCK macro has a trailing else which prompted me to look at every single instance of FUNLOCK to make sure all of them contained a semicolon. This is just dangerous, why not use the more common idiom of do { } while (0). [comment] The _IO_FILE is ABI compat with glibc but this isn't documented any where in the header which makes members like mustbezero_{1,2} really confusion and the layout of members within the structure as well, since the size of a FILE can be made much smaller just by sorting the members which is something someone may want to do. [style] feof/ferror macros which are unlocked, this requires that code inside src/stdio actually #undef them when implementing and anything inside there that requires these _must_ avoid including stdio.h over this to ensure it uses the right ones. I would just call these something else like __feof_ui/__ferror_ui (unlocked internal) then no weird #undef is needed and the intent is clear. shgetc.h: [comment] Maybe document this is the "scan helper" interface because it was not obvious what the naming was. shgetc.c: __shlim: [nothing] __shgetc: [nothing] intscan.h: [style] It isn't apparent for why needs to be included. Should just forward declare struct FILE; here instead. intscan.c: __intscan: [nothing] floatscan.h: [style] It isn't apparent for why needs to be included. Should just forward declare struct FILE; here instead. floatscan.c: scanexp: [nothing] decfloat: [nothing] hexfloat: [nothing] __floatscan: [nothing]