musl - Re: stdio glitch & questions

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181130160951.GS23599@brightrain.aerifal.cx>
Date: Fri, 30 Nov 2018 11:09:51 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: stdio glitch & questions

On Fri, Nov 30, 2018 at 09:51:39PM +1100, Xan Phung wrote:
> Hi,
> 
> A few questions about stdio:
> 
> (1) I notice __toread.c uses angular quotes for <stdio_impl.h> whereas all
> other source files use "stdio_impl.h".  I assume the latter is correct and
> __toread.c's use of angular quotes was a glitch & it should really be
> double quotes... is that correct?

Yes, this doesn't make any difference but it's a style mistake.

> (2) I notice vfprintf first tries to call printf_core with f=0 (line 667)
> then calls printf_core again with f set to the actual file to receive
> output (line 682).  Why is printf_core called twice?  I struggle to
> understand the purpose of the first call with f=0.

To understand this you need to look inside printf_core. When called
with !f, it attempts to collect the %N$-form arguments if they're
used, or bails out early if it detects that normal % arguments are
used. Two passes are needed here because random access to a va_list is
not possible.

> (3) When I do a step thru the __fwritex function to understand how printf
> works, I notice the resulting writev system calls pass on the output data
> as a two element iovec array, with the 1st element comprising all line
> buffered text up to & including the last variable data item, and then the
> 2nd element comprising the residual format string trailing the last
> variable data item (more often than not just a single '\n').
> 
> For example, printf("error: %s\n", msg) would put all text up to &
> including %s text in first iovec and the second iovec contains only '\n'.
> I understand the rationale of this is to avoid copying the final '\n' to
> the buffer at f->wpos.  (There is actually guaranteed space in the buffer
> itself due to a check at line 10 of fwrite.c).  The use the array of 2x
> iovec's presumably then relies on Linux kernel scatter-gather I/O to then
> optimally handle the iovec array, ie: that the writev() of 2x iovec is more
> efficient than avoiding the copy of a few additional bytes (often a single
> '\n' byte) into f->wpos, and then using a single write() syscall.

Indeed, in the case where the new data is very short, it's almost
certainly faster to just copy it to the buffer and perform a single
write syscall. Likewise, for reading a single character it's almost
surely faster to perform a single read syscall then pull it out of the
buffer.

However, conversely, it's possible to see a call to the stdio write
backend (f->write) where the new data is too large to fit in the
buffer. In this case, a writev syscall is almost certainly faster
(fewer trips back and forth between user and kernel space, which are
the dominant cost), and moving data into the buffer is not helpful
because it can't reduce the number of syscalls. Prior to commit
e3cd6c5c265cd481db6e0c5b529855d99f0bda30, fwrite contained heuristic
logic for individual cases, but it couldn't necessarily be optimal
under all usage patterns. After the change, the number of syscalls is
always minimized.

> Isn't this a big assumption?  With Linux itself, can we really know that
> Linux device drivers are smart enough to do writev() optimally?  Also,
> there is a lot of interest in porting musl to non-Linux os's, many of which
> do not have writev().  (I am porting musl to WebAssembly and to Plan 9).
> 
> I can prepare a patch of a version using write() instead of writev() if
> there is interest in this...

You can emulate readv and writev using the property that short reads
and writes are permissible, copying data through a fixed-size
intermediate buffer on the stack. This is of course suboptimal but
easy to do.

Emulation of readv is really sensitive, because breaking it up into
multiple reads can cause inapprpriate blocking. Linux actually has a
bug where this can happen anyway -- see commit
2cff36a84f268c09f4c9dc5a1340652c8e298dc0 -- so musl's __stdio_read
already reads the last character of the first (caller-requested) part
through the buffer, and collapses the readv to a read if this makes
the first iov empty.

It would probably be welcome to make __stdio_write make use of
SYS_write when it would be expected to be faster (len very small), but
I'm not sure what the exact cutoff should be. Switching away from
writev/readv would not be a welcome change though; use of them is very
intentional and it's how musl avoids some pathological slowness under
certain stdio usage patterns.

If you're porting to a system that lacks the underlying syscalls, I
think it probably makes sense to emulate them at the syscall() level
using a strategy like I described above. It's necessary for making the
public readv()/writev() functions work anyway.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.