Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 12 Oct 2020 11:02:16 -0400
From: Rich Felker <dalias@...c.org>
To: Jouni Roivas <jouni.roivas@...era.com>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] Skip writing first iovec if it's empty

On Mon, Oct 12, 2020 at 03:59:01PM +0300, Jouni Roivas wrote:
> In case the first iovec is empty, skip writing it. Usually writing
> zero length iovec is no-op, but in case of certain special cases this
> causes the write to fail.
> 
> This affects at least cgroups under sysfs, since it doesn't properly
> support writev with multiple iovec. For those kernel tends to handle
> them as simple as possible, passing each iovec separately. In case
> of zero length write into cgroups file causes kernel to return error.
> Thus if writing the first iovec fails for being zero length, it
> causes the whole write to fail even if writing the second iovec would
> succeed. This happens for example when doing unbuffered write with
> musl to a file under cgroups. Fix the issue here, since if kernel
> gets fixed for this specific case, it still doesn't get fixed for
> older kernels, nor any other possible similar case.
> ---
>  src/stdio/__stdio_write.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/src/stdio/__stdio_write.c b/src/stdio/__stdio_write.c
> index d2d89475..eedce03a 100644
> --- a/src/stdio/__stdio_write.c
> +++ b/src/stdio/__stdio_write.c
> @@ -11,6 +11,10 @@ size_t __stdio_write(FILE *f, const unsigned char *buf, size_t len)
>  	size_t rem = iov[0].iov_len + iov[1].iov_len;
>  	int iovcnt = 2;
>  	ssize_t cnt;
> +	if (iov[0].iov_len == 0) {
> +		iov++;
> +		iovcnt--;
> +	}
>  	for (;;) {
>  		cnt = syscall(SYS_writev, f->fd, iov, iovcnt);
>  		if (cnt == rem) {
> -- 
> 2.25.1

I don't think this is sufficient to make writing procfs/sysfs files
via stdio work reliably; their interface is fundamentally incompatible
with flexibility of implementation in buffering. But it might be
preferable to do this still for other reasons. If we go forward with
this I think the commit message needs major rework so as not to be
implying that it makes the procfs/sysfs thing into a supported usage
case.

Note that commit e7eeeb9f2a4a358fb0bbed81e145ef5538ff60f0 did the
analog for __stdio_read in a way that's probably slightly better, but
that would need adaptation to work for the write case.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.