Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 15 Mar 2021 18:29:16 -0400
From: Rich Felker <dalias@...c.org>
To: Alexander Monakov <amonakov@...ras.ru>
Cc: musl@...ts.openwall.com, Dominic Chen <d.c.ddcc@...il.com>
Subject: Re: Issue with fread() and unaligned readv()

On Tue, Mar 16, 2021 at 01:09:16AM +0300, Alexander Monakov wrote:
> On Mon, 15 Mar 2021, Rich Felker wrote:
> 
> > On Mon, Mar 15, 2021 at 05:39:43PM -0400, Dominic Chen wrote:
> > > Not sure this counts as a problem in musl or the application, but
> > > I've been debugging a return error of EINVAL from `fread(&buf, 8,
> > > 16, f)`, where `f = fopen("/proc/self/pagemap", "r")`. Internally,
> > > musl converts this into a call to `readv(f->fd, iov, 2)`, where `iov
> > > = {{iov_base = buf, iov_len = 127}, {iov_base = f->buf, iov_len =
> > > 1024}}`. However, it turns out that the kernel VFS read
> > > implementation inside `pagemap_read` checks that both the file
> > > position and count are divisible by PM_ENTRY_BYTES (8 on x86_64),
> > > otherwise it rejects the read with EINVAL. In comparison, glibc's
> > > `_IO_file_xsgetn` does appear to try to maintain read alignment,
> > > although I haven't looked at it in detail.
> > 
> > You can't use stdio to read or write special files/devices that depend
> > on the reads or writes happening in particular units, because the
> > relationship between stdio operations and the underlying
> > buffer-fill/flush operations on the underlying fd is unspecified. It's
> > really unfortunate that the kernel lies that procfs files are regular
> > files but doesn't give them regular-file semantics, but you really
> > need to use direct operations on the fd in the units the interface
> > requires, rather than stdio, to work with these files.
> 
> Where does iov_len = 127 for the first iov tuple come from, though?
> >From fread arguments I'd expect 8 * 16 = 128.
> 
> If musl always does such off-by-one, it is an efficiency issue (forces
> a copy with mismatching source/dest alignment).

It's necessary to work around a kernel bug, whereby the kernel fails
to honor the requirement that a readv of total length n behave
identically, except for where the data is stored, as a single read of
length n. For vfs backends that don't implement a proper readv
operation, the kernel executes readv as a sequence of reads. When this
happens, if the amount of data to read is exactly the length of the
first iov (the length requested by the application), continuing to the
second iov with no more data available will cause the operation to
block indefinitely until more data is available. By reducing the
length of the first iov (the caller's buffer) by 1, we ensure that at
least 1 byte of the second iov (the FILE's buffer) is actually needed
to satisfy the caller, and thus that the call will return without
blocking as soon as everything the caller requested is available.

This exact situation arises all the time with one very common type of
file: tty devices. :(

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.