Date: Tue, 16 Mar 2021 19:54:55 -0400 From: Rich Felker <dalias@...c.org> To: Alexander Monakov <amonakov@...ras.ru> Cc: musl@...ts.openwall.com, Dominic Chen <d.c.ddcc@...il.com> Subject: Re: Issue with fread() and unaligned readv() On Tue, Mar 16, 2021 at 12:30:11PM +0300, Alexander Monakov wrote: > On Mon, 15 Mar 2021, Rich Felker wrote: > > > > Thanks. Can musl reduce the first iov tuple by, say, 8 bytes rather than > > > 1 byte, to avoid forcing the kernel to perform a misaligned copy? > > > > Well then you have to do more copy in userspace afterwards, and reduce > > the effective buffer size by a bit, going back to kernel slightly more > > often or spending extra memory to compensate. > > Of course, but shouldn't you consider how it balances against the > cost to perform a 1K (BUFSIZ) misaligned copy on each read? Do we have an idea what this cost actually is on popular platforms? That would help determine if this direction is useful to consider. > > There's also no strong > > reason to believe one will be aligned and the other won't, except at > > beginning of file. The alignment mod 8 depends on file position and > > access history, and neither the caller's buffer nor the FILE buffer > > have any inherent alignment. > > The alignment of caller's buffer is another matter, I was talking about > misaligned copy into internal FILE buffer (and even then, at least when > user buffer was malloc'ed it will be sufficiently aligned). > > The buffer in FILE obtained from fopen will be aligned to _Alignof(_IO_FILE) > in musl thanks to UNGET being 8. Yes, it's just stdin/out/err that have no inherent alignment. The buffers from fopen will be aligned to at least 4 and perhaps 8 depending on arch ABI and members of FILE. > If the file has been repositioned, yes, bets are off, but I think with stdio > it is quite common to read a file without seeks (could be non-seekable > in the first place). Well it's plenty common to work with network, terminal, pipe, etc. containing non-aligned-record content. But these are probably less interesting for performance. Looking at the source, we could probably, instead of using -!!f->buf_size, setup a negative offset based on len and buf_size, up to 8, that's computed and saved in the first line of the function. If the offset is equal to len, the first iov can be skipped entirely, which tends to make the syscall considerably faster -- optimizing this out was already desirable. Then, the final if (f->buf_size) buf[len-1] = *f->rpos++; could be replaced with a loop up to this offset that collapses out if it's zero. This would avoid introducing significant amounts of new code and might improve things in other cases too. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.