Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 11 Jun 2017 14:57:59 -0600
From: Benjamin Slade <slade@...nam.net>
To: Joakim Sindholt <opensource@...sha.com>
Cc: musl@...ts.openwall.com
Subject: Re: ENOSYS/EOPNOTSUPP fallback?

Thank you for the extensive reply.

Just to be clear: I'm just an end-user of flatpak, &c. As far as I can
tell, flatpak is making use of `ostree` which assumes that the libc will
take care of handling `dd` fallback (I got the impression that flatpak
isn't directly calling `fallocate` itself).

Do you think there's an obvious avenue for following up on this?
Admittedly this is an edge-case that won't necessarily affect musl users
on ext4, but it will affect musl users on zfs (and I believe
f2fs). Do you think `ostree` shouldn't rely on the libc for fallback? Or
should ZFS on Linux implement a fallback for fallocate?

--
Benjamin Slade
  `(pgp_fp: ,(21BA 2AE1 28F6 DF36 110A 0E9C A320 BBE8 2B52 EE19))
    '(sent by mu4e on Emacs running under GNU/Linux . https://gnu.org )
       '(Choose Linux, Choose Freedom . https://linux.com )


On 2017-06-05T06:46:33-0600, Joakim Sindholt <opensource@...sha.com> wrote:

 > On Sun, Jun 04, 2017 at 09:22:27PM -0600, Benjamin Slade wrote:
 > > I ran into what is perhaps a weird edge case. I'm running a system with
 > > musl that uses a ZFS root fs. When I was trying to install some
 > > flatpaks, I got an `fallocate` failure, with no `dd` fallback. Querying
 > > the flatpak team, the fallback to `dd` seems to be something which glibc
 > > does (and so the other components assume will be taken care).
 > >
 > > Here is the exchange regarding this issue:
 > > https://github.com/flatpak/flatpak/issues/802

 > To quote the glibc source file linked in the bug:

 >   /* Minimize data transfer for network file systems, by issuing
 >      single-byte write requests spaced by the file system block size.
 >      (Most local file systems have fallocate support, so this fallback
 >      code is not used there.)  */

 >   /* NFS clients do not propagate the block size of the underlying
 >      storage and may report a much larger value which would still
 >      leave holes after the loop below, so we cap the increment at
 >      4096.  */

 >   /* Write a null byte to every block.  This is racy; we currently
 >      lack a better option.  Compare-and-swap against a file mapping
 >      might address local races, but requires interposition of a signal
 >      handler to catch SIGBUS.  */

 > Which leaves 2 massive bugs:
 > 1) the leaving of unallocated gaps both because of the NFS thing but
 > also because other file systems may work on entirely different
 > principles that are not accounted for here and
 > 2) overwriting data currently being written to the file as it's being
 > forcibly allocated (which might be doing nothing, think deduplication).

 > This is not a viable general solution and furthermore fallocate is
 > mostly just an optimization hint. If it's a hard requirement of your
 > software I would suggest implementing it in your file system. These
 > operations can only be safely implemented in the kernel.

 > An example:

 > MyFS uses write time deduplication on unused blocks (and blocks with all
 > zeroes fall under the umbrella of unused). Glibc starts its dance where
 > it writes a zero byte to the beginning of each block it perceives and
 > for now let's just say it has the right block size. MyFS just trashes
 > these writes immediately without touching the disk and updates the size
 > metadata which gets lazily written at some point. There's only 400k left
 > on the disk and your fallocate of 16G will succeed and run exceptionally
 > fast to boot, but it will have allocated nothing and your next write
 > fails with ENOSPC.

 > Another example:

 > myutil has 2 threads running. One thread is constantly writing things to
 > a file. The other thread sometimes writes large chunks of data to the
 > file and so it hints the kernel to allocate these large chunks by
 > calling fallocate, and only then taking the lock(s) held internally to
 > synchronize the threads. The first thread finds it needs to update
 > something in the section currently being fallocated by glibc's
 > algorithm. Suddenly zero bytes appear at 4k intervals for no discernible
 > reason, overwriting the data.


 > Personally I would look into seeing to it that flatpak only uses
 > fallocate as an optimization. The most reliable thing I can think of
 > otherwise would be to do the locking necessary (if any) in the program
 > and filling the entire target section of the file with data from
 > /dev/urandom, but even that may fail spectacularly with transparent
 > compression (albeit unlikely).

 > Hope this was at least somewhat helpful.



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.