Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMSMCxkQHuDRp6g-ByYrT3vrvzP-HBBoG4bHOWTnqA4j7037xg@mail.gmail.com>
Date: Thu, 10 Jul 2025 10:01:50 -0700
From: Nathan McSween <nwmcsween@...il.com>
To: musl@...ts.openwall.com
Subject: Re: unlink on NFS volume fails silently

https://github.com/Azure/AKS/issues/1325#issuecomment-713372369, does the
behavior happen with coreutils?

On Thu, Jul 10, 2025, 8:44 AM Rich Felker <dalias@...c.org> wrote:

> On Thu, Jul 10, 2025 at 02:58:30PM +1000, Stephen Von Takach wrote:
> > Yeah I see your point and this was closed as a kernel issue:
> > https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960
>
> OK, is your issue unlink falsely succeeding, or readdir skipping
> entries? The latter is a known bug in the kernel NFS client. One of my
> comments on the tracker suggests:
>
>   "The nordirplus option mentioned in one of those tracker threads
>   might be a workaround."
>
> I'm not sure if this is the case, but it might be worth trying.
>
> Note that it's *expected* that an already-in-progress iteration of a
> directory may return entries that were already deleted. The
> unacceptable thing is the opposite: when it skips some entries that
> have not been deleted as a consequence of other things being deleted.
>
> > We're running these two containers on the same kernel and seeing the same
> > behaviour as that alpine issue.
> > Happy to continue working around the issue by using debian userspace to
> > build our service.
> >
> > It does seems crazy that there is clearly an issue, possibly a kernel
> issue
> > that is being handwaved away by all parties
>
> It's not "handwaved away" by us. We have determined that there is a
> bug in a component we have no control over, and for which we have no
> sound means of working around.
>
> I'm happy to work together on tracking down the cause to get it fixed,
> but that requires cooperation from someone who's able to reproduce it,
> documenting the exact circumstances under which it occurs (NFS server
> vendor/version, NFS mount options) and either producing a minimal test
> program to reproduce the issue under those conditions, or being
> willing to run a proposed test by someone else.
>
> Even if using Debian/glibc *seems* to make things work for you, I
> think it would be beneficial for you to try to get to the root cause
> of the problem and get it fixed. What we previously found on the
> above-linked ticket was that glibc is not doing anything special that
> should rule out that bug, only that the particular filename
> sizes/counts in the test didn't trigger the bug with glibc.
>
> Again, I don't know if this is the same bug you're hitting (this is
> the first time in the thread you've mentioned readdir if I'm not
> mistaken, as opposed to just unlink) or if there's a second bug in
> play here. If you could at least clarify that, it would be a big help
> to anyone investigating it in the future.
>
> Rich
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.