![]() |
|
Message-ID: <20250710154414.GU1827@brightrain.aerifal.cx> Date: Thu, 10 Jul 2025 11:44:15 -0400 From: Rich Felker <dalias@...c.org> To: Stephen Von Takach <steve@...ce.technology> Cc: musl@...ts.openwall.com, Viv Briffa <viv@...ce.technology> Subject: Re: unlink on NFS volume fails silently On Thu, Jul 10, 2025 at 02:58:30PM +1000, Stephen Von Takach wrote: > Yeah I see your point and this was closed as a kernel issue: > https://gitlab.alpinelinux.org/alpine/aports/-/issues/10960 OK, is your issue unlink falsely succeeding, or readdir skipping entries? The latter is a known bug in the kernel NFS client. One of my comments on the tracker suggests: "The nordirplus option mentioned in one of those tracker threads might be a workaround." I'm not sure if this is the case, but it might be worth trying. Note that it's *expected* that an already-in-progress iteration of a directory may return entries that were already deleted. The unacceptable thing is the opposite: when it skips some entries that have not been deleted as a consequence of other things being deleted. > We're running these two containers on the same kernel and seeing the same > behaviour as that alpine issue. > Happy to continue working around the issue by using debian userspace to > build our service. > > It does seems crazy that there is clearly an issue, possibly a kernel issue > that is being handwaved away by all parties It's not "handwaved away" by us. We have determined that there is a bug in a component we have no control over, and for which we have no sound means of working around. I'm happy to work together on tracking down the cause to get it fixed, but that requires cooperation from someone who's able to reproduce it, documenting the exact circumstances under which it occurs (NFS server vendor/version, NFS mount options) and either producing a minimal test program to reproduce the issue under those conditions, or being willing to run a proposed test by someone else. Even if using Debian/glibc *seems* to make things work for you, I think it would be beneficial for you to try to get to the root cause of the problem and get it fixed. What we previously found on the above-linked ticket was that glibc is not doing anything special that should rule out that bug, only that the particular filename sizes/counts in the test didn't trigger the bug with glibc. Again, I don't know if this is the same bug you're hitting (this is the first time in the thread you've mentioned readdir if I'm not mistaken, as opposed to just unlink) or if there's a second bug in play here. If you could at least clarify that, it would be a big help to anyone investigating it in the future. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.