musl - Re: Re: Potential bug in musl in nftw implementation

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b1db88c8-0ff3-470a-88f5-544049bd5472@lenardszolnoki.com>
Date: Thu, 4 Dec 2025 09:30:49 +0000
From: Lénárd Szolnoki <cpp@...ardszolnoki.com>
To: musl@...ts.openwall.com, Neeraj Sharma <neerajsharma.live@...il.com>,
 Rich Felker <dalias@...c.org>
Subject: Re: Re: Potential bug in musl in nftw implementation



On 03/12/2025 14:55, Neeraj Sharma wrote:
> On Wed, Dec 3, 2025 at 7:08 PM Rich Felker <dalias@...c.org> wrote:
>>
>> Are you saying you'd deem it a better behavior to return with an error
>> when it can't traverse within the fd limit, instead of skipping
>> descent past what it can do within the limit?
>>
>> That's probably more correct. My understanding is that historically,
>> the limit was a depth limit, but POSIX repurposed it into a "fd limit"
>> with an expectation that implementations somehow do the traversal with
>> fewer fds than the depth if the limit is exceeded (or error out?).
>> Trying to actually do it seems error-prone (requires unbounded working
>> space that grows with size of a single directory, vs growth only with
>> depth for fds, and involves caching potentially-stale data) but maybe
>> just erroring so the caller knows to use a larger limit would be the
>> right thing to do here...?
> 
> I would suggest aligning with common understanding across nix in this
> case. This was the main reason for my confusion in the beginning.
> Silently skipping or erroring out both seems unaligned with common
> understanding as in [1], [2], [3]. The documentation in linux [1] is
> more explicit about the functionality than IEEE [2] or opengroup [3]
> in this case.
> 
> Quotes from [1] or linux man page ftw(3).
> 
> "To avoid using up all of the calling process's file descriptors,
> nopenfd specifies the maximum number of directories that ftw() will
> hold open simultaneously. When the search depth exceeds this, ftw()
> will become slower because directories have to be closed and reopened.
> ftw() uses at most one file descriptor for each level in the directory
> tree."
> 
> "The function nftw() is the same as ftw(), except that it has one
> additional argument, flags, and calls fn() with one more argument,
> ftwbuf."

Notably the Linux man page also says this:

"As long as fn() returns 0, ftw() will continue either until it has traversed the entire 
tree, in which case it will return zero, or until it encounters an error (such as a 
malloc(3) failure), in which case it will return -1. "

So allocation failure is on the table, I assume with ENOMEM being reported as the error. 
Allocation error is merely an example here, otherwise the set of errors seem to be rather 
open-ended here. Arguably running out of working space in any way, or just refusing to 
work outside of the fd limit fits into this description.

The posix docs don't seem to admit ENOMEM or otherwise running out of working space being 
a possible error (apart from the callback setting that), which seems like a defect. The 
traversal algorithm inherently needs more than O(1) space, whether that is partially 
allocated in the kernel or not.

Lénárd

> 
> [1] https://linux.die.net/man/3/nftw
> [2] IEEE Std 1003.1-2001/Cor 2-2004, item XSH/TC2/D6/6 Change Number:
> XSH/TC2/D6/64 [XSH ERN 73] - p52,
> https://pubs.opengroup.org/onlinepubs/7899949099/toc.pdf
> [3] https://pubs.opengroup.org/onlinepubs/000095399/functions/nftw.html
> 
> Regards,
> Neeraj
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.