Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 18 Aug 2015 19:13:32 -0700
From: Chuck Lever <chuck.lever@...cle.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Re: nfs-utils broken with musl: "select: Bad file descriptor"


On Aug 18, 2015, at 6:51 PM, Rich Felker <dalias@...c.org> wrote:

> On Tue, Aug 18, 2015 at 06:44:46PM -0700, Chuck Lever wrote:
>> 
>> On Aug 18, 2015, at 6:24 PM, Rich Felker <dalias@...c.org> wrote:
>> 
>>> On Tue, Aug 18, 2015 at 06:05:01PM -0700, Chuck Lever wrote:
>>>>>> i think this call goes wrong:
>>>>>> 
>>>>>> 
>>>>> http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=blob;f=utils/statd/rmtcall.c;hb=HEAD#l56
>>>>> 
>>>>>> 
>>>>>> it loops for 100 iterations and if all ports are used
>>>>>> according to getservbyport then it FD_SET(sockfd, &SVC_FDSET);
>>>>>> with some random high sockfd (eg. 105) that is closed.
>>>>>> 
>>>>>> ...so should getservbyport fail there?
>>>>>> 
>>>>>> (according to strace it tries ports 883 to 982)
>>>>> 
>>>>> I think the application's expectation is that it fail rather than
>>>>> returning a decimal-string-only service entity. However it looks like
>>>>> the code is written to handle the case where all 100 iterations fail
>>>>> to get an anonymous port. The problem seems to be that, when the loop
>>>>> stops due to hitting the iteration count rather than exiting with
>>>>> break, i has already been incremented past the last tmp_socket slot,
>>>>> so the close loop closes the fd that they actually want to use, later
>>>>> causing EBADF. This is purely an application bug, but it happens not
>>>>> to get noticed if getservbyport fails anywhere along the way, which
>>>>> they expect to happen in the usual case.
>>>> 
>>>> statd_get_socket() is hunting for a privileged source port that
>>>> is not just unused at the moment, but that is also not going to be
>>>> used by some other well-known service. This is a long-lived socket
>>>> that statd uses to communicate with the kernel. It must use a
>>>> privileged port.
>>>> 
>>>> if getservbyport(3) is returning something for every port that
>>>> is tried, then statd_get_socket() will fail to find a usable
>>>> port.
>>>> 
>>>> If it's returning 105, that suggests it has run out of retries.
>>>> It should return -1 in this case. That is a logic bug.
>>>> 
>>>> But is it true that every port returned by bindresvport(3) is
>>>> actually defined in /etc/services? Surely there is one open
>>>> port that can be used. What port does bindresvport(3) start
>>>> with?
>> 
>>> The logic bug is the count-down loop that closes all the temp sockets.
>>> In the case where the loop terminates via break, it leaves the last
>>> one open and only closes the extras. But in the case where where the
>>> loop terminates via the end condition in the for statement, the close
>>> loop closes all the sockets including the one it intends to use.
>> 
>> OK. Do you have a patch?
> 
> Something like this:
> 
> +        if (i == loopcnt) i--;
>         while (--i >= 0)
>                 close(tmp_sockets[i]);
> 
> I don't have a checkout of the source to make a proper diff, and I've
> got too many things I'm trying to do at the moment already. But it
> should be easy to make a real patch from what I wrote above.

I'll see if I can put together a proper patch for Steve, then.
But we're both ooo through at least next week.


>> Still not clear why it would take 100 tries exactly.
> 
> Because getservbyport never fails, and the code path with 100 failures
> to find a port where getservbyport fails was never tested. We can (and
> probably should) make it fail in musl when there's no named service to
> go with the port,

Ah ha. So musl's getservbyport(3) never returns NULL?

http://pubs.opengroup.org/onlinepubs/9699919799/functions/endservent.html

suggests that's incorrect behavior.

Why wasn't this a problem before now? The port hunting logic hasn't
changed since January 2013 (commit eb8229338f).


> but that doesn't change that there's a bug in this
> previously-untested codepath of nfs-utils that's the source of the
> EBADF error from select.

Agreed. Just trying to get a full picture.


--
Chuck Lever



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.