Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 2 Dec 2014 21:48:51 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: __sched_cpucount returns garbage

On Tue, Dec 02, 2014 at 05:33:04PM -0800, Isaac Dunham wrote:
> On Tue, Dec 02, 2014 at 07:11:15PM -0500, Rich Felker wrote:
> > On Sun, Nov 30, 2014 at 12:38:46PM +0100, Szabolcs Nagy wrote:
> > > * Isaac Dunham <ibid.ag@...il.com> [2014-11-29 15:36:33 -0800]:
> > > > I noticed that nproc ended up on the toybox TODO list (via Tizen), and went
> > > > poking about via strace and ltrace to see where it got the cpu count from.
> > > > 
> > > > In the process, I discovered that __sched_cpucount is returning garbage;
> > > 
> > > works here as expected:
> > > 
> > > #define _GNU_SOURCE
> > > #include <sched.h>
> > > int main()
> > > {
> > > 	cpu_set_t s = {0};
> > > 	CPU_SET(3, &s);
> > > 	CPU_SET(7, &s);
> > > 	CPU_SET(24, &s);
> > > 	return __sched_cpucount(sizeof s, &s);
> > > }
> > > 
> > > returns 3
> > > 
> > > > on Alpine Linux on my N270-based netbook (1 physical core but 
> > > > hyperthreading makes it look like 2),
> > > > nproc
> > > > outputs a random number of CPUs ranging from 413 to 472.
> > > 
> > > see where the cpu_set_t argument comes from
> > > (most likely sched_getaffinity syscall)
> > > then see why that is broken
> > > 
> > > __sched_cpucount just counts bit flags
> > 
> > Is it possible that the macros from sched.h are using it wrong, or
> > that nproc is using __sched_cpucount directly rather than using the
> > sched.h macros and expecting different behavior from it (perhaps a
> > mismatch between the musl and glibc behavior, like counting bits vs
> > bytes vs longs)?
> > 
> > Rich
> 
> I have no idea what it's doing; after reading the source, I have *less*
> of an understanding, since it's got half a dozen #ifdefs in the relevant
> code (in lib/nproc.c).
> But I can say that it's returning the result of __sched_cpucount without
> modification (the return matches the output of nproc).
> 
> OK, rereading it:
> We're probably using HAVE_SCHED_GETAFFINITY_LIKE_GLIBC, and CPU_COUNT is
> defined.
> So it ostensibly should be more-or-less:
>   if (sched_getaffinity (0, sizeof (set), &set) == 0)
>     {
>       unsigned long count;
>       count = CPU_COUNT(&set);
>       if (count > 0)
>         return count;
>     }
> BUT... isolating that snippet gives me the expected results...if
> I initialize set to 0, which they *don't*.
> So I guess it's the missing initialization.

I think it's a kernel quirk. It looks like the kernel only fills the
part of the cpuset up to the actual number of cpus the kernel knows
about or supports. The syscall then returns a value (bits? bytes?)
indicating the amount filled, and userspace is responsible for
zero-filling the rest and returning zero. I'll look into the details
and fix it.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.