Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 16 Mar 2019 10:28:19 -0400
From: Jonathan Rajotte-Julien <jonathan.rajotte-julien@...icios.com>
To: musl@...ts.openwall.com, Michael Jeanson <mjeanson@...icios.com>,
	Richard Purdie <richard.purdie@...uxfoundation.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: sysconf(_SC_NPROCESSORS_CONF) returns the wrong value

> > A simple command line to show this:
> > 
> >   taskset -c 0 nproc --all
> > 
> > This is equivalent to asking sysconf(__SC_NPROCESSORS_CONF).
> 
> the right way to check the sysconf from a shell is getconf

It was only to provide an easy reproducer. But you are right in that nproc
does not expose the complete picture. Thanks for taking the time to reproduce
the base problem.

I mixed up the  _NPROCESSORS_ONLN result for glibc, it should have been 4 in the
previous email since there is 4 online cpu even if we have sched_affinity only
set for cpu0. (nproc was not proving my point for the _NPROCESSORS_ONLN value)

As you know, you can take a cpu offline easily, but we still need to account for
it in userspace tracing since it can be put back online (see Mathieu Desnoyers
answer in this thread).

  echo 0 > /sys/devices/system/cpu/cpu3/online

Now on a glibc system:

  $ taskset -c 0 getconf -a |grep NPROC
  _NPROCESSORS_CONF                  4
  _NPROCESSORS_ONLN                  3

This is why we use _NPROCESSORS_CONF and expect it to represent the complete
picture. We do not care much for _NPROCESSORS_ONLN or affinity. This is why the
use of "nproc --all" was sufficient for me (I was wrong).

> 
> on glibc system
> $ taskset -c 0 getconf -a |grep NPROC
> _NPROCESSORS_CONF                  8
> _NPROCESSORS_ONLN                  8
> 
> on musl
> $ taskset -c 0 getconf -a |grep NPROC
> _NPROCESSORS_CONF                  1
> _NPROCESSORS_ONLN                  1
> 
> so both values differ (plain nproc returns the affinity number,
> *_ONLN is all the cpus that the kernel schedules to, *_CONF
> includes offline cpus that may be hotplugged)
> 
> these are documented linux extensions so i think musl should follow
> the linux sysconf man page. (but the semantics is not entirely clear
> e.g. there is /sys/devices/system/cpu/possible which can have larger
> number than echo /sys/devices/system/cpu/cpu[0-9]* |wc -w which is
> what glibc seems to be doing for *_CONF)
> 
> i think we need to know why does a process care if musl returns
> the wrong number? or what are the valid uses of such a number?
> (there are heterogeous systems like arm big-little, numa systems
> with many sockets, containers, virtualization,.. how deep may a
> user process need to go down in this rabbit hole?)

I'll refer you to Mathieu Desnoyers answer regarding that. (same thread).
It should be approved shortly by a moderator.

Cheers

-- 
Jonathan Rajotte-Julien
EfficiOS

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.