Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Sep 2016 11:52:44 -0700
From: Adhemerval Zanella <>
Subject: Re: Model specific optimizations?

On 29/09/2016 11:13, Rich Felker wrote:
> On Thu, Sep 29, 2016 at 07:08:01PM +0200, Markus Wichmann wrote:
>> On Thu, Sep 29, 2016 at 11:23:54AM -0400, Rich Felker wrote:
>>> What kind of version-checking? Not all systems even give you a way to
>>> version-check.
>> To the extent that they don't, they also don't give you a way to check
>> for features (again, except for executing the instructions and seeing if
>> you get SIGILL). PowerPC (sorry, but that's where I spent a lot of time
>> on recently) for instance only has the PVR (Processor Version Register).
>> No software I could find online uses another way to detect the features
>> of the CPU.
>> And for systems to not give you a way of detecting system version at
>> runtime and then define optional parts of the ISA would be very dickish,
>> in my opinion. That basically guarentees optional functions won't be
>> used at all.
> On Linux it's supposed to be the kernel which detects availability of
> features (either by feature-specific cpu flags or translating a model
> to flags) but I don't see anything for fsqrt on ppc. :-( How/why did
> they botch this?

Maybe because recent power work on kernel is POWER oriented, where fsqrt
is define since POWER4.  However some more recent freescale chips (such
as e5500 and e6500) also decided to not add fsqrt instruction.

With GCC you can check for _ARCH_PPCSQ to see if current arch flags 
allows fsqrt.  From runtine I presume programs can check for hwcap bit
PPC_FEATURE_POWER4, however it does not help on non-POWER chips which
do support fsqrt.

Another option and a bit hacky would issue fsqrt and trap on SIGILL... 

>>> This code contains data races. In order to be safe under musl's memory
>>> model, sqrtfn would have to be volatile and should probably be written
>>> via a_cas_p. It also then has to have type void* and be cast to/from
>>> function pointer type. See clock_gettime.c.
>> Well, yes, I was just throwing shit at a wall to see what sticks. We
>> could also move the function pointer dispatch into a pthread_once block
>> or something. I don't know if any caches need to be cleared then or not.
> pthread_once/call_once would be the nice clean abstraction to use, but
> it's mildly to considerably more expensive, currently involving a full
> barrier. There's a nice technical report on how that can be eliminated
> but it requires TLS, which is also expensive on some archs. In cases
> like this where there's no state other than the function pointer,
> relaxed atomics can simply be used on the reading end and then they're
> always fast.
>>> For some archs, gas produces an error or tags the .o file as needing a
>>> certain ISA level if you use an instruction that's not present in the
>>> baseline ISA. I'm not sure if this is an issue here or not.
>> As I said, fsqrt is defined in the baseline ISA, just marked as
>> optional.
> We're just using words differently. To me, baseline ISA means the part
> of the ISA that all models (or at least all usable models; e.g. for
> x86, pre-486 is not usable without trap-and-emulate of cmpxchg so we
> consider 486 the baseline ISA) support.
>> So any PowerPC implementation is free to include it or not.
>> There are a lot of optional features, and if the gas people made a
>> different subarch for each combination of them, they'd be here all day.
> They've actually done that for some archs...
>>> I think it's the #define sqrt soft_sqrt that's a hack. The inclusion
>>> itself is okay and would be the right way to do this for sure if it
>>> were just a compile-time check and not a runtime one.
>> I meant the define. While it is hacky, it does mean no code duplication
>> and only one externally facing symbol regarding sqrt(), which is the one
>> defined by the standard. Although I am abusing the little known rule
>> about C that if a function is declared as static in its prototype, and
>> the function definition doesn't have an explicit storage class
>> specifier, then the function will be static. Most style guides (rightly)
>> say to have the storage class specifier in the prototype and the
>> definition be the same, because otherwise this gets confusing fast.
>> I guess it goes to show that you should know your language even in the
>> parts you barely ever use (because forbidden), because they might come
>> in handy at some point.
> Yes, I was a bit surprised first and had to recall the rule, but I
> knew the code was either valid or a constraint violation right away.
> Anyway, I would have no objection right away to doing a patch like
> this that's decided at compile-time based on predefined macros set by
> -march. For runtime choice I think we need to discuss motivation. Are
> you trying to do a powerpc-based distro where you need a universal
> that works optimally on various models? Or would just
> compiling for the right -march meet your needs?
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.