Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Sep 2016 14:13:36 -0400
From: Rich Felker <>
Subject: Re: Model specific optimizations?

On Thu, Sep 29, 2016 at 07:08:01PM +0200, Markus Wichmann wrote:
> On Thu, Sep 29, 2016 at 11:23:54AM -0400, Rich Felker wrote:
> > What kind of version-checking? Not all systems even give you a way to
> > version-check.
> To the extent that they don't, they also don't give you a way to check
> for features (again, except for executing the instructions and seeing if
> you get SIGILL). PowerPC (sorry, but that's where I spent a lot of time
> on recently) for instance only has the PVR (Processor Version Register).
> No software I could find online uses another way to detect the features
> of the CPU.
> And for systems to not give you a way of detecting system version at
> runtime and then define optional parts of the ISA would be very dickish,
> in my opinion. That basically guarentees optional functions won't be
> used at all.

On Linux it's supposed to be the kernel which detects availability of
features (either by feature-specific cpu flags or translating a model
to flags) but I don't see anything for fsqrt on ppc. :-( How/why did
they botch this?

> > This code contains data races. In order to be safe under musl's memory
> > model, sqrtfn would have to be volatile and should probably be written
> > via a_cas_p. It also then has to have type void* and be cast to/from
> > function pointer type. See clock_gettime.c.
> Well, yes, I was just throwing shit at a wall to see what sticks. We
> could also move the function pointer dispatch into a pthread_once block
> or something. I don't know if any caches need to be cleared then or not.

pthread_once/call_once would be the nice clean abstraction to use, but
it's mildly to considerably more expensive, currently involving a full
barrier. There's a nice technical report on how that can be eliminated
but it requires TLS, which is also expensive on some archs. In cases
like this where there's no state other than the function pointer,
relaxed atomics can simply be used on the reading end and then they're
always fast.

> > For some archs, gas produces an error or tags the .o file as needing a
> > certain ISA level if you use an instruction that's not present in the
> > baseline ISA. I'm not sure if this is an issue here or not.
> As I said, fsqrt is defined in the baseline ISA, just marked as
> optional.

We're just using words differently. To me, baseline ISA means the part
of the ISA that all models (or at least all usable models; e.g. for
x86, pre-486 is not usable without trap-and-emulate of cmpxchg so we
consider 486 the baseline ISA) support.

> So any PowerPC implementation is free to include it or not.
> There are a lot of optional features, and if the gas people made a
> different subarch for each combination of them, they'd be here all day.

They've actually done that for some archs...

> > I think it's the #define sqrt soft_sqrt that's a hack. The inclusion
> > itself is okay and would be the right way to do this for sure if it
> > were just a compile-time check and not a runtime one.
> I meant the define. While it is hacky, it does mean no code duplication
> and only one externally facing symbol regarding sqrt(), which is the one
> defined by the standard. Although I am abusing the little known rule
> about C that if a function is declared as static in its prototype, and
> the function definition doesn't have an explicit storage class
> specifier, then the function will be static. Most style guides (rightly)
> say to have the storage class specifier in the prototype and the
> definition be the same, because otherwise this gets confusing fast.
> I guess it goes to show that you should know your language even in the
> parts you barely ever use (because forbidden), because they might come
> in handy at some point.

Yes, I was a bit surprised first and had to recall the rule, but I
knew the code was either valid or a constraint violation right away.

Anyway, I would have no objection right away to doing a patch like
this that's decided at compile-time based on predefined macros set by
-march. For runtime choice I think we need to discuss motivation. Are
you trying to do a powerpc-based distro where you need a universal that works optimally on various models? Or would just
compiling for the right -march meet your needs?


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.