Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Sep 2016 16:21:26 +0200
From: Markus Wichmann <nullplan@....net>
To: musl@...ts.openwall.com
Subject: Model specific optimizations?

Hi there,

I wanted to ask if there is any wish for the near future to support
model-specific optimizations. What I mean by that is multiple
implementations of the same function, where the best implementation is
decided at run-time.

One simple example would be PowerPC's fsqrt instruction. The PowerPC
Book 1 defines it as optional and provides no way to know specifically,
if the currently running processor supports this instruction besides
executing it and seeing if you get a SIGILL.

A cursory DuckDuckGo search revealed that Apple uses the instruction as
sqrt implementation if it detects the CPU capability for that, however
it only detects that capability by checking the PVR for known-good bit
patterns (Currently, the only known PowerPC cores to support this
instruction are the 970 and 970FX, which have a version field if 0x39
and 0x3c, respectively). x86 and -derived architectures at least have
the cpuid instruction to check for some features, and admittedly,
there's a lot of defined bits. However, glibc's ifunc-initialization
function (which selects the implementation) also does a lot of work
finding out the precise make and model of the CPU to set some more
flags.

The reason I ask is that lots of ISAs define optional parts that aren't
mandatory, but grow in popularity more and more until they're seen in
all current practical implementations. Like how x87 started out as a
separate device but is a fixed part of x86 since the later days of the
486. Same with MMX, SSE, SSE2. None of these are mandatory by the ABI,
but available in all practical implementations. And musl is never going
to be able to utilize that in its current form. Oh, alright, the
compiler might support it, but that's different. I also suspect the
fsqrt instruction will be available in more future PowerPC
implementations.

If we were to go this route, the question is how to go about it. First
the detection method: Stuff like cpuid or AT_HWCAP are pretty nice,
because they allow for the detection of a feature, whereas version
checking only allows one to find known-good implementations. The latter
means there's a list of known-good values, and that list has to be kept
up-to-date. However, the latter is also pretty much always possible,
while the former isn't always available. The kernel doesn't check for
fsqrt availability, for example.

Then organization: Are we going the glibc route, which gathers all
indirect functions in a single section and initializes all of the
pointers at startup (__libc_init_main()), or do we put these checks
separately in each function?

To make a practical example, we could implement sqrt() for PowerPC like
this:

static double soft_sqrt(double);
static double hard_sqrt(double);
static double init_sqrt(double);
static double (*sqrtfn)(double) = init_sqrt;

double sqrt(double x) {
    return sqrtfn(x);
}

static double init_sqrt(double x) {
    unsigned long pvr;
    unsigned long ver;
    asm ("mfspr pvr, r0" : "=r"(pvr));
    ver = (pvr >> 16) & 0xffff;
    /* XXX: Add more values for cores with the fsqrt instruction here */
    if (0
        || ver == 0x39  /* PowerPC 970 */
        || ver == 0x3c  /* PowerPC 970FX */
    )
        sqrtfn = hard_sqrt;
    else
        sqrtfn = soft_sqrt;

    return sqrtfn(x);
}


static double hard_sqrt(double x) {
    double r;
    asm ("fsqrt %0, %1": "=d"(r) : "d"(x));
    return r;
}

#define sqrt soft_sqrt
#include "../sqrt.c"


Problem with this is: The same thing would have to be repeated for
sqrtf(), the same list of known values would have to be maintained
twice, although we could make it a real list (an array, I mean), and get
rid of that issue. But it does add quite a bit of code, and the overhead
of an indirect function call, and at the moment isn't going to be useful
to all but a few people.

Also, the inclusion here is a hack. But I couldn't think of a better
way.

Thoughts?

Ciao,
Markus

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.