Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 24 Jun 2015 09:08:15 +0300 (MSK)
From: Alexander Monakov <amonakov@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 2/5] dynlink.c: compute modulus via magic
 multiplication

> > How can I do the last step, 'x-v*div' without it?
> 
> Ah yes. Would it be preferable to have that in struct udiv then,
> though? Then the caller never has to care about the divisor, and in
> the case where umod doesn't get inlined, the call will be more
> efficient (fewer args).

I doubt that; the caller needs that value ('nbuckets') anyway.

[...]
> Using the post-mul add rather than the saturated increment would make
> this work for 0xffffffff too.

post-mul add is problematic on 32-bit
 
> Another obvious solution is not using the +32 offset so that the right
> shift can just be 31, but that pessimizes the code on 32-bit archs
> quite a bit.

I agree that a special-case for a power-of-two divisor will fire too rarely,
so figuring a replacement out would be nice.

If you (or anyone) want to play with ideas, I'm attaching my test driver for
magic division that tests 2^32-1 inputs for a divisor given on the command
line (the main loop looks odd, but my goal was to have gcc vectorize it).

> > p->s1 check is for this reason: shift are relatively costly, and s1 is rarely
> > non-zero, so try to skip the shift if possible; in the rare case the pre-shift
> > is non-zero, the check allows to skip the saturating increment operation.
> 
> Shifts are costly? Are we talking about P4-era junk? ;-)

I primarily had modern Intel cores in mind where as far as I can see shifts by
%ecx have latency 2.  But even ignoring that, there's the second part of my
argument.

Anyway it's not trivial to measure an impact on the same "modern Intel cores",
so if anybody can say how that looks on a different platform, I'd appreciate
that.  Removing my if-else there is obviously beneficial for code size.

Alexander
View attachment "udiv.c" of type "TEXT/x-c" (1923 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.