Date: Wed, 25 Feb 2015 15:54:31 +0800 From: 邓尧 <torshie@...il.com> To: musl@...ts.openwall.com Subject: Re: x86[_64] memset and rep stos I'm not an expert on micro optimization, but why not use a dynamic routine selection system which would select the optimal routine for a given CPU during program initialization. The routine selection algorithm could simply be a predefined static table look up. IMO, only very small number of functions (like memset, memcpy) would benefit from such a system, so no code size overhead to worry about. On Wed, Feb 25, 2015 at 2:12 PM, Rich Felker <dalias@...c.org> wrote: > Doing some timings on the new proposed memset code, I found it was > pathologically slow on my Atom D510 (32-bit) when reaching sizes > around 2k - 16k. Like 4x slower than the old code. Apparently the > issue is that the work being done to align the destination mod 4 > misaligns it mod higher powers of two, and "rep stos" performs > pathologically bad when it's not cache-line-aligned, or something like > that. On my faster 64-bit system alignment mod 16 also seems to make a > difference, but less - it's 1.5x slower misaligned mod 16. > > I also found that on the 32-bit Atom, there seems to be a huge jump in > speed at size 1024 -- sizes just below 1024 are roughly 2x slower. > Since it otherwise doesn't make a measurable difference, it seems > preferable _not_ to try to reduce the length of the rep stos to avoid > writing the same bytes multiple times but simply use the max allowable > length. > > Combined with the first issue, it seems we should "round up to a > multiple of 16" rather than "add 16 then round down to a multiple of > 16". Not only does this avoid reducing the length of the rep stos; it > also preserves any higher-than-16 alignment that might be preexisting, > in case even higher alignments are faster. > > Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.