Date: Wed, 25 Feb 2015 15:37:12 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Updated draft of improved memset.s for i386 Here's a new version of the improved i386 memset.s. The main changes are: - Alignment to 16-byte boundary rather than 4-byte for rep stosl. - Preserving existing over-alignment via rounding up instead of adding 16 then rounding down. - Special-casing already-aligned case (saves a few cycles when already aligned, maybe 5-10% total run time at sizes just above the rep stosl cutoff such as 64). - Keeping the rep stosl run-length as long as possible rather than trying to avoid duplicate stores. This helps a lot (>2x improvement) at size 1024 on Atom and shouldn't hurt in general. At this point I think it should be a net improvement on nearly any x86 system. I've checked and it passes the current tests in libc-test. I'm not entirely sure the tests cover all the cases we need though. For the 32-bit version, tests need to cover: - All sizes 0-62; alignment doesn't matter. - Sufficiently many sizes >=63 to get all alignments mod 16 for both the length and the base pointer. For the 64-bit versions (either Denys's latest or mine) we also need coverage for all sizes 63-126 (alignmen doesn't matter) and sufficiently many past that to test all alignments mod 16 for both length and base. For the sake of robustness and future-proofing, we should probably be testing all base and length alignments mod 32 or more up to size 256 or larger. Rich View attachment "memset-draft3.s" of type "text/plain" (1171 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.