Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Date: Wed, 25 Feb 2015 15:37:12 -0500
From: Rich Felker <>
Subject: Updated draft of improved memset.s for i386

Here's a new version of the improved i386 memset.s. The main changes

- Alignment to 16-byte boundary rather than 4-byte for rep stosl.

- Preserving existing over-alignment via rounding up instead of adding
  16 then rounding down.

- Special-casing already-aligned case (saves a few cycles when already
  aligned, maybe 5-10% total run time at sizes just above the rep
  stosl cutoff such as 64).

- Keeping the rep stosl run-length as long as possible rather than
  trying to avoid duplicate stores. This helps a lot (>2x improvement)
  at size 1024 on Atom and shouldn't hurt in general.

At this point I think it should be a net improvement on nearly any x86

I've checked and it passes the current tests in libc-test. I'm not
entirely sure the tests cover all the cases we need though. For the
32-bit version, tests need to cover:

- All sizes 0-62; alignment doesn't matter.

- Sufficiently many sizes >=63 to get all alignments mod 16 for both
  the length and the base pointer.

For the 64-bit versions (either Denys's latest or mine) we also need
coverage for all sizes 63-126 (alignmen doesn't matter) and
sufficiently many past that to test all alignments mod 16 for both
length and base. For the sake of robustness and future-proofing, we
should probably be testing all base and length alignments mod 32 or
more up to size 256 or larger.


View attachment "memset-draft3.s" of type "text/plain" (1171 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.