Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 11 Feb 2015 02:07:23 +0100
From: Denys Vlasenko <>
To: musl <>
Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations

On Tue, Feb 10, 2015 at 10:37 PM, Rich Felker <> wrote:
> OK. Based on some casual testing on my Celeron 847:
> - For small sizes, your patches make significant improvement, 20-30%.
> - For rep stosq path, the improvement is minimal (roughly 1-2 cycles).
> - Using 32-bit imul instead of 64-bit makes no difference at all.

That's because Celeron 847 is a Sandy Bridge CPU. Only Intel's "big"
CPUs starting from Nehalem have fast (and large in transistor count)
integer multiplier capable of 3-cycle 64-bit multiply.

Many other CPUs are worse, even Intel ones: Atoms are 13-cycle (!),
Silvermont: 5 cycles. AMD's Bulldozers: 6 cycles, Bobcat: 6-7, Jaguar:
6, K10: 4 cycles.

32-bit imul is 3 or 4 cycles on all these CPUs (well, Atom has 5).

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.