Date: Tue, 10 Feb 2015 20:21:26 -0500 From: Rich Felker <dalias@...c.org> To: Denys Vlasenko <vda.linux@...glemail.com> Cc: musl <musl@...ts.openwall.com> Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations On Wed, Feb 11, 2015 at 02:07:23AM +0100, Denys Vlasenko wrote: > On Tue, Feb 10, 2015 at 10:37 PM, Rich Felker <dalias@...c.org> wrote: > > OK. Based on some casual testing on my Celeron 847: > > > > - For small sizes, your patches make significant improvement, 20-30%. > > > > - For rep stosq path, the improvement is minimal (roughly 1-2 cycles). > > > > - Using 32-bit imul instead of 64-bit makes no difference at all. > > That's because Celeron 847 is a Sandy Bridge CPU. Only Intel's "big" > CPUs starting from Nehalem have fast (and large in transistor count) > integer multiplier capable of 3-cycle 64-bit multiply. > > Many other CPUs are worse, even Intel ones: Atoms are 13-cycle (!), > Silvermont: 5 cycles. AMD's Bulldozers: 6 cycles, Bobcat: 6-7, Jaguar: > 6, K10: 4 cycles. > > 32-bit imul is 3 or 4 cycles on all these CPUs (well, Atom has 5). Thanks for the info, and for the patches, which I just committed. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.