Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 10 Feb 2015 20:21:26 -0500
From: Rich Felker <dalias@...c.org>
To: Denys Vlasenko <vda.linux@...glemail.com>
Cc: musl <musl@...ts.openwall.com>
Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations

On Wed, Feb 11, 2015 at 02:07:23AM +0100, Denys Vlasenko wrote:
> On Tue, Feb 10, 2015 at 10:37 PM, Rich Felker <dalias@...c.org> wrote:
> > OK. Based on some casual testing on my Celeron 847:
> >
> > - For small sizes, your patches make significant improvement, 20-30%.
> >
> > - For rep stosq path, the improvement is minimal (roughly 1-2 cycles).
> >
> > - Using 32-bit imul instead of 64-bit makes no difference at all.
> 
> That's because Celeron 847 is a Sandy Bridge CPU. Only Intel's "big"
> CPUs starting from Nehalem have fast (and large in transistor count)
> integer multiplier capable of 3-cycle 64-bit multiply.
> 
> Many other CPUs are worse, even Intel ones: Atoms are 13-cycle (!),
> Silvermont: 5 cycles. AMD's Bulldozers: 6 cycles, Bobcat: 6-7, Jaguar:
> 6, K10: 4 cycles.
> 
> 32-bit imul is 3 or 4 cycles on all these CPUs (well, Atom has 5).

Thanks for the info, and for the patches, which I just committed.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.