Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 10 Feb 2015 17:36:48 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 1/2] x86_64/memset: simple optimizations

On Tue, Feb 10, 2015 at 04:37:56PM -0500, Rich Felker wrote:
> On Tue, Feb 10, 2015 at 10:08:29PM +0100, Denys Vlasenko wrote:
> > On Tue, Feb 10, 2015 at 9:50 PM, Rich Felker <dalias@...c.org> wrote:
> > > On Tue, Feb 10, 2015 at 06:30:56PM +0100, Denys Vlasenko wrote:
> > >> "and $0xff,%esi" is a six-byte insn (81 e6 ff 00 00 00), can use
> > >> 4-byte "movzbl %sil,%esi" (40 0f b6 f6) instead.
> > >> [...]
> > >
> > > Do you want to go ahead with these patches as-is, or consider some of
> > > the other ideas we discussed off-list like avoiding the 64-bit imul
> > > entirely in the small-n case? If you think that's easy as another
> > > incremental change I'll go ahead with these
> > 
> > I think you can apply these patches without waiting
> > for potential future improvements.
> 
> OK. Based on some casual testing on my Celeron 847:
> 
> - For small sizes, your patches make significant improvement, 20-30%.
> 
> - For rep stosq path, the improvement is minimal (roughly 1-2 cycles).
> 
> - Using 32-bit imul instead of 64-bit makes no difference at all.
> 
> I'll review the patches again for correctness, but so far they look
> good, and it doesn't look like these are things we'd want to back out
> or rewrite for subsequent improvements anyway.
> 
> Thanks!

One more trivial change I might do: since the non-rep-stosq path is
faster for small sizes, changing the jb 1f to jbe 1f significantly
improves 16-byte memsets with no additional code changes.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.