Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 12 Feb 2015 21:36:26 +0100
From: Denys Vlasenko <vda.linux@...glemail.com>
To: musl <musl@...ts.openwall.com>
Subject: Re: [PATCH 1/2] x86_64/memset: avoid multiply insn if possible

On Thu, Feb 12, 2015 at 8:26 PM, Denys Vlasenko
<vda.linux@...glemail.com> wrote:
>> I'd actually like to extend the "short" range up to at least 32 bytes
>> using two 8-byte writes for the middle, unless the savings from using
>> 32-bit imul instead of 64-bit are sufficient to justify 4 4-byte
>> writes for the middle. On the cpu I tested on, the difference is 11
>> cycles vs 32 cycles for non-rep path versus rep path at size 32.
>
> The short path causes mixed feelings in me.
>
> On one hand, it's elegant in a contrived way.
>
> On the other hand, multiple
> overlaying stores must be causing hell in store unit.
> I'm thinking, maybe there's a faster way to do that.

For example, like in the attached implementation.

This one will not perform eight stores to memory
to fill 15 byte area... only two.

Download attachment "memset.s" of type "application/octet-stream" (1136 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.