Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPfzE3bRBkreBToPM3Rbf9dgTume=fnTfPM0X_bm4MJXFWGzag@mail.gmail.com>
Date: Fri, 9 Aug 2013 08:17:22 +1200
From: Andre Renaud <andre@...ewatersys.com>
To: musl@...ts.openwall.com
Subject: Re: Optimized C memcpy

Hi Rich,
>From the looks of the code, compared to the original bionic assembly,
I assume the remaining speed difference is caused by the C-code doing
8 discrete store operations, where as the bionic code batches these
all up into registers and does these as a single multiple-store. Would
it be worth having a structure with 8 32-bit ints in it, and doing a
single write to d of one of these (hoping that gcc will catch it and
turn it into a stm instruction)? It unfortunately runs the risk that
gcc will decide a 32-byte copy is worth using memcpy for, resulting in
the recursive issue you've seen previously.

Regards,
Andre

On 9 August 2013 03:15, Rich Felker <dalias@...ifal.cx> wrote:
> On Thu, Aug 08, 2013 at 09:03:51AM -0400, Andrew Bradford wrote:
>> > > This is not a replacement for the ARM asm (which is still better), but
>> > > it's a step towards avoiding the need to have written-by-hand assembly
>> > > for every single new arch we add as a prerequisite for tolerable
>> > > performance.
>> >
>> > Sorry if this has been discussed before but Google isn't much help.  Why
>> > is 32 bytes chosen as the block size over other sizes?
>> >
>> > It seems that the code would be fewer lines if blocks were 4 bytes,
>>
>> Sorry, I now see why 4 byte blocks won't work due to the misalignment,
>> but 8 or 16 seem like they should be possible.
>> Is it just the evaluation of the for loop being expensive that's trying
>> to be avoided?
>
> It's purely empirical reasons. 8 is the smallest that would work
> without extra logic to shuffle w/x. 16 runs 50% slower than the ARM
> asm. 32 runs only 25% slower than the ARM asm.
>
> Rich



-- 
Bluewater Systems - An Aiotec Company

Andre Renaud
andre@...ewatersys.com          5 Amuri Park, 404 Barbadoes St
www.bluewatersys.com            PO Box 13 889, Christchurch 8013
www.aiotec.co.nz                New Zealand
Phone: +64 3 3779127            Freecall: Australia 1800 148 751
Fax:   +64 3 3779135            USA 1800 261 2934

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.