Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 11 Aug 2013 13:14:31 +0200
From: Luca Barbato <lu_zero@...too.org>
To: musl@...ts.openwall.com
Subject: Re: Optimized C memcpy [updated]

On 11/08/13 10:13, Rich Felker wrote:
>> Unfortunately this case seems to be compiling to a call to memcpy on
>> powerpc (but nowhere else I found). So I may need to drop the special
>> case for 64-bit alignment. I wish there was some source for knowledge
>> of the cases that can trigger gcc's stupidity, though...
> 
> It turns out mips at certain optimization levels is also generating a
> memcpy for the structure assignments. I think I just need to drop all
> of the structure-assignment tricks and use a mildly unrolled loop with
> uint32_t units for the aligned case. This gives much worse performance
> on ARM, where gcc fails to generate the proper ldmia/stmia without the
> struct, but we have asm we can use for ARM anyway. On other archs, the
> struct copy code does not even seem to help. The simple integer loop
> works just as well.
> 
> I'll do some more experimenting and probably commit the ARM asm soon,
> followed by the C code once I get some better feedback on how it
> performs on real machines.

What about sprinkling volatile here and there?

lu

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.