|
Message-Id: <1375967031.16501.7506555.32466BB3@webmail.messagingengine.com> Date: Thu, 08 Aug 2013 09:03:51 -0400 From: Andrew Bradford <andrew@...dfordembedded.com> To: musl@...ts.openwall.com Subject: Re: Optimized C memcpy On Thu, Aug 8, 2013, at 08:59 AM, Andrew Bradford wrote: > On Wed, Aug 7, 2013, at 02:21 PM, Rich Felker wrote: > > Attached is the latest version of my "pure C" (modulo aliasing issues) > > memcpy implementation. Compiled with -O3 on arm, it matches the > > performance of the assembly language memcpy from Bionic for aligned > > copies, and is only 25% slower than the asm for misaligned copies. And > > it's only mildly larger. It uses the same principle as the Bionic > > code: large block copies as aligned 32-bit units for aligned copies, > > and aligned-load, bitshift-then-or, aligned-store for misaligned > > copies. This should, in principle, work well on typical risc archs > > that have plenty of registers but no misaligned load or store support. > > > > Unfortunately it only works on little-endian (I haven't though much > > yet about how it could be adapted to big-endian), but testing it on > > qemu-ppc with the endian check disabled (thus wrong behavior) > > suggested that this approach would work well on there too if we could > > adapt it. Of course tests under qemu are not worth much; the ARM tests > > were on real hardware and I'd like to see real-hardware results for > > others archs (mipsel?) too. > > > > This is not a replacement for the ARM asm (which is still better), but > > it's a step towards avoiding the need to have written-by-hand assembly > > for every single new arch we add as a prerequisite for tolerable > > performance. > > Sorry if this has been discussed before but Google isn't much help. Why > is 32 bytes chosen as the block size over other sizes? > > It seems that the code would be fewer lines if blocks were 4 bytes, Sorry, I now see why 4 byte blocks won't work due to the misalignment, but 8 or 16 seem like they should be possible. Is it just the evaluation of the for loop being expensive that's trying to be avoided? Thanks, Andrew
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.