Date: Fri, 16 Sep 2016 18:16:03 -0400 From: Rich Felker <dalias@...c.org> To: Rob Landley <rob@...dley.net> Cc: "j-core@...ore.org" <j-core@...ore.org>, musl@...ts.openwall.com Subject: Re: Re: [J-core] Aligned copies and cacheline conflicts? On Wed, Sep 14, 2016 at 10:36:45PM -0400, Rich Felker wrote: > On Wed, Sep 14, 2016 at 07:58:52PM -0500, Rob Landley wrote: > > On 09/14/2016 07:34 PM, Rich Felker wrote: > > > I could put a fork of memcpy.c in sh/memcpy.c and work on it there and > > > only merge it back to the shared one if others test it on other archs > > > and find it beneficial (or at least not harmful). > > > > Both musl and the kernel need it. And yes at the moment it seems > > architecture-specific, but it's a _big_ performance difference... > > I actually think it's justifiable to have in the generic C memcpy, > from a standpoint that the generic C shouldn't assume an N-way (N>1, > i.e. not direct mapped) associative cache. Just need to make sure > changing it doesn't make gcc do something utterly idiotic for other > archs, I guess. I'll take a look at this. Attached is a draft memcpy I'm considering for musl. Compared to the current one, it: 1. Works on 32 bytes per iteration, and adds barriers between the load phase and store phase to preclude cache line aliasing between src and dest with a direct-mapped cache. 2. Equally unrolls the misaligned src/dest cases. 3. Adjusts the offsets used in the misaligned src/dest loops to all be multiples of 4, with the adjustments to make that work outside the loops. This helps compilers generate indexed addressing modes (e.g. @(4,Rm)) rather than having to resort to arithmetic. 4. Factors the misaligned cases into a common inline function to reduce code duplication. Comments welcome. Rich View attachment "memcpy-draft.c" of type "text/plain" (2705 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.