|
|
Message-ID: <20130712031615.GS29800@brightrain.aerifal.cx>
Date: Thu, 11 Jul 2013 23:16:15 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: Thinking about release
On Fri, Jul 12, 2013 at 10:34:31AM +1200, Andre Renaud wrote:
> I've rejiggled it a bit, and it appears to be working. I wasn't
> entirely sure what you meant about the proper constraints. There is an
> additional reason why 8*4 was used for the align - to force the whole
> loop to work in cache-line blocks. I've now done this explicitly on
> the lead-in by doing the first few copies as 32-bit, then going to the
> full cache-line asm. This has the same performance as the fully native
> assembler. However to get that I had to use the same trick that the
> native assembler uses - doing a load of the next block prior to
> storing this one. I'm a bit concerned that this would mean we'd be
> doing a read that was out of bounds, and I can't entirely see why this
> wouldn't be happening with the existing assembler (but I'm presuming
> it doesn't). Any comments on this side of it?
I was unable to measure any difference in performance of your version
with the prefetch hack versus simply:
__asm__ __volatile__(
"ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
"stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
: "+r"(d), "+r"(s) :
: "a4", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "memory");
in the inner loop.
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.