|
|
Message-ID: <CAPfzE3a8hRbpcmD55D-y9nwaJn6YaD7BA9dhxM7OkwpnHeEc5w@mail.gmail.com>
Date: Fri, 12 Jul 2013 15:36:42 +1200
From: Andre Renaud <andre@...ewatersys.com>
To: musl@...ts.openwall.com
Subject: Re: Thinking about release
> I was unable to measure any difference in performance of your version
> with the prefetch hack versus simply:
>
> __asm__ __volatile__(
> "ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
> "stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
> : "+r"(d), "+r"(s) :
> : "a4", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "memory");
What kind of machine were you using? I see a change of 115MB/s ->
105MB/s when I drop the prefetch, even using the code that you
suggested. This is on an Atmel AT91sam9g45 (ARM926ejs @ 400MHz). I'm
assuming this is some subtlety about how the cache is operating?
Sticking the ldrhi back in brings the speed back, ie:
__asm__ __volatile__(
"ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
"ldrhi r12, [%1]\n"
"stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
: "+r"(d), "+r"(s) :
: "a4", "v1", "v2", "v3", "v4", "v5",
"v6", "v7", "r12", "memory");
Regards,
Andre
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.