Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 12 Jul 2013 15:36:42 +1200
From: Andre Renaud <andre@...ewatersys.com>
To: musl@...ts.openwall.com
Subject: Re: Thinking about release

> I was unable to measure any difference in performance of your version
> with the prefetch hack versus simply:
>
>         __asm__ __volatile__(
>         "ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
>         "stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
>         : "+r"(d), "+r"(s) :
>         : "a4", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "memory");

What kind of machine were you using? I see a change of 115MB/s ->
105MB/s when I drop the prefetch, even using the code that you
suggested. This is on an Atmel AT91sam9g45 (ARM926ejs @ 400MHz). I'm
assuming this is some subtlety about how the cache is operating?
Sticking the ldrhi back in brings the speed back, ie:
          __asm__ __volatile__(
                                "ldmia %1!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
                                "ldrhi r12, [%1]\n"
                                "stmia %0!,{a4,v1,v2,v3,v4,v5,v6,v7}\n\t"
                                : "+r"(d), "+r"(s) :
                                : "a4", "v1", "v2", "v3", "v4", "v5",
"v6", "v7", "r12", "memory");

Regards,
Andre

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.