Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 3 Mar 2013 09:28:23 +1300
From: Andre Renaud <andre@...ewatersys.com>
To: musl@...ts.openwall.com
Subject: Re: Re: [PATCH] Added ARM optimised memcpy implementation

On 1 March 2013 20:26, Szabolcs Nagy <nsz@...t70.net> wrote:
> * nwmcsween@...il.com <nwmcsween@...il.com> [2013-02-28 19:14:28 -0800]:
>> Hmm what does this do that builtins cannot? What I'm asking is why is this more preformant than preload + word-at-a-time.
>>
>
> musl uses naive memcpy if src and dst are not congruent (src%4 != dst%4)
>
> the android asm takes care of that by fetching a 32bytes
> from src into registers and dumping it into dst with
> apropriate shifts
>
> and in the congruent case 32byte alignment is used (cacheline aligned)

It also takes advantage of the arm load/store multiple instructions,
allowing it to manipulate up to 8 32-bit words with a single
instruction.

Regards,
Andre

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.