Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 01 Mar 2013 22:33:19 -0600
From: Rob Landley <rob@...dley.net>
To: musl@...ts.openwall.com
Cc: musl@...ts.openwall.com
Subject: Re: ARM optimisations

On 02/28/2013 05:30:51 PM, Rich Felker wrote:
> On Fri, Mar 01, 2013 at 12:15:21PM +1300, Andre Renaud wrote:
> > Hi,
> > Can anyone tell me what the policy for musl is regarding ARM  
> optimised
> > assembly implementations of functions such as memcpy/memmove? I  
> notice
> > that there are i386/x86_64 versions for some of these. Doing some
> > simple testing on an ARM platform I found that an ARM asm
> > implementation of memcpy is ~80% faster than the C one currently in
> > MUSL (this is on an ARMv5, so no NEON instructions or similar).
> >
> > I don't think I'm capable of writing the optimised version entirely
> > myself, however there are various implementations floating around in
> > libraries such as bionic etc... Is it possible to have BSD licensed
> > code brought in to musl (which is MIT licensed)?
> 
> ARM optimizations are welcome as long as they're thoroughly tested,
> not heavily bloated, and support all v4 (including no-thumb) and later
> cpu models, either by using universally-available features or
> conditioning use of features on the .hidden __hwcap provided in musl.

Out of curiosity, why armv4 no thumb?

I'd actually say that armv5 is probably the one to optimize for,  
because it's somewhere over 80% of the installed base of arm systems  
and generally provides an additonal 25% speedup from armv4 to armv5.  
Anything lower than that can use C, anything newer than that can  
benefit from an armv5 version vs C.

The reason armv4t _without_ thumb isn't interesting is you need at  
least armv4t to use EABI, and I had to patch my compiler to make even  
that work because telling it EABI hardwired output to <= armv5l even  
though that wasn't technically required. (Presumably since fixed but  
the point is nobody _noticed_ for several years.)

Newer compilers have dropped support for OABI entirely, and armv4t  
systems aren't that common. (They existed, the tin can tools nail board  
used one, but the generic C code works for them. Point is I'm not sure  
they're worth _optimizing_ for if it costs the vast majority of systems  
a 25% performance hit and we don't want to maintain multiple versions.  
If you _have_ an armv5 version, the armv4 one won't/shouldn't get much  
testing.)

I believe armv6 was mostly just SMP extensions, so not worth optimizing  
memcpy for. armv7 is nice but not uibiquitous the way armv5 is, and  
armv7 brings with it the "thumb2" instruction set which means you'd  
need 2 versions depending on what target you wanted to compile for...

Rob

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.