Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 16 Jan 2020 16:21:20 +0100
From: Natanael Copa <>
To: Andre McCurdy <>
Subject: Re: [PATCH 2/2] Add big-endian support to ARM assembler

On Wed, 15 Jan 2020 10:41:08 -0800
Andre McCurdy <> wrote:

> On Wed, Jan 15, 2020 at 7:46 AM Rich Felker <> wrote:
> > On Fri, Sep 13, 2019 at 01:38:34PM -0700, Andre McCurdy wrote:  
> > > On Fri, Sep 13, 2019 at 11:59 AM Rich Felker <> wrote:  
> > > > On Fri, Sep 13, 2019 at 11:44:32AM -0700, Andre McCurdy wrote:  
> > > > > Allow the existing ARM assembler memcpy implementation to be used for
> > > > > both big and little endian targets.  
> > > >
> > > > Nice. I don't want to merge this just before release, but as long as
> > > > it looks ok I should be able to review and merge it afterward.
> > > >
> > > > Note that I'd really like to replace this giant file with C using
> > > > inline asm just for the inner block copies and C for all the flow
> > > > control, but I don't mind merging this first as long as it's correct.  
> > >
> > > Sounds good. I'll wait for your feedback after the upcoming release.  
> >
> > Sorry this dropped off my radar. I'd like to merge at least the thumb
> > part since it's simple enough to review quickly and users have
> > actually complained about memcpy being slow on armv7 with -mthumb as
> > default.  
> Interesting. I wonder what the reference was against which the musl C
> code was compared? From my own benchmarking I didn't find the musl
> assembler to be much faster than the C code. There are armv6 and maybe
> early armv7 CPUs where explicit prefetch instructions make a huge
> difference (much more so than C -vs- assembler). Did the users who
> complained about musl memcpy() compare against a memcpy() which uses
> prefetch? For armv7 using NEON may help, although the latest armv7
> cores seem to perform very well with plain old C code too. There are
> lots of trade offs so it's impossible for a single implementation to
> be universally optimal. The "arm-mem" routines used on Raspberry Pi
> seem to be a very fast for many targets, but unfortunately the armv6
> memcpy generates mis-aligned accesses so isn't suitable for armv5.

The Alpine user reported it here:

I don't know if you got the __builtin_memcpy or the libc version. I do
know that qemu once got surprised that `memcpy` used libc's non-atomic
version instead of gcc's atomic __builtin_memcpy. This happened due to
alpine users fortify-headers as FORTIFY_SOURCE implementation.

Not sure if something similar happened here.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.