Date: Wed, 25 Mar 2020 21:49:11 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: [PATCH] aarch64: add optimized memcpy, memmove and memset On Wed, Mar 25, 2020 at 10:45:45PM +0100, Szabolcs Nagy wrote: > minimal edits to upstream version for easier updates > and because this code was benchmarked across many cores. > > gcc generates slow code for the current c implementations. > > the integer memcpy was chosen instead of the simd one, > this performs better on little cores, i think this is > the more conservative choice for now. I think this was discussed before on IRC, and I'm not particularly opposed to these especially since aarch64 is one of the most important archs these days. However I would really like to avoid adding more asm source files with the function flow written in asm when the only thing that really needs to benefit from asm is the inner loop body. I know nothing has happened on this front since we last talked about it, so it's very possible that the answer is just "we need something with decent performance in the short term and nobody has cycles to spend on doing it better right now and so we we should just use the asm files"... > note: there are upcoming security architectures which > may mean updates to these functions (BTI - landing pads, > PAUTH - return address signing, MTE - 16byte tag granule > may affect optimized strcmp etc, not relevant yet), but > runtime support for these will need other libc changes. If these mattered they'd be another reason to prefer having the function in C with minimal inline asm or just extensions for unaligned loads/stores, but MTE is the only one of these that's interesting and it doesn't conflict with any current code in musl at all (nothing does unaligned overreads; they have to be assumed to be able to fault anyway). Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.