Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 15 Jan 2020 11:35:59 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 1/2] Add Thumb2 support to ARM assembler memcpy

On Fri, Sep 13, 2019 at 11:44:31AM -0700, Andre McCurdy wrote:
> For Thumb2 compatibility, replace two instances of a single
> instruction "orr with a variable shift" with the two instruction
> equivalent. Neither of the replacements are in a performance critical
> loop.
> ---
>  src/string/arm/memcpy.c    |  2 +-
>  src/string/arm/memcpy_le.S | 17 ++++++++++-------
>  2 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/src/string/arm/memcpy.c b/src/string/arm/memcpy.c
> index f703c9bd..041614f4 100644
> --- a/src/string/arm/memcpy.c
> +++ b/src/string/arm/memcpy.c
> @@ -1,3 +1,3 @@
> -#if __ARMEB__ || __thumb__
> +#if __ARMEB__
>  #include "../memcpy.c"
>  #endif
> diff --git a/src/string/arm/memcpy_le.S b/src/string/arm/memcpy_le.S
> index 9cfbcb2a..64bc5f9e 100644
> --- a/src/string/arm/memcpy_le.S
> +++ b/src/string/arm/memcpy_le.S
> @@ -1,4 +1,4 @@
> -#if !__ARMEB__ && !__thumb__
> +#if !__ARMEB__
>  
>  /*
>   * Copyright (C) 2008 The Android Open Source Project
> @@ -40,8 +40,9 @@
>   * This file has been modified from the original for use in musl libc.
>   * The main changes are: addition of .type memcpy,%function to make the
>   * code safely callable from thumb mode, adjusting the return
> - * instructions to be compatible with pre-thumb ARM cpus, and removal
> - * of prefetch code that is not compatible with older cpus.
> + * instructions to be compatible with pre-thumb ARM cpus, removal of
> + * prefetch code that is not compatible with older cpus and support for
> + * building as thumb 2.
>   */
>  
>  .syntax unified
> @@ -241,8 +242,9 @@ non_congruent:
>  	beq     2f
>  	ldr     r5, [r1], #4
>  	sub     r2, r2, #4
> -	orr     r4, r3, r5,             lsl lr
> -	mov     r3, r5,                 lsr r12
> +	mov     r4, r5, lsl lr
> +	orr     r4, r4, r3
> +	mov     r3, r5, lsr r12
>  	str     r4, [r0], #4
>  	cmp     r2, #4
>  	bhs     1b

This is outside of loops and not a hot path, 

> @@ -348,8 +350,9 @@ less_than_thirtytwo:
>  
>  1:      ldr     r5, [r1], #4
>  	sub     r2, r2, #4
> -	orr     r4, r3, r5,             lsl lr
> -	mov     r3,     r5,                     lsr r12
> +	mov     r4, r5, lsl lr
> +	orr     r4, r4, r3
> +	mov     r3, r5, lsr r12
>  	str     r4, [r0], #4
>  	cmp     r2, #4
>  	bhs     1b

This one is in a loop, but perhaps not terribly critical to
performance. We could keep old version with #if !__thumb__ but I doubt
it matters, and it looks like hardly anyone is using pre-thumb2 ARM
anymore anyway; a show-stopping bug went uncaught for over a year in
other things for v6.

One cosmetic fix I'd like to make when applying this is keeping the
old gratuitously-ugly formatting just so the actual change isn't
obscured by the formatting-only change on an adjacent line. I can
handle that though.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.