Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 26 Nov 2017 12:15:06 +1300
From: Michael Clark <michaeljclark@....com>
To: musl@...ts.openwall.com
Subject: Re: Do not use 64 bit division if possible

At -O0 and above, clang and gcc strength reduce division by a constant power of two into a right shift (arithmetic or logical depending on signedness of the types).

- https://cx.rv8.io/g/kDrEkB

a_ctz_l is not exactly inexpensive, given it has a multiply, and, negate, shift, load (cache miss).

We’d be better off defining PAGE_SHIFT if we want to be certain the code uses shift when optimisation is disabled, however I trust the compilers to turn the division into a shift.

#ifndef a_ctz_l
#define a_ctz_l a_ctz_l
static inline int a_ctz_l(unsigned long x)
{
        static const char debruijn32[32] = {
                0, 1, 23, 2, 29, 24, 19, 3, 30, 27, 25, 11, 20, 8, 4, 13,
                31, 22, 28, 18, 26, 10, 7, 12, 21, 17, 9, 6, 16, 5, 15, 14
        };
        if (sizeof(long) == 8) return a_ctz_64(x);
        return debruijn32[(x&-x)*0x076be629 >> 27];
}
#endif

If you study the codegen then this might be a better change (including to all other archs).

$ git diff arch/x86_64/bits/limits.h
diff --git a/arch/x86_64/bits/limits.h b/arch/x86_64/bits/limits.h
index 792a30b..32f29bf 100644
--- a/arch/x86_64/bits/limits.h
+++ b/arch/x86_64/bits/limits.h
@@ -1,6 +1,6 @@
 #if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
  || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
-#define PAGE_SIZE 4096
+#define PAGE_SIZE 4096UL
 #define LONG_BIT 64
 #endif
 
 Try removing the UL suffix from the constant in the compiler explorer example above and see the change in codegen.

> On 26/11/2017, at 9:52 AM, David Guillen Fandos <david@...idgf.es> wrote:
> 
> Hey there,
> 
> Just noticed that my binary was getting some gcc functions for integer division in some places coming from musl. I checked and it seems that, even though musl assumes PAGE_SIZE is always power of two, that we divide by it instead of using shifts for that. This results in extra overhead and slow division on platforms that do not have a 64 bit divider (even the ones that do have 32 bit divider).
> 
> So I propose a patch here, let me know what you people think about.
> 
> David
> 
> 
> diff --git a/src/conf/sysconf.c b/src/conf/sysconf.c
> index b8b761d0..aa9fc9d1 100644
> --- a/src/conf/sysconf.c
> +++ b/src/conf/sysconf.c
> @@ -4,6 +4,7 @@ long sysconf(int name)
> #include <sys/sysinfo.h>
> #include "syscall.h"
> #include "libc.h"
> +#include "atomic.h"
> 
> #define JT(x) (-256|(x))
> #define VER JT(1)
> @@ -206,7 +206,7 @@ long sysconf(int name)
> 		if (name==_SC_PHYS_PAGES) mem = si.totalram;
> 		else mem = si.freeram + si.bufferram;
> 		mem *= si.mem_unit;
> -		mem /= PAGE_SIZE;
> +		mem >>= (unsigned)(a_ctz_l(PAGE_SIZE));
> 		return (mem > LONG_MAX) ? LONG_MAX : mem;
> 		case JT_ZERO & 255:
> 		return 0;

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.