Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 29 Nov 2017 15:36:41 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] Wasm support patch 3 (the actual arch/wasm)

On Tue, Nov 28, 2017 at 12:39:03PM +0000, Nicholas Wilson wrote:
> Hi,
> 
> Here's the patch with the actual Wasm arch implementation!
> 
> It's got a certain amount of boilerplate, mainly in "arch/wasm/bits"
> where certain structures have to be defined per-architecture. I just
> copied x32 where possible, since Wasm a new arch and we have free
> choice of how to define things, and x32 is another modern arch
> that's like i386 but with some legacy cruft removed.
> 
> As you can see, I'm using the "static syscalls" approach. There's a
> README explaining the approach in "src/internal/wasm", where I
> explain that most syscalls are to be provided by the embedding
> environment, but some syscalls like brk can actually be implemented
> using native Wasm intrinsics, which I've done in
> "src/internal/wasm".
> 
> The result of these three patches (this and the last two emails), is
> a version of Musl that builds using Clang's Wasm support, and
> creates executable Wasm modules that can use malloc, do I/O, and all
> sorts of other things. I haven't tested the full range of syscalls
> yet, but the ones I have tried work well.
> 
> Ready for feedback!

See inline below:

> diff --git a/arch/wasm/atomic_arch.h b/arch/wasm/atomic_arch.h
> new file mode 100644
> index 00000000..9da04059
> --- /dev/null
> +++ b/arch/wasm/atomic_arch.h
> @@ -0,0 +1,74 @@
> +#include <stdint.h>
> +
> +#define a_ctz_l a_ctz_l
> +static inline int a_ctz_l(unsigned long x)
> +{
> +  return __builtin_ctzl(x);
> +}
> +#define a_ctz_64 a_ctz_64
> +static inline int a_ctz_64(uint64_t x)
> +{
> +  return __builtin_ctzll(x);
> +}
> +

Unsure about whether this is good; you can just omit them and the
higher-level header provides default definitions. I think clang even
optimizes them to a single logical insn.

Also note that coding style is to indent with tabs not spaces.

> +#define a_and_64 a_and_64
> +static inline void a_and_64(volatile uint64_t *p, uint64_t v)
> +{
> +  // TODO use a WebAssembly CAS builtin, when those arrive with the threads feature
> +  //__atomic_fetch_and(p, v, __ATOMIC_SEQ_CST);
> +  *p &= v;
> +}

These need to be non-dummy at some point but we can discuss that
later. However you shouldn't define dummy versions of all of them like
this. Either just define dummy a_ll and a_sc, or dummy a_cas. Let the
higher level framework take care of all the rest unless you really
have a working, better-optimized version of them like x86 does.

> +#define a_crash a_crash
> +static inline void a_crash()
> +{
> +  // This generates the Wasm "unreachable" instruction which traps when reached
> +  __builtin_unreachable();
> +}

Shouldn't it be __builtin_trap()? __builtin_unreachable allows the
compiler to assume the code path is not reachable, which is exactly
the opposite of what we want.

> diff --git a/arch/wasm/bits/alltypes.h.in b/arch/wasm/bits/alltypes.h.in
> new file mode 100644
> index 00000000..ca4e94a2
> --- /dev/null
> +++ b/arch/wasm/bits/alltypes.h.in
> @@ -0,0 +1,33 @@
> +#ifdef __wasm32__
> +#define _Addr int
> +#define _Int64 long long
> +#define _Reg int
> +
> +#elif defined __wasm64__
> +#define _Addr long
> +#define _Int64 long long
> +#define _Reg long
> +
> +#endif

Generally we don't do 32/64-bit archs together as one arch with
#ifdefs but as separate ones. There are a number of additional things
that have to be different, like...

> +TYPEDEF __builtin_va_list va_list;
> +TYPEDEF __builtin_va_list __isoc_va_list;
> +
> +#ifndef __cplusplus
> +TYPEDEF __WCHAR_TYPE__ wchar_t;
> +#endif
> +TYPEDEF __WINT_TYPE__ wint_t;
> +
> +TYPEDEF float float_t;
> +TYPEDEF double double_t;
> +
> +TYPEDEF long time_t;
> +TYPEDEF long suseconds_t;
> +
> +TYPEDEF struct { union { int __i[9]; volatile int __vi[9]; unsigned __s[9]; } __u; } pthread_attr_t;
> +TYPEDEF struct { union { int __i[6]; volatile int __vi[6]; volatile void *volatile __p[6]; } __u; } pthread_mutex_t;
> +TYPEDEF struct { union { int __i[6]; volatile int __vi[6]; volatile void *volatile __p[6]; } __u; } mtx_t;
> +TYPEDEF struct { union { int __i[12]; volatile int __vi[12]; void *__p[12]; } __u; } pthread_cond_t;
> +TYPEDEF struct { union { int __i[12]; volatile int __vi[12]; void *__p[12]; } __u; } cnd_t;
> +TYPEDEF struct { union { int __i[8]; volatile int __vi[8]; void *__p[8]; } __u; } pthread_rwlock_t;
> +TYPEDEF struct { union { int __i[5]; volatile int __vi[5]; void *__p[5]; } __u; } pthread_barrier_t;

..despite these being in the arch headers, musl policy actually
requires all 32-bit archs and all 64-bit archs to have the same
pthread type sizes/definitions. Using the 32-bit definitions on a
64-bit arch will not work because of how pthread_impl.h lays out the
actual usage of the slots; even if it did for some of them it wouldn't
be future-proof.

> diff --git a/arch/wasm/bits/float.h b/arch/wasm/bits/float.h
> new file mode 100644
> index 00000000..9a56ad14
> --- /dev/null
> +++ b/arch/wasm/bits/float.h
> @@ -0,0 +1,16 @@
> +#define FLT_EVAL_METHOD __FLT_EVAL_METHOD__
> +
> +#define LDBL_TRUE_MIN __LDBL_DENORM_MIN__
> +#define LDBL_MIN __LDBL_MIN__
> +#define LDBL_MAX __LDBL_MAX__
> +#define LDBL_EPSILON __LDBL_EPSILON__
> +
> +#define LDBL_MANT_DIG __LDBL_MANT_DIG__
> +#define LDBL_MIN_EXP __LDBL_MIN_EXP__
> +#define LDBL_MAX_EXP __LDBL_MAX_EXP__
> +
> +#define LDBL_DIG __LDBL_DIG__
> +#define LDBL_MIN_10_EXP __LDBL_MIN_10_EXP__
> +#define LDBL_MAX_10_EXP __LDBL_MAX_10_EXP__
> +
> +#define DECIMAL_DIG __DECIMAL_DIG__

These should be defined explicitly to the correct values so that there
are not silently varying arch parameters resulting in undocumentedly
incompatible ABIs. I would assume they're the same as double, no?

> diff --git a/arch/wasm/bits/limits.h b/arch/wasm/bits/limits.h
> new file mode 100644
> index 00000000..649d7d74
> --- /dev/null
> +++ b/arch/wasm/bits/limits.h
> @@ -0,0 +1,9 @@
> +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
> + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +// The WebAssembly fixed page size is 64KiB
> +#define PAGE_SIZE 65536
> +#define LONG_BIT (__SIZEOF_LONG__*8)
> +#endif
> +
> +#define LONG_MAX  __LONG_MAX__
> +#define LLONG_MAX __LONG_LONG_MAX__
> diff --git a/arch/wasm/bits/posix.h b/arch/wasm/bits/posix.h
> new file mode 100644
> index 00000000..c37b94c1
> --- /dev/null
> +++ b/arch/wasm/bits/posix.h
> @@ -0,0 +1,2 @@
> +#define _POSIX_V6_LP64_OFF64  1
> +#define _POSIX_V7_LP64_OFF64  1

These differ by 32/64-bit.

> diff --git a/arch/wasm/bits/setjmp.h b/arch/wasm/bits/setjmp.h
> new file mode 100644
> index 00000000..a9262a64
> --- /dev/null
> +++ b/arch/wasm/bits/setjmp.h
> @@ -0,0 +1 @@
> +typedef unsigned long long __jmp_buf[8];

Are you sure that makes sense? What does the wasm register file look
like? What registers are call-saved?

> diff --git a/arch/wasm/bits/signal.h b/arch/wasm/bits/signal.h
> new file mode 100644
> index 00000000..c8d10a81
> --- /dev/null
> +++ b/arch/wasm/bits/signal.h
> @@ -0,0 +1,77 @@
> +#if defined(_POSIX_SOURCE) || defined(_POSIX_C_SOURCE) \
> + || defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +
> +#if defined(_XOPEN_SOURCE) || defined(_GNU_SOURCE) || defined(_BSD_SOURCE)
> +#define MINSIGSTKSZ 2048
> +#define SIGSTKSZ 8192
> +#endif
> +
> +// I'm not expecting any of this to be actually usable... these definitions are
> +// just the bare minimum so that src/signal/*.c compiles.
> +
> +typedef struct {
> +	unsigned long long __ip_dummy;
> +} mcontext_t;

Likewise here.

> +struct sigaltstack {
> +	void *ss_sp;
> +	int ss_flags;
> +	size_t ss_size;
> +};
> +
> +typedef struct __ucontext {
> +	unsigned long uc_flags;
> +	struct __ucontext *uc_link;
> +//	stack_t uc_stack;
> +	mcontext_t uc_mcontext;
> +	sigset_t uc_sigmask;
> +//	unsigned long long __fpregs_mem[64];
> +} ucontext_t;

Why is uc_stack omitted?

> diff --git a/arch/wasm/bits/syscall.h.in b/arch/wasm/bits/syscall.h.in
> new file mode 100644
> index 00000000..00c31889
> --- /dev/null
> +++ b/arch/wasm/bits/syscall.h.in
> @@ -0,0 +1,321 @@
> +// For Wasm, we don't use syscall numbers!  We statically link in only the
> +// syscalls which are invoked.
> +
> +#define SYS_accept4 syscall_accept4

They probably need casts to an arithmetic type and need to be in a
reserved namespace (like __syscall_accept4, etc.).

> diff --git a/arch/wasm/crt_arch.h b/arch/wasm/crt_arch.h
> new file mode 100644
> index 00000000..e69de29b

If this is empty, how does program entry point get created? How does
__libc_start_main get called?

> diff --git a/arch/wasm/pthread_arch.h b/arch/wasm/pthread_arch.h
> new file mode 100644
> index 00000000..75aac668
> --- /dev/null
> +++ b/arch/wasm/pthread_arch.h
> @@ -0,0 +1,5 @@
> +static inline struct pthread *__pthread_self(void) { return pthread_self(); }

This is a circular definition. You need code to load the thread
pointer from whatever part of the register file it's stored in.

> diff --git a/arch/wasm/reloc.h b/arch/wasm/reloc.h
> new file mode 100644
> index 00000000..eca58d9b
> --- /dev/null
> +++ b/arch/wasm/reloc.h
> @@ -0,0 +1 @@
> +#define CRTJMP(pc,sp) __builtin_unreachable()
> \ No newline at end of file

Needs newline. This is probably not actually important without dynamic
linking.

> diff --git a/arch/wasm/syscall_arch.h b/arch/wasm/syscall_arch.h
> new file mode 100644
> index 00000000..41baf881
> --- /dev/null
> +++ b/arch/wasm/syscall_arch.h
> @@ -0,0 +1,333 @@
> +// For both Wasm32 and Wasm64, we assume that the host environment can't
> +// provide 64-bit types, and split 64-bit values into two arguments.  A Wasm
> +// interpreter *could* provide i64 support, but in practice web browsers aren't
> +// doing that right now, and any i64 we pass out to a syscall will get chopped
> +// to 58-bit precision.
> +#define __SYSCALL_LL_E(x) \
> +((union { long long ll; long l[2]; }){ .ll = x }).l[0], \
> +((union { long long ll; long l[2]; }){ .ll = x }).l[1]
> +#define __SYSCALL_LL_O(x) 0, __SYSCALL_LL_E((x))
> +
> +long __syscall_accept4(long arg1, ...);

...

> +### Details
> +
> +* Due to the type-safe nature of Wasm linkage, syscalls cannot actually be
> +  variadic if defined externally.  Any syscalls called with a variable argument
> +  count, and not provided here by Musl, could be fixed on a case-by-case basis
> +  as needed.
> \ No newline at end of file

If variadic is a problem you could make them just always take 6 fixed
long args and make the syscall_arch.h machinery just pass dummy zeros
for the unused slots.

> diff --git a/src/internal/wasm/syscall_brk.c b/src/internal/wasm/syscall_brk.c
> new file mode 100644
> index 00000000..28c5fc23
> --- /dev/null
> +++ b/src/internal/wasm/syscall_brk.c
> @@ -0,0 +1,17 @@
> +#include <limits.h>
> +#include "syscall.h"
> +
> +long __syscall_brk(long arg1, ...)
> +{
> +  unsigned long newbrk = (unsigned long)arg1;
> +  unsigned long pages = __builtin_wasm_current_memory();
> +  if (newbrk % PAGE_SIZE)
> +    goto end;
> +  unsigned long new_pages = newbrk / PAGE_SIZE;
> +  if (new_pages <= pages || new_pages >= (0xffffffffu / PAGE_SIZE))
> +    goto end;
> +  if (__builtin_wasm_grow_memory(new_pages - pages) != (unsigned long)-1)
> +    pages = new_pages;
> +  end:
> +  return pages * PAGE_SIZE;
> +}

As noted before I think it makes sense to just drop brk. It's not
needed.

> diff --git a/src/internal/wasm/syscall_futex.c b/src/internal/wasm/syscall_futex.c
> new file mode 100644
> index 00000000..9c14fa0b
> --- /dev/null
> +++ b/src/internal/wasm/syscall_futex.c
> @@ -0,0 +1,45 @@
> +#include <futex.h>
> +#include <stdarg.h>
> +#include <errno.h>
> +#include "syscall.h"
> +
> +// Wasm doesn't *yet* have futex(), but it's being planned as part of the
> +// threaded-Wasm support in Spring 2018.
> +//
> +// For now, Wasm is single-threaded and we simply assert that the lock is not
> +// held, and abort if a wait would be required (assume it's a corrupted lock).

I think this is probably a bogus assumption; you can't assume anything
about how futex is being used to implement locks. Note that a
perfectly valid implementation of futex is just one that always
returns success; it will simply result in spinning at 100% cpu load
waiting for the event to happen rather than going to sleep.

This was all a pretty quick, high-level review, but I hope it's
helpful.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.