Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 16 Dec 2019 12:49:50 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: max_align_t mess on i386

On Mon, Dec 16, 2019 at 11:36:42AM -0500, Jeffrey Walton wrote:
> On Mon, Dec 16, 2019 at 10:56 AM Rich Felker <dalias@...c.org> wrote:
> >
> > On Mon, Dec 16, 2019 at 10:30:30AM -0500, Jeffrey Walton wrote:
> > > On Sun, Dec 15, 2019 at 1:22 PM Rich Felker <dalias@...c.org> wrote:
> > > >
> > > > On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> > > > > On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <dalias@...c.org> wrote:
> > > > > >
> > > > > > In reserching how much memory could be saved, and how practical it
> > > > > > would be, for the new malloc to align only to 8-byte boundaries
> > > > > > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > > > > > much all 32-bit archs), I discovered that GCC quietly changed its
> > > > > > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > > > > > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > > > > > checked) the change is reflected with changes in the psABI document to
> > > > > > make it "official".
> > > > >
> > > > > Be careful with policy changes like this. The malloc (3) man page says:
> > > >
> > > > Generally, you should look to the C11 or POSIX (man 3p) specifications
> > > > for the functions rather than the "man 3" ones, but here it's pretty
> > > > close to the same, just imprecisely worded:
> > > >
> > > > >     The malloc() and calloc() functions return a pointer to the
> > > > >     allocated memory that is suitably aligned for any kind of variable.
> > > > >
> > > > > I expect to be able to use a pointer returned by malloc (and friends)
> > > > > in MMX, SSE and AVX functions.
> > > >
> > > > "Any kind of variable" isn't "any kind of load/store instruction". For
> > > > example you most certainly will not get 32- or 64-byte alignment that
> > > > you may want for AVX-256 or AVX-512 without memalign.
> > >
> > > GCC tells us the largest alignment that we can expect:
> > >
> > >     $ gcc -dM -E - </dev/null | grep -i align
> > >     #define __BIGGEST_ALIGNMENT__ 16
> > >
> > > Because __BIGGEST_ALIGNMENT__ is 16, I don't expect to get 32-byte or
> > > 64-byte aligned buffers.
> >
> > I wasn't aware of this gcc feature. Do you know if it's documented and
> > what it's derived from? It seems to match what max_align_t is expected
> > to be, including on i386 (16) and powerpc (16) and indeed it's only 4
> > on a few 32-bit archs and even 2 on m68k.
> 
> I believe it is documented at
> https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html.
> 
> The linker problem discussed in the same area has bitten me several
> times in the past. It usually arises on 32-bit systems. But PowerPC
> also got me when using AIX.
> 
> > > > A max_align_t
> > > > (and corresponding malloc alignment constraint) that heavily aligned
> > > > would be awful to use, with memory waste possibly exceeding 1000% and
> > > > over 500% likely for real-world data structures. Over-alignment also
> > > > weakens hardening properties by making pointers more predictable.
> > >
> > > It sounds like you are moving the fragmentation problem from the
> > > runtime library to the application. (When fragmentation is a problem).
> >
> > I don't understand what you mean.
> 
> When we can't get properly aligned buffers in userland, then we
> (userland) have to over-commit in our allocators and play the pointer
> games. For example, if I can only get 8-byte aligned pointers, then I
> always have to allocate n+16 bytes, move the pointer 'p' to the right
> for a 16 byte alignment, and store the offset at p-1 so I can delete
> the base pointer on delete/free.

You absolutely should never do this. Pretty much all historical
unix-like systems had (and still have) memalign, POSIX has
posix_memalign with an awkward and error-prone signature (but it's
easy enough to wrap), and C11+ has aligned_alloc. This "over-allocate
and adjust such that it's impossible to just call free" idiom is
something people did on Windows because Windows...

> Those kind of pointer games are usually played out in the runtime
> library. I can only says "usually" and not always because we have to
> do them on AIX and GNU Hurd (among others).

I don't understand your use of "userland" and "in the runtime
library". The only non-userland allocation is at page granularity (4k
or larger). If you mean at the application level (outside libc), this
is not something you need to do, at all.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.