Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 15 Dec 2019 13:51:25 -0500
From: Rich Felker <>
Subject: Re: max_align_t mess on i386

On Sun, Dec 15, 2019 at 07:23:14PM +0100, Joakim Sindholt wrote:
> On Sun, Dec 15, 2019 at 01:06:29PM -0500, Jeffrey Walton wrote:
> > On Sat, Dec 14, 2019 at 10:19 AM Rich Felker <> wrote:
> > >
> > > In reserching how much memory could be saved, and how practical it
> > > would be, for the new malloc to align only to 8-byte boundaries
> > > instead of 16-byte on archs where alignof(max_align_t) is 8 (pretty
> > > much all 32-bit archs), I discovered that GCC quietly changed its
> > > idead of i386 max_align_t to 16-byte alignment in GCC 7, to better
> > > accommodate the new _Float128 access via SSE. Presumably (I haven't
> > > checked) the change is reflected with changes in the psABI document to
> > > make it "official".
> > 
> > Be careful with policy changes like this. The malloc (3) man page says:
> > 
> >     The malloc() and calloc() functions return a pointer to the
> >     allocated memory that is suitably aligned for any kind of variable.
> Your man pages are not the standard, but the standard does have this to
> say:
> > The pointer returned if the allocation succeeds shall be suitably
> > aligned so that it may be assigned to a pointer to any type of object
> > and then used to access such an object in the space allocated (until the
> > space is explicitly freed or reallocated).
> To me this sounds like my next suggestion is technically disallowed.
> > I expect to be able to use a pointer returned by malloc (and friends)
> > in MMX, SSE and AVX functions.
> I might agree, but would it not be feasible to have the alignment of the
> returned pointer be dependent on the size of the allocation? That way,
> if you allocate <16 bytes you can get 8 byte alignment. You might even
> be able to go all the way down to 4 byte alignment for <8 byte
> allocations.

This is a nice idea and the bump allocator (simple_malloc) in musl for
static-linked programs that don't use free does pretty much exactly
that. With a nontrivial allocator it gets more complicated though, and
I don't think there's any way to take advantage of this with the new

For example, in the new allocator with 4-byte inband slot headers,
16-byte slots don't need 16-byte alignment because the largest object
they can hold is 12 bytes, and the largest alignment such an object
can need is 8-byte. However, since they're spaced 16 bytes apart,
there's no advantage to being able to misalign them mod 16; as long as
the first one in a run is aligned, all of them are.

The same would apply if we had 8-byte slots, but those are mostly
uninteresting with 4 bytes taken for headers.

Taking advantage of it with dlmalloc-type designs that don't involve
evenly-spaced slots is perhaps more practical, but can lead to messy
split/merge since the small underaligned chunks aren't starting on
valid boundaries to merge with adjacent free chunks. I think they'll
tend to eventually get tied up as unusable space at the bottom of
adjacent chunks, unnecessarily limiting the size of the allocations
just below them.

> It might violate the standard technically speaking, but I don't know of
> any examples of types smaller than 16 bytes that require 16 byte
> alignment.

It doesn't since no object can have size smaller than its alignment.
(As long as pointer types aren't lossy; if some pointer types lost low
bits, then it would be non-conforming.)


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.