Date: Fri, 23 Oct 2015 10:48:43 -0400
From: Rich Felker <>
To: Denys Vlasenko <>
Cc: musl <>
Subject: Re: [PATCH] configure: add gcc flags for better link-time

On Fri, Oct 23, 2015 at 04:41:17PM +0200, Denys Vlasenko wrote:
> On Fri, Oct 23, 2015 at 3:12 PM, Szabolcs Nagy <> wrote:
> >> +# When linker merges sections, a tiny section (such as one resulting
> >> +# from "static char flag_var") with no alignment restrictions
> >> +# can end up logded between two more strongly aligned ones (say,
> >> +# "static int global_cnt1/2", both of which want 32-bit alignment).
> >> +# Then this byte-sized "flag_var" gets 3 bytes of padding.
> >> +#
> >> +# With section sorting by alignment, one-byte flag variables have
> >> +# higher chance of being grouped together and not require padding.
> >> +# (It can be made even better. Linker is too dumb.
> >> +# ld needs to grow -Wl,--pack-sections-optimally)
> >> +#
> >> +# For us, this affects the size of only one file:
> >> +#
> >> +tryldflag LDFLAGS_AUTO -Wl,--sort-section=alignment
> >> +tryldflag LDFLAGS_AUTO -Wl,--sort-common
> >
> > i think this came up before
> >
> >
> > it was also noted at some point that the optimal sorting
> > is 'sort by use' so all the unused legacy functions end
> > up on the same page so they never need to be loaded.
> Sure, but that would be quite hard to do.
> How would you reliably know who uses which part of libc
> code?
> OTOH, we don't _need_ to kill ourselves trying to optimize
> that. Optimizing code size is not the big thing here.
> Even though data and bss shrinkage is smaller,
> it is more important.

I agree, data is a lot more important than code here.

> Minimizing the number of data pages is more important
> than text pages. A text page is shared among all processes linked
> to this; data page is allocated in every process
> (as soon as even one byte in this page is written to.
> With only 4 pages in total like in this example, I'm pretty sure
> all of them get dirtied by libc init, use of stdio or malloc).
> Make libc (.data + .bss) fit into one page less and you get about
> as many pages saved as you have processes running.

FYI all the data/bss in libc except a few large objects _easily_ fits
in a single page. Unfortunately a couple of those large ones (malloc
state & stdio buffers) are used by the majority of programs. I'm still
not sure of the best way to achieve a particular sorting without awful
hacks. Sort by alignment may be a decent approximation of best
behavior but I need to check it out.

Thanks for working on this topic.


