Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 19 Feb 2019 21:43:13 -0500
From: Rich Felker <>
Subject: Re: Stdio resource usage

On Tue, Feb 19, 2019 at 03:34:52PM -0800, Nick Bray wrote:
> Other that compiler warnings, the main pain point I ran into porting a
> subset of Musl into a resource constrained environment was the resource
> usage of stdio.

For what it's worth, I think this is better described as "printf" than
"stdio". The rest of stdio is utterly tiny.

> I don't expect any of these modifications to make it
> upstream.  Talking out loud as a FYI / user feedback.  Also curious to see
> if there's any wisdom out there.
> Stack usage of stdio was an issue.  On arm64, printf takes 8k of stack
> which is a rough when you only have 4-12k of stack.  This is because fmt_fp
> allocates stack space proportional O(log(MAX_LONG_DOUBLE)).  It also gets
> inlined into printf so you always take the hit.  (noinline fmt_fp is a

This is a known compiler flaw, hoisting large stack allocations, and
one I've complained a lot about but with little luck. It might be
possible to work around it by making the array a VLA, whose size is 1
or the proper size depending on some condition the compiler can't
easily see, but that's rather awful. It might be worth doing though,
given the lack of progress fixing the bug.

> Faustian bargain that makes stack usage worse in the worst case... hmmm.)
> On arm64, long double is defined as 128 bits, which not only increases
> stack size because of the larger mantisa, but also pulls in software
> emulation for fp128.  In terms of spec compliance, Musl is doing the right
> thing.  But as a practical matter, none of the programs I care about will
> ever use long double.  So my rough first pass was to reduce the max float
> size from long double to double.  In a later pass, I'll also add a knob to
> remove floating point formatting entirely.

It's kinda unfortunate that aarch64 defined long double as IEEE quad
without hardware implementation of it, but it's probably the right
future-facing choice. I was under the impression that aarch64 was
intended mostly for "large" systems, and that you'd use 32-bit arm
(with much smaller code due to thumb) for tiny space-constrained
systems, though.

> %m calls strerror which pulls in a string table, so removing support for %m
> lets static linking and DCE work its magic.

Yes. Note that %m is needed for a confirming syslog(), which was the
motivation for supporting it in printf.

> I also eliminated %n for
> security hardening reasons.

This actually introduces security bugs by breaking the contract. At
some point I believe there may even have been some parts of musl you
would have broken in dangerous ways, though I'm not sure if that's the
case now. If you have a situation where the format string is
non-constant, that, not %n, is the problem.

> The "states" structure is sparse and takes a little more memory than I'd
> like -  464b of rodata.  I don't see any workarounds without deeper
> changes, so for now I am living with it.

I think you'd have a hard time fitting the code to use a more
space-efficient data structure (e.g. binary search of a sorted
non-sparse table with pairs rather than just outputs) in less than the
size difference.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.