musl - Re: Request for volunteers

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130701235954.GD29800@brightrain.aerifal.cx>
Date: Mon, 1 Jul 2013 19:59:55 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: Request for volunteers

On Mon, Jul 01, 2013 at 10:58:57PM +0200, Szabolcs Nagy wrote:
> i was thinking about this and a few category of tests:
> 
> functional:
> 	black box testing of libc interfaces
> 	(eg input-output test vectors)
> 	tries to achieve good coverage
> regression:
> 	tests for bugs we found
> 	sometimes the bugs are musl specific or arch specific so
> 	i think these are worth keeping separately from the general
> 	functional tests (eg same repo but different dir)
> static:
> 	testing without executing code
> 	eg check the symbols and types in headers
> 	(i have tests that cover all posix headers only using cc)
> 	may be the binaries can be checked in some way as well
> static-src:
> 	code analyzis can be done on the source of musl
> 	(eg sparse, cppcheck, clang-analyzer were tried earlier)
> 	(this is different from the other tests and probably should
> 	be set up separately)
> metrics:
> 	benchmarks and quality of implementation metrics
> 	(eg performance, memory usage the usual checks but even
> 	ulp error measurements may be in this category)

One thing we could definitely measure here is "given a program that
just uses interface X, how much crap gets pulled in when static
linking?" I think this could easily be automated to cover tons of
interfaces, but in the interest of signal-to-noise ratio, I think we
would want to manually select interesting interfaces to have it
performed on.

> other tools:
> 	a coverage tool would be useful, i'm not sure if anyone
> 	set up one with musl yet (it's not just good for testing,
> 	but also for determining what interfaces are actually in
> 	use in the libc)

Yes. Coverage from real-world apps can also tell us which interfaces
need tests.

> 	clever fuzzer tools would be nice as well, but i don't know
> 	anything that can be used directly on a library (maybe with
> 	small effort they can be used to get better coverage)

Yes, automatically generating meaningful inputs that meet the
interface contracts is non-trivial.

> as a first step i guess we need to start with the functional
> and regression tests

Agreed, these are the highest priority.

> design goals of the test system:
> - tests should be easy to run even a single test in isolation
> (so test should be self contained if possible)

Agreed. This is particularly important when trying to fix something
that broke a test, or when testing a new port (since it may be hard to
get the port working enough to test anything if failure of one test
prevents seeing the results of others).

> - output is a report, failure cause should be clear

This would be really nice.

> - the system should not have external dependencies
> (other than libc, posix sh, gnu make: so tests are in .c files with
> simple buildsystem or .sh wrapper)

Agreed.

> - the failure of one test should not interfere with other tests
> (so tests should be in separate .c files each with main() and
> narrow scope, otherwise a build failure can affect a lot of tests)

How do you delineate what constitutes a single test? For example, we
have hundreds of test cases for scanf, and it seems silly for each
input/format combination to be a separate .c file. On the other hand,
my current scanf tests automatically test both byte and wide versions
of both string and stdio versions of scanf; it may be desirable in
principle to separate these into 4 separate files.

My naive feeling would be that deciding "how much can go in one test"
is not a simple rule we can follow, but requires considering what's
being tested, how "low-level" it is, and whether the expected failures
might interfere with other tests. For instance a test that's looking
for out-of-bounds accesses would not be a candidate for doing a lot in
a single test file, but a test that's merely looking for correct
parsing could possibly get away with testing lots of assertions in a
single file.

> - the test system should run on all archs
> (so arch specific and implementation defined things should be treated
> carefully)

It should also run on all libcs, I think, with tests for unsupported
functionality possibly failing at build time.

> - the test results should be robust
> (failures are always reported, deterministically if possible)

I would merely add that this is part of the requirement of minimal
dependency. For example, if you have a fancy test framework that uses
stdio and malloc all over the place in the same process as the test,
it's pretty hard to test stdio and malloc robustly...

> - tests should leave the system in clean state
> (or easily cleanable state)

Yes, I think this mainly pertains to temp files, named POSIX IPC or
XSI IPC objects, etc.

> 
> some difficulties:
> - some "test framework" functionality would be nice, but can be
> problematic: eg using nice error reporting function on top of stdio
> may cause loss of info because of buffering in case of a crash

I think any fancy "framework" stuff could be purely in the controlling
and reporting layer, outside the address space of the actual tests. We
may however need a good way for the test to communicate its results to
the framework...

> - special compiler or linker flags (can be maintained in makefile
> or in the .c files as comments)

One thing that comes to mind where tests may need a lot of "build
system" help is testing the dynamic linker.

> - tests may require special environment, filesystem access, etc
> i'm not sure what's the best way to manage that
> (and some tests may need two different uid or other capabilities)

If possible, I think we should make such tests use Linux containers,
so that they can be tested without elevated privileges. I'm not
experienced with containers, but my feeling is that this is the only
reasonable way to get a controlled environment for tests that need
that sort of thing, without having the user/admin do a lot of sketchy
stuff to prepare for a test.

Fortunately these are probably low-priority and could be deferred
until later.

> - i looked at the bug history and many bugs are in hard to
> trigger cornercases (eg various races) or internally invoke ub
> in a way that may be hard to verify in a robust way

Test cases for race conditions make one of the most interesting types
of test writing. :-) The main key is that you need to have around a
copy of the buggy version to test against. Such tests would not have
FAILED or PASSED as possible results, but rather FAILED, or FAILED TO
FAIL. :-)

> - some tests may need significant support code to achieve good
> coverage (printf, math, string handling close to 2G,..)
> (in such cases we can go with simple self-contained tests without
> much coverage, but easy maintainance, or with something
> sophisticated)

I don't follow.

> does that sound right?
> 
> i think i can reorganize my libc tests to be more "scalabe"
> in these directions..

:)

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.