musl - Re: Request for volunteers

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130702021937.GT15323@port70.net>
Date: Tue, 2 Jul 2013 04:19:37 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Subject: Re: Request for volunteers

* Rich Felker <dalias@...ifal.cx> [2013-07-01 19:59:55 -0400]:
> On Mon, Jul 01, 2013 at 10:58:57PM +0200, Szabolcs Nagy wrote:
> > - the failure of one test should not interfere with other tests
> > (so tests should be in separate .c files each with main() and
> > narrow scope, otherwise a build failure can affect a lot of tests)
> 
> How do you delineate what constitutes a single test? For example, we
> have hundreds of test cases for scanf, and it seems silly for each
> input/format combination to be a separate .c file. On the other hand,
> my current scanf tests automatically test both byte and wide versions
> of both string and stdio versions of scanf; it may be desirable in
> principle to separate these into 4 separate files.
> 
> My naive feeling would be that deciding "how much can go in one test"
> is not a simple rule we can follow, but requires considering what's
> being tested, how "low-level" it is, and whether the expected failures
> might interfere with other tests. For instance a test that's looking
> for out-of-bounds accesses would not be a candidate for doing a lot in
> a single test file, but a test that's merely looking for correct
> parsing could possibly get away with testing lots of assertions in a
> single file.

yes the boundary is not clear, but eg the current pthread
test does too many kinds of things in one file

if the 'hundreds of test cases' can be represented as
a simple array of test vectors then that should go into
one file

if many functions want to use the same test vectors then
at some point it's worth moving the vectors out to a
header file and write separate tests for the different
functions

> > some difficulties:
> > - some "test framework" functionality would be nice, but can be
> > problematic: eg using nice error reporting function on top of stdio
> > may cause loss of info because of buffering in case of a crash
> 
> I think any fancy "framework" stuff could be purely in the controlling
> and reporting layer, outside the address space of the actual tests. We
> may however need a good way for the test to communicate its results to
> the framework...
> 

the simple approach is to make each test a standalone process that
exits with 0 on success

in the failure case it can use dprintf to print error messages to
stdout and the test system collects the exit status and the messages

> > - special compiler or linker flags (can be maintained in makefile
> > or in the .c files as comments)
> 
> One thing that comes to mind where tests may need a lot of "build
> system" help is testing the dynamic linker.

yes

and we need to compile with -lpthread -lm -lrt -l...
if the tests should work on other libcs

my current solution is using wildcard rules for building
*_dso.c into .so and *.c into executables and then
add extra rules and target specific make variables:

foo: LDFLAGS+=-ldl -rdynamic
foo: foo_dso.so

the other solution i've seen is to put all the build commands
into the .c file as comments:

//RUN cc -c -o $name.o $name.c
//RUN cc -o $name $name.o
...

and use simple shell scripts as the build system
(dependencies are harder to track this way, but the tests
are more self-contained)

> > - tests may require special environment, filesystem access, etc
> > i'm not sure what's the best way to manage that
> > (and some tests may need two different uid or other capabilities)
> 
> If possible, I think we should make such tests use Linux containers,
> so that they can be tested without elevated privileges. I'm not
> experienced with containers, but my feeling is that this is the only
> reasonable way to get a controlled environment for tests that need
> that sort of thing, without having the user/admin do a lot of sketchy
> stuff to prepare for a test.
> 
> Fortunately these are probably low-priority and could be deferred
> until later.

ok, skip these for now

> > - i looked at the bug history and many bugs are in hard to
> > trigger cornercases (eg various races) or internally invoke ub
> > in a way that may be hard to verify in a robust way
> 
> Test cases for race conditions make one of the most interesting types
> of test writing. :-) The main key is that you need to have around a
> copy of the buggy version to test against. Such tests would not have
> FAILED or PASSED as possible results, but rather FAILED, or FAILED TO
> FAIL. :-)

hm we can introduce a third result for tests that try to trigger
some bug but are not guaranteed to do so
(eg failed,passed,inconclusive)
but probably that's more confusing than useful

> > - some tests may need significant support code to achieve good
> > coverage (printf, math, string handling close to 2G,..)
> > (in such cases we can go with simple self-contained tests without
> > much coverage, but easy maintainance, or with something
> > sophisticated)
> 
> I don't follow.

i mean for many small functions there is not much difference between
a simple sanity check and full coverage (eg basename can be thoroughly
tested by about 10 input-output pairs)

but there can be a huge difference: eg detailed testing of getaddrinfo
requires non-trivial setup with dns server etc, it's much easier to do
some sanity checks like gnulib would do, or a different example is
rand: a real test would be like the diehard test suit while the sanity
check is trivial

so i'm not sure how much engineering should go into the tests:
go for a small maintainable set that touch as many areas in libc
as possible, or go for extensive coverage and develop various tools
and libs that help setting up the environment or generate large set
of test cases (eg my current math tests are closer to this later one)

if the goal is to execute the test-suit as a post-commit hook
then there should be a reasonable limit on resource usage, build and
execution time etc and this limit affects how the code may be
organized, how errors are reported..
(most test systems i've seen are for simple unit tests: they allow
checking a few constraints and then report errors in a nice way,
however in case of libc i'd assume that you want to enumerate the
weird corner-cases to find bugs more effectively)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.