musl - Re: Re: AVL tree: storing balances instead of heights

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151210121449.GC23362@port70.net>
Date: Thu, 10 Dec 2015 13:14:50 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Cc: Ed Schouten <ed@...i.nl>
Subject: Re: Re: AVL tree: storing balances instead of heights

* Ed Schouten <ed@...i.nl> [2015-12-10 10:21:45 +0100]:
> 
> I did some more research and it turns out that AVL trees also allow
> you to apply top-down rebalancing instead of bottom-up. The reason for
> this is that at each level rebalancing never affects the child along
> the path, only its sibling. This approach isn't covered in literature
> extensively, but this blog post describes the process quite well:
> 
> http://neil.brown.name/blog/20041124101820
> http://neil.brown.name/blog/20041124141849
> 
> The downside of such an approach is that the tree needs to be
> traversed forward twice, meaning that a naïve implementation would
> need to perform a larger number of comparisons. This can be mitigated
> by storing the path during the first traversal. As the number of nodes
> of the tree can never exceed the number of bytes in the address space,
> we know that two uintptr_t's provide enough bits to store any path.
> 
> I've done some measurements and such an implementation only seems to
> be 7% slower for insertions and 4% slower for deletions when compared
> to my previous recursive implementation. The advantage is that
> recursion is eliminated. Memory use is constant, regardless of the
> depth of the tree.
> 

eliminating recursion is not that important.

you can turn depth*stackframe into depth bits + 1 stackframe
stack usage, but avl tree is very balanced so depth is small
(e.g. < 60 with 47bit address space)

there are plenty libc apis that use more stack than that, so
i think the optimizations only make sense if the code size
remains small (or if you provide a non-standard api that's
reasonable and use it somewhere).

> I also benchmarked an iterative bottom-up implementation. This version
> was about 25% faster for deletions, but had the downside of requiring
> an array of parent pointers on the stack (~750 bytes).
> 
> As usual, here are links to my source files:
> 
> https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/tsearch.c
> https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/tdelete.c
> https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/search/search_impl.h
> 
> And a link to a spreadsheet with some benchmark results, where I
> measure the amount of time and number of comparisons of 10 million
> insertions and deletions of keys in ascending order:
> 
> https://docs.google.com/spreadsheets/d/1Uy1Kgz9SGf_szH3K4FkyxxioHmBlRu9LsQjDqkBw8v8/edit#gid=0
> 

can you add code size column as well?
(sum of size of x86_64 pic .text+.data+.bss)

i assume freebsd does not balance the tree so it timeouts
(which also shows why this api is not useful in practice)

glibc uses red-black trees which do less balancing operations
so insert/delete may be faster, but lookup is slightly slower.
(but it seems it does more comparisions and that can be slow)

i assume the musl code there already has the tdelete balancing
patch applied and it is compiled with -Os.

performance also depends on the allocator since insert/delete
has to malloc/free (yet another issue with the api) musl's
allocator is i think still better for realtime systems than
the jemalloc used in cloudlibc, but it will have worse average
performance in benchmarks like this.

thanks.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.