Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180920012411.GZ17995@brightrain.aerifal.cx>
Date: Wed, 19 Sep 2018 21:24:11 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] new tsearch implementation

On Wed, Sep 19, 2018 at 02:02:47AM +0200, Szabolcs Nagy wrote:
> new code that's a bit faster and smaller.
> 
> with simple benchmark of randomly adding/deleting nodes
> new code is about 1.5s vs old 2s on my laptop (many
> tsearch/tdelete operations on a tree size around 1000
> nodes and depth around 10).

> From 57e62eb255d835ba6801f704c4c1aabd28d34804 Mon Sep 17 00:00:00 2001
> From: Szabolcs Nagy <nsz@...t70.net>
> Date: Sun, 21 Aug 2016 20:06:56 +0000
> Subject: [PATCH] new tsearch implementation
> 
> Rewrote the AVL tree implementation:
> 
> - It is now non-recursive with fixed stack usage (large enough for
> worst case tree height). twalk and tdestroy are still recursive as
> that's smaller/simpler.
> 
> - Moved unrelated interfaces into separate translation units.
> 
> - The node structure is changed to use indexed children instead of
> left/right pointers, this simplifies the balancing logic.
> 
> - Using void * pointers instead of struct node * in various places,
> because this better fits the api (node address is passed in a void**
> argument, so it is tempting to incorrectly cast it to struct node **).
> 
> - As a further performance improvement the rebalancing now stops
> when it is not needed (subtree height is unchanged). Otherwise
> the behaviour should be the same as before (checked over generated
> random inputs that the resulting tree shape is equivalent).
> 
> - Removed the old copyright notice (including prng related one: it
> should be licensed under the same terms as the rest of the project).
> 
> .text size of pic tsearch + tfind + tdelete + twalk:
> 
>    x86_64 i386 aarch64  arm mips powerpc ppc64le  sh4 m68k s390x
> old   941  899    1220 1068 1852    1400    1600 1008 1008  1488
> new   857  881    1040  976 1564    1192    1360  736  820  1408

Nice! Given that you seem to have done heavy testing I didn't read the
code with a lot of scrutiny, but it all looks good from a casual
reading.

> ---
>  COPYRIGHT                |   6 --
>  src/search/tdelete.c     |  49 ++++++++++
>  src/search/tdestroy.c    |  13 +--
>  src/search/tfind.c       |  20 ++++
>  src/search/tsearch.c     |  88 +++++++++++++++++
>  src/search/tsearch.h     |  13 +++
>  src/search/tsearch_avl.c | 204 ---------------------------------------
>  src/search/twalk.c       |  22 +++++
>  8 files changed, 196 insertions(+), 219 deletions(-)
>  create mode 100644 src/search/tdelete.c
>  create mode 100644 src/search/tfind.c
>  create mode 100644 src/search/tsearch.c
>  create mode 100644 src/search/tsearch.h
>  delete mode 100644 src/search/tsearch_avl.c
>  create mode 100644 src/search/twalk.c
> 
> diff --git a/COPYRIGHT b/COPYRIGHT
> index 8c3a6d19..b8a76045 100644
> --- a/COPYRIGHT
> +++ b/COPYRIGHT
> @@ -126,12 +126,6 @@ in jurisdictions that may not recognize the public domain.
>  The smoothsort implementation (src/stdlib/qsort.c) is Copyright © 2011
>  Valentin Ochs and is licensed under an MIT-style license.
>  
> -The BSD PRNG implementation (src/prng/random.c) and XSI search API
> -(src/search/*.c) functions are Copyright © 2011 Szabolcs Nagy and
> -licensed under following terms: "Permission to use, copy, modify,
> -and/or distribute this code for any purpose with or without fee is
> -hereby granted. There is no warranty."
> -
>  The x86_64 port was written by Nicholas J. Kain and is licensed under
>  the standard MIT terms.

This part fails to apply because your MUA reencoded the attachment to
latin1. Not a big problem (piping thru iconv -f latin1 fixed it) but
it's something you might want to be aware of. Ideally mutt should have
a way to tell it not to reencode attachments, ever, but this generally
works to avoid the problem assuming your data is UTF-8:

set send_charset="us-ascii:utf-8"

> diff --git a/src/search/tsearch.c b/src/search/tsearch.c
> new file mode 100644
> index 00000000..54f9f699
> --- /dev/null
> +++ b/src/search/tsearch.c
> @@ -0,0 +1,88 @@
> +#include <stdlib.h>
> +#include <search.h>
> +#include "tsearch.h"
> +
> +static inline int height(struct node *n) { return n ? n->h : 0; }
> +
> +static int rot(void **p, struct node *x, int dir /* deeper side */)
> +{
> +	struct node *y = x->a[dir];
> +	struct node *z = y->a[!dir];
> +	int hx = x->h;
> +	int hz = height(z);
> +	if (hz > height(y->a[dir])) {
> +		//   x
> +		//  / \ dir          z
> +		// A   y            / \
> +		//    / \   -->    x   y
> +		//   z   D        /|   |\
> +		//  / \          A B   C D
> +		// B   C
> +		x->a[dir] = z->a[!dir];
> +		y->a[!dir] = z->a[dir];
> +		z->a[!dir] = x;
> +		z->a[dir] = y;
> +		x->h = hz;
> +		y->h = hz;
> +		z->h = hz+1;
> +	} else {
> +		//   x               y
> +		//  / \             / \
> +		// A   y    -->    x   D
> +		//    / \         / \
> +		//   z   D       A   z

These comments trigger -Wcomment since they have lines ending in \ in
a //-form comment. I think the easiest solution is converting them to
the style of /**/ comment used elsewhere, with *'s lined up in each
line.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.