musl - Re: memchr() performance

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160918204030.GC15995@brightrain.aerifal.cx>
Date: Sun, 18 Sep 2016 16:40:30 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: memchr() performance

On Sun, Sep 18, 2016 at 08:54:22PM +0200, Georg Sauthoff wrote:
> (please CC me as I am not subscribed to this ML)
> 
> Hello,
> 
> fyi, I've done some benchmarking of different memchr() and std::find()
> versions.
> 
> I also included the memchr() version from musl.
> 
> In general, musl's memchr() implementation doesn't perform better than a
> simple unrolled loop (as used in libstdc++ std::find()) - and that is
> consistent over different CPU generations and architectures.
> 
> On recent Intel CPUs it is even slower than a naive implementation:

Are you assuming vectorization of the naive version by the compiler? I
think it's reasonable to assume that on x86_64 but not on 32-bit since
many users build for a baseline ISA that does not have vector ops
(i486 or i586).

> https://gms.tf/stdfind-and-memchr-optimizations.html#measurements
> https://gms.tf/sparc-and-ppc-find-benchmark-results.html
> 
> Of course, on x86, other implementations that use SIMD instructions
> perform even better.

I'm aware that musl's memchr (and more generally the related functions
like strchr, strlen, etc.) are not performing great, but it's not
clear to me what the right solution is, since the different approaches
vary A LOT in terms of how they compare with each other depending on
the exact cpu model and compiler. Improving this situation is probably
a big project.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.