musl - Re: memchr() performance

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160918204036.GZ1280@port70.net>
Date: Sun, 18 Sep 2016 22:40:36 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: Georg Sauthoff <mail@...rg.so>
Cc: musl@...ts.openwall.com
Subject: Re: memchr() performance

* Georg Sauthoff <mail@...rg.so> [2016-09-18 20:54:22 +0200]:
> 
> In general, musl's memchr() implementation doesn't perform better than a
> simple unrolled loop (as used in libstdc++ std::find()) - and that is
> consistent over different CPU generations and architectures.
> 

memchr in musl was never updated (same for >5 years) so probably
should be and last time the position was

"In the particular case of strlen, the naive unrolled strlen with no
OOB access is actually optimal on most or all 32-bit archs, better
than what we have now. I suspect the same is true for strchr and other
related functions."
http://www.openwall.com/lists/musl/2016/01/05/5

but we did not have benchmark numbers at the time.. note that
this benchmark does not measure the effect of more branch
prediction slots used in the unrolled case.

> On recent Intel CPUs it is even slower than a naive implementation:
> 
> https://gms.tf/stdfind-and-memchr-optimizations.html#measurements
> https://gms.tf/sparc-and-ppc-find-benchmark-results.html
> 
> Of course, on x86, other implementations that use SIMD instructions
> perform even better.
> 

yes simd is expected to be faster.

but that needs asm which is expensive to maintain (there is no
portable simd language extension for c and there is the aliasing
issue: the reinterpret_cast in your code is formally ub).

> Best regards
> Georg

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.