Date: Sun, 18 Sep 2016 22:40:36 +0200 From: Szabolcs Nagy <nsz@...t70.net> To: Georg Sauthoff <mail@...rg.so> Cc: musl@...ts.openwall.com Subject: Re: memchr() performance * Georg Sauthoff <mail@...rg.so> [2016-09-18 20:54:22 +0200]: > > In general, musl's memchr() implementation doesn't perform better than a > simple unrolled loop (as used in libstdc++ std::find()) - and that is > consistent over different CPU generations and architectures. > memchr in musl was never updated (same for >5 years) so probably should be and last time the position was "In the particular case of strlen, the naive unrolled strlen with no OOB access is actually optimal on most or all 32-bit archs, better than what we have now. I suspect the same is true for strchr and other related functions." http://www.openwall.com/lists/musl/2016/01/05/5 but we did not have benchmark numbers at the time.. note that this benchmark does not measure the effect of more branch prediction slots used in the unrolled case. > On recent Intel CPUs it is even slower than a naive implementation: > > https://gms.tf/stdfind-and-memchr-optimizations.html#measurements > https://gms.tf/sparc-and-ppc-find-benchmark-results.html > > Of course, on x86, other implementations that use SIMD instructions > perform even better. > yes simd is expected to be faster. but that needs asm which is expensive to maintain (there is no portable simd language extension for c and there is the aliasing issue: the reinterpret_cast in your code is formally ub). > Best regards > Georg
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.