Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 11 Feb 2023 17:36:42 +0800 (CST)
From: "David Wang" <00107082@....com>
To: musl@...ts.openwall.com
Subject: Re:Re: Re:Re: qsort





At 2023-02-11 17:22:49, "Markus Wichmann" <nullplan@....net> wrote:

>
>Getting back to this, with the benchmarks recently performed, I notice
>one area where C++ is able to significantly outperform C, and that is in
>the area of generic functions. C++ std::sort is a template function, and
>the compiler can optimize the code for the situation. For example, in
>your case, you only ever need to swap two elements by way of copying
>four bytes, which the compiler can just inline.
>
Yes, I tried -O2 with g++ sort, its performance is much better.






>In C, the same code can only call memcpy(), and a call to cycle() will
>involve many calls to memcpy(), and they cannot be inlined since the
>compiler will not know the copy size at the time it is compiling the
>calls. We end up with tons of calls to memcpy() with a size of four,
>when memcpy() is optimized for large sizes.
>
>Merge sort, as glibc implements it, allows at least a few memcpy() calls
>with larger sizes. Glibc also avoids calling memcpy by duplicating the
>code for common sizes, only resorting to memcpy() for the default case.
>That may be an avenue worth pursuing, at least for cycle().
>


Totally agree with this, lots of memcpy function call overhead could be avoid.




>I'm not quite certain why glibc chose the sizes they chose for
>msort_with_tmp(), but they are branching out for 32-bit integers, 64-bit
>integers, longs (which in a System V ABI are always one of the former
>two) and void* (which in a System V ABI are always the same as a long).
>I think special casing 32-bit and 64-bit integers ought to get us most
>of the way there, since those are the most common sizes that can still
>be copied simply inline.
>
>Also note that glibc further reduces the need for memcpy() by sorting
>indirectly if the size of a single element exceeds 32 bytes. That is of
>course something they can do, since they already are allocating memory
>and have a fall-back strategy in place. Not sure if this is worthwhile
>for musl. At least in the static linking case, it would suddenly pull
>malloc() and free() into executables that previously did not have those.
>
>That last optimization would also not have any bearing on the current
>batch of benchmarks, since in those, a size of four is set in stone.
>
>Ciao,
>Markus

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.