Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 9 Apr 2015 08:50:24 +0200
From: Daniel Cegiełka <daniel.cegielka@...il.com>
To: musl@...ts.openwall.com
Cc: John Mudd <johnbmudd@...il.com>
Subject: Re: musl perf, 20% slower than native build?

2015-04-08 22:59 GMT+02:00 Paul Schutte <sjpschutte@...il.com>:
> Hi Daniel,
>
> Pardon my stupidity, but with what did you replace the memcpy ?

I use memcpy more suited to my CPU. memcpy latency was very important
for me because it had a big impact on the total latency (in my code).
I suppose that most of the problems with latency will have its cause
in musl's memcpy. This is quite a complex topic, because the memcpy's
optimal code depends on how large blocks of memory will be copied.
Sometimes faster will be SSE2 and sometimes AVX2, but heavily
optimized code is not portable (eg AVX2) and this is a problem. Fast
memcpy implementations usualy uses CPUID to choose the right code, but
such code is blown and ugly.

Daniel


> Regards
> Paul
>
> On Wed, Apr 8, 2015 at 9:28 PM, Daniel Cegiełka <daniel.cegielka@...il.com>
> wrote:
>>
>> 2015-04-08 21:10 GMT+02:00 John Mudd <johnbmudd@...il.com>:
>>
>> > Here's output from perf record/report for libc. This looks consistent
>> > with
>> > the 5% longer run time.
>> >
>> > native:
>> >      2.20%   python  libc-2.19.so         [.] __memcpy_ssse3
>>
>> >
>> > musl:
>> >      4.74%   python  libc.so              [.] memcpy
>>
>> I was able to get twice speed-up (in my code) just by replacing memcpy
>> in the musl.
>>
>> Daniel
>
>

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.