|
|
Message-ID: <2296.84.188.252.232.1146177354.squirrel@www.jpberlin.de>
Date: Fri, 28 Apr 2006 00:35:54 +0200 (CEST)
From: sebastian.rother@...erlin.de
To: john-users@...ts.openwall.com
Subject: Re: Performance tuning
> It's not registers which "perform". There are x86/MMX or x86-64/SSE
> instructions which are translated into one or more micro-ops. Some of
> those micro-ops may have latencies of greater than 1 cycle. Both
> micro-op counts and their latencies might differ for micro-ops generated
> for x86/MMX vs. x86-64/SSE. That's the theory - to answer your question
> ("how can it be true").
Interesting :)
> However, I've based my brief analysis primarily on the actual benchmarks
> I had performed. According to those benchmarks, MMX bitwise ops deliver
> better performance per-bit than SSE ones do, despite SSE registers being
> twice wider, on Pentium 3 and on AMD processors - but SSE is actually
> somewhat faster than MMX per-bit on Pentium 4 processors. In other
> words, SSE instructions perform more than twice slower than MMX ones do
> on P3 and AMD, but less than twice slower on P4. Of course, this may
> change with future processors of either or both vendors.
Wouldn`t it be better to benchmark /during compilation) on the CPU itself
wich kind of implementation performs faster?
You may benchmarked it on AMD CPUs but also on AMD64-base CPUs?
The K7 and K8 famaly are not the same and I belive you if you say it does
not perform tht well on an AMD K7 CPU.
Please point out the CPU-Famalies you tested (Also P4 != P4.. wich core?
and maybe also importent: Stepping?).
>> Related to the Co-Processors:
>
> Sebastian, Frank - thank you for the links. I'll have a look a bit
> later and comment in here if appropriate.
No problem ;)
Kind regards,
Sebastian
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.