Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 28 Apr 2006 00:48:49 +0400
From: Solar Designer <>
Subject: Re: Performance tuning


You're so religious (this time as it relates to AMD vs. Intel) that
you've missed my point. ;-)

I wrote, speaking of the possibility to take advantage of the
availability of 16 (and not just 8) SSE registers on x86-64:

> > The extra registers are indeed very helpful, but the
> > slowdown with the move from MMX to SSE on AMD processors is bad enough
> > that the extra registers, if used to reduce the instruction count and/or
> > to avoid dependencies, would barely compensate for it (of course, this
> > is just my guesstimate).
> >
> > Perhaps this is worth doing for EM64T and for future AMD processors.

I expect _no_ significant speedup (if at all) from the use of SSE in
64-bit mode on _current_ AMD64 processors (those I've run benchmarks
on), compared to running MMX code on those processors.  It's roughly
like this: SSE-bitslice-DES is 20% slower than MMX-bitslice-DES on
current AMD processors (both 32- and 64-bit); 16regsSSE-bitslice-DES
might be 20% faster than 8regsSSE-bitslice-DES - so we arrive at the
same performance that we already have with MMX.  Of course, that's
just a guesstimate.

For _current_ EM64T processors, things are different.  On Intel P4
processors, including those with EM64T, SSE-bitslice-DES is already
faster than MMX-bitslice-DES - and 16regsSSE-bitslice-DES might be
faster by another 20%.

Thus, the use of 16 SSE registers is likely beneficial for current EM64T
processors, maybe for future AMD64 processors, but likely not for
current AMD64 processors.

That's my way of thinking.  No religion or politics involved.

(In the above analysis, I considered the trivial conversion from MMX to
SSE only.  It is possible that certain optimizations would make the SSE
code faster on AMD and/or Intel processors.  But this is currently an
unknown for processors of either vendor.)

On Thu, Apr 27, 2006 at 09:49:16PM +0200, wrote:
> For AMD Motherboards there`s a CO-Processor avaiable wich is
> compatible to the AMD-Sockets and wich is more powerfull then a FPGA.
> I don`t know the Company anymore but they produce programmable CPUs wich
> can be assembled at a f.e. dual CPU Mainboard (one AMD-CPU, one
> CO-Processor).
> These CPUs are programmable but they`re NOT limited by the PCI-Bus (like
> FPGA-based Cards via PCI). So you could speed up some stuff a lot using
> those Co-Processors... :)

Please post specific references.


Alexander Peslyak <solar at>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598 - bringing security into open computing environments

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ