Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 25 Dec 2014 12:46:35 +0100
From: magnum <>
Subject: Re: bcrypt BF_X2=3 is not always best

On 2014-12-25 03:14, Solar Designer wrote:
> The 3x interleaving works significantly betterthan 2x for Intel
> x86-64 CPUs without Hyperthreading (such as Core 2 Duo/Quad), but is
> usually of little help or sometimes even hurts speeds on CPUs that
> are capable of running 2 threads/core.

> I don't know how/whether we can reasonably detect which BF_X2 setting is
> best.  Running benchmarks at build- or run-time is unstable or slow,
> given the variance seen under light unrelated load.  And these would have
> to be full OpenMP benchmarks, because relative speeds are different when
> running only 1 thread.

I think we should use a shared cpu_detect() function for x86, so we can
detect HT, XOP/AVX and other things at run time. Another thing that can
differ a lot between different CPU types is what we usually call
OMP_SCALE - for the nt2 format I believe 1M is best on Bull while just
4K is best on core i7. The current selection is the __XOP__ and __AVX__
macros at build time.

Looking at x86.S we already have this function... is it usable as-is?


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.