Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 7 Sep 2015 12:39:10 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: SIMD_PARA_SHA256

On Mon, Sep 07, 2015 at 09:31:27AM +0200, magnum wrote:
> On 2015-09-06 21:59, Solar Designer wrote:
> >We should increase SIMD_PARA_SHA256 from 1 to 2 at least for XOP builds.
> 
> magnum@...l:src [bleeding-jumbo]$ ./testparas.pl
> gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
> This will take a while.
> Initial configure...
> Initial build...
> (...)
> gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
> John the Ripper 1.8.0.6-jumbo-1-999-gcc5ae47 OMP [linux-gnu 64-bit XOP-ac]
> 
> hash\para  |       1  |       2  |       3  |       4  |       5  |
> -----------|----------|----------|----------|----------|----------|
> md4        |   18648  |   34096  |   35136  |   35072  | **38140**|
> md4-omp    |  103072  |**171904**|  162344  |  155776  |  152000  |
> md5        |   13452  |   25320  |   24744  |   25936  | **27560**|
> md5-omp    |   74272  |**125184**|  111936  |  106496  |  103360  |
> sha1       |   10808  | **16752**|   15456  |   14848  |   12780  |
> sha1-omp   | **56928**|   54144  |   47808  |   45056  |   42240  |
> sha256     |    3736  |  **7104**|    6444  |    3968  |    1722  |
> sha256-omp |   20736  | **25984**|   23424  |   17280  |    9254  |
> sha512     |    3044  |  **3212**|    2850  |     982  |     683  |
> sha512-omp | **11264**|   10784  |    9504  |    4992  |    3722  |
> 
> $ ../run/john --list=build-info | grep interleaving
> SIMD: XOP, interleaving: MD4:2 MD5:2 SHA1:1 SHA256:1 SHA512:1
> 
> Perhaps we should do so for SHA-1 too if built without OpenMP (unless 
> that is already the case).

This is not obvious.  For fast hashes, we currently disable OpenMP and
suggest use of --fork, and I expect that --fork will show similar
behavior to OpenMP in terms of which interleaving factors are optimal.
Maybe we should test --fork=8 explicitly and decide based on that.

Unfortunately, we have an instance of the problem Frank is complaining
about for bcrypt.  On many CPUs, the optimal interleaving factors for
these hashes will vary depending on whether we run 1 or 2 threads/core,
and thus they will also vary for HT-capable CPUs vs. not even when the
maximum number of threads or processes is run.  Preferably, we'd have
some generic workaround for that - such as determining HT support at
configure time and using different interleaving factors accordingly, or
including code for both and choosing at runtime.

Luckily, XOP pretty much implies being able to run 2 threads or
processes per module at this time.  But chances are the issue is also
seen for Intel CPUs, which is why I talk about HT above.

> >I didn't test this on other machines (nor without XOP).
> 
> Should we consider bull representative for XOP?

Yes.  We don't currently have another XOP machine to test on, and I
think not much changed since Bulldozer.  Also, I think AMD's
APU-specific microarchitectures don't have XOP, but I might be wrong.

> Do you have some gcc-4.9 for bull already built that we could try too?

Not already built, but I can build it easily.  Why 4.9 and not 5.x?

> I'll do some new AVX and AVX2 tests.

Thanks!

Do you have pre-AVX machines to tune SSE2 and SSE4.1 interleaving on?
I expect similar tradeoffs there, for Core 2 alikes (no HT) vs. early
Core i7 (HT).

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.