Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 26 Jul 2015 15:40:18 +0200
From: Solar Designer <>
Subject: Re: PHC: Lyra2 vs yescrypt benchmarks 2

On Sun, Jul 26, 2015 at 03:15:43PM +0200, Agnieszka Bielec wrote:
> 2015-07-26 2:31 GMT+02:00 Solar Designer <>:
> > On Sat, Jul 25, 2015 at 10:56:42PM +0200, Agnieszka Bielec wrote:
> >> a@...l:~/m/run$ ./john --test --format=lyra2
> >> Will run 8 OpenMP threads
> >> Benchmarking: Lyra2 [Blake2 AVX2]... (8xOMP) DONE
> >
> > Does this build actually use AVX2?  If so, how much slower is an
> > AVX-only build?
> nope :<, my bad,

Ouch.  I'll need to communicate a correction to the PHC community, then.

> I was thinking that it uses AVX2 becaues Lyra2 uses
> blake2b which has some instructions in SSE4_1

Huh?!  Do you understand how SSE2, SSE4.1, AVX, and AVX2 correspond to
each other and in what ways they differ?  Can you please explain your
understanding to me, so that I see if it's correct or where exactly it
is wrong.  You sound confused pretty badly here.  Perhaps not enough
assembly output reading on your part. ;-)

> #if defined(__SSE4_1__)
> #include "blake2b-load-sse41.h"
> #else
> #include "blake2b-load-sse2.h"
> #endif
> but now I see that these instructions are not coverable by Lyra2
> (because Lyra2 ' blake2b' uses another but similar to blake2b ROUND
> without LOAD_MSG_ ) I don't know if these rounds are the same, looks
> like different things
> round used by Lyra: ROUND_LYRA_SSE in file Sponge_sse.h
> original round: ROUND in file blake2b-round.h

What does any of this have to do with AVX2?

> >> Calculating best global worksize (GWS); max. 1s single kernel invocation.
> >> gws:       256         436 c/s         436 rounds/s 586.434ms per crypt_all()!
> >> gws:       512         832 c/s         832 rounds/s 615.005ms per crypt_all()+
> >> gws:      1024        1477 c/s        1477 rounds/s 693.232ms per crypt_all()+
> >> Local worksize (LWS) 64, global worksize (GWS) 1024
> >> DONE
> >> Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
> >> Raw:    1077 c/s real, 204800 c/s virtual
> >
> > Why are we getting, here and elsewhere, a higher c/s rate reported for
> > the optimal GWS during auto-tuning than we're getting during a
> > subsequent benchmark?  Is this because auto-tuning is possibly run with
> > too few different passwords (just a guess)?
> the opposite, seems that Lyra2 is faster with different passwords,
> when I was testing Lyra I forgot to upload bench.c to server,

Please add some debugging output to your modified bench.c, e.g. telling
that random candidate passwords are being generated.  This way, you'd
hopefully notice if/when you use a different than intended bench.c.

> after
> that I uploaded bench.c and tested the speed before that and after and
> somehow overlooked the difference but now I see, so Lyra2 should be
> faster
> these speeds returned by auto-tuning seems be the same to these
> returned by modified bench.c (for slow hashes, my bench.c uses rand
> which makes a difference at cracking faster hashes, I saw the
> difference at 150k/s)

I think your modified bench.c should be committed to your tree anyway.
Ideally, you'd have it skip the modifications when they are not needed
or when they are harmful.  e.g. you may introduce a new format flag, say
call it FMT_RANDOM, and check for it in your formats that need it for
proper benchmarking.

While we might not accept this exact change into the main jumbo tree, I
think you should have it in your tree anyway, since it's easy to make
and it will then be saving you time and avoiding errors like "used a
wrong bench.c".

As to your benchmark results, there are similar speed differences
between c/s rates reported during auto-tuning and during benchmarking
for yescrypt as well.  How do you explain those?


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ