Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 26 Jul 2015 15:15:43 +0200
From: Agnieszka Bielec <>
Subject: Re: PHC: Lyra2 vs yescrypt benchmarks 2

2015-07-26 2:31 GMT+02:00 Solar Designer <>:
> On Sat, Jul 25, 2015 at 10:56:42PM +0200, Agnieszka Bielec wrote:
>> a@...l:~/m/run$ ./john --test --format=lyra2
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2 [Blake2 AVX2]... (8xOMP) DONE
> Does this build actually use AVX2?  If so, how much slower is an
> AVX-only build?

nope :<, my bad, I was thinking that it uses AVX2 becaues Lyra2 uses
blake2b which has some instructions in SSE4_1
#if defined(__SSE4_1__)
#include "blake2b-load-sse41.h"
#include "blake2b-load-sse2.h"

but now I see that these instructions are not coverable by Lyra2
(because Lyra2 ' blake2b' uses another but similar to blake2b ROUND
without LOAD_MSG_ ) I don't know if these rounds are the same, looks
like different things

round used by Lyra: ROUND_LYRA_SSE in file Sponge_sse.h
original round: ROUND in file blake2b-round.h

>> Calculating best global worksize (GWS); max. 1s single kernel invocation.
>> gws:       256         436 c/s         436 rounds/s 586.434ms per crypt_all()!
>> gws:       512         832 c/s         832 rounds/s 615.005ms per crypt_all()+
>> gws:      1024        1477 c/s        1477 rounds/s 693.232ms per crypt_all()+
>> Local worksize (LWS) 64, global worksize (GWS) 1024
>> Speed for cost 1 (t) of 1, cost 2 (m) of 64, cost 3 (c) of 256, cost 4 (p) of 1
>> Raw:    1077 c/s real, 204800 c/s virtual
> Why are we getting, here and elsewhere, a higher c/s rate reported for
> the optimal GWS during auto-tuning than we're getting during a
> subsequent benchmark?  Is this because auto-tuning is possibly run with
> too few different passwords (just a guess)?

the opposite, seems that Lyra2 is faster with different passwords,
when I was testing Lyra I forgot to upload bench.c to server, after
that I uploaded bench.c and tested the speed before that and after and
somehow overlooked the difference but now I see, so Lyra2 should be

these speeds returned by auto-tuning seems be the same to these
returned by modified bench.c (for slow hashes, my bench.c uses rand
which makes a difference at cracking faster hashes, I saw the
difference at 150k/s)

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ