|
|
Message-ID: <CAKGDhHUKO77MOW86F_ZmY=OPN8LjWJdjY6PYCsT-ZiVBWxrLQQ@mail.gmail.com>
Date: Tue, 2 Jun 2015 19:26:52 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on CPU
2015-06-02 3:45 GMT+02:00 Solar Designer <solar@...nwall.com>:
> On Mon, Jun 01, 2015 at 11:57:34PM +0200, Agnieszka Bielec wrote:
>> Lyra 2 uses by default openmp threads in one hashing function.
>
> IIRC, their implementation uses pthreads directly, not via OpenMP.
> Do you know otherwise?
in source code I downloaded lately is omp
>
>> nPARALLEL option determines how many omp threads are running. and if
>> nPARALLEL changes, output also changes.
>> nPARALLEL by default equals to 2
>
> What we need is an implementation of Lyra2 that would work for any
> thread-level parallelism setting _without_ necessarily creating any
> threads. In its threads-disabled mode, it would compute those threads'
> portions of work sequentially. This is much like Colin Percival's
> original implementation of scrypt works when called with p > 1.
yes, I implemented also lyra2 for nPARALLEL > 1 without any threads.
(version c) after removing threads)
these results I included are for nPARALLEL=2
> I haven't looked at your code yet - I should.
I've just uploadeded my versions to branches:
a) - "omp_nested"
b) - "lyra"
c) - "lyra_external_threads"
but my code contains warnings . I though that after we select the
winner I will be working on my code look
in version b) function crypt_all may look unfamiliar. it's because I
had problems with barriers. my all threads after reaching a barrier
were blocked.
in funciton for(i=0;i<2;i++) only two threads were running in function
LYRA2, but printf omp_get_num_threads() was returning 8.
also on super I had problems with barriers
>
>> a)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts: 4896 c/s real, 848 c/s virtual
>> Only one salt: 5005 c/s real, 856 c/s virtual
>>
>> b)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts: 6608 c/s real, 876 c/s virtual
>> Only one salt: 7120 c/s real, 935 c/s virtu
>>
>> c)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts: 7032 c/s real, 943 c/s virtual
>> Only one salt: 7872 c/s real, 1035 c/s virtual
>>
>> without openmp)
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts: 2130 c/s real, 2152 c/s virtual
>> Only one salt: 2160 c/s real, 2138 c/s virtual
>>
>> I think that method b) is slower because we are using synchronization
>> many times and we have barriers for all omp threads.
>
> Maybe. You can test this hypothesis by benchmarking at higher cost
> settings and thus at lower c/s rates. At lower c/s rates, the
> synchronization overhead becomes relatively lower.
I choose only the biggest noticed speeds for tests:
; 8896/9144
~0.97287839020122484689
; 2312/2368
~0.97635135135135135135
> If confirmed, a way to reduce the overhead at higher c/s rates as well
> would be via computing larger batches of hashes per parallel for loop.
> This is what we normally call OMP_SCALE, but possibly applied at a
> slightly lower level in this case.
lyra2 hash uses barriers in one hash computation so I'm not sure,
maybe I don't understand your point
>
>> I couldn't find a way how to do it for only x threads.
>
> What do you mean by x here?
only nPARALLEL number of threads.
>
>> I am leaving to you to decide which method to implement to jtr.
>
> I think the order of our experiments should be as I outlined at the
> start of this reply.
>
> For nPARALLEL, make it a runtime parameter encoded with the hashes.
>
> What other options "like this" are there?
"where PARAMETERS can be:
nCols = (number of columns, default is 256)
nThreads = (number of threads, default is 2)
nRoundsSponge = (number of Rounds performed for reduced sponge
function [1 - 12], default is 1)
bSponge = (number of sponge blocks, bitrate, 8 or 10 or 12, default is 12)
sponge = (0, 1 or 2, default is 0) 0 means Blake2b, 1 means
BlaMka and 2 means half-round BlaMka"
> While we're at it, have you moved the memory (de)allocation out of the
> cracking loop? And have you done it for POMELO too, as we had
> discussed - perhaps reusing the allocators from yescrypt? I don't
> recall you reporting on this, so perhaps this task slipped through the
> cracks? If so, can you please (re-)add it to your priorities?
not yet for both, I will do it in this week
thanks
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.