john-dev - Re: PHC: Lyra2 on CPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHUKO77MOW86F_ZmY=OPN8LjWJdjY6PYCsT-ZiVBWxrLQQ@mail.gmail.com>
Date: Tue, 2 Jun 2015 19:26:52 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on CPU

2015-06-02 3:45 GMT+02:00 Solar Designer <solar@...nwall.com>:
> On Mon, Jun 01, 2015 at 11:57:34PM +0200, Agnieszka Bielec wrote:
>> Lyra 2 uses by default openmp threads in one hashing function.
>
> IIRC, their implementation uses pthreads directly, not via OpenMP.
> Do you know otherwise?

in source code I downloaded lately is omp

>
>> nPARALLEL option determines how many omp threads are running. and if
>> nPARALLEL changes, output also changes.
>> nPARALLEL by default equals to 2
>
> What we need is an implementation of Lyra2 that would work for any
> thread-level parallelism setting _without_ necessarily creating any
> threads.  In its threads-disabled mode, it would compute those threads'
> portions of work sequentially.  This is much like Colin Percival's
> original implementation of scrypt works when called with p > 1.

yes, I implemented also lyra2 for nPARALLEL > 1 without any threads.
(version c) after removing threads)
these results I included are for nPARALLEL=2

> I haven't looked at your code yet - I should.

I've just uploadeded my versions to branches:
a) - "omp_nested"
b) - "lyra"
c) - "lyra_external_threads"

but my code contains warnings . I though that after we select the
winner I will be working on my code look

in version b) function crypt_all may look unfamiliar. it's because I
had problems with barriers. my all threads after reaching a barrier
were blocked.
in funciton for(i=0;i<2;i++) only two threads were running in function
LYRA2, but printf omp_get_num_threads() was returning 8.
also on super I had problems with barriers

>
>> a)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts:     4896 c/s real, 848 c/s virtual
>> Only one salt:  5005 c/s real, 856 c/s virtual
>>
>> b)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts:     6608 c/s real, 876 c/s virtual
>> Only one salt:  7120 c/s real, 935 c/s virtu
>>
>> c)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts:     7032 c/s real, 943 c/s virtual
>> Only one salt:  7872 c/s real, 1035 c/s virtual
>>
>> without openmp)
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts:     2130 c/s real, 2152 c/s virtual
>> Only one salt:  2160 c/s real, 2138 c/s virtual
>>
>> I think that method b) is slower because we are using synchronization
>> many times and we have barriers  for all omp threads.
>
> Maybe.  You can test this hypothesis by benchmarking at higher cost
> settings and thus at lower c/s rates.  At lower c/s rates, the
> synchronization overhead becomes relatively lower.

I choose only the biggest noticed speeds for tests:
; 8896/9144
        ~0.97287839020122484689
; 2312/2368
        ~0.97635135135135135135


> If confirmed, a way to reduce the overhead at higher c/s rates as well
> would be via computing larger batches of hashes per parallel for loop.
> This is what we normally call OMP_SCALE, but possibly applied at a
> slightly lower level in this case.

lyra2 hash uses barriers in one hash computation so I'm not sure,
maybe I don't understand your point

>
>> I couldn't find a way how to do it for only x threads.
>
> What do you mean by x here?

only nPARALLEL number of threads.

>
>> I am leaving to you to decide which method to implement to jtr.
>
> I think the order of our experiments should be as I outlined at the
> start of this reply.
>

> For nPARALLEL, make it a runtime parameter encoded with the hashes.
>
> What other options "like this" are there?

"where PARAMETERS can be:
      nCols = (number of columns, default is 256)
      nThreads = (number of threads, default is 2)
      nRoundsSponge = (number of Rounds performed for reduced sponge
function [1 - 12], default is 1)
      bSponge = (number of sponge blocks, bitrate, 8 or 10 or 12, default is 12)
      sponge = (0, 1 or 2, default is 0) 0 means Blake2b, 1 means
BlaMka and 2 means half-round BlaMka"

> While we're at it, have you moved the memory (de)allocation out of the
> cracking loop?  And have you done it for POMELO too, as we had
> discussed - perhaps reusing the allocators from yescrypt?  I don't
> recall you reporting on this, so perhaps this task slipped through the
> cracks?  If so, can you please (re-)add it to your priorities?

not yet for both, I will do it in this week

thanks
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.