Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 2 Jun 2015 19:26:52 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Lyra2 on CPU

2015-06-02 3:45 GMT+02:00 Solar Designer <solar@...nwall.com>:
> On Mon, Jun 01, 2015 at 11:57:34PM +0200, Agnieszka Bielec wrote:
>> Lyra 2 uses by default openmp threads in one hashing function.
>
> IIRC, their implementation uses pthreads directly, not via OpenMP.
> Do you know otherwise?

in source code I downloaded lately is omp

>
>> nPARALLEL option determines how many omp threads are running. and if
>> nPARALLEL changes, output also changes.
>> nPARALLEL by default equals to 2
>
> What we need is an implementation of Lyra2 that would work for any
> thread-level parallelism setting _without_ necessarily creating any
> threads.  In its threads-disabled mode, it would compute those threads'
> portions of work sequentially.  This is much like Colin Percival's
> original implementation of scrypt works when called with p > 1.

yes, I implemented also lyra2 for nPARALLEL > 1 without any threads.
(version c) after removing threads)
these results I included are for nPARALLEL=2

> I haven't looked at your code yet - I should.

I've just uploadeded my versions to branches:
a) - "omp_nested"
b) - "lyra"
c) - "lyra_external_threads"

but my code contains warnings . I though that after we select the
winner I will be working on my code look

in version b) function crypt_all may look unfamiliar. it's because I
had problems with barriers. my all threads after reaching a barrier
were blocked.
in funciton for(i=0;i<2;i++) only two threads were running in function
LYRA2, but printf omp_get_num_threads() was returning 8.
also on super I had problems with barriers

>
>> a)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts:     4896 c/s real, 848 c/s virtual
>> Only one salt:  5005 c/s real, 856 c/s virtual
>>
>> b)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts:     6608 c/s real, 876 c/s virtual
>> Only one salt:  7120 c/s real, 935 c/s virtu
>>
>> c)
>> Will run 8 OpenMP threads
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... (8xOMP) DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts:     7032 c/s real, 943 c/s virtual
>> Only one salt:  7872 c/s real, 1035 c/s virtual
>>
>> without openmp)
>> Benchmarking: Lyra2, Generic Lyra2 [ ]... DONE
>> Speed for cost 1 (t) of 8, cost 2 (m) of 8
>> Many salts:     2130 c/s real, 2152 c/s virtual
>> Only one salt:  2160 c/s real, 2138 c/s virtual
>>
>> I think that method b) is slower because we are using synchronization
>> many times and we have barriers  for all omp threads.
>
> Maybe.  You can test this hypothesis by benchmarking at higher cost
> settings and thus at lower c/s rates.  At lower c/s rates, the
> synchronization overhead becomes relatively lower.

I choose only the biggest noticed speeds for tests:
; 8896/9144
        ~0.97287839020122484689
; 2312/2368
        ~0.97635135135135135135


> If confirmed, a way to reduce the overhead at higher c/s rates as well
> would be via computing larger batches of hashes per parallel for loop.
> This is what we normally call OMP_SCALE, but possibly applied at a
> slightly lower level in this case.

lyra2 hash uses barriers in one hash computation so I'm not sure,
maybe I don't understand your point

>
>> I couldn't find a way how to do it for only x threads.
>
> What do you mean by x here?

only nPARALLEL number of threads.

>
>> I am leaving to you to decide which method to implement to jtr.
>
> I think the order of our experiments should be as I outlined at the
> start of this reply.
>

> For nPARALLEL, make it a runtime parameter encoded with the hashes.
>
> What other options "like this" are there?

"where PARAMETERS can be:
      nCols = (number of columns, default is 256)
      nThreads = (number of threads, default is 2)
      nRoundsSponge = (number of Rounds performed for reduced sponge
function [1 - 12], default is 1)
      bSponge = (number of sponge blocks, bitrate, 8 or 10 or 12, default is 12)
      sponge = (0, 1 or 2, default is 0) 0 means Blake2b, 1 means
BlaMka and 2 means half-round BlaMka"

> While we're at it, have you moved the memory (de)allocation out of the
> cracking loop?  And have you done it for POMELO too, as we had
> discussed - perhaps reusing the allocators from yescrypt?  I don't
> recall you reporting on this, so perhaps this task slipped through the
> cracks?  If so, can you please (re-)add it to your priorities?

not yet for both, I will do it in this week

thanks

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.