john-dev - Re: [GSoC] John the Ripper support for PHC finalists

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHUNezhzh3=-b0CF-xvEYkRUwO6Do2kpCmwwev3dU3d_wQ@mail.gmail.com>
Date: Thu, 7 May 2015 15:30:43 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] John the Ripper support for PHC finalists

2015-05-05 20:00 GMT+02:00 Solar Designer <solar@...nwall.com>:
> On Mon, May 04, 2015 at 01:18:46AM +0200, Agnieszka Bielec wrote:
>> 2015-04-27 3:50 GMT+02:00 Solar Designer <solar@...nwall.com>:
>>
>> > BTW, bumping into total GPU global memory size may be realistic with
>> > these memory-hard hashes.  Our TITAN's 6 GB was the performance
>> > limiting factor in some of the benchmarks here:
>> > http://www.openwall.com/lists/crypt-dev/2014/03/13/1
>>
>> I use only 128MB
>
> What happens if you increase GWS further?  Does performance drop?  What
> if you manually increase GWS even further?  It might happen that the
> auto-tuning finds a local minimum, whereas a higher GWS is optimal.

the speed drops significantly when I make gws x2 bigger

>> for the experimenst i removed almost everything from my code except
>> the biggest bottleneck:
>
> That's a good experiment.
>
>>         v1=vload4(0,S+sMAP(index_local));       \
>>         v= v+(v1<<1); \
>>         v1=v1+(v<<2);             \
>>         vstore4(v1,0,S+sMAP(index_local));               \
>> \
>>         random_number  = S[sMAP(i3)];              \
>
> BTW, can you explain why sMAP is as it is?
>
> #define sMAP(X) ((X)*GID+gid4)
>
> where:
>
>         gid = get_global_id(0);
>         GID = get_global_size(0);
>         gid4 = gid * 4;
>

sMAP, MAP and cMAP stands for coalescing (IIRC it's around 5 (maybe
more) times faster with it)

> Also, I notice there are some if/else in G and H macros.  Are they
> removed during loop unrolling, or do they translate to exec masks in the
> generated code?

I cached values from memory into variables and I must check if
i0==index_global and i0==index_local, it's faster with this. In F all
workitems execute the same if-else branch but not in H. I didn't
disassemble the code yet. I doubt

>> and the gws number with the memory usage were the same, I can nothing
>> to do with this bottleneck
>>
>> but If I remove everything from the code, GWS also doesn't differ
>
> "Everything"?

if I change my function into pomelo_crypt_kernel(args...) { nothing  }
but sorry, this was a false positive, If i set manually gws in this
case everything looks normal


> Yes, caching the index_local portion of S[] in local memory (or
> you can also try private memory on NVIDIA) makes sense to me.  A drawback
> is that for all writes to S[], you'd have to check if the index is low
> enough that the write needs to go to this cached copy (as well as
> possibly to the global copy, to avoid having to perform a similar check
> on global_index reads, or you can use the cache there as well - it's
> unclear which will run faster).
>
>> we could cache this segment but today graphic cards
>> rarely has 64KB of local memory, on super we have 48kB and 32KB
>> it is even worse because this is 64KB for the work-group
>>
>> we don't know how much the __private memory we have , we can only see
>> if the kernel compilation failed or not, but I'm not sure of this
>
> I think you should generally prefer to use local rather than private
> memory for this.

I think so too


> AMD GCN (dev=0 and dev=1 in super) has 64 KB of local memory per CU.
> See http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf
> slide 10.

I checked local memory size using this code

clGetDeviceInfo(devices[gpu_id],CL_DEVICE_LOCAL_MEM_SIZE,sizeof(cl_ulong),&local_memory_size,NULL);
    printf("mamy %llu\n",(unsigned long long) local_memory_size);

and I was getting 48 and 32 KB

thanks
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.