Date: Tue, 1 May 2012 10:58:22 +0530 From: SAYANTAN DATTA <std2048@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Sayantan :Weekly Report #2 > > 1. Implemented a function to find the optimum local work group size in > > opencl-mscash2. > Is this in magnum-jumbo? It seems not. If so, where is it? > > OK. The code currently in magnum-jumbo only achieves 36k c/s on 7970: > > user@...l:~/john/magnum-jumbo/src$ ../run/john -te -fo=mscash2-opencl > -pla=1 > OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). > Using device 0: Tahiti > Benchmarking: MSCASH2-OPENCL [PBKDF2_HMAC_SHA1]... DONE > Raw: 35754 c/s real, 50592 c/s virtual > I haven't posted a patch yet because it would require two or more kernels for better utilization of various GPUs. As my kernel size is quite large I should first try to make it as compact as possible before stuffing two or more kernel in one file. This could be easily done using function inlining and macros. But I need more time. Whatsoever I will try to post the patch by 8th or 9th of this month. Currently I have my codes on bull that produce 73k c/s on 7970 which you might test if you want to. rotate() function caused huge performance drop on 7970 on bull. > This is puzzling. Perhaps you can try reviewing the generated code (IL > or native) to figure out the cause of the performance drop? In general, > I think we (as a team) should learn to do that. One task I think you could approach slightly later is trying to > implement and optimize Eksblowfish on GPU. As discussed before, we > expect it to be slow, but it'd be useful to have some hard data to prove > this - or maybe disprove it (unlikely), and to have some OpenCL code > (and maybe CUDA as well) that we could run on future GPUs easily as they > become available. Specifically, this may be helpful for design of > future password hashing methods. Additionally, this OpenCL code may > happen to be readily capable of making use of AVX2's VSIB addressing > with Intel's OpenCL SDK - if so, it may actually be faster (on those > future CPUs) than the existing CPU code for bcrypt, until we implement > proper AVX2 code more directly (perhaps with intrinsics). > I had mentioned this task in GSoC student selection context before, but > it may also be approached outside of that context and with slightly > different goals as above. In that way, it will actually be useful even > if the implementation is indeed slower than the current CPU code on > current hardware. Okay, we will do that. Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.