Date: Sat, 31 Mar 2012 15:42:03 +0530 From: SAYANTAN DATTA <std2048@...il.com> To: john-dev@...ts.openwall.com Subject: Re: JtR: GPU for slow hashes On Thu, Mar 29, 2012 at 5:37 AM, Solar Designer <solar@...nwall.com> wrote: > On Wed, Mar 28, 2012 at 11:21:50PM +0530, SAYANTAN DATTA wrote: > > On Wed, Mar 28, 2012 at 5:54 PM, SAYANTAN DATTA <std2048@...il.com> > wrote: > > > Here are a few problems I'm facing.Since ATI 4000 series gpus don't > > > support byte_addressable_store I have to work around this problem by > using > > > only uint as the data type for temporary data storage.This problem > exsist > > > with many of the hash algorithms already implemented with openCL eg > MD5,4 > > > etc.However ATI 5000 series and above seems to support > > > byte_addressable_store.So the exsisting codes should work fine on > 5000 or > > > above GPUs but for 4000 series or below they need to be > reimplemented.The > > > workaround is also causing some performance penalties. > ... > > Since my GPU dosen't support byte_addressable_store it is becoming an > > increasingly uphill task to implement the HMAC_SHA1 algorithm.Using the > > uint instead of uchar is a probable solution but debugging the > > code becomes very much time consuming. > > I have also considered using 4 uchar16 vectors to replace single > > uchar array but it is resulting in too much branching in the code.If > > you have any suggestion please let me know. > > I am totally unfamiliar with this - maybe someone else will comment. > Lukas, Milen, Samuele, Claudio, magnum - maybe some of you? > > It is not necessarily a bad thing that the task turned out to be more > complicated - you have a better chance to demonstrate your ability to > work on complex tasks in this way. ;-) > > Thanks, > > Alexander > Hi Alexander, I'm pleased to inform you that I have finished the implementation of PBKDF2 step on GPU (openCL). The code is primarily based on the sample program that you mentioned in the earlier post but I had to heavily modify the code in order to implement it on ATI RV790 architecture because of which it took a lot more time than expected. I have compared the outputs with the sample code you provided and the outputs are perfect match.Also there is a room for lot more optimization. One drawback I found is that due to very large length of code the compilation(clBuildProgram()) time is a bit long.As I've already told you that my GPU doesn't support byte_addressable_storage ,I had to improvise a work around which resulted in lengthier code. I'm attaching the unoptimized version host and device codes here. Regards, -Sayantan Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.