john-dev - Re: JtR: GPU for slow hashes

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+TsHUCPLZ2TGrE50OsBgKwPEDt=BjZjaUkAakwxP2GGiMaP7A@mail.gmail.com>
Date: Sat, 31 Mar 2012 15:43:31 +0530
From: SAYANTAN DATTA <std2048@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: JtR: GPU for slow hashes

On Sat, Mar 31, 2012 at 3:42 PM, SAYANTAN DATTA <std2048@...il.com> wrote:

>
>
> On Thu, Mar 29, 2012 at 5:37 AM, Solar Designer <solar@...nwall.com>wrote:
>
>> On Wed, Mar 28, 2012 at 11:21:50PM +0530, SAYANTAN DATTA wrote:
>> > On Wed, Mar 28, 2012 at 5:54 PM, SAYANTAN DATTA <std2048@...il.com>
>> wrote:
>> > >   Here are a few problems I'm facing.Since ATI 4000 series gpus don't
>> > > support byte_addressable_store I have to work around this problem by
>> using
>> > > only uint as the data type for temporary data storage.This problem
>> exsist
>> > > with many of the hash algorithms already implemented with openCL eg
>> MD5,4
>> > > etc.However ATI 5000 series and above seems to support
>> > >  byte_addressable_store.So the exsisting codes should work fine on
>> 5000 or
>> > > above GPUs but for 4000 series or below they need to be
>> reimplemented.The
>> > > workaround is also causing some performance penalties.
>> ...
>> >   Since my GPU dosen't support byte_addressable_store it is becoming an
>> > increasingly uphill task to implement the HMAC_SHA1 algorithm.Using the
>> > uint[]  instead of uchar[] is a probable solution but debugging the
>> > code becomes very much time consuming.
>> >    I  have also considered using 4 uchar16 vectors to  replace single
>> > uchar[64] array but it is resulting in too much branching in the code.If
>> > you have any suggestion please let me know.
>>
>> I am totally unfamiliar with this - maybe someone else will comment.
>> Lukas, Milen, Samuele, Claudio, magnum - maybe some of you?
>>
>> It is not necessarily a bad thing that the task turned out to be more
>> complicated - you have a better chance to demonstrate your ability to
>> work on complex tasks in this way. ;-)
>>
>> Thanks,
>>
>> Alexander
>>
>
> Hi Alexander,
>
> I'm pleased to inform you that I have finished the implementation of
> PBKDF2 step on GPU  (openCL). The code is primarily based on the sample
> program that you mentioned in the earlier post but I had to heavily modify
> the code in order to implement it on ATI RV790 architecture because of
> which it took a lot more time than expected.
> I have compared the outputs with the sample code you provided and the
> outputs are perfect match.Also there is a room for lot more optimization.
>     One drawback I found is that due to very large length of code the
> compilation(clBuildProgram()) time is a bit long.As I've already told you
> that my GPU doesn't support byte_addressable_storage ,I had to improvise a
> work around which resulted in lengthier code.
>    I'm attaching the unoptimized version host and device codes here.
>
> Regards,
> -Sayantan
>
>
>
>

Content of type "text/html" skipped

View attachment "MSCash2_sample.cpp" of type "text/x-c++src" (26707 bytes)

View attachment "MSCash2_host.cpp" of type "text/x-c++src" (12685 bytes)

Download attachment "PBKDF2.cl" of type "application/octet-stream" (35849 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.