john-dev - Re: Mulit-gpu using claudio's interfaces

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50FAA598.6040508@gmail.com>
Date: Sat, 19 Jan 2013 11:54:32 -0200
From: Claudio André <claudioandre.br@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Mulit-gpu using claudio's interfaces

Em 19-01-2013 07:55, magnum escreveu:
> On 19 Jan, 2013, at 9:04 , Sayantan Datta <std2048@...il.com> wrote:
>
>> I have integrated the claudio's interfaces in mscash2-opencl:
>>
>> std2048@...l:~/Jtr3/run$ ./john -te -fo=mscash2-opencl -dev=0,1
>> Device 0: GeForce GTX 570
>> Optimal Work Group Size:256
>> Kernel Execution Speed (Higher is better):0.475557
>> Device 1: Tahiti (AMD Radeon HD 7900 Series)
>> Optimal Work Group Size:128
>> Kernel Execution Speed (Higher is better):1.556817
>> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
>> Raw:    125427 c/s real, 125427 c/s virtual
>>
>> std2048@...l:~/Jtr3/run$ ./john -te -fo=mscash2-opencl -dev=gpu
>> Device 0: GeForce GTX 570
>> Optimal Work Group Size:128
>> Kernel Execution Speed (Higher is better):0.474687
>> Device 1: Tahiti (AMD Radeon HD 7900 Series)
>> Optimal Work Group Size:128
>> Kernel Execution Speed (Higher is better):1.557031
>> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
>> Raw:    126030 c/s real, 125728 c/s virtual
> Good stuff. Ideally we should have a GTX 590 or GTX 690 and an AMD 7990 in Bull so we could test multi-device CUDA as well as multi-device OpenCL with homogenous or heterogenous devices.

To do the debug, absolutely.

> I'm not sure about iterations and key length, but shouldn't mscash2 ideally perform similar or better than wpapsk?
>
> Also, there's some problem using three devices:
>
> ../run/john -t -fo:mscash2-opencl --dev=all
> Device 0: GeForce GTX 570
> Optimal Work Group Size:128
> Kernel Execution Speed (Higher is better):0.474681
> Device 1: Tahiti (AMD Radeon HD 7900 Series)
> Optimal Work Group Size:256
> Kernel Execution Speed (Higher is better):1.556828
> Device 2: AMD FX(tm)-8120 Eight-Core Processor
> Optimal Work Group Size:4
> Kernel Execution Speed (Higher is better):0.001337
> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... Segmentation fault
>
> Maybe they are just too different in speed.
>
> magnum

The '--dev=all' is something that bothers my mind on a real run.

If you have free time and can debug it and get more information, it is 
going to be useful. It could be some bug in common code, as well.
-----

BTW: is this hard? (for example).

my_formar_crypt_all() // crypt_all inside a format file.
{
...
//Put work on N cards
enqueue()
...

while (anyGpu.hasWork_toDo) {

   // ****************************************************************
   // ***** (This, so) core can check all candidates from 16000 to 
32000, while any other tasks inside other GPUs have not finished.
   if (Gpu[x].finished)
     core_send_event(result = WORK_DONE, start = 16000, finish = 32000)
}

----
Ok, the (only one)/main thread part is a limitation.

Claudio

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.