Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 8 Nov 2012 20:09:54 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Split kernel for OpenCL WPA-PSK

On 8 Nov, 2012, at 19:12 , magnum <john.magnum@...hmail.com> wrote:
> Using device 0: Tahiti
> Local worksize (LWS) 192, Global worksize (GWS) 196608
> Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
> Raw:    66197 c/s real, 137970 c/s virtual

> This code too does over 2.1 billion SHA1/second, but CPU post-processing nearly halves the speed (without OMP). So I'm in the process of moving all of that post-processing to GPU. It's just a couple HMACs more, so I hope to exceed 120K c/s with that in place.

Lol, while digging into that post processing, I found out that the (CPU side) prf_512() function of wpapsk.h did four times more work than needed. It produced an 80 byte key of which only 16 bytes was needed. Just with this fix, the Tahiti figure went up another 35%:

Using device 0: Tahiti
Local worksize (LWS) 256, Global worksize (GWS) 262144
Benchmarking: WPA-PSK PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw:    89164 c/s real, 296207 c/s virtual

This will affect CUDA too. Still, I'm proceeeding with implementing all of that post-processing on GPU.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.