Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 28 Jan 2015 02:19:14 +0530
From: Sayantan Datta <>
To: john-dev <>
Subject: Re: Discuss Mask-mode with GPUs and changes required for
 its support.

On Wed, Nov 12, 2014 at 5:35 AM, magnum <> wrote:

> On 2014-11-05 16:02, Sayantan Datta wrote:
>> Based on my earlier experience with mask-mode, it was necessary to write
>> separate kernels for benchmark, mask-mode(password on GPU) and non-mask
>> modes. However, as much as I would love to unify them under a common
>> kernel, with all of them following the same code path, it is difficult to
>> do so without making debatable changes.
> Btw here's some food for thought:
> The NT kernel could either (among alternatives):
> a) Only suppoort ISO-8859-1 (like Hashcat).
> b) Transfer base words in UTF-16 to GPU, and use a UTF-16 version of
> GPU-side mask generation.
> c) Support UTF-8/codepage conversions on GPU (NTLMv2 and krb5pa-md5
> kernels currently do this). So we transfer base words in UTF-8 or a CP to
> GPU, apply the mask and finally convert to UTF-16 on GPU.
> d) some combination of b and c. For example, transfer basewords to GPU in
> UTF-8/CP, then convert them to UTF-16 once, finally apply mask with a
> UTF-16 version of mask mode.
> IMHO we should *definitely* have full UTF-8/codepage support, the question
> is how. We will never be quite as fast as Hashcat with NT hashes anyway so
> we should beat it with functionality. So in my book, option a is totally
> out of the question.
> Option b is simplest but typically need twice the bandwidth for PCI
> transfers (which is not much of a problem when we run hybrid mask) while
> option c needs somewhat more complex GPU code. I guess option b is
> typically fastest for mask mode. However, option c is fastest when not
> using a mask.
> magnum
Regarding bandwidth, I don't understand how transferring UTF16 words as
UTF8 would save PCIe bandwidth. If I am correct, UTF16 can support 65536
values while UTF8 supports only upto 255. So any character with unsigned
value greater than 255 must be sent as two UTF8 words or a single UTF16
words and both choices should require same bandwidth. So I believe we are
only talking about UTF16 words with values <= 255.

Also, does our mask mode support UTF16 placeholders? Because internally the
design only support 8bit characters. Are we somehow converting UTF16
placeholders into UTF8 placeholders?

Currently, what we are doing is splitting the mask into two parts, one part
generates the template candidates(by mask.c) while the other part generates
the values to plug into the template(by mask_ext.c). Both activities are
performed on CPU, but the actual plugging takes place inside GPU. The
advantage is less complex kernel code which enables us to write unified
kernel for mask mode, self-test and all other modes with least branch
instructions. For UTF16 words with values <=255, it is better to generate
the template candidates as UTF8 while the values to plug into the template
as UTF16. The reason being we could use parallelism to convert UTF8 into
UTF16 for template candidates while for the plug in values, to avoid
redundant work it is better to do the conversion on CPU. I think this is
what you have suggested in option 'd'.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.