Date: Wed, 28 Jan 2015 02:19:14 +0530 From: Sayantan Datta <std2048@...il.com> To: john-dev <john-dev@...ts.openwall.com> Subject: Re: Discuss Mask-mode with GPUs and changes required for its support. On Wed, Nov 12, 2014 at 5:35 AM, magnum <john.magnum@...hmail.com> wrote: > On 2014-11-05 16:02, Sayantan Datta wrote: > >> Based on my earlier experience with mask-mode, it was necessary to write >> separate kernels for benchmark, mask-mode(password on GPU) and non-mask >> modes. However, as much as I would love to unify them under a common >> kernel, with all of them following the same code path, it is difficult to >> do so without making debatable changes. >> > > Btw here's some food for thought: > > The NT kernel could either (among alternatives): > a) Only suppoort ISO-8859-1 (like Hashcat). > b) Transfer base words in UTF-16 to GPU, and use a UTF-16 version of > GPU-side mask generation. > c) Support UTF-8/codepage conversions on GPU (NTLMv2 and krb5pa-md5 > kernels currently do this). So we transfer base words in UTF-8 or a CP to > GPU, apply the mask and finally convert to UTF-16 on GPU. > d) some combination of b and c. For example, transfer basewords to GPU in > UTF-8/CP, then convert them to UTF-16 once, finally apply mask with a > UTF-16 version of mask mode. > > IMHO we should *definitely* have full UTF-8/codepage support, the question > is how. We will never be quite as fast as Hashcat with NT hashes anyway so > we should beat it with functionality. So in my book, option a is totally > out of the question. > > Option b is simplest but typically need twice the bandwidth for PCI > transfers (which is not much of a problem when we run hybrid mask) while > option c needs somewhat more complex GPU code. I guess option b is > typically fastest for mask mode. However, option c is fastest when not > using a mask. > > magnum > > Regarding bandwidth, I don't understand how transferring UTF16 words as UTF8 would save PCIe bandwidth. If I am correct, UTF16 can support 65536 values while UTF8 supports only upto 255. So any character with unsigned value greater than 255 must be sent as two UTF8 words or a single UTF16 words and both choices should require same bandwidth. So I believe we are only talking about UTF16 words with values <= 255. Also, does our mask mode support UTF16 placeholders? Because internally the design only support 8bit characters. Are we somehow converting UTF16 placeholders into UTF8 placeholders? Currently, what we are doing is splitting the mask into two parts, one part generates the template candidates(by mask.c) while the other part generates the values to plug into the template(by mask_ext.c). Both activities are performed on CPU, but the actual plugging takes place inside GPU. The advantage is less complex kernel code which enables us to write unified kernel for mask mode, self-test and all other modes with least branch instructions. For UTF16 words with values <=255, it is better to generate the template candidates as UTF8 while the values to plug into the template as UTF16. The reason being we could use parallelism to convert UTF8 into UTF16 for template candidates while for the plug in values, to avoid redundant work it is better to do the conversion on CPU. I think this is what you have suggested in option 'd'. Regards, Sayantan Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.