Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 14 Sep 2015 22:16:54 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: SHA-1 H()

On Sep 14, 2015, at 8:27 PM, Solar Designer <solar@...nwall.com> wrote:
> 
> On Mon, Sep 14, 2015 at 04:33:55PM +0800, Lei Zhang wrote:
>> In case it's helpful, here're some benchmark figures of JtR running on SDE:
> 
> And, did you fully make use of them (turning all of the
> 3-input basic functions of MD4/MD5/SHA-1/SHA-2 into single instructions)
> or only to define vcmov() for now?

I turned all 3-input functions to using a single TERNLOG instruction, except for those that are already using a single CMOV (I thought one CMOV is good enough, but forgot it might be emulated). Now that you mentioned it, I also used TERNLOG to emulate CMOV. Here's the latest results:

Benchmarking: Raw-MD4 [MD4 512/512 AVX512F 16x3]... DONE
Raw:	219184 c/s real, 219184 c/s virtual

Benchmarking: Raw-MD5 [MD5 512/512 AVX512F 16x3]... DONE
Raw:	138917 c/s real, 140293 c/s virtual

Benchmarking: Raw-SHA1 [SHA1 512/512 AVX512F 16x]... DONE
Raw:	99216 c/s real, 99216 c/s virtual

Benchmarking: Raw-SHA256 [SHA256 512/512 AVX512F 16x]... DONE
Raw:	48839 c/s real, 49328 c/s virtual

Benchmarking: Raw-SHA512 [SHA512 512/512 AVX512F 8x]... DONE
Raw:	22019 c/s real, 22019 c/s virtual

Compared to the previous figures (please refer to my last message), using TERNLOG to emulate CMOV makes JtR slower on SDE. Maybe SDE's emulation of TERNLOG is just not efficient.


And here's the results without using any TERNLOG instructions:

Benchmarking: Raw-MD4 [MD4 512/512 AVX512F 16x3]... DONE
Raw:	444356 c/s real, 448800 c/s virtual

Benchmarking: Raw-MD5 [MD5 512/512 AVX512F 16x3]... DONE
Raw:	225172 c/s real, 227424 c/s virtual

Benchmarking: Raw-SHA1 [SHA1 512/512 AVX512F 16x]... DONE
Raw:	212784 c/s real, 212784 c/s virtual

Benchmarking: Raw-SHA256 [SHA256 512/512 AVX512F 16x]... DONE
Raw:	63413 c/s real, 63413 c/s virtual

Benchmarking: Raw-SHA512 [SHA512 512/512 AVX512F 8x]... DONE
Raw:	27440 c/s real, 27168 c/s virtual

I think that further confirms my statement above: SDE's emulation of TERNLOG is inefficient.

>> BTW, SDE runs much more smoothly than I expected. At least those formats listed above ran quite fast on it.
> 
> You mean the program's interactive response time, not the c/s rates.

That's exactly what I meant :D


Lei

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.