Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 22 Jun 2012 01:37:29 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: relbench and changed format names

On 2012-06-22 01:21, Frank Dittrich wrote:
> On 06/22/2012 12:49 AM, magnum wrote:
>> On 2012-06-22 00:37, Frank Dittrich wrote:
>>> Benchmarking: NT MD4 [128/128 SSE2 + 32/32]... DONE
>>> Raw:    2202K c/s real, 2224K c/s virtual
>>>
>>> Benchmarking: NT MD4 [128/128 SSE2 intrinsics 12x]... DONE
>>> Raw:    5078K c/s real, 5235K c/s virtual
>>
>> How can my vanilla MD4 NT2 format be *that* much faster than Alain's
>> optimised one that reverses rounds? Something must be amiss.
>
> Don't really know how that happened.
> I can't reproduce it now.
> With
>
> $ ./john --list=build-info
> Version: 1.7.9-jumbo-5+unstable
> Build: linux-x86-sse2 OMP
> Arch: 32-bit LE
> $JOHN is ./
> Rec file version: REC3
> CHARSET_MIN: 32 (0x20)
> CHARSET_MAX: 126 (0x7e)
> CHARSET_LENGTH: 8
> gcc version: 4.6.3
>
> I get much smaller differences now (repeated several times, with similar
> results):
>
> $ ./john --test --format=nt
> Benchmarking: NT MD4 [128/128 SSE2 + 32/32]... DONE
> Raw:	5330K c/s real, 5277K c/s virtual
> $ ./john --test --format=nt2
> Benchmarking: NT MD4 [128/128 SSE2 intrinsics 12x]... DONE
> Raw:	6746K c/s real, 6814K c/s virtual

Still, Alain's format should not be that much slower. Simon's register 
tricks must be damn good. In my head, nt2 should always be slower, but I 
know it's typically faster.

> And with
> $ ./john --list=build-info
> Version: 1.7.9-jumbo-5+unstable
> Build: linux-x86-sse2i
> Arch: 32-bit LE
> $JOHN is ./
> Rec file version: REC3
> CHARSET_MIN: 32 (0x20)
> CHARSET_MAX: 126 (0x7e)
> CHARSET_LENGTH: 8
> gcc version: 4.6.3
>
> I get:
>
> $ ./john --test --format=nt
> Benchmarking: NT MD4 [128/128 SSE2 + 32/32]... DONE
> Raw:	6877K c/s real, 6946K c/s virtual
>
> $ ./john --test --format=nt2
> Benchmarking: NT MD4 [128/128 SSE2 intrinsics 12x]... DONE
> Raw:	6318K c/s real, 6318K c/s virtual

The difference between -sse2 and -sse2i builds should be *zilch* for 
Alain's format, OMP or not. Apparently it's not, this figure is a good 
30% faster than the one above.

Also, it seems icc precompiled intrinsics just make things slower here 
for nt2. Please run "make clean testpara32" and post the output.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.