john-dev - Re: PRINCE

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5e146d0de78e453ab4a8177fc80d6617@smtp.hushmail.com>
Date: Wed, 14 Jan 2015 00:58:36 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
CC: atom <atom@...hcat.net>
Subject: Re: PRINCE

On 2015-01-13 19:49, Solar Designer wrote:
> On Tue, Jan 13, 2015 at 07:21:47PM +0100, magnum wrote:
>> FWIW I did a quick and dirty test today with dropping GMP and using
>> gcc's __uint128_t extension just to see what performance it would get.
>> That was trivial and the boost was pretty significant: 33% faster.
> 
> Great.
> 
> Is 128-bit guaranteed to be large enough not to overflow in PRINCE?

I think we're fine as long as we can represent the total keyspace figure
(cc'ing Atom: is this correct?) and for "rockyou" with default settings
that's 1584345389899053977143859 which is closer to 2^80 than 2^81. BTW
you'd be better off anyway with a wordlist that is smaller than this but
optimized for the task.

> As an alternative, we could have 64-bit saturating math - just detect
> overflow and set the result to the max value if so.  This is trivial for
> addition, but unfortunately is slow for multiplication - we'd have to
> divide to check if there was overflow, which is very slow.  So perhaps
> not a good idea for speed reasons.

I see now the only multiplication we need is not only out of the hot
loop, it's not even in the generation loop at all. It's used when
calculating keyspace and that's it.

In the inner loop we have these >64-bit operations:

	mpz_add_ui()
	mpz_fdiv_ui() [modulo]
	mpz_div_ui()

Since they all end in _ui this looks very suitable for a math.h solution
using 64-bit hi/lo functions.

> As a maybe better alternative, what about "double"?  Have you tried?

Good idea, that was totally trivial given my present code. The modulo
operation now used fmod() and got very slow so I also tried a version
that casted the modulos and divisions to uint128_t and otherwise use
double. That one was actually fastest of them all with a slight margin:

Running in 30s against 1 fake NT hash:
=========================================
double w/ int128 mod/div: 8940 Kp/s  146%
int128:                   8843 Kp/s  144%
GMP:                      6141 Kp/s  100%
double:                   5441 Kp/s   89%

(Note that both in case of "int128" and "double", there *might* be some
optimizations possible - I just did a quick'n'dirty conversion withut
changing anything else. The GMP interface sometimes calls for temp
variables that are no longer needed, and so on. Not sure if any of that
could have significant impact).

Speaking about that modulo operation - could you do that faster than a
full division using 128/64 math.h, or would it be the same work anyway?
It is actually 'mpz % uint64_t' in the original code (and actually that
might already mean a 128/64 function could be faster than using gcc's
uint128_t unless gcc optimizes it accordingly, right?).

magnum
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.