Date: Wed, 14 Jan 2015 00:58:36 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com CC: atom <atom@...hcat.net> Subject: Re: PRINCE On 2015-01-13 19:49, Solar Designer wrote: > On Tue, Jan 13, 2015 at 07:21:47PM +0100, magnum wrote: >> FWIW I did a quick and dirty test today with dropping GMP and using >> gcc's __uint128_t extension just to see what performance it would get. >> That was trivial and the boost was pretty significant: 33% faster. > > Great. > > Is 128-bit guaranteed to be large enough not to overflow in PRINCE? I think we're fine as long as we can represent the total keyspace figure (cc'ing Atom: is this correct?) and for "rockyou" with default settings that's 1584345389899053977143859 which is closer to 2^80 than 2^81. BTW you'd be better off anyway with a wordlist that is smaller than this but optimized for the task. > As an alternative, we could have 64-bit saturating math - just detect > overflow and set the result to the max value if so. This is trivial for > addition, but unfortunately is slow for multiplication - we'd have to > divide to check if there was overflow, which is very slow. So perhaps > not a good idea for speed reasons. I see now the only multiplication we need is not only out of the hot loop, it's not even in the generation loop at all. It's used when calculating keyspace and that's it. In the inner loop we have these >64-bit operations: mpz_add_ui() mpz_fdiv_ui() [modulo] mpz_div_ui() Since they all end in _ui this looks very suitable for a math.h solution using 64-bit hi/lo functions. > As a maybe better alternative, what about "double"? Have you tried? Good idea, that was totally trivial given my present code. The modulo operation now used fmod() and got very slow so I also tried a version that casted the modulos and divisions to uint128_t and otherwise use double. That one was actually fastest of them all with a slight margin: Running in 30s against 1 fake NT hash: ========================================= double w/ int128 mod/div: 8940 Kp/s 146% int128: 8843 Kp/s 144% GMP: 6141 Kp/s 100% double: 5441 Kp/s 89% (Note that both in case of "int128" and "double", there *might* be some optimizations possible - I just did a quick'n'dirty conversion withut changing anything else. The GMP interface sometimes calls for temp variables that are no longer needed, and so on. Not sure if any of that could have significant impact). Speaking about that modulo operation - could you do that faster than a full division using 128/64 math.h, or would it be the same work anyway? It is actually 'mpz % uint64_t' in the original code (and actually that might already mean a 128/64 function could be faster than using gcc's uint128_t unless gcc optimizes it accordingly, right?). magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.