Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 01 Jul 2012 14:30:31 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: raw-sha1-ng reduced binary size

On 2012-07-01 13:51, magnum wrote:
> On 2012-07-01 13:20, Tavis Ormandy wrote:
>> magnum <john.magnum@...hmail.com> wrote:
>>> BTW you still advertise a BINARY_SIZE of 20 octets although you would do
>>> perfectly fine with 4. The difference is very far from neglectable if you
>>> load a couple of million hashes. I *really* think this should be your #1
>>> goal and it's a walk in the park. Take a look at a late rawSHA1_fmt_plug.c
>>> and look for BINARY_SIZE vs DIGEST_SIZE.
>>
>> I understand, I'm just not sure it's worth the performance penalty (because
>> I can't treat it like a dqword in cmp_all). I can think of a faster format
>> if I store it redundantly, like:
>>
>> SHA1  =00112233 44556677 aabbccdd eeff3344 eeaa1122
>> BINARY=EEAA1122 EEAA1122 EEAA1122 EEAA1122
>>
>> Then I only have to shuffle it once, instead of once per cmp_all. That's a
>> saving of 4 bytes per hash, and I can still use it like a dqword, is that
>> ok?
> 
> Sure, I did not realize you would end up with a slower cmp_all. There
> should be some way around that.
> 
>> I made both changes, so you can choose. I sent you a pull req for the 16byte
>> one, but the 4byte one is here if you prefer:
>>
>> https://github.com/taviso/magnum-jumbo/commit/...
> 
> Cool, I'll do some experiments with all three versions.

I think we are actually not using cmp_all() much, due to the hash
tables. I tried exhausting --inc=digits against 6.5 million hashes:

Original 20-byte version peaks 876 MB and ends in 21 seconds.
The 16-byte version peaks 826 MB and ends in 21 seconds.
The 4-byte version peaks 777 MB and ends in 21 seconds.

The vanilla raw-sha1 format also peaks 777 MB and also ends in 21 seconds.

The raw-sha1 in bleeding-jumbo uses a full 20 bytes of binary but
instead does not keep the source hashes (it uses the new get_source()
function for rebuilding them). In only needs 674 MB and it finishes in
17 seconds, however I think I just found a bug in that one so this might
not be a proper figure.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.