
Date: Sat, 26 Mar 2011 14:12:14 0500 From: "jfoug" <jfoug@....net> To: <johndev@...ts.openwall.com> Subject: RE: john scalability >From: magnum > >So if we use the "size 2" (4096) hash, we will have 4096 buckets? And if >we use the "size 4" (1M) hash, we'll have 1M buckets with a lot less in >each. Yes. >The loader produce binaries from the hashes once and for all, and calls >binary_hash() to decide what bucket each one should be put in? There are 3 (or 5) binary_hash functions for each format. There are also a matching 3 (or 5) get_hash() function in the format. The loader has counted how many candidates there are. It then uses a table (in params.h??) to select which of the hashes to use. Both 'sides' (i.e. binary_hash[x]() and get_hash[x]()) MUST be the same, and the hash table generated must be generated the proper size. The hash tables are all generated on a even 2^(4*(size+1)) value. So hash[0] hash 16 elements, hash[1] has 256, hash[2] hash 4096, etc. Thus, that is why you see all the same hash[0](Val&0xF) hash[1](Val&0xFF) hash[2](Val&0xFFF) ... This is so the number returned contains the proper distribution. Thus, we currently allow for hash tables of 16, 256, 4096, 64k, and 1M. Solar is adding 16M (0xFFFFFF) and 256M (0xFFFFFFF) >And after we produce a candidate binary, say we call get_hash() where >some part of it is (in the 1M case) and'ed with 0xFFFFF and it returns >0x32154, then we only have to look through the bucket that contains the >"0x32154" hashes for a match. Something like that? That is exactly how it works. NOTE, that binary_hash() and get_hash() MUST be exactly the same (the end result). It is 2 different function, since the data they work on is different. Binary_hash works with the base16 or base16, while the get_hash works with the native results of the crypt function. Binary hash does not need to be optimized. It is only used once per candidate at startup. Get_hash must be fast. It is used once per each dictionary word for unsalted or once for each dictionary word * each salt for salted hashes. It is called a LOT. >It seems very easy to add them to most formats and I have already added >them to a couple I needed. It may be academic for some formats but I can >volunteer to produce a diff with the "missing" functions added to all or >most formats, if that is something we should have. It likely is. I took the ones I was 'sure' about at the time. There were some formats, I simply was not comfortable enough to know that my changes would be right. That is why I did not do them. What I did do, was put NULLs into the hash functions, and made loader.c NOT use those null functions. If you have the time and knowledge to get them right, then go for it. It will only speed things up, for anyone doing a large bulk search, of those formats. Jim
Powered by blists  more mailing lists